<a href="https://colab.research.google.com/github/isabella-as/Intelligent-Systems-Assignments/blob/main/NN_Classification_A2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**CLASSIFICATION PROBLEM**
**Neural Network Approach (Task 3)**

In [1]:
import numpy as np
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error,accuracy_score,classification_report
import matplotlib.pyplot as plt
import torch.nn.functional as F
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import TensorDataset, DataLoader
import pandas

The code below loads the dataset, separates the features "x" from the target "y". The target column is mapped from the original labels (“tested_negative” and “tested_positive”) to binary values 0 and 1 for classification. The
feature matrix "X" is converted to floats, and calling "X.shape" confirms its dimensions (8 features and 768 samples).

In [2]:
# Classification dataset

from sklearn.datasets import fetch_openml
diabetes = fetch_openml("diabetes", version=1, as_frame=True)

X = diabetes.data.values.astype(float)
y = diabetes.target.map({'tested_negative': 0, 'tested_positive': 1}).astype(int).values

X.shape



(768, 8)

In this step, the dataset is split into training and testing sets using an 80/20 ratio, where 80% of the data (Xtr, ytr) is used to train the model and 20% (Xte, yte) is reserved for testing. The parameter random_state=42 ensures that the split is reproducible, so the same division of data will be obtained each time the code is ran.

In [3]:
#train test spliting
test_size=0.2
Xtr, Xte, ytr, yte = train_test_split(X, y, test_size=test_size, random_state=42)

The input features are standardized using StandardScaler. They are rescaled so that each feature has a mean of 0 and a standard deviation of 1. This prevents features with larger numeric ranges from dominating the learning process. The scaler is fit on the training set and then applied to both the training and testing sets to avoid data leakage.

In [4]:
# Standardize features
scaler=StandardScaler()
Xtr= scaler.fit_transform(Xtr)
Xte= scaler.transform(Xte)

This code defines a multilayer perceptron (MLP) model for regression using PyTorch. The network takes the input features and passes them through four fully connected hidden layers, each with 64 neurons and ReLU activation functions for non-linearity. To reduce overfitting, dropout is applied after each hidden layer, randomly setting 50% of the neurons to zero during training. Finally, the output layer (self.out) produces a single value (output_size=1), which corresponds to the continuous regression target. The forward method specifies the sequence of operations that the data follows as it moves through the network.

In [5]:
class MLP(nn.Module):
    def __init__(self, input_size, output_size=1, dropout_prob=0.5):
        super(MLP, self).__init__()

        self.fc1 = nn.Linear(input_size, 64)
        self.fc2 = nn.Linear(64, 64)
        self.fc3 = nn.Linear(64, 64)
        self.fc4 = nn.Linear(64, 64)
        self.out = nn.Linear(64, output_size)

        self.dropout = nn.Dropout(p=dropout_prob)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = self.dropout(x)

        x = F.relu(self.fc2(x))
        x = self.dropout(x)

        x = F.relu(self.fc3(x))
        x = self.dropout(x)

        x = F.relu(self.fc4(x))
        x = self.dropout(x)

        x = self.out(x)
        return x

In the following code the main training hyperparameters are defined. The model is trained for 100 epochs- the entire training set will be passed through the network 100 times. The learning rate is set to 0.0005 (controls the step size during optimization). A dropout rate of 0.1 implies that 10% of the neurons are randomly deactivated during training to reduce overfitting. Finally, the batch size is set to 64, so the model processes 64 samples at a time before updating the weights. These parameters affect how fast the network learns and how well it generalizes.

In [6]:
num_epochs=500
lr=0.0005
dropout=0.1
batch_size=64

At this stage, the training and testing sets are converted from NumPy arrays into PyTorch tensors so they can be processed by the neural network (type float 32). The training inputs (Xtr) and outputs (ytr) are combined into a TensorDataset, which stores them as pairs. This dataset is then wrapped in a DataLoader, which automatically handles splitting the data into mini-batches (here of size 64, as previously defined) and shuffles the order of the samples at each epoch, helping the model generalize better by preventing it from seeing the data in the same order every time.

In [7]:
Xtr = torch.tensor(Xtr, dtype=torch.float32)
ytr = torch.tensor(ytr, dtype=torch.float32)
Xte = torch.tensor(Xte, dtype=torch.float32)
yte = torch.tensor(yte, dtype=torch.float32)

# Wrap Xtr and ytr into a dataset
train_dataset = TensorDataset(Xtr, ytr)

# Create DataLoader
train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

Here we set up the model, loss function, and optimizer. The code first checks whether a GPU is available and assigns the device accordingly. The MLP model is then created with an input size matching the number of features and the specified dropout probability, and it is moved to the chosen device. For a classification task a BCEWithLogitsLoss() loss funtion is used. Finally, the Adam optimizer is defined with the chosen learning rate (lr), which controls de rate in which the model’s parameters are updated during training.

In [8]:
# Model, Loss, Optimizer
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model = MLP(input_size=Xtr.shape[1], dropout_prob=dropout).to(device)
criterion = nn.BCEWithLogitsLoss()
optimizer = optim.Adam(model.parameters(), lr=lr)

This block implements the training loop for the neural network. For each epoch, the model is set to training mode and the data is processed in small batches from the DataLoader. Each batch of features and labels is moved to the selected device (CPU or GPU), goes through the model to produce predictions (logits), and compared to the true targets using the chosen loss function (for regression- MSELoss). Before backpropagation, the optimizer’s gradients are reset with zero_grad(). The backward() call computes gradients of the loss with respect to the model parameters, and optimizer.step() updates those parameters. The average loss of each epoch (from each epoch's batches) is printed to monitor the training progress.

In [9]:
# Training loop
for epoch in range(num_epochs):
    model.train()
    epoch_loss = 0.0

    for batch_x, batch_y in train_dataloader:
        batch_x = batch_x.to(device)
        batch_y = batch_y.to(device)

        logits = model(batch_x)
        loss = criterion(logits, batch_y.view(-1, 1))

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        epoch_loss += loss.item()

    avg_loss = epoch_loss / len(train_dataloader)
    print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {avg_loss:.4f}")

Epoch [1/500], Loss: 0.6834
Epoch [2/500], Loss: 0.6733
Epoch [3/500], Loss: 0.6558
Epoch [4/500], Loss: 0.6304
Epoch [5/500], Loss: 0.5896
Epoch [6/500], Loss: 0.5389
Epoch [7/500], Loss: 0.4949
Epoch [8/500], Loss: 0.4749
Epoch [9/500], Loss: 0.4614
Epoch [10/500], Loss: 0.4712
Epoch [11/500], Loss: 0.4491
Epoch [12/500], Loss: 0.4585
Epoch [13/500], Loss: 0.4470
Epoch [14/500], Loss: 0.4481
Epoch [15/500], Loss: 0.4460
Epoch [16/500], Loss: 0.4352
Epoch [17/500], Loss: 0.4410
Epoch [18/500], Loss: 0.4347
Epoch [19/500], Loss: 0.4352
Epoch [20/500], Loss: 0.4329
Epoch [21/500], Loss: 0.4315
Epoch [22/500], Loss: 0.4272
Epoch [23/500], Loss: 0.4310
Epoch [24/500], Loss: 0.4295
Epoch [25/500], Loss: 0.4361
Epoch [26/500], Loss: 0.4144
Epoch [27/500], Loss: 0.4297
Epoch [28/500], Loss: 0.4231
Epoch [29/500], Loss: 0.4261
Epoch [30/500], Loss: 0.4213
Epoch [31/500], Loss: 0.4310
Epoch [32/500], Loss: 0.4080
Epoch [33/500], Loss: 0.4296
Epoch [34/500], Loss: 0.4331
Epoch [35/500], Loss: 0

To finalize, the model is used to make predictions on the test data (Xte), and these predictions (y_pred) are compared with the true target values (yte) to evaluate performance. For classification problems is used the metric of accuracy. It can be calculated where predictions greater than 0.5 are treated as class 1 and those below as class 0.

In [10]:
y_pred=model(Xte)
print(f'ACC:{accuracy_score(yte.detach().numpy(),y_pred.detach().numpy()>0.5)}')

ACC:0.7012987012987013
