<a href="https://colab.research.google.com/github/enridagoo/enridagoo-DataScience-GenAI-Submissions/blob/main/Assignment_12/6_02_DNN_101_COMPLETED.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![](https://drive.google.com/uc?export=view&id=1xqQczl0FG-qtNA2_WQYuWePW9oU8irqJ)

# 6.02 Dense Neural Network (with PyTorch)
This will expand on our logistic regression example and take us through building our first neural network. If you haven't already, be sure to check (and if neccessary) switch to GPU processing by clicking Runtime > Change runtime type and selecting GPU. We can test this has worked with the following code:

In [13]:
import torch

# Check for GPU availability
print("Num GPUs Available: ", torch.cuda.device_count())

Num GPUs Available:  1


Hopefully your code shows you have 1 GPU available! Next let's get some data. We'll start with another in-built dataset:

In [14]:
# upload an in-built Python (OK semi-in-built) dataset
from sklearn.datasets import load_diabetes

import pandas as pd
import numpy as np

# import the data
data = load_diabetes()
data

{'data': array([[ 0.03807591,  0.05068012,  0.06169621, ..., -0.00259226,
          0.01990749, -0.01764613],
        [-0.00188202, -0.04464164, -0.05147406, ..., -0.03949338,
         -0.06833155, -0.09220405],
        [ 0.08529891,  0.05068012,  0.04445121, ..., -0.00259226,
          0.00286131, -0.02593034],
        ...,
        [ 0.04170844,  0.05068012, -0.01590626, ..., -0.01107952,
         -0.04688253,  0.01549073],
        [-0.04547248, -0.04464164,  0.03906215, ...,  0.02655962,
          0.04452873, -0.02593034],
        [-0.04547248, -0.04464164, -0.0730303 , ..., -0.03949338,
         -0.00422151,  0.00306441]]),
 'target': array([151.,  75., 141., 206., 135.,  97., 138.,  63., 110., 310., 101.,
         69., 179., 185., 118., 171., 166., 144.,  97., 168.,  68.,  49.,
         68., 245., 184., 202., 137.,  85., 131., 283., 129.,  59., 341.,
         87.,  65., 102., 265., 276., 252.,  90., 100.,  55.,  61.,  92.,
        259.,  53., 190., 142.,  75., 142., 155., 225.,  59

We are working on a regression problem, with "structured" data which has already been cleaned and normalised. We can skip the usual cleaning/engineering steps. However, we do need to get the data into PyTorch:

In [15]:
# Convert data to PyTorch tensors
X = torch.tensor(data.data, dtype=torch.float32)
y = torch.tensor(data.target, dtype=torch.float32).reshape(-1, 1) # Reshape y to be a column vector

Now our data is stored in tensors we can do train/test splitting as before (in fact we can use sklearn as before):

In [16]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print(X_train.shape, y_train.shape)
print(X_test.shape, y_test.shape)

torch.Size([353, 10]) torch.Size([353, 1])
torch.Size([89, 10]) torch.Size([89, 1])


Now we can set up our batches for training. As we have a nice round 400 let's go with batches of 50 (8 batches in total). We'll also seperate the features and labels:

In [17]:
from torch.utils.data import TensorDataset, DataLoader

# Create TensorDatasets and DataLoaders
train_dataset = TensorDataset(X_train, y_train)
train_loader = DataLoader(train_dataset, batch_size=50, shuffle=True)

test_dataset = TensorDataset(X_test, y_test)
test_loader = DataLoader(test_dataset, batch_size=50, shuffle=False)

Now its time to build our model. We'll keep it simple ... a model with an input layer of 10 features and then 2x _Dense_ (fully connected) layers each with 5 neurons and ReLU activation. Our output layer will be size=1 given this is a regression problem and we want a single value output per prediction.

This will be easier to understand if you have read through the logistic regression tutorial.

In [18]:
import torch
import torch.nn as nn

# Define the model
class DiabetesModel(nn.Module):
    def __init__(self):
        super(DiabetesModel, self).__init__()
        # we'll set up the layers as a sequence using nn.Sequential
        self.layers = nn.Sequential(

            # first layer will be a linear layer that has 5x neurons
            # (5x sets of linear regression)
            # the layer takes the 10 features as input (i.e. 10, 5)
            nn.Linear(10, 5),

            nn.ReLU(), # ReLU activation

            # second linear layer again has 5 neurons
            # this time taking the input as the output of the last layer
            # (which had 5x neurons)
            nn.Linear(5, 5),

            nn.ReLU(), # ReLU again

            # last linear layer takes the output from the previous 5 neurons
            # this time its a single output with no activation
            # i.e. this is the predicitons (regression)
            nn.Linear(5, 1)
        )

    def forward(self, x):
        return self.layers(x) # pass the data through the layers

As before we need to create a model object, specify the loss (criterion) and an optimiser (which we cover next week):

In [19]:
import torch.optim as optim

# Initialize the model, loss function, and optimizer
model = DiabetesModel()
criterion = nn.MSELoss() # MSE loss function
optimiser = optim.Adam(model.parameters(), lr=0.001)

Now we can train the model. Again, the logistic regression tutorial (6.01) may help you undertstand this:

In [20]:
# Move model to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Training loop (example - you'll likely want to add more epochs)
epochs = 100 # 100 epochs

for epoch in range(epochs):
  # use the train_loader to pass the inputs (x) and targets (y)
  for inputs, targets in train_loader:
    # pass to the GPU (hopefully)
    inputs, targets = inputs.to(device), targets.to(device)

    # pass model to GPU as well
    model.to(device)

    model.train() # put the model object in train mode
    optimiser.zero_grad() # reset the gradiants
    outputs = model(inputs) # create outputs
    loss = criterion(outputs, targets) # compare with Y to get loss
    loss.backward() # backpropogate the loss (next week)
    optimiser.step() # # update the parameters based on this round of training

  # every 10 steps we will print out the current loss
    if (epoch+1) % 10 == 0: # modular arithmetic
        print(f'Epoch [{epoch+1}/{epochs}], Loss: {round(loss.item(), 4)}')

Epoch [10/100], Loss: 22010.3945
Epoch [10/100], Loss: 32505.5977
Epoch [10/100], Loss: 22927.0996
Epoch [10/100], Loss: 27025.1289
Epoch [10/100], Loss: 34037.6133
Epoch [10/100], Loss: 37536.4297
Epoch [10/100], Loss: 31655.5586
Epoch [10/100], Loss: 20590.4297
Epoch [20/100], Loss: 30466.7891
Epoch [20/100], Loss: 29802.0234
Epoch [20/100], Loss: 31118.6387
Epoch [20/100], Loss: 23590.8242
Epoch [20/100], Loss: 32351.5918
Epoch [20/100], Loss: 30529.1797
Epoch [20/100], Loss: 28690.332
Epoch [20/100], Loss: 31233.3281
Epoch [30/100], Loss: 24351.1816
Epoch [30/100], Loss: 24252.4453
Epoch [30/100], Loss: 30116.666
Epoch [30/100], Loss: 27620.7988
Epoch [30/100], Loss: 27814.4199
Epoch [30/100], Loss: 37503.5234
Epoch [30/100], Loss: 34972.793
Epoch [30/100], Loss: 18055.1855
Epoch [40/100], Loss: 29464.9492
Epoch [40/100], Loss: 28618.3535
Epoch [40/100], Loss: 24681.2168
Epoch [40/100], Loss: 29328.8418
Epoch [40/100], Loss: 25440.0078
Epoch [40/100], Loss: 33465.6992
Epoch [40/100

We can see loss is significantly lower at the end than it was at the start. However, it is also bouncing around a little still which suggests the model needs more training (100 epochs is not a lot in deep learning terms). However, let's evaluate as before:

In [21]:
# Evaluation (example)
model.eval() # testing mode
mse_values = [] # collect the MSE scores

with torch.no_grad():
    for inputs, targets in test_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        outputs = model(inputs) # predict the test data

        # Calculate Mean Squared Error
        mse = criterion(outputs, targets) # calcualte mse for the batch
        mse_values.append(mse.item()) # add to the list of MSE values

# Calculate and print the average MSE
avg_mse = np.mean(mse_values)
print(f"Average MSE on test set: {avg_mse}")

Average MSE on test set: 22762.103515625


MSE looks expected given training (no obvious sign of overfitting). However, we probably can get better results with tuning and more epochs.

Let's run the loop again a little differently to collect the predicted values (y_hat) and actuals (y) and add them to a dataset for comparions:

In [22]:
# Evaluation
model.eval()
predictions = []
actuals = []

with torch.no_grad():
    for inputs, targets in test_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        outputs = model(inputs)
        predictions.extend(outputs.cpu().numpy())
        actuals.extend(targets.cpu().numpy())

# Create DataFrame
results_df = pd.DataFrame({'Predicted': np.array(predictions).flatten(), 'Actual': np.array(actuals).flatten()})
results_df

Unnamed: 0,Predicted,Actual
0,12.954713,219.0
1,12.204578,70.0
2,12.677232,202.0
3,15.577978,230.0
4,12.401243,111.0
...,...,...
84,11.120073,153.0
85,10.312742,98.0
86,9.356843,37.0
87,9.790593,63.0


Side-by-side, they don't look great. Can you improve them?

<br><br>

## EXERCISE #1
Try increasing the number of epochs to 1,000 (when the model is fairly well trained then the results printed for each 10x epochs will be fairly stable and not change much). Does this give better results?

<br><br>

## EXERCISE #2 (optional)
Try experimenting with the architecture (number of neurons and/or number of layers). Can we reach an optimal architecture?

In [None]:
# Move model to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Training loop (example - you'll likely want to add more epochs)
epochs = 1000 # 1000 epochs

for epoch in range(epochs):
  # use the train_loader to pass the inputs (x) and targets (y)
  for inputs, targets in train_loader:
    # pass to the GPU (hopefully)
    inputs, targets = inputs.to(device), targets.to(device)

    # pass model to GPU as well
    model.to(device)

    model.train() # put the model object in train mode
    optimiser.zero_grad() # reset the gradiants
    outputs = model(inputs) # create outputs
    loss = criterion(outputs, targets) # compare with Y to get loss
    loss.backward() # backpropogate the loss (next week)
    optimiser.step() # # update the parameters based on this round of training

  # every 10 steps we will print out the current loss
    if (epoch+1) % 10 == 0: # modular arithmetic
        print(f'Epoch [{epoch+1}/{epochs}], Loss: {round(loss.item(), 4)}')

Let's evaluate the model's performance on the test set after 1000 epochs of training by calculating the Mean Squared Error (MSE).

In [28]:
# Evaluation
model.eval() # Set the model to evaluation mode
mse_values = [] # List to store MSE for each batch

with torch.no_grad(): # Disable gradient calculation during evaluation
    for inputs, targets in test_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        outputs = model(inputs) # Get predictions

        # Calculate Mean Squared Error for the current batch
        mse = criterion(outputs, targets)
        mse_values.append(mse.item()) # Add the batch MSE to the list

# Calculate and print the average MSE across all test batches
avg_mse = np.mean(mse_values)
print(f"Average MSE on test set after 1000 epochs: {avg_mse}")

Average MSE on test set after 1000 epochs: 2845.820556640625


Now, let's look at a side-by-side comparison of the predicted values (`y_hat`) and the actual target values (`y`) from the test set.

In [29]:
# Collect predictions and actuals for comparison
model.eval() # Ensure model is in evaluation mode
predictions = []
actuals = []

with torch.no_grad():
    for inputs, targets in test_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        outputs = model(inputs)
        predictions.extend(outputs.cpu().numpy()) # Move predictions to CPU and convert to NumPy array
        actuals.extend(targets.cpu().numpy()) # Move actuals to CPU and convert to NumPy array

# Create a Pandas DataFrame for easy viewing
results_df = pd.DataFrame({'Predicted': np.array(predictions).flatten(), 'Actual': np.array(actuals).flatten()})
display(results_df)

Unnamed: 0,Predicted,Actual
0,142.918137,219.0
1,181.063156,70.0
2,139.865234,202.0
3,297.205597,230.0
4,123.258530,111.0
...,...,...
84,115.583130,153.0
85,88.791550,98.0
86,80.685448,37.0
87,65.376556,63.0


At the end we can see that the MSE in this second case, it has plummeted significantly and this indicates a substantial improvement in the model's performance. As a lower MSE mens the predicrtions are closer to the actual values.

# Task
Define a new PyTorch neural network class (`ImprovedDiabetesModel`) with an enhanced architecture by increasing the number of neurons in the existing layers and adding an additional hidden layer. Initialize an instance of this new model, set up the MSE loss function and Adam optimizer, and then train the model for 1000 epochs using the `train_loader`, monitoring the loss during training. After training, evaluate the `ImprovedDiabetesModel` on the test set by calculating the Mean Squared Error (MSE) and display a DataFrame comparing predicted and actual values. Finally, summarize the performance improvements, if any, of the new model architecture compared to the original one.

## Define New Model Architecture

### Subtask:
Define a new PyTorch neural network class (`ImprovedDiabetesModel`) with a modified architecture by increasing the number of neurons in existing layers and adding a new hidden layer.


**Reasoning**:
The subtask requires defining a new PyTorch neural network class with a modified architecture. This code block will define the `ImprovedDiabetesModel` class as specified in the instructions, including the increased neuron counts and the additional hidden layer with ReLU activations.



In [51]:
import torch.nn as nn

class ImprovedDiabetesModel(nn.Module):
    def __init__(self):
        super(ImprovedDiabetesModel, self).__init__()
        self.layers = nn.Sequential(
            nn.Linear(10, 100), # First layer: 10 input features, 20 output neurons
            nn.ReLU(),
            nn.Linear(100, 50), # Second layer: 20 input neurons, 20 output neurons
            nn.ReLU(),
            nn.Linear(50, 20), # Third layer: 20 input neurons, 10 output neurons
            nn.ReLU(),
            nn.Linear(20, 1)  # Output layer: 10 input neurons, 1 output neuron (regression)
        )

    def forward(self, x):
        return self.layers(x)


**Reasoning**:
Now that the `ImprovedDiabetesModel` class is defined, the next step is to initialize an instance of this model, set the loss function to Mean Squared Error (MSE), and configure the Adam optimizer with the model's parameters and a specified learning rate. This prepares the model for training.



In [52]:
import torch.optim as optim

# Initialize the improved model
model = ImprovedDiabetesModel()

# Define the loss function (MSE for regression)
criterion = nn.MSELoss()

# Define the optimizer (Adam optimizer with a learning rate of 0.001)
optimiser = optim.Adam(model.parameters(), lr=0.001)

print("ImprovedDiabetesModel initialized with MSELoss and Adam optimizer.")

ImprovedDiabetesModel initialized with MSELoss and Adam optimizer.


**Reasoning**:
The next step is to train the `ImprovedDiabetesModel` for 1000 epochs using the `train_loader`, while monitoring and printing the loss every 10 epochs. This involves iterating through the `train_loader`, moving data to the appropriate device (GPU if available), performing forward and backward passes, and updating the model parameters.



In [53]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

epochs = 1000

for epoch in range(epochs):
  for inputs, targets in train_loader:
    inputs, targets = inputs.to(device), targets.to(device)
    model.to(device)

    model.train()
    optimiser.zero_grad()
    outputs = model(inputs)
    loss = criterion(outputs, targets)
    loss.backward()
    optimiser.step()

  if (epoch + 1) % 10 == 0:
    print(f'Epoch [{epoch+1}/{epochs}], Loss: {round(loss.item(), 4)}')


Epoch [10/1000], Loss: 56505.5938
Epoch [20/1000], Loss: 6760.3999
Epoch [30/1000], Loss: 2231.9084
Epoch [40/1000], Loss: 2193.4204
Epoch [50/1000], Loss: 4155.2188
Epoch [60/1000], Loss: 4050.1531
Epoch [70/1000], Loss: 4612.3066
Epoch [80/1000], Loss: 2855.2019
Epoch [90/1000], Loss: 2635.2688
Epoch [100/1000], Loss: 4603.814
Epoch [110/1000], Loss: 1615.5364
Epoch [120/1000], Loss: 1744.0194
Epoch [130/1000], Loss: 1504.366
Epoch [140/1000], Loss: 434.1941
Epoch [150/1000], Loss: 5718.2378
Epoch [160/1000], Loss: 42.3805
Epoch [170/1000], Loss: 2524.2864
Epoch [180/1000], Loss: 5891.96
Epoch [190/1000], Loss: 3282.2378
Epoch [200/1000], Loss: 2963.1345
Epoch [210/1000], Loss: 870.3259
Epoch [220/1000], Loss: 4348.9009
Epoch [230/1000], Loss: 1285.8198
Epoch [240/1000], Loss: 3061.0298
Epoch [250/1000], Loss: 3122.4883
Epoch [260/1000], Loss: 9173.9688
Epoch [270/1000], Loss: 4521.9575
Epoch [280/1000], Loss: 454.9415
Epoch [290/1000], Loss: 753.5175
Epoch [300/1000], Loss: 3422.928

**Reasoning**:
After training the `ImprovedDiabetesModel`, the next logical step is to evaluate its performance on the test set by calculating the Mean Squared Error (MSE), similar to how the previous model was evaluated.



In [54]:
model.eval() # Set the model to evaluation mode
mse_values = [] # List to store MSE for each batch

with torch.no_grad(): # Disable gradient calculation during evaluation
    for inputs, targets in test_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        outputs = model(inputs) # Get predictions

        # Calculate Mean Squared Error for the current batch
        mse = criterion(outputs, targets)
        mse_values.append(mse.item()) # Add the batch MSE to the list

# Calculate and print the average MSE across all test batches
avg_mse_improved_model = np.mean(mse_values)
print(f"Average MSE on test set for ImprovedDiabetesModel: {avg_mse_improved_model}")

Average MSE on test set for ImprovedDiabetesModel: 2687.8408203125


**Reasoning**:
To complete the evaluation of the `ImprovedDiabetesModel`, the next step is to collect its predictions on the test set and compare them side-by-side with the actual target values, displaying them in a Pandas DataFrame.



In [55]:
model.eval() # Ensure model is in evaluation mode
predictions = []
actuals = []

with torch.no_grad():
    for inputs, targets in test_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        outputs = model(inputs)
        predictions.extend(outputs.cpu().numpy()) # Move predictions to CPU and convert to NumPy array
        actuals.extend(targets.cpu().numpy()) # Move actuals to CPU and convert to NumPy array

# Create a Pandas DataFrame for easy viewing
results_df = pd.DataFrame({'Predicted': np.array(predictions).flatten(), 'Actual': np.array(actuals).flatten()})
display(results_df)

Unnamed: 0,Predicted,Actual
0,143.963547,219.0
1,182.958969,70.0
2,143.829117,202.0
3,308.166626,230.0
4,115.608459,111.0
...,...,...
84,105.330750,153.0
85,84.642410,98.0
86,89.110268,37.0
87,75.093369,63.0


Finally, this model seems better optimized, but it could be improved, maybe increasing the deepness of the layers and adding more ReLU