<a href="https://colab.research.google.com/github/guezhenxue/FIT3181-Deep-Learning/blob/main/Tutorial/%20FIT3181_5215_Tute1b_PyTorch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# <span style="color:#0b486b">  FIT3181/5215: Deep Learning (2025)</span>
***
*CE/Lecturer (Clayton):*  **Dr Trung Le** | trunglm@monash.edu <br/>
*Lecturer (Clayton):* **A/Prof Zongyuan Ge** | zongyuan.ge@monash.edu <br/>
*Lecturer (Malaysia):*  **Dr Arghya Pal** | arghya.pal@monash.edu <br/>
*Lecturer (Malaysia):*  **Dr Lim Chern Hong** | lim.chernhong@monash.edu <br/>  <br/>
*Head Tutor 3181:*  **Ms Ruda Nie H** |  RudaNie.H@monash.edu <br/>
*Head Tutor 5215:*  **Ms Leila Mahmoodi** |  leila.mahmoodi@monash.edu

<br/> <br/>
Faculty of Information Technology, Monash University, Australia
***

# Tutorial 1b: Logistic Regression with PyTorch


This tutorial aims to introduce the Logistic Regression which can be regarded as a feed-forward neural network with one layer.

## Import Necessary Libraries

In [1]:
print("A")

A


In [None]:
import torch
import torch.nn as nn
import numpy as np
from sklearn import datasets
from sklearn.preprocessing import StandardScaler      # for feature scaling
from sklearn.model_selection import train_test_split  # for train/test split

## Prepare Data

We first load the `breast cancer` dataset from `sklean` datasets and then split into 80% for training and 20% for testing.

In [None]:
# Prepare data
bc = datasets.load_breast_cancer()
X, y = bc.data, bc.target

n_samples, n_features = X.shape
print(f'number of samples: {n_samples}, number of features: {n_features}')

# split data to 80% for training and 20% for testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1234)


number of samples: 569, number of features: 30


**<span style="color:red">Exercise 1</span>:** Write the code to print out the first 10 feature vectors in `X_train` and `y_train`. Write the code to show the unique labels in `y_train`.

In [None]:
#Your answer here
print(X_train[:10])
print(y_train[:10])
print(np.unique(y_train))



tensor([[-0.3618, -0.2652, -0.3172, -0.4671,  1.8038,  1.1817, -0.5169,  0.1065,
         -0.3901,  1.3914,  0.1437, -0.1208,  0.1601, -0.1326, -0.5863, -0.1248,
         -0.5787,  0.1091, -0.2819, -0.1889, -0.2571, -0.2403, -0.2442, -0.3669,
          0.5449,  0.2481, -0.7109, -0.0797, -0.5280,  0.2506],
        [-0.8633,  0.7156, -0.8565, -0.7967, -0.0586, -0.4285, -0.5170, -0.6814,
          0.7948,  0.3882, -0.4545,  0.4009, -0.4357, -0.5216, -1.1631,  0.2724,
          0.0675, -0.2392,  1.1130,  0.3502, -0.8894,  0.3847, -0.8880, -0.7897,
         -1.0429, -0.4824, -0.5631, -0.7698,  0.4431, -0.2099],
        [-0.4334,  0.3251, -0.4129, -0.5036,  0.2029,  0.3169,  0.2114,  0.2923,
         -0.2941,  1.1295, -0.2249,  0.9890, -0.0743, -0.4596,  1.8909,  0.8176,
          0.5919,  1.7726,  0.1356,  0.7924, -0.6160, -0.0636, -0.5528, -0.6284,
         -0.1823, -0.1924, -0.2601, -0.0660, -1.1169,  0.0329],
        [-0.4191,  1.0410, -0.3904, -0.4502,  1.1198,  0.4183,  0.2901,  0.5127

We use `StandardScaler()` from `sklearn` to normalize the training/testing sets. We convert the training/testing numpy arrays to PyTorch arrays and then reshape them.

In [None]:
# scale data
sc = StandardScaler()

X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

# convert to tensors
X_train = torch.from_numpy(X_train.astype(np.float32))
X_test = torch.from_numpy(X_test.astype(np.float32))
y_train = torch.from_numpy(y_train.astype(np.int64))
y_test = torch.from_numpy(y_test.astype(np.int64))


## Training/Testing Procedure

We now present the `fundamental workflow of PyTorch` including training a model based on the training set and testing the trained model on the testing set. This fundamental workflow is the same for various PyTorch models.

### Prepare Model

First, we need to declare and define a model, which is a computational graph showing how to compute the model output from the input vector $x$. Specifically, given a data point $x$ (i.e., [1,30]) a batch $x$ (i.e., [64,30]), or even the entire training set $x$ (i.e., [569,30]), we compute
- logits = xW + b
- pred_probs = softmax(logits)

In [None]:
# Create model
# f = wx + b, softmax at the end
class LogisticRegression(nn.Module):

    def __init__(self, n_input_features):
        super(LogisticRegression, self).__init__()
        self.linear = nn.Linear(n_input_features, 2)
        self.X_train, self.y_train = None, None
        self.X_test, self.y_test = None, None

    def forward(self, x):
        logits = self.linear(x)
        # pred_probs = torch.nn.Softmax(dim=-1)(logits) #for asking question only
        return logits #return the logits

    def train_model(self, X_train, y_train, learning_rate, num_epochs, loss_fn, optimizer):
        self.X_train, self.y_train = X_train.to(device), y_train.to(device) #load the data to device (GPU or CPU)
        for epoch in range(num_epochs):
            # forward pass and loss
            logits = self.forward(self.X_train)

            loss = loss_fn(logits, self.y_train.squeeze().long())

            # backward pass to compute the gradient
            loss.backward()

            # updates the model parameter based on the gradient
            optimizer.step()

            # zero gradients
            optimizer.zero_grad()

            if (epoch+1) % 10 == 0:
                print(f'epoch: {epoch+1}, loss = {loss.item():.4f}')

    def evaluate(self, X_test, y_test):
        # Ensure the model is in evaluation mode
        self.eval()

        # Disable gradient calculation
        with torch.no_grad():
            # Load the data to the device (GPU or CPU)
            self.X_test, self.y_test = X_test.to(device), y_test.to(device)
            # Get the model's predictions
            logits = self.forward(self.X_test.type(torch.float32))
            # Compute the predicted class
            y_predicted = torch.argmax(logits, dim=1)

            # Calculate the number of correct predictions
            corrects = (y_predicted == self.y_test).sum().item()
            print(f'correct = {corrects}')

            # Get the total number of samples
            totals = self.y_test.size(0)
            print(f'totals = {totals}')

            # Compute the accuracy
            acc = corrects / totals
            print(f'accuracy = {acc * 100:.2f}%')

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f'device = {device}')
# model = LogisticRegression(n_features).to(device)  #load the model to the current device

device = cpu


**<span style="color:red">Exercise 2</span>:** Explain the forward function. What are the meanings and dimensions of `logits` and `pred_probs`?

Given a mini-batch $x$, `logits` represents the logits/discriminative scores/values of data points in the batch, while `pred_probs` represents the prediction probabilities of data points in the batch

### Prepare Loss and Optimizer

We declare `loss_fn` as the cross entropy loss. To train our logistic regression, we invoke the SGD optimizer with the learning rate $0.01$.

In [None]:
# Loss and optimizer
learning_rate = 0.01
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

In [None]:
print(y_train.shape, y_train.squeeze().shape)

torch.Size([455]) torch.Size([455])


### Train Model By Feeding the Training Set All-in-Once

We train the model in $200$ epochs (i.e., going through the entire training set $100$ times). Here in each epoch, we input entire training set to the model to compute the cross-entropy loss over the training set and then use the optimizer to update the model parameters (i.e., W and b).

In [None]:
# training loop
num_epochs = 200

for epoch in range(num_epochs):
    # forward pass and loss
    X_train, y_train = X_train.to(device), y_train.to(device) #load the data to device (GPU or CPU)
    logits = model(X_train)

    loss = loss_fn(logits, y_train.squeeze().long())

    # backward pass to compute the gradient
    loss.backward()

    # updates the model parameter based on the gradient
    optimizer.step()

    # zero gradients
    optimizer.zero_grad()

    if (epoch+1) % 10 == 0:
        print(f'epoch: {epoch+1}, loss = {loss.item():.4f}')

epoch: 10, loss = 0.5796
epoch: 20, loss = 0.4290
epoch: 30, loss = 0.3549
epoch: 40, loss = 0.3097
epoch: 50, loss = 0.2784
epoch: 60, loss = 0.2551
epoch: 70, loss = 0.2368
epoch: 80, loss = 0.2219
epoch: 90, loss = 0.2094
epoch: 100, loss = 0.1988
epoch: 110, loss = 0.1895
epoch: 120, loss = 0.1814
epoch: 130, loss = 0.1741
epoch: 140, loss = 0.1676
epoch: 150, loss = 0.1618
epoch: 160, loss = 0.1565
epoch: 170, loss = 0.1517
epoch: 180, loss = 0.1472
epoch: 190, loss = 0.1432
epoch: 200, loss = 0.1394


In [None]:
model = LogisticRegression(n_features).to(device)  #load the model to the current device
model.train()

LogisticRegression(
  (linear): Linear(in_features=30, out_features=2, bias=True)
)

### Evaluate Trained Model on Testing Set

We compute the accuracy on the testing set (i.e., the testing accuracy).

In [None]:
# Ensure the model is in evaluation mode
model.eval()

# Disable gradient calculation
with torch.no_grad():
    # Load the data to the device (GPU or CPU)
    X_test, y_test = X_test.to(device), y_test.to(device)
    # Get the model's predictions
    logits = model(X_test.type(torch.float32))
    # Compute the predicted class
    y_predicted = torch.argmax(logits, dim=1)

    # Calculate the number of correct predictions
    corrects = (y_predicted == y_test).sum().item()
    print(f'correct = {corrects}')

    # Get the total number of samples
    totals = y_test.size(0)
    print(f'totals = {totals}')

    # Compute the accuracy
    acc = corrects / totals
    print(f'accuracy = {acc * 100:.2f}%')


correct = 79
totals = 114
accuracy = 69.30%


**<span style="color:red">Exercise 3</span>:** Explain the code above to compute the testing accuracy. What are `logits` and `y_predicted`?

`logits` is a 2D tensor of the shape $[114,2]$ in which each row is the logits of a data point in the testing set. `y_predicted` is a 1D tensor that contains the predicted labels of the data points in the testing set. You can print out the values of `logits` and `y_predicted` for more information.

**<span style="color:red">Exercise 4</span>:** Package the above code in a function, allowing you to try with different learning rates. Then, train the logistic regression models with different learning rates (i.e., 0.05, 0.04, 0.005, 0.001) and observe the loss tendency and testing accuracies.

In [None]:
def train_and_evaluate_logistic_regression(learning_rate, n_features, X_train, y_train, X_test, y_test, num_epochs):
    # Initialize model, loss, and optimizer
    model = LogisticRegression(n_features).to(device)
    loss_fn = nn.CrossEntropyLoss()
    optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

    print(f"\nTraining with learning rate: {learning_rate}")

    # Train the model
    model.train_model(X_train, y_train, learning_rate, num_epochs, loss_fn, optimizer)

    # Evaluate the model
    model.evaluate(X_test, y_test)

# Train and evaluate with different learning rates
learning_rates_to_try = [0.05, 0.04, 0.005, 0.001]
for lr in learning_rates_to_try:
    train_and_evaluate_logistic_regression(lr, n_features, X_train, y_train, X_test, y_test, num_epochs)


Training with learning rate: 0.05
epoch: 10, loss = 0.2527
epoch: 20, loss = 0.1800
epoch: 30, loss = 0.1473
epoch: 40, loss = 0.1282
epoch: 50, loss = 0.1156
epoch: 60, loss = 0.1066
epoch: 70, loss = 0.0997
epoch: 80, loss = 0.0944
epoch: 90, loss = 0.0900
epoch: 100, loss = 0.0863
epoch: 110, loss = 0.0832
epoch: 120, loss = 0.0805
epoch: 130, loss = 0.0782
epoch: 140, loss = 0.0761
epoch: 150, loss = 0.0742
epoch: 160, loss = 0.0725
epoch: 170, loss = 0.0710
epoch: 180, loss = 0.0696
epoch: 190, loss = 0.0683
epoch: 200, loss = 0.0671
correct = 110
totals = 114
accuracy = 96.49%

Training with learning rate: 0.04
epoch: 10, loss = 0.2969
epoch: 20, loss = 0.2077
epoch: 30, loss = 0.1690
epoch: 40, loss = 0.1465
epoch: 50, loss = 0.1317
epoch: 60, loss = 0.1210
epoch: 70, loss = 0.1130
epoch: 80, loss = 0.1066
epoch: 90, loss = 0.1014
epoch: 100, loss = 0.0971
epoch: 110, loss = 0.0934
epoch: 120, loss = 0.0902
epoch: 130, loss = 0.0874
epoch: 140, loss = 0.0849
epoch: 150, loss = 

----

**The end**