<a href="https://colab.research.google.com/github/elhamod/BA865-2024/blob/main/hands-on/First_Pytorch_NN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Welcome to your first PyTorch Neural Net!

##Things we will investigate:

- How to load and pre-process the data.
- How to construct an MLP.
- How to train an MLP (Loss and optimization).
- How to utilize a GPU.
- How the complexity of the model affects its performance.
- How to measure the performance of the model.
- The effects of hyper-parameters:
  - Learning rate.
  - Optimizer.
  - Batch size.
- How to use WandB.
- Using SCC.


##Import some packages

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader, random_split

This helps you check if GPU is available

In [2]:
torch.cuda.is_available()

False

Some extra fancy but optional packages:

- `torchmetrics` for calculating accuracy
- `wandb` for logging

In [3]:
# !pip install wandb -qU

In [4]:

# import wandb
# wandb.login()

## Hyper-parameters

Define your hyper-parameters here.

In [5]:
# Hyperparameters

experiment_name = "experiment_33"

# Data
input_size = 28 * 28  # MNIST images are 28x28
output_size = 10  # 10 classes for the digits 0-9
batch_size = 32

# MLP
hidden_size = 64

#Optimzation
learning_rate = 0.001
epochs = 10
weight_decay = 0.0


## Data

Load your dataset and create `DataLoaders` that handle the batching and shuffling.

In [6]:
# Transformations
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])

# MNIST dataset
train_dataset = datasets.MNIST(root='./data', train=True, transform=transform, download=True)
test_dataset = datasets.MNIST(root='./data', train=False, transform=transform, download=True)

# Data loaders
train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=False)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:00<00:00, 106170992.37it/s]


Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 30729501.22it/s]

Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz



100%|██████████| 1648877/1648877 [00:00<00:00, 34151032.28it/s]


Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 5373915.03it/s]


Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw



## Define and create your model

In [7]:
# MLP model
class MLP(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(MLP, self).__init__()
        self.input_size = input_size
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, output_size)

    # Defines the forward pass.
    def forward(self, x):
        x = x.view(-1, self.input_size)
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return x

# Could also be written as this:
# class MLP(nn.Module):
#     def __init__(self, input_size, hidden_size, output_size):
#         super(MLP, self).__init__()
#         self.model = nn.Sequential(
#             nn.Linear(input_size, hidden_size),
#             nn.ReLU(),
#             nn.Linear(hidden_size, output_size)
#         )

#     def forward(self, x):
#         x = x.view(-1, input_size)
#         x = self.model(x)
#         return x


Adding `.cuda` moves your model to the GPU.

In [8]:
model = MLP(input_size, hidden_size, output_size) #.cuda()

use `torch-summary` for more info on the model

In [9]:
# !pip install torch-summary

In [10]:
from torchsummary import summary

_= summary(model, (28, 28),) # device= torch.device("cuda")

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Linear-1                   [-1, 64]          50,240
              ReLU-2                   [-1, 64]               0
            Linear-3                   [-1, 10]             650
Total params: 50,890
Trainable params: 50,890
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.19
Estimated Total Size (MB): 0.20
----------------------------------------------------------------


## Loss

For classification, we use cross-entropy.

In [11]:
criterion = nn.CrossEntropyLoss()

##Optimizer

In [12]:
optimizer = optim.Adam(model.parameters(), lr=learning_rate, weight_decay=weight_decay)

## Training!

In [13]:
# wandb.init(
#     # Set the project where this run will be logged
#     project="First PyTorch NN",
#     # We pass a run name (otherwise it’ll be randomly assigned, like sunshine-lollypop-10)
#     name=experiment_name,
#     # Track hyperparameters and run metadata
#     config={
#     "learning_rate": learning_rate,
#     "epochs": epochs,
#     "batch_size": batch_size,
#     "weight_decay": weight_decay,
#     "notes for me": "This is a very lovely experiment. please work!"
#     })

Define some functions to calculate training and test accuracies

In [14]:
# !pip install -U torchmetrics
# import torchmetrics

def get_accuracy(dataloader, model):
  acc = 0
  # <OR>
  # acc = torchmetrics.Accuracy()

  with torch.no_grad():
      for images, labels in dataloader:
          outputs = model(images.cuda()) # get predictions

          # Update accuracy for this batch
          acc = acc + torch.sum(torch.argmax(outputs, axis=1) == labels.cuda())
          # <OR>
          # acc.update(outputs, labels)


      # Compute the accuracy
      acc = acc/len(dataloader.dataset) # normalizes
      # <OR>
      # acc = acc.compute()

      return acc



In [15]:
def get_loss(loader):
  with torch.no_grad(): # Can only be used for testing, not training!

    loss = 0
    for i, (images, labels) in enumerate(loader): # The batches.
          # step 1.1 move data to cuda. Make sure the model is on cuda too!
          # images = images.cuda()
          # labels = labels.cuda()

          # step2: Forward pass
          outputs = model(images)

          # step 3: calculate the loss.
          loss = loss + criterion(outputs, labels)
    return loss/ len(loader.dataset)

Early-stopping class

In [16]:
class EarlyStopper:
    def __init__(self, patience=1):
        self.patience = patience
        self.counter = 0
        self.min_validation_loss = float('inf')

    def early_stop(self, validation_loss):
        if validation_loss < self.min_validation_loss:
            self.min_validation_loss = validation_loss
            self.counter = 0
        elif validation_loss > self.min_validation_loss:
            self.counter += 1
            if self.counter >= self.patience:
                return True
        return False

Train!

In [17]:


# Training loop
for epoch in range(epochs): # The epochs.
    for i, (images, labels) in enumerate(train_loader): # The batches.
        # step 1: Zero out the gradients.
        optimizer.zero_grad()

        # step 1.1 move data to cuda. Make sure the model is on cuda too!
        # images = images.cuda()
        # labels = labels.cuda()

        # step2: Forward pass
        outputs = model(images)

        # step 3: calculate the loss.
        loss = criterion(outputs, labels)

        # step 4: Backward pass
        loss.backward()
        optimizer.step()

        # Print the loss
        if i %100 == 0:
          print("Epoch", epoch+ 1, " batch", i+1, ". Training Loss: ", loss.item())
          # wandb.log({"loss": loss})

    # Compute total train accuracy
    train_acc = get_accuracy(train_loader, model)
    test_acc = get_accuracy(val_loader, model)

    print(f'Epoch [{epoch + 1}/{epochs}], Train Accuracy: {train_acc.item():.4f}, Validation Accuracy: {test_acc.item():.4f}')
    # wandb.log({"epoch": epoch + 1, "train_accuracy": train_acc.item(), "val_accuracy": test_acc.item()})



Epoch 1  batch 1 . Training Loss:  2.381603479385376
Epoch 1  batch 101 . Training Loss:  0.8024726510047913
Epoch 1  batch 201 . Training Loss:  0.4466899037361145
Epoch 1  batch 301 . Training Loss:  0.4637675881385803
Epoch 1  batch 401 . Training Loss:  0.4477194547653198
Epoch 1  batch 501 . Training Loss:  0.3261856138706207


KeyboardInterrupt: 

Test

In [None]:
acc = get_accuracy(test_loader, model)

print(f'Test Accuracy: {acc.item():.4f}')
# wandb.summary['Test Accuracy'] = acc.item()

In [None]:
# wandb.finish()