# Lab 9: Techniques for training Deep Neural Netwoks

```
- Machine Learning, Innopolis University (Fall semester 2023)
- Professor: Adil Khan
- Teaching Assistant: Gcinizwe Dlamini
```
<hr>


```
In this lab, you will practice techniques that are used to improve deep learning models perfomence in Pytorch.

Lab Plan
1. Data Augmentation examples
2. Batch normalization, Dropout, ...
3. Adaptive Learning rate and Optimizers
4. Using TensorBoard
5. Using Pretrained models (Transfer learning)

```
<hr>

# 1. CNN with PyTorch

## 1.1. Data Loading

In [None]:
import torch
import torch.nn as nn
# import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch. utils.data import DataLoader

batch_size = 32
test_batch_size = 100

# Transformations
data_transformations = transforms.Compose([
                           transforms.ToTensor(),
                           transforms.Normalize((0.1307,), (0.3081,))
                       ])

# Data Source
mnist_train = datasets.MNIST('../data', train=True, download=True,
                       transform=data_transformations)
mnist_test = datasets.MNIST('../data', train=False,
                            transform=data_transformations)


# Data loaders
train_loader = DataLoader(mnist_train,
                          batch_size=batch_size, shuffle=True)
test_loader = DataLoader(mnist_test,
                         batch_size=test_batch_size, shuffle=False)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ../data/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:00<00:00, 61273751.91it/s]


Extracting ../data/MNIST/raw/train-images-idx3-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ../data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 42369952.37it/s]

Extracting ../data/MNIST/raw/train-labels-idx1-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ../data/MNIST/raw/t10k-images-idx3-ubyte.gz



100%|██████████| 1648877/1648877 [00:00<00:00, 24025357.63it/s]


Extracting ../data/MNIST/raw/t10k-images-idx3-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ../data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 4658970.11it/s]


Extracting ../data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ../data/MNIST/raw



## 1.2 Define CNN model

In [None]:
class CNN(nn.Module):
    # Convolution formula: ((n + 2p - f) / s) + 1

    def init(self):
        super(CNN, self).init()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_drop = nn.Dropout2d()
        self.conv2_bn = nn.BatchNorm2d(20)
        self.fc1 = nn.Linear(320, 50)
        self.fc1_bn = nn.BatchNorm1d(50)
        self.fc2 = nn.Linear(50, 10)
        self.fc_drop = nn.Dropout(p=0.5)

    def forward(self, x):
        x = torch.relu(torch.max_pool2d(self.conv1(x), 2))
        #x = torch.relu(torch.max_pool2d(self.conv2(x), 2))
        x = torch.relu(torch.max_pool2d(self.conv2_bn(self.conv2(x)), 2))
        #x = torch.relu(torch.max_pool2d(self.conv2_bn(self.conv2_drop(self.conv2(x))), 2))
        x = x.view(-1, 320)
        x = torch.relu(self.fc1_bn(self.fc1(x)))
        #x = self.fc_drop(x)
        x = self.fc2(x)
        return torch.nn.functional.log_softmax(x, dim=1)

use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")
model_cnn = CNN().to(device)

## 2.2 Fully-conected model from the last class

In [None]:
import torch.nn.functional as F

In [None]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=10,
                               kernel_size=5,
                               stride=1)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_bn = nn.BatchNorm2d(20)
        self.dense1 = nn.Linear(in_features=320, out_features=50)
        self.dense1_bn = nn.BatchNorm1d(50)
        self.dense2 = nn.Linear(50, 10)

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_bn(self.conv2(x)), 2))
        x = x.view(-1, 320) #reshape
        x = F.relu(self.dense1_bn(self.dense1(x)))
        x = F.relu(self.dense2(x))
        return F.log_softmax(x)

use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")
model_nn = Net().to(device)

In [None]:
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

In [None]:
# Let's compare the number of parameters of these models:
print("Number of params in the Fully-connected model:", count_parameters(model_nn))
print("Number of params in the CNN model:", count_parameters(model_cnn))

Number of params in the Fully-connected model: 21980
Number of params in the CNN model: 0


Task: Try changing the fully-connected model to have the same number of parameters as CNN and compare the resulting performance

## 3. Training and testing

In [None]:
def train(model, device, train_loader, optimizer, epoch, log_interval=700):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = torch.nn.functional.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                       100. * batch_idx / len(train_loader), loss.item()))

def test( model, device, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += torch.nn.functional.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.2f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))

## Training the CNN model

In [None]:
epochs = 10
lr = 0.01
momentum = 0.5
log_interval = 700

In [None]:
# training CNN model
model = model_cnn
optimizer = optim.SGD(model.parameters(), lr=lr, momentum=momentum)

for epoch in range(1, epochs + 1):
    train(model, device, train_loader, optimizer, epoch, log_interval)
    test(model, device, test_loader)


Test set: Average loss: 0.1318, Accuracy: 9599/10000 (95.99%)


Test set: Average loss: 0.0957, Accuracy: 9707/10000 (97.07%)


Test set: Average loss: 0.0734, Accuracy: 9776/10000 (97.76%)


Test set: Average loss: 0.0686, Accuracy: 9781/10000 (97.81%)


Test set: Average loss: 0.0592, Accuracy: 9826/10000 (98.26%)


Test set: Average loss: 0.0539, Accuracy: 9831/10000 (98.31%)


Test set: Average loss: 0.0531, Accuracy: 9830/10000 (98.30%)


Test set: Average loss: 0.0498, Accuracy: 9835/10000 (98.35%)


Test set: Average loss: 0.0445, Accuracy: 9862/10000 (98.62%)


Test set: Average loss: 0.0463, Accuracy: 9854/10000 (98.54%)



## Train the fully-connected model

In [None]:
model = model_nn
optimizer = optim.SGD(model.parameters(), lr=lr, momentum=momentum)

for epoch in range(1, epochs + 1):
    train(model, device, train_loader, optimizer, epoch, log_interval)
    test(model, device, test_loader)

torch.save(model.state_dict(), "mnist_nn.pt")

  return F.log_softmax(x)



Test set: Average loss: 0.0638, Accuracy: 9841/10000 (98.41%)


Test set: Average loss: 0.0542, Accuracy: 9858/10000 (98.58%)


Test set: Average loss: 0.0342, Accuracy: 9905/10000 (99.05%)


Test set: Average loss: 0.0357, Accuracy: 9888/10000 (98.88%)


Test set: Average loss: 0.0378, Accuracy: 9880/10000 (98.80%)


Test set: Average loss: 0.0281, Accuracy: 9911/10000 (99.11%)


Test set: Average loss: 0.0267, Accuracy: 9910/10000 (99.10%)


Test set: Average loss: 0.0277, Accuracy: 9917/10000 (99.17%)


Test set: Average loss: 0.0259, Accuracy: 9914/10000 (99.14%)


Test set: Average loss: 0.0267, Accuracy: 9915/10000 (99.15%)



## Self-practice Task
