<center><img src='https://drive.google.com/uc?id=1_utx_ZGclmCwNttSe40kYA6VHzNocdET' height="60"></center>

AI TECH - Akademia Innowacyjnych Zastosowań Technologii Cyfrowych. Program Operacyjny Polska Cyfrowa na lata 2014-2020
<hr>

<center><img src='https://drive.google.com/uc?id=1BXZ0u3562N_MqCLcekI-Ens77Kk4LpPm'></center>

<center>
Projekt współfinansowany ze środków Unii Europejskiej w ramach Europejskiego Funduszu Rozwoju Regionalnego
Program Operacyjny Polska Cyfrowa na lata 2014-2020,
Oś Priorytetowa nr 3 "Cyfrowe kompetencje społeczeństwa" Działanie  nr 3.2 "Innowacyjne rozwiązania na rzecz aktywizacji cyfrowej"
Tytuł projektu:  „Akademia Innowacyjnych Zastosowań Technologii Cyfrowych (AI Tech)”
    </center>

Code based on https://github.com/pytorch/examples/blob/master/mnist/main.py

In this exercise we are using high level abstractions from torch.nn like nn.Linear.
Note: during the next lab session we will go one level deeper and implement more things
with bare hands.

Tasks:

    1. Read the code.

    2. Check that the given implementation reaches 95% test accuracy for architecture input-128-128-10 after few epochs.

    3. Add the option to use SGD with momentum instead of ADAM.

    4. Experiment with different learning rates, plot the learning curves for different
    learning rates for both ADAM and SGD with momentum.

    5. Parameterize the constructor by a list of sizes of hidden layers of the MLP.
    Note that this requires creating a list of layers as an atribute of the Net class,
    and one can't use a standard python list containing nn.Modules (why?).
    Check torch.nn.ModuleList.


In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms

In [2]:
class Net(nn.Module):
    def __init__(self, hidden_layers_sizes=[128, 128]):
        super(Net, self).__init__()
        # After flattening an image of size 28x28 we have 784 inputs
        self.layers = nn.ModuleList()
        self.layers.append(nn.Linear(784, hidden_layers_sizes[0]))
        self.layers.extend([nn.Linear(hidden_layers_sizes[i], hidden_layers_sizes[i+1]) for i in range(len(hidden_layers_sizes)-1)])
        self.layers.append(nn.Linear(hidden_layers_sizes[-1], 10))
        
    def forward(self, x):
        x = torch.flatten(x, 1)
        for f in self.layers:
            x = f(x)
            x = F.relu(x)
        output = F.log_softmax(x, dim=1)
        return output


def train(model, device, train_loader, optimizer, epoch, log_interval):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))
    return loss.item()


def test(model, device, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))



In [3]:
batch_size = 256
test_batch_size = 1000
epochs = 5
lr = 1e-2
seed = 1
log_interval = 10
use_cuda = torch.cuda.is_available()

In [4]:
torch.manual_seed(seed)
device = torch.device("cuda" if use_cuda else "cpu")

train_kwargs = {'batch_size': batch_size}
test_kwargs = {'batch_size': test_batch_size}
if use_cuda:
    cuda_kwargs = {'num_workers': 1,
                    'pin_memory': True,
                    'shuffle': True}
    train_kwargs.update(cuda_kwargs)
    test_kwargs.update(cuda_kwargs)

In [5]:
transform=transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
    ])
dataset1 = datasets.MNIST('../data', train=True, download=True,
                    transform=transform)
dataset2 = datasets.MNIST('../data', train=False,
                    transform=transform)
train_loader = torch.utils.data.DataLoader(dataset1,**train_kwargs)
test_loader = torch.utils.data.DataLoader(dataset2, **test_kwargs)

In [6]:
model = Net().to(device)
optimizer = optim.Adam(model.parameters(), lr=lr)

for epoch in range(1, epochs + 1):
    train(model, device, train_loader, optimizer, epoch, log_interval)
    test(model, device, test_loader)


Test set: Average loss: 1.0139, Accuracy: 5943/10000 (59%)


Test set: Average loss: 0.9963, Accuracy: 5966/10000 (60%)


Test set: Average loss: 0.9740, Accuracy: 5973/10000 (60%)


Test set: Average loss: 0.9779, Accuracy: 5987/10000 (60%)


Test set: Average loss: 0.9880, Accuracy: 5981/10000 (60%)


In [7]:
 # 3. Add the option to use SGD with momentum instead of ADAM.
momentum = 0.9
model_with_momentum = Net().to(device)
optimizer = optim.SGD(model_with_momentum.parameters(), lr=lr, momentum=momentum)

for epoch in range(1, epochs + 1):
    train(model_with_momentum, device, train_loader, optimizer, epoch, log_interval)
    test(model_with_momentum, device, test_loader)


Test set: Average loss: 0.5025, Accuracy: 8345/10000 (83%)


Test set: Average loss: 0.4144, Accuracy: 8566/10000 (86%)


Test set: Average loss: 0.3764, Accuracy: 8653/10000 (87%)


Test set: Average loss: 0.3523, Accuracy: 8701/10000 (87%)


Test set: Average loss: 0.3359, Accuracy: 8746/10000 (87%)


In [8]:
### Your code goes here ###
%env CLEARML_WEB_HOST=https://app.clear.ml
%env CLEARML_API_HOST=https://api.clear.ml
%env CLEARML_FILES_HOST=https://files.clear.ml
%env CLEARML_API_ACCESS_KEY=DHUSDP5H5QRF6ZJGXYAS
%env CLEARML_API_SECRET_KEY=w7yowOv0p7WHKK4tU3AlEYNfG1gNU80jx2XwS0z3w9XaSaMnPR
###########################

from clearml import Task
task = Task.init(project_name="michal_mierzejewski_lab4", task_name="pytorch")

env: CLEARML_WEB_HOST=https://app.clear.ml
env: CLEARML_API_HOST=https://api.clear.ml
env: CLEARML_FILES_HOST=https://files.clear.ml
env: CLEARML_API_ACCESS_KEY=DHUSDP5H5QRF6ZJGXYAS
env: CLEARML_API_SECRET_KEY=w7yowOv0p7WHKK4tU3AlEYNfG1gNU80jx2XwS0z3w9XaSaMnPR
ClearML Task: created new task id=2b638735f84145d09710dd43d19e7488
2023-11-16 13:02:29,715 - clearml.Task - INFO - Storing jupyter notebook directly as code
ClearML results page: https://app.clear.ml/projects/3d1e60ba8ae5463ead62d15e76117409/experiments/2b638735f84145d09710dd43d19e7488/output/log


In [None]:
# 4. Experiment with different learning rates, plot the learning curves for different
# learning rates for both ADAM and SGD with momentum.
import matplotlib.pyplot as plt
grid = {'lr': [1e-4]}#, 1e-3, 1e-2, 1e-1, 1]}
for lr in grid['lr']:
    losses = []
    model = Net().to(device)
    optimizer = optim.Adam(model.parameters(), lr=lr)
    for epoch in range(1, epochs + 1):
        losses.append(train(model, device, train_loader, optimizer, epoch, log_interval))
        test(model, device, test_loader)
    print(losses)
    plt.plot(losses, label=f'lr={lr}')
    plt.show()
    
        
for lr in grid['lr']:
    losses = []
    model_with_momentum = Net().to(device)
    optimizer = optim.SGD(model_with_momentum.parameters(), lr=lr, momentum=momentum)
    for epoch in range(1, epochs + 1):
        losses.append(train(model_with_momentum, device, train_loader, optimizer, epoch, log_interval))
        test(model_with_momentum, device, test_loader)
    print(losses)
    plt.plot(losses, label=f'lr={lr}')
    plt.show()




In [None]:
"""5. Parameterize the constructor by a list of sizes of hidden layers of the MLP.
Note that this requires creating a list of layers as an atribute of the Net class,
and one can't use a standard python list containing nn.Modules (why?).
Check torch.nn.ModuleList."""
