# Lab. 2 Multi Layered Networks

### Ładowanie danych

PyTroch, a właściwie pakiet `torchvision` udostępnia parę przydatnych rzeczy, z których skorzystamy na dzisiejszych zajęciach.

Zacznijmy od ściąganie i ładowania danych, w [`torchvision.datasets`](https://pytorch.org/docs/stable/torchvision/datasets.html) znajdziemy popularne datasety, zajmiemy się dzisiaj MNISTem.

In [1]:
import torch
from torchvision.datasets import MNIST
from torchvision.transforms import Compose, ToTensor, Normalize, Lambda

train_dataset = MNIST(root='.', download=True, train=True, transform=ToTensor())

train_mean = (train_dataset.data.type(torch.float32) / 255).mean()
train_std = (train_dataset.data.type(torch.float32) / 255).std()

print('Training data mean: {}'.format(train_mean))
print('Training data std: {}'.format(train_std))

Training data mean: 0.13054749369621277
Training data std: 0.30810782313346863


In [2]:
transform = Compose([
    ToTensor(),
    Normalize(mean=[train_mean] ,std=[train_std]),
    Lambda(lambda x: x.flatten())
])

train_data = MNIST(root='.', download=True, train=True, transform=transform)
test_data = MNIST(root='.', download=True, train=False, transform=transform)

train_data

Dataset MNIST
    Number of datapoints: 60000
    Split: train
    Root Location: .
    Transforms (if any): Compose(
                             ToTensor()
                             Normalize(mean=[tensor(0.1305)], std=[tensor(0.3081)])
                             Lambda()
                         )
    Target Transforms (if any): None

Oprócz tego z samego `torcha` możemy skorzystać z [`DataLoadera`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), który załatwia za nas sporo przydatnych rzeczy typu shufflowanie i batchowanie danych.

In [3]:
from torch.utils.data import DataLoader

train_loader = torch.utils.data.DataLoader(train_data, batch_size=10)

for x, y in train_loader:
    print(x.shape)
    print(x.dtype)
    print(y)
    break

torch.Size([10, 784])
torch.float32
tensor([5, 0, 4, 1, 9, 2, 1, 3, 1, 4])


Wygląda na to, że aż tak bardzo za darmo wszystkiego nie dostaniemy, klasa `MNIST` zwraca nam dane w postaci obiektów [PILa](https://pillow.readthedocs.io/en/stable/). Musimy coś z tym zrobić.

## Zadanie 1.
1. Za pomocą [`transformerów`](https://pytorch.org/docs/stable/torchvision/transforms.html) przerobić powyższy kod tak aby zadziałał.  
**HINT**: sprawdzić jakie argumenty przyjmuje klasa `MNIST`.
2. Policzyć średnią i odchylenie standardowe wartości pojedynczego piksela dla całego zbioru trenującego i użyć ich do znormalizowania danych trenujących.  
**HINT**: Tutaj torchvision też powinien nam to ułatwić.
3. Zmienić "kształt" jednego przykładu z `28x28` na `784`.  
**HINT**: [`Lambda`](https://pytorch.org/docs/stable/torchvision/transforms.html#torchvision.transforms.Lambda)

Uwaga: zwrócić uwagę co dokładnie robią używane _transformery_!

## Zadanie 2.

Ręcznie zaimplementować prostą sieć z jedną warstwą ukrtyą. Sieć ma mieć:
1. Jedną warstwę ukrytą rozmiaru 500 z wagami zainicjalizowanymi ze standardowego rozkładu normalnego.
2. Warstwa przy obu operacjach ma mieć uczone _biasy_ zainicjalizowane na 0.

**HINT**: Do rozkładu normalnego najlepiej użyć [`torch.randn`](https://pytorch.org/docs/stable/torch.html#torch.randn). Sprawdzić jakie ważne argumenty ta funkcja przyjmuje!

Należy oprócz tego zaimplementować pętlę uczenia z użyciem PyTorchowej funkcji kosztu _cross entropy_ i optymalizatora SGD.

In [4]:
from typing import List

class CustomNetwork(object):
    """
    Simple 1-hidden layer linear neural network
    """
    def __init__(self, input_size, n_classes, hidden_layer_size):
        """
        Initialize network's weights 
        """
        self.weight_1: torch.Tensor = torch.randn(size=(input_size, hidden_layer_size), requires_grad=True)
        self.bias_1: torch.Tensor = torch.zeros(size=(hidden_layer_size,), requires_grad=True)        
        self.weight_2: torch.Tensor = torch.randn(size=(hidden_layer_size, n_classes), requires_grad=True)
        self.bias_2: torch.Tensor = torch.zeros(size=(n_classes,), requires_grad=True)

        
    def __call__(self, x: torch.Tensor) -> torch.Tensor:
        """
        Forward pass through the network
        """
        out1 = x.mm(self.weight_1) + self.bias_1
        out2 = out1.mm(self.weight_2) + self.bias_2
        return out2
    
    
    def parameters(self) -> List[torch.Tensor]:
        """
        Returns all trainable parameters 
        """
        return [self.weight_1, self.bias_1, self.weight_2, self.bias_2]

In [5]:
from torch.optim import SGD
from torch.nn.functional import cross_entropy

# some hyperparams
batch_size: int = 64
epoch: int = 3
lr: float = 0.01
momentum: float = 0.9
input_size: int = 784
n_classes : int = 10
hidden_layer_size : int = 500

# prepare data loaders, based on the already loaded datasets
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size)

# initialize the model
model: CustomNetwork = CustomNetwork(input_size=input_size,
                                     n_classes=n_classes,
                                     hidden_layer_size=hidden_layer_size)

# initialize the optimizer
optimizer: torch.optim.Optimizer = SGD(model.parameters(), lr=lr, momentum=momentum)

# training loop
for e in range(epoch):
    for i, (x, y) in enumerate(train_loader):
        # reset the gradients from previous iteration
        optimizer.zero_grad()
        # pass through the network
        output: torch.Tensor = model(x)
        # calculate loss
        loss: torch.Tensor = cross_entropy(output, y)
        # backward pass through the network
        loss.backward() 
        # apply the gradients
        optimizer.step()
        # log the loss value
        if (i + 1) % 100 == 0:
            print(f"Epoch {e} iter {i+1}/{len(train_data) // batch_size} loss: {loss.item()}", end="\r")
            
    # at the end of an epoch run evaluation on the test set
    with torch.no_grad():
        # initialize the number of correct predictions
        correct: int = 0 
        for i, (x, y) in enumerate(test_loader):
            # pass through the network
            output: torch.Tensor = model(x)
            # update the number of correctly predicted examples
            correct += float(sum(output.argmax(dim=1) == y))

        print(f"\nTest accuracy: {correct / len(test_data)}")

        
# this is your test
assert correct / len(test_data) > 0.8, "Subject to random seed you should be able to get >80% accuracy"

Epoch 0 iter 900/937 loss: 42.645645141601566
Test accuracy: 0.8298
Epoch 1 iter 900/937 loss: 35.844032287597656
Test accuracy: 0.8401
Epoch 2 iter 900/937 loss: 27.460433959960938
Test accuracy: 0.8528


## Zadanie 3.

1. Przepisać całą sieć do PyTorcha używając [`torch.nn.Module`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module), [`torch.nn.Linear`](https://pytorch.org/docs/stable/nn.html#torch.nn.Linear).
2. Dodać [nieliniowe aktywacje](https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity) i dodatkową warstwę, tak aby wyciągnąć przynajmniej 95% testowego accuracy w 3 epoki.

In [6]:
from torch.nn import LeakyReLU, ELU

class TorchNetwork(torch.nn.Module):
    """
    Simple 2-hidden layer non-linear neural network
    """
    def __init__(self, input_size, n_classes, hidden_layer_1_size, hidden_layer_2_size):
        super(TorchNetwork, self).__init__()
        self.linear1 = torch.nn.Linear(input_size, hidden_layer_1_size)
        self.linear2 = torch.nn.Linear(hidden_layer_1_size, hidden_layer_2_size)
        self.linear3 = torch.nn.Linear(hidden_layer_2_size, n_classes)
        self.activation1 = LeakyReLU(0.1)
        self.activation2 = ELU()
    
    
    def forward(self, x):
        """
        Forward pass through the network
        """
        hidden_1_out = self.linear1(x)
        hidden_2_out = self.activation1(self.linear2(hidden_1_out))
        out = self.activation2(self.linear3(hidden_2_out))
        return out

In [7]:
from torch.optim import SGD
from torch.nn.functional import cross_entropy

# some hyperparams
batch_size: int = 64
epoch: int = 3
lr: float = 0.01
momentum: float = 0.9
input_size: int = 784
n_classes : int = 10
hidden_layer_1_size : int = 500
hidden_layer_2_size : int = 300

# prepare data loaders, based on the already loaded datasets
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size)

# initialize the model
model: TorchNetwork = TorchNetwork(input_size=input_size, 
                                   n_classes=n_classes,
                                   hidden_layer_1_size=hidden_layer_1_size, 
                                   hidden_layer_2_size=hidden_layer_2_size)

# initialize the optimizer
optimizer: torch.optim.Optimizer = SGD(model.parameters(), lr=lr, momentum=momentum)

# training loop
for e in range(epoch):
    for i, (x, y) in enumerate(train_loader):
        
        optimizer.zero_grad()
        output: torch.Tensor = model(x)
        loss: torch.Tensor = cross_entropy(output, y)
        loss.backward()
        optimizer.step()
        if (i + 1) % 100 == 0:
            print(f"Epoch {e} iter {i+1}/{len(train_data) // batch_size} loss: {loss.item()}", end="\r")
            
    # at the end of an epoch run evaluation on the test set
    with torch.no_grad():
        correct: int = 0
        for i, (x, y) in enumerate(test_loader):
            output: torch.Tensor = model(x)
            correct += float(sum(output.argmax(dim=1) == y))

        print(f"\nTest accuracy: {correct / len(test_data)}")
            
            
# this is your test       
assert correct / len(test_data) > 0.95, "Subject to random seed you should be able to get >95% accuracy"

Epoch 0 iter 900/937 loss: 0.06910865753889084
Test accuracy: 0.9377
Epoch 1 iter 900/937 loss: 0.03264394402503967
Test accuracy: 0.9548
Epoch 2 iter 900/937 loss: 0.022021561861038208
Test accuracy: 0.9596
