<a href="https://colab.research.google.com/github/lagom-QB/M11/blob/master/Practice_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Keywords: modules, optimizers, dense layer


# High level concepts

## Modules

Modules helps organizing and composing functions and inputs (weihgts) together.

In [0]:
from torch import nn
from torch.nn import init
from torch.nn.modules import loss
import torch

Some examples:

In [0]:
linear = nn.Linear(10, 10)
linear

In [0]:
linear(torch.tensor([1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,0.0]))

In [0]:
relu = nn.ReLU()
relu


In [0]:
x = torch.tensor([-1.0])
relu(x)

In [0]:
tanh = nn.Tanh()
tanh

In [0]:
dropout = nn.Dropout(0.5)
dropout

In [0]:
sequential = nn.Sequential(nn.Linear(10, 100), nn.Tanh(), nn.Linear(100,100), nn.Tanh(), nn.Linear(100,10))
sequential

In [0]:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.lin1 = nn.Linear(10,100)
        self.act1 = nn.Tanh()
        self.lin2 = nn.Linear(100,100)
        self.lin3 = nn.Linear(100,100)
        self.lin4 = nn.Linear(100,10)
        
    def forward(self, x):
        x = self.lin1(x)
        x = self.act1(x)
        x = self.lin2(x)
        x = self.act1(x)
        x = self.lin3(x)
        x = self.act1(x)
        x = self.lin4(x)
        return x
net = Net()
net


In [0]:
cross_entropy = loss.CrossEntropyLoss()
cross_entropy


In [0]:
from torch.nn import Module

In [0]:
from torch.nn import Parameter

In [0]:
class Power(Module):

    __constants__ = ['power']

    def __init__(self, exponent=3):
        super().__init__()
        self.exponent = exponent

    def forward(self, input):
        return torch.pow(input, self.exponent)

    def extra_repr(self):
        return f'exponent={self.exponent}'

In [0]:
class WPower(Module):    
    def __init__(self, ):
        super().__init__()
        self.exponent = Parameter(torch.Tensor(1))
        self.reset_parameters()

    def reset_parameters(self):
        init.uniform_(self.exponent, a=math.sqrt(5))

    def forward(self, input):
        return torch.pow(input, self.exponent)


## Parameters

Some models are not just functions, but they also have internal parameters (weights/graph inputs).

In [0]:
list(linear.parameters())


In [0]:
linear.weight


In [0]:
linear.bias


In [0]:
list(tanh.parameters())


In [0]:
list(dropout.parameters())


In [0]:
dropout.p 

In [0]:
list(cross_entropy.parameters())


In [0]:
list(map(lambda x: x.shape, list(sequential.parameters())))


In [0]:
list(map(lambda x: x.shape, list(net.parameters())))


In [0]:
list(map(lambda x: x.requires_grad, list(net.parameters())))


## Eval

Each module can be in either `eval` or `train` state.

In [0]:
dropout.train()


In [0]:
dropout(torch.ones(10))


In [0]:
dropout.eval()


In [0]:
newseq = nn.Sequential(nn.Dropout(), nn.Dropout())
newseq(torch.ones(10))


In [0]:
newseq.eval()
newseq(torch.ones(10))

**Important**! Train / eval mode has nothing to do with weight training. It just changes behaviour of some modules (i.e. `dropout`, `batchnorm`). For composite modules `.eval()`/`.train()` sets corresponding mode for each of its components.

## Initialization

Most of module have default way of parameter initialization, but sometimes we might want to init them explicitly.

In [0]:
linear.weight

In [0]:
init.xavier_uniform_(linear.weight)


In [0]:
init.constant_(linear.weight, 1.0)


In [0]:
list(linear.parameters())


In [0]:
for param in linear.parameters():
    init.uniform_(param, -12, 12)
list(linear.parameters())

You can find more initialization functions here: https://pytorch.org/docs/master/nn.html#torch-nn-init.

## Optimizers

Torch has a reach collection of optimizers built-in.

In [0]:
import torch.optim as optim

In [0]:
x = torch.tensor([1.0], requires_grad = True)

In [0]:
sgd = optim.SGD([x], lr=0.1)

In [0]:
y = x * 2


In [0]:
y.backward()

In [0]:
x.grad

In [0]:
sgd.step()

In [0]:
x

In [0]:
x.grad


In [0]:
sgd.zero_grad()


In [0]:
x.grad


# First Training Loop

In [0]:
from torchvision import datasets, transforms

Let's downlad MNIST --- dataset of handwritten digits.

In [0]:
train_dataset = datasets.MNIST('/data', train=True, download=True,
                                transform=transforms.Compose([
                                    transforms.ToTensor(),
                                    transforms.Normalize((0.1307,), (0.3081,))
                                ]))


In [0]:
test_dataset = datasets.MNIST('../data', train=False, download=True,
                                transform=transforms.Compose([
                                    transforms.ToTensor(),
                                    transforms.Normalize((0.1307,), (0.3081,))
                                ]))

Dataloaders are responsible for data loading. They help us to split dataset in batches and shuffles the dataset(otherwise each buch will have only variants of a single digit). We will look inside them later.

In [0]:
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=64, shuffle=True)

In [0]:
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
toPIL = transforms.ToPILImage()

In [0]:
def example(i):
    print(train_dataset[i][1])
    return toPIL(train_dataset[i][0]).resize((256, 256))

In [0]:
example(10)


In [0]:
train_loader.__iter__().__next__()[1]


In [0]:
train_loader.__iter__().__next__()[0].shape


In [0]:
toPIL(train_loader.__iter__().__next__()[0][0]).resize((256,256))

Let's write a simple helper module.

In [0]:
class Flatten(torch.nn.Module):
    def forward(self, x):
        batch_size = x.shape[0]
        return x.view(batch_size, -1)


In [0]:
model = nn.Sequential(Flatten(), 
                      nn.Linear(784, 512), 
                      nn.Tanh(),
                      nn.Linear(512, 64), 
                      nn.Tanh(),
                      nn.Linear(64, 10))
for param in model.parameters():
    init.uniform_(param, -0.1, 0.1)

Why do we need `Flatten` module?

Setup an optimizer:


In [0]:
optimizer = optim.SGD(model.parameters(), lr=0.1)

Choose a loss function:

In [0]:
loss_function = loss.CrossEntropyLoss()


And start training:

In [0]:
def train(model, train_loader, optimizer, loss_function, epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        optimizer.zero_grad()
        output = model(data)
        loss = loss_function(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % 200 == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))

In [0]:
def test(model, test_loader, loss_function):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            output = model(data)
            test_loss += loss_function(output, target).sum().item()
            pred = output.argmax(dim=1, keepdim=True)
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))

In [0]:
 %%time
 for epoch in range(1, 100):
        train(model, train_loader, optimizer, loss_function, epoch)
        test(model, test_loader, loss_function)

# Assignment

## Due to 10AM, 20.05.2020

## 1. MNIST playground [10]

**Important!** This task is not too hard, but it is pretty time-consuming. Total computation time is about 4 hours.

1. Find out how many epochs are needed for our network to stop improving on test dataset (let's stop on 5 epochs without accuracy improvement on the test set). How long does it take? [1]
2. Find some problematic examples and show them with `example()` function we defined in class.[1]
3. Draw a confusion matrix for your model on test dataset. It is a 10x10 matrix, and in the cell `(i,j)` there is a number of digits `i` classified as digit `j`.[1]
4. By default weight of linear layer is initialized with `kaiming_uniform` function and bias is unitialized with `uniform` function (see reset parameters method of Linear class https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/linear.py). Initialize all weights as `uniform(-0.1,0.1)` and test. How does this modification affect training process? Is it faster/slower? Is the end result better/worse? Same question form `uniform(-1, 1)`. Same question for `constant(0)` initialization. Don't forget to recreate optimizer for your new model (otherwise you'll optimize parameters of the old model using values from the new one, which does not work).[1]
5. Try replacing `Tanh` activation by `Sigmoid` test, how does this modification affect training process? These and further questions assumes that you are changing the initial model (i.e. all modification from previous step are undone). [1]
6. Try changing output dimension of the first linear layer  (and input of the second) to `256`, to `1024`. How does this modification affect training process? How does the number of model parameters changes? [1]
7. Our model has 2 hidden layers of sizes `512` and `64`. Let's use 3 hidden layers of sizes `512`, `256` and `64`.  How does this modification affect training process? How does the number of model parameters changes? Same question for 3 layers of sizes `512`, `5` and `64`(don't forget to add activation function between linear layers). [1]
8. Try adding dropout after first/second layer. How does this modification affect training process? [1]
9. Try disabling shuffle in the train dataloader (leave it unchanged in the test dataloader, otherwise testing will not be fair). How does this modification affect training process? Do not forget to reset training weights of the model. [1]
10. Try training, using half of the training dataset. 30%. 10%. How does this affect training process? Do not forget to reset training weights of the model. [1] 

