<a href="https://colab.research.google.com/github/mataney/APES/blob/master/notebooks/3_Building_your_own_net_With_Answers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Train your own network

## Load CIFAR10

In [0]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision

In [0]:
import torchvision.transforms as transforms
transform = transforms.Compose([transforms.ToTensor()])

In [3]:
batch_size = 32

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')


Files already downloaded and verified
Files already downloaded and verified


### Train and Evaluation loop

In [0]:
def train(model, num_epochs, trainloader, optimizer, criterion, device):
  model.train()
  for epoch in range(num_epochs):
      running_loss = 0.0
      for i, data in enumerate(trainloader, 0):
          # get the inputs
          inputs, labels = data
          inputs, labels = inputs.to(device), labels.to(device)

          # zero the parameter gradients
          optimizer.zero_grad()

          # forward + backward + optimize
          outputs = model(inputs)
          loss = criterion(outputs, labels)
          loss.backward()
          optimizer.step()

          # print statistics
          running_loss += loss.item()
          if i % 200 == 199:    # print every 200 mini-batches
              print('[%d, %5d] loss: %.3f' %
                    (epoch + 1, i + 1, running_loss / 200))
              running_loss = 0.0

  print('Finished Training')

In [0]:
def evaluate(model, dataloader, device):
  correct = 0
  total = 0
  model.eval()
  with torch.no_grad():
      for data in dataloader:
          inputs, labels = data
          inputs, labels = inputs.to(device), labels.to(device)
          outputs = model(inputs)
          _, predicted = torch.max(outputs.data, 1)
          total += labels.size(0)
          correct += (predicted == labels).sum().item()

  print('Accuracy of the network on the 10000 test images: %d %%' % (
      100 * correct / total))

## More Transformations!

### Description

Before we update our model, lets try to give it better inputs.  
Let's add 
 - Data Augmentation (more -> better)
 - Normalizing the input images (Theoreticall, your network will train better when the inputs are normalized, related to the way the weights are initialized).
How to do this:  

We define a `transform` instance and read our data using it.  
Reread the **train data** with your own `transform` instance.

- Horizontally flip the given PIL Image randomly with a given probability, use default probability. 
 - **Hint:** Look at [`RandomHorizontalFlip`](https://pytorch.org/docs/stable/torchvision/transforms.html#torchvision.transforms.RandomHorizontalFlip).
-  Normalize the inputs such that the mean and standard deviation are 0, 1 respectively for each channel. 
  - Normalize using standard normalization:  
    $x' = \frac{x-\mu}{\sigma}$, or  
    `input[channel] = (input[channel]-mean[channel]) / std[channel]`  
  - **Hint:** use [Normalize](https://pytorch.org/docs/stable/torchvision/transforms.html#torchvision.transforms.RandomHorizontalFlip)  to preform such transformation.  
    We need 2 vectors of size `channels`, to represent the mean and std of each channel, find these.
- Don't forget to use the same `ToTensor` transformation we used.

Reread the **test data** using the same normalization as the train data, but don't augment the data. (Again, don't forget to use `ToTensor`)


Yes, you can do this in python with loops etc, but try to do this with native Torch native methods, `mean()`, `std()` etc'.  
**Hint:** You will probably want to stack all the images to one tensor. use `torch.stack([t[0] for t in trainset])` then you will have a `[50000, 3, 32, 32]` size tensor with all the images.

### Your Implemention

### View implementation

Let's have a look on the train data set

We need to check the mean and std of the data, for this we need to have an iterable object of the data 

In [0]:
tensored_trainset = torch.stack([t[0] for t in trainset])

In [7]:
print(tensored_trainset.size())

torch.Size([50000, 3, 32, 32])


A small note about calculating the mean and std.  
what is the right why of calculating the std for example?  
Should we assume nothing and find the std for each image, then normalize w.r.t each image's std? - this will result with a distribution with 0, 1 mean, std respectively.
Should we assume all the images are coming from the same distribution and by that we can just find calculate the std of all the images at once? - this as well will result with a distribution with 0, 1 mean, std respectively.

For now, assume the latter.  

So, let's find mean and std for each channel

In [8]:
print(tensored_trainset.mean([0, 2, 3]))
print(tensored_trainset.std([0, 2, 3]))

tensor([0.4914, 0.4822, 0.4465])
tensor([0.2470, 0.2435, 0.2616])


Define transformations with the mean and std we found.

In [0]:
import torchvision.transforms as transforms

transform_train = transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2470, 0.2435, 0.2616)),
])

transform_test = transforms.Compose([
     transforms.ToTensor(),
     transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2470, 0.2435, 0.2616))])


In [10]:
batch_size = 32

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform_train)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform_test)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

Files already downloaded and verified
Files already downloaded and verified


Let's sanity check that it is normalized:

In [11]:
tensored_trainset = torch.stack([t[0] for t in trainset])
print(tensored_trainset.mean([0, 2, 3]))
print(tensored_trainset.std([0, 2, 3]))

tensor([-1.2867e-06, -1.7074e-04,  1.1819e-04])
tensor([1.0001, 0.9999, 1.0000])


# CNN

Define the following CNN:
 - It should have a 3 Convolution layers.
 - It should have a Deep FC layer.
 
 
 Let's start with the latter part:

## Deep Fully Connected


While it will be the latter layer of our network, let's start with the Deep FC network.  
We  should define such network in an independent way from CNN,  
so when we want to reuse it, we can.

Define the following network (x is input)

$x \rightarrow dropout \rightarrow linearLayer_1 \rightarrow relu \rightarrow linearLayer_2 \rightarrow relu \rightarrow dropout \rightarrow linearLayer_3$

Make the The input/hidden/output sizes and the Dropout probability decided by the constructor arugments.

### Your implementation

In [0]:
class FC(nn.Module):
  pass

### View implementation

In [0]:
class FC(nn.Module):
  def __init__(self, in_size1, in_size2, in_size3, out_size, drop_prob):
    super(FC, self).__init__()
    self.linear1 = nn.Linear(in_size1, in_size2)
    self.linear2 = nn.Linear(in_size2, in_size3)
    self.out_linear = nn.Linear(in_size3, out_size)
    
    self.drop = nn.Dropout(drop_prob)
      
  def forward(self, x):
    x = self.drop(x)
    x = self.linear1(x)
    x = F.relu(x)
    x = self.linear2(x)
    x = F.relu(x)
    x = self.drop(x)
    x = self.out_linear(x)
    
    return x

## Deep Convolution Layer

We want to define a single Deep Convolution network

Define the following network (x is input)

$x \rightarrow ConvLayer_1 \rightarrow batch~normalization \rightarrow relu \rightarrow ConvLayer_2 \rightarrow relu \rightarrow pooling \rightarrow dropout $

Each ConvLayer is defined with a few arguments: 
- number of input channels
- number of output channels
- kernel size
- padding

(There are other arguments, like stride and dilation, we won't use these here)

Set these accordingly:  
- Set $Convolution_1$ to be `(c_in, c_hidden, 3, 1)` respectively.  
- Set $Convolution_2$ to be `(c_hidden, c_out, 3, 1)` respectively.  
(Hint: Check out `nn.Conv2d`)

- Set Batch Normalization to be the same size as $Convolution_1$ output (`c_out`).  
(Hint: Check out `nn.BatchNorm2d`)

- Set pooling of `kernel_size=2` and `stride=2`.  
(Hint: Check out `nn.MaxPool2d`)

- Set Dropout with dropout probability of `.0`.  
(Hint: Check out `nn.Dropout2d`)

### Your Implementation

In [0]:
class DeepConvLayer(nn.Module):
  pass

### View Implementation

In [0]:
class DeepConvLayer(nn.Module):
  def __init__(self, in_channels, conv1_out_channels, conv2_out_channels, drop_prob=.0):
    super(DeepConvLayer, self).__init__()
    self.conv1 = nn.Conv2d(in_channels=in_channels,
                           out_channels=conv1_out_channels,
                           kernel_size=3, padding=1)
    self.batch_norm = nn.BatchNorm2d(conv1_out_channels) 
    self.conv2 = nn.Conv2d(in_channels=conv1_out_channels,
                           out_channels=conv2_out_channels,
                           kernel_size=3, padding=1)
    self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
    self.drop = nn.Dropout2d(p=drop_prob)
  
  def forward(self, x):
    x = self.conv1(x)
    x = self.batch_norm(x)
    x = F.relu(x)
    x = self.conv2(x)
    x = F.relu(x)
    x = self.pool(x)
    x = self.drop(x)
    
    return x

## Deep CNN

After we have these 2 new network, let's use them to create our end2end CNN model.

Define the following network (x is input)

$x \rightarrow DeepConvLayer_1 \rightarrow DeepConvLayer_2 \rightarrow DeepConvLayer_3 \rightarrow FC$

You can also stack the three DeepConvLayers to a single layer using the `nn.Sequential` function. Give this a go.

For our ConvLayers set `(c_in, c_hidden, c_out, dropout_p=.0)` to be:  
- $DeepConvLayer_1:$ `(3, 32, 64)` respectively.
- $DeepConvLayer_2:$ `(64, 128, 128, 0.05)` respectively.
- $DeepConvLayer_3:$ `(128, 256, 256)` respectively.

For the FC, set  
- The input/hidden/output sizes are 4096, 1024, 512, 10.  
- Dropout probability is 0.1



**Something you might require to think about:**  
You should always keep track of the sizes the input is receiving and returning.  
For example, what is the output of size the third convolution?  
what is the input size of the FC?  
This is not an easy one, you can just run and see why it collapses, as we did before :)  
Do these match?

### Your Implementation

In [0]:
class CNN(nn.Module):
  pass

### View Implementation

In [0]:
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv = nn.Sequential(DeepConvLayer(3, 32, 64), 
                                  DeepConvLayer(64, 128, 128, 0.05),
                                  DeepConvLayer(128, 256, 256))
        self.fc_layer = FC(4096, 1024, 512, 10, 0.1)


    def forward(self, x):
        x = self.conv(x)
        x = x.view(x.size(0), -1)
        x = self.fc_layer(x)

        return x

## Train Model

Initialize and set to run on CUDA

In [0]:
model = CNN()

In [19]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model.to(device)

CNN(
  (conv): Sequential(
    (0): DeepConvLayer(
      (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (batch_norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      (drop): Dropout2d(p=0.0)
    )
    (1): DeepConvLayer(
      (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (batch_norm): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      (drop): Dropout2d(p=0.05)
    )
    (2): DeepConvLayer(
      (conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (batch_norm): BatchNorm2d(256, e

In [0]:
import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-08)

In [21]:
num_epochs = 10
train(model, num_epochs, trainloader, optimizer, criterion, device)

[1,   200] loss: 1.883
[1,   400] loss: 1.596
[1,   600] loss: 1.453
[1,   800] loss: 1.374
[1,  1000] loss: 1.271
[1,  1200] loss: 1.202
[1,  1400] loss: 1.134
[2,   200] loss: 1.054
[2,   400] loss: 1.014
[2,   600] loss: 0.969
[2,   800] loss: 0.965
[2,  1000] loss: 0.901
[2,  1200] loss: 0.901
[2,  1400] loss: 0.869
[3,   200] loss: 0.818
[3,   400] loss: 0.798
[3,   600] loss: 0.776
[3,   800] loss: 0.780
[3,  1000] loss: 0.791
[3,  1200] loss: 0.758
[3,  1400] loss: 0.740
[4,   200] loss: 0.706
[4,   400] loss: 0.692
[4,   600] loss: 0.678
[4,   800] loss: 0.673
[4,  1000] loss: 0.684
[4,  1200] loss: 0.673
[4,  1400] loss: 0.672
[5,   200] loss: 0.618
[5,   400] loss: 0.617
[5,   600] loss: 0.612
[5,   800] loss: 0.621
[5,  1000] loss: 0.621
[5,  1200] loss: 0.614
[5,  1400] loss: 0.597
[6,   200] loss: 0.567
[6,   400] loss: 0.543
[6,   600] loss: 0.559
[6,   800] loss: 0.557
[6,  1000] loss: 0.563
[6,  1200] loss: 0.558
[6,  1400] loss: 0.549
[7,   200] loss: 0.490
[7,   400] 

## Saving the Model

We are using Notebooks, so it is not a problem to use the trained model the next cell.  
But what if this is not the case and we want to save and load our model.

What should we do?

one option is:
```
torch.save(model, PATH)
...
model = torch.load(PATH)
model.eval()
```

This save/load process uses the most intuitive syntax and involves the least amount of code. Saving a model in this way will save the entire module using Python’s pickle module. The disadvantage of this approach is that the serialized data is bound to the specific classes and the exact directory structure used when the model is saved. The reason for this is because pickle does not save the model class itself. Rather, it saves a path to the file containing the class, which is used during load time. Because of this, your code can break in various ways when used in other projects or after refactors.

A more recommended way is:

```
torch.save(model.state_dict(), PATH)
...
model = TheModelClass(*args, **kwargs)
model.load_state_dict(torch.load(PATH))
```

What about other components in our code?
You should:
```
torch.save({
            'epoch': epoch,
            'model_state_dict': model.state_dict(),
            'optimizer_state_dict': optimizer.state_dict(),
            'loss': loss,
            ...
            }, PATH)
...
model = TheModelClass(*args, **kwargs)
optimizer = TheOptimizerClass(*args, **kwargs)

checkpoint = torch.load(PATH)
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
loss = checkpoint['loss']
```

In [0]:
PATH = './model'
torch.save(model.state_dict(), PATH)

### Test the network on the test data

In [23]:
model = CNN()
print(model.conv[0].conv1.weight[0, 0 , 0, 0])
model.load_state_dict(torch.load(PATH))
print(model.conv[0].conv1.weight[0, 0 , 0, 0])
model.to(device)

tensor(-0.0964, grad_fn=<SelectBackward>)
tensor(-0.0273, grad_fn=<SelectBackward>)


CNN(
  (conv): Sequential(
    (0): DeepConvLayer(
      (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (batch_norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      (drop): Dropout2d(p=0.0)
    )
    (1): DeepConvLayer(
      (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (batch_norm): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      (drop): Dropout2d(p=0.05)
    )
    (2): DeepConvLayer(
      (conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (batch_norm): BatchNorm2d(256, e

In [24]:
evaluate(model, testloader, device)

Accuracy of the network on the 10000 test images: 83 %


Source: https://zhenye-na.github.io/2018/09/28/pytorch-cnn-cifar10.html  
https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#sphx-glr-beginner-blitz-neural-networks-tutorial-py  
https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#sphx-glr-beginner-blitz-cifar10-tutorial-py