<a href="https://colab.research.google.com/github/mataney/PyTorchCourse/blob/master/notebooks/3_Building_your_own_net_Skeleton.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Train your own network

This is the skeleton for writing your own upgraded classifier.  
The next few cells are copy-pastes from the previous notebook.  
Run them, then read a long for description of the tasks.  

## Load CIFAR10

In [0]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision

In [0]:
import torchvision.transforms as transforms
transform = transforms.Compose([transforms.ToTensor()])

In [0]:
batch_size = 32

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')


Files already downloaded and verified
Files already downloaded and verified


### Train and Evaluation loop

In [0]:
def train(model, num_epochs, trainloader, optimizer, criterion, device):
  model.train()
  for epoch in range(num_epochs):
      running_loss = 0.0
      for i, data in enumerate(trainloader, 0):
          # get the inputs
          inputs, labels = data
          inputs, labels = inputs.to(device), labels.to(device)

          # zero the parameter gradients
          optimizer.zero_grad()

          # forward + backward + optimize
          outputs = model(inputs)
          loss = criterion(outputs, labels)
          loss.backward()
          optimizer.step()

          # print statistics
          running_loss += loss.item()
          if i % 200 == 199:    # print every 200 mini-batches
              print('[%d, %5d] loss: %.3f' %
                    (epoch + 1, i + 1, running_loss / 200))
              running_loss = 0.0

  print('Finished Training')

In [0]:
def evaluate(model, dataloader, device):
  correct = 0
  total = 0
  model.eval()
  with torch.no_grad():
      for data in dataloader:
          inputs, labels = data
          inputs, labels = inputs.to(device), labels.to(device)
          outputs = model(inputs)
          _, predicted = torch.max(outputs.data, 1)
          total += labels.size(0)
          correct += (predicted == labels).sum().item()

  print('Accuracy of the network on the 10000 test images: %d %%' % (
      100 * correct / total))

## More Transformations!

### Description

Before we update our model, lets try to give it better inputs.  
Let's add 
 - Data Augmentation (more -> better)
 - Normalizing the input images (Theoreticall, your network will train better when the inputs are normalized, related to the way the weights are initialized).
How to do this:  

We define a `transform` instance and read our data using it.  
Reread the **train data** with your own `transform` instance.

- Horizontally flip the given PIL Image randomly with a given probability, use default probability. 
 - **Hint:** Look at [`RandomHorizontalFlip`](https://pytorch.org/docs/stable/torchvision/transforms.html#torchvision.transforms.RandomHorizontalFlip).
-  Normalize the inputs such that the mean and standard deviation are 0, 1 respectively for each channel. 
  - Normalize using standard normalization:  
    $x' = \frac{x-\mu}{\sigma}$, or  
    `input[channel] = (input[channel]-mean[channel]) / std[channel]`  
  - **Hint:** use [Normalize](https://pytorch.org/docs/stable/torchvision/transforms.html#torchvision.transforms.RandomHorizontalFlip)  to preform such transformation.  
    We need 2 vectors of size `channels`, to represent the mean and std of each channel, find these.
- Don't forget to use the same `ToTensor` transformation we used.

Reread the **test data** using the same normalization as the train data, but don't augment the data. (Again, don't forget to use `ToTensor`)


Yes, you can do this in python with loops etc, but try to do this with native Torch native methods, `mean()`, `std()` etc'.  
**Hint:** You will probably want to stack all the images to one tensor. use `torch.stack([t[0] for t in trainset])` then you will have a `[50000, 3, 32, 32]` size tensor with all the images.

### Your Implemention

# CNN

Define the following CNN:
 - It should have a 3 Convolution layers.
 - It should have a Deep FC layer.

## Deep Fully Connected


While it will be the later layer of our network, let's start with the Deep FC network.  
We  should define such network in an independent way from CNN,  
so when we want to reuse it, we can.

Define the following network (x is input)

$x \rightarrow dropout \rightarrow linearLayer_1 \rightarrow relu \rightarrow linearLayer_2 \rightarrow relu \rightarrow dropout \rightarrow linearLayer_3$

Make the The input/hidden/output sizes and the Dropout probability decided by the constructor arugments.

### Your implementation

In [1]:
class FC(nn.Module):
  pass

NameError: ignored

## Deep Convolution Layer

We want to define a single Deep Convolution network (Then, we will initialize it 3 times)

Define the following network (x is input)

$x \rightarrow ConvLayer_1 \rightarrow batch~normalization \rightarrow relu \rightarrow ConvLayer_2 \rightarrow relu \rightarrow pooling \rightarrow dropout $

Each ConvLayer is defined with a few arguments: 
- number of input channels
- number of output channels
- kernel size
- padding

(There are other arguments, like stride and dilation, we don't use these here)

Set these accordingly:  
- Set $Convolution_1$ to be `(c_in, c_hidden, 3, 1)` respectively.  
- Set $Convolution_2$ to be `(c_hidden, c_out, 3, 1)` respectively.  
(Hint: Check out `nn.Conv2d`)

- Set Batch Normalization to be the same size as $Convolution_1$ output (`c_out`).  
(Hint: Check out `nn.BatchNorm2d`)

- Set pooling of `kernel_size=2` and `stride=2`.  
(Hint: Check out `nn.MaxPool2d`)

- Set Dropout with dropout probability of `.0`.  
(Hint: Check out `nn.Dropout2d`)

### Your Implementation

In [0]:
class ConvLayer(nn.Module):
  pass

## Deep CNN

After we have these 2 new network, let's use them to create our end2end CNN model.

Define the following network (x is input)

$x \rightarrow DeepConvLayer_1 \rightarrow DeepConvLayer_2 \rightarrow DeepConvLayer_3 \rightarrow FC$

You can also stack the three DeepConvLayers to a single layer using the `nn.Sequential` function. Give this a go.

For our ConvLayers set `(c_in, c_hidden, c_out, dropout_p=.0)` to be:  
- $DeepConvLayer_1:$ `(3, 32, 64)` respectively.
- $DeepConvLayer_2:$ `(64, 128, 128, 0.05)` respectively.
- $DeepConvLayer_3:$ `(128, 256, 256)` respectively.

For the FC, set  
- The input/hidden/output sizes are 4096, 1024, 512, 10.  
- Dropout probability is 0.1



**Something you might require to think about:**  
You should always keep track of the sizes the input is receiving and returning.  
For example, what is the output of size the third convolution?  
what is the input size of the FC?  
This is not an easy one, you can just run and see why it collapses, as we did before :)  
Do these match?

### Your Implementation

In [0]:
class CNN(nn.Module):
  pass

## Train Model

Initialize and set to run on CUDA

In [0]:
model = CNN()

In [0]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model.to(device)

In [0]:
import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-08)

In [0]:
num_epochs = 2
train(model, num_epochs, trainloader, optimizer, criterion, device)

### Test the network on the test data

In [0]:
evaluate(model, testloader, device)