# Convolutional neural network on CIFAR10
**Outline**
- Convolutional layers
- Pooling layers
- Dropout
- [model.train()](https://pytorch.org/docs/stable/nn.html?highlight=train#torch.nn.Module.train) and [model.eval()](https://pytorch.org/docs/stable/nn.html?highlight=eval#torch.nn.Module.eval)
- Residual block

## Convolution's properties
![convolution](./images/conv.png)
Some properties which make this transformation such a popular choice for deep learning algorithms:

1. **Sparse interactions**: traditionally, each output interacts with each input, since a matrix multiplication is performed. In CNN, kernels are definitely smaller than the input, since it is made the assumption that the relevant interactions are local. This allows to store fewer parameters, reducing the memory requirements of the model and the number of operations for the output to be computed.

![sparsity](./images/sparsity.png)

2. **Parameters sharing**: for a densely connected layer, each weight defines a single interaction between an element of the input and an element of the output, and it is used exactly once during the computation of the output layer. CNNs, on the other hand, have \textit{tied weights}, that means that a relatively small set of weights is shared by a larger set of inputs to produce the next layer's elements.

3. **Equivariant representations**: a function $f$ is equivariant to a function $g$ if $f(g(x)) = g(f(x))$. Let $g$ be a function that shifts the input; therefore, the convolution operation is equivariant to any input's shift. In other words, let's suppose that a specific representation is associated by the convolution to a particular event in the time series. If we shift the same event $N$ time steps later, the output for the shifted sequence will show the exact same representation $N$ steps later (assuming there is no a resizing process within the transformation).

## Pooling

![max pooling](./images/max_pooling.png)
A pooling function is a transformation that summarizes the local properties for a certain location within the input. 

A common form of pooling is called **max pooling**, which reports the maximum value within a window. This introduces invariance to small translations, shrinking the information and reducing the number of parameters needed for the following layer.

## Dropout
![dropout](./images/dropout.png)

For dropping out a neuron we mean removing it from the network for that specific iteration. In particular, it is implemented by assigning to each neuron of a layer a probability $p$ for its output to be multiplied by zero just for that specific training iteration. The probability $p$ is usually set to a number between $0.25$ and $0.5$ during the training process. It must be set to $0.0$ during the validation and the test iterations.

In [2]:
import numpy as np
from utils.misc import get_params_num, get_accuracy

import torch
from torch import nn
from torch import optim
import torch.nn.functional as F

import torchvision
from torchvision import transforms
from IPython import display

torch.manual_seed(2)

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print('Device: {}'.format(device))

Device: cuda:0


In [3]:
# import CIFAR10
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=64,
                                         shuffle=False, num_workers=2)

print("CIFAR images shape: {}".format(tuple(trainset[0][0].shape)))

Files already downloaded and verified
Files already downloaded and verified
CIFAR images shape: (3, 32, 32)


In [4]:
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.input_dim = 3 * 32 * 32
        self.n_classes = 10
        
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3)
        self.conv2 = nn.Conv2d(32, 128, kernel_size=3)
        self.maxpool = nn.MaxPool2d(kernel_size=2)
        self.out = nn.Linear(128 * 6 * 6, self.n_classes)
        
    def forward(self, x, verbose=False):
        x = F.relu(self.conv1(x))
        x = self.maxpool(x) # F.max_pool2d(x, kernel_size=2)
        x = F.relu(self.conv2(x))
        x = self.maxpool(x) # F.max_pool2d(x, kernel_size=2)
        x = x.view(-1, 128 * 6 * 6)
        x = self.out(x)
        return x
        
        
net = CNN()
net.to(device)
print("# of parameters: {}".format(get_params_num(net)))
print(net)

# of parameters: 83978
CNN(
  (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1))
  (conv2): Conv2d(32, 128, kernel_size=(3, 3), stride=(1, 1))
  (maxpool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (out): Linear(in_features=4608, out_features=10, bias=True)
)


In [5]:
lr = 0.001
momentum = 0.9
epochs = 20

n_batches = len(trainloader)

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=lr, momentum=momentum)

net.train() 
for e in range(epochs):
    for i, data in enumerate(trainloader):
        batch = data[0].to(device)
        labels = data[1].to(device)      
        outputs = net(batch)
        loss = criterion(outputs, labels)
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        if i % 50 == 0:
            print("[EPOCH]: {}, [BATCH]: {}/{}, [LOSS]: {}".format(e, i, n_batches, loss.item()))
            display.clear_output(wait=True)

[EPOCH]: 19, [BATCH]: 750/782, [LOSS]: 0.6600993871688843


In [6]:
acc_train = get_accuracy(trainloader, net, device=device)
acc_test = get_accuracy(testloader, net, device=device)
print("Train accuracy: {}\nTest accuracy: {}".format(acc_train, acc_test))

Train accuracy: 0.73458
Test accuracy: 0.6867


### Scrambled CIFAR

In [7]:
def scramble_image(tensor, indices):
    tensor = tensor.view(-1)[indices].view(3, 32, 32)
    return tensor

indices = np.arange(3*32*32)
np.random.shuffle(indices)

In [8]:
# import CIFAR10
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
     transforms.Lambda(lambda tens: scramble_image(tens, indices))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=False, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=False, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=64,
                                         shuffle=False, num_workers=2)

In [9]:
net = CNN()
net.to(device)

lr = 0.001
momentum = 0.9
epochs = 20

n_batches = len(trainloader)

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=lr, momentum=momentum)

net.train() 
for e in range(epochs):
    for i, data in enumerate(trainloader):
        batch = data[0].to(device)
        labels = data[1].to(device)      
        outputs = net(batch)
        loss = criterion(outputs, labels)
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        if i % 50 == 0:
            print("[EPOCH]: {}, [BATCH]: {}/{}, [LOSS]: {}".format(e, i, n_batches, loss.item()))
            display.clear_output(wait=True)

[EPOCH]: 19, [BATCH]: 750/782, [LOSS]: 1.1913037300109863


In [10]:
acc_train = get_accuracy(trainloader, net, device=device)
acc_test = get_accuracy(testloader, net, device=device)
print("Train accuracy: {}\nTest accuracy: {}".format(acc_train, acc_test))

Train accuracy: 0.59886
Test accuracy: 0.5008


## Making our CNN a bit fancier

In [11]:
# import CIFAR10
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=False, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=False, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=64,
                                         shuffle=False, num_workers=2)

In [12]:
class Bottleneck(nn.Module):
    def __init__(self, in_channels, reduction_factor):
        super(Bottleneck, self).__init__()
        self.bottleneck = nn.Conv2d(in_channels,in_channels // reduction_factor, kernel_size=1)
        self.conv = nn.Conv2d(in_channels // reduction_factor, in_channels // reduction_factor, 
                              padding=1, kernel_size=3)
        self.expansion = nn.Conv2d(in_channels // reduction_factor, in_channels, kernel_size=1)
        self.act = nn.LeakyReLU()
        
    def forward(self, x):
        x = self.act(self.bottleneck(x))
        x = self.act(self.conv(x))
        x = self.expansion(x)
        return x
    

class ResidualBlock(nn.Module):
    def __init__(self, in_channels, reduction_factor):
        super(ResidualBlock, self).__init__()
        self.bottleneck = Bottleneck(in_channels, reduction_factor)
        
    def forward(self, x):
        return x + self.bottleneck(x)
    
    
class GlobalAveragePooling(nn.Module):
    def forward(self, x):
        return torch.mean(x.view(x.size(0), x.size(1), -1), dim=2)
    
    
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        C = 256
        n_classes = 10
        
        
        self.network = nn.Sequential(
                nn.Conv2d(in_channels=3, out_channels=C, kernel_size=3),
                nn.LeakyReLU(),
                nn.MaxPool2d(kernel_size=2),
                ResidualBlock(C, 2),
                nn.LeakyReLU(),
                GlobalAveragePooling(),
                nn.Linear(C, 100),
                nn.LeakyReLU(),
                nn.Dropout(p=0.25),
                nn.Linear(100, n_classes)
        )
        
    def forward(self, x, verbose=False):
        return self.network(x)
        
        
net = CNN()
net.to(device)
print("# of parameters: {}".format(get_params_num(net)))
print(net)

# of parameters: 247382
CNN(
  (network): Sequential(
    (0): Conv2d(3, 256, kernel_size=(3, 3), stride=(1, 1))
    (1): LeakyReLU(negative_slope=0.01)
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): ResidualBlock(
      (bottleneck): Bottleneck(
        (bottleneck): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1))
        (conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (expansion): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1))
        (act): LeakyReLU(negative_slope=0.01)
      )
    )
    (4): LeakyReLU(negative_slope=0.01)
    (5): GlobalAveragePooling()
    (6): Linear(in_features=256, out_features=100, bias=True)
    (7): LeakyReLU(negative_slope=0.01)
    (8): Dropout(p=0.25, inplace=False)
    (9): Linear(in_features=100, out_features=10, bias=True)
  )
)


In [13]:
lr = 0.001
epochs = 20

n_batches = len(trainloader)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=lr)

net.train() 
for e in range(epochs):
    for i, data in enumerate(trainloader):
        batch = data[0].to(device)
        labels = data[1].to(device)      
        outputs = net(batch)
        loss = criterion(outputs, labels)
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        if i % 50 == 0:
            print("[EPOCH]: {}, [BATCH]: {}/{}, [LOSS]: {}".format(e, i, n_batches, loss.item()))
            display.clear_output(wait=True)

[EPOCH]: 19, [BATCH]: 750/782, [LOSS]: 0.3826696276664734


In [14]:
acc_train = get_accuracy(trainloader, net, device=device)
acc_test = get_accuracy(testloader, net, device=device)
print("Train accuracy: {}\nTest accuracy: {}".format(acc_train, acc_test))

Train accuracy: 0.83066
Test accuracy: 0.7623


In [15]:
torch.save(net.state_dict(), 'saved_models/fancy_net_CIFAR10.pt')

In [None]:
# net = CNN()
# net.load_state_dict(torch.load('saved_models/fancy_net_CIFAR10.pt'))
# net.eval()

### Resources:
[Deep Residual Learning for Image Recognition](https://arxiv.org/pdf/1512.03385.pdf)

[Dropout](http://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf)

[A nice explanation in three parts](https://adeshpande3.github.io/A-Beginner%27s-Guide-To-Understanding-Convolutional-Neural-Networks/)

[Saving and loading models](https://pytorch.org/tutorials/beginner/saving_loading_models.html)
