# Project 1 - Classification, weight sharing, auxiliary losses

This notebook contains my ideas concerning the first question of the project. 

>The goal of the project is to compare different architectures, and assess the performance improvement
that can be achieved through weight sharing, or using auxiliary losses. For the latter, the training can
in particular take advantage of the availability of the classes of the two digits in each pair, beside the
Boolean value truly of interest. 

>All the experiments should be done with 1000 pairs for training and test. A convnet with around 70000
parameters can be trained with 25 epochs in the VM in less than 2s and should achieve around 15% error
rate. 

>Performance estimates provided in your report should be estimated through 10+ rounds for each
architecture, where both data and weight initialization are randomized, and you should provide estimates
of standard deviations.

In [1]:
# Imports
import torch

from torch import nn
from torch.nn import functional as F
from torch import optim
from torchvision import datasets

# Import function
from dlc_practical_prologue import generate_pair_sets

# Matplot
import matplotlib.pyplot as plt

## Import data

In [2]:
# Get data
train_input, train_target, train_classes, test_input, test_target, test_classes = generate_pair_sets(nb=1000)

# Print dimensions
print(train_input.size())
print(train_target.size())
print(train_classes.size())

torch.Size([1000, 2, 14, 14])
torch.Size([1000])
torch.Size([1000, 2])


In [3]:
# Normalize data
train_mean = train_input.mean()
train_std = train_input.std()

train_input -= train_mean
test_input -= train_mean

train_input /= train_std
test_input /= train_std

## Defining the network

- How to use multiple losses in PyTorch: https://stackoverflow.com/questions/53994625/how-can-i-process-multi-loss-in-pytorch/53995165

In [4]:
class FelixNet(nn.Module):
    def __init__(self):
        super().__init__()

        # Define layers
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=16,
                              kernel_size = 5,
                              padding = 2)
        self.conv2 = nn.Conv2d(in_channels=16, out_channels=32,
                               kernel_size = 3,
                               padding = 2)
        self.bn1 = nn.BatchNorm2d(num_features=16)
        self.bn2 = nn.BatchNorm2d(num_features=32)
        self.fc_numbers = nn.Linear(in_features=32, out_features=10)
        self.fc_comparison = nn.Linear(in_features=64, out_features=2)

    def forward(self, x):
        # Split input
        x1 = x[:,0:1,:,:]
        x2 = x[:,1:2,:,:]

        # Define the shared part of the network
        x1, y1_number = self.shared_forward(x1)
        x2, y2_number = self.shared_forward(x2)

        # Stacking outputs
        out_aux = torch.cat((y1_number, y2_number), dim=0)
        x = torch.cat((x1, x2), dim=1)

        # Comparing numbers
        y = self.fc_comparison(x)

        return y, out_aux

    def shared_forward(self, x):
        # First layer
        x = self.conv1(x)
        x = self.bn1(x)
        x = F.relu(x)

        # Second layer
        x = self.conv2(x)
        x = self.bn2(x)
        x = F.relu(x)

        # Third layer
        y = F.avg_pool2d(x, 16).view(x.size(0), -1)
        y_numbers = self.fc_numbers(y)
        return y, y_numbers


In [5]:
### TRAINING ###
model = FelixNet()

# Parmas
lr = 0.1
mini_batch_size = 100
nb_epochs = 100
verbose = True
print_every = int(0.1 * nb_epochs)

# Optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=lr)

# Loss criterion
criterion = nn.CrossEntropyLoss()

# Lerning rate sheduler
nb_steps = 10
step_size = 1 if nb_epochs <= nb_steps else int(nb_epochs/nb_steps)
gamma = 0.65
lr_scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=step_size, gamma=gamma)

# Iterate over epochs
for e in range(nb_epochs):
    # Learning rate sheduler step
    if e != 0: lr_scheduler.step()

    # Set model in training mode
    model.train()

    # Iterate over minibatches
    for b in range(0, train_input.size(0), mini_batch_size):
        optimizer.zero_grad()

        out, out_aux = model(train_input.narrow(0, b, mini_batch_size))

        loss_target = criterion(out, train_target.narrow(0, b, mini_batch_size))
        loss_aux = criterion(out_aux, train_classes.narrow(0, b, mini_batch_size).T.reshape(-1))
        loss = loss_target + loss_aux

        loss.backward()
        optimizer.step()

    if verbose and (e % print_every == 0 or e+1==nb_epochs):
        # Put model in test model
        model.eval()

        err, err_aux = 0, 0

        # Loop over testset and get outputs
        for b in range(0, test_input.size(0), mini_batch_size):
            out, out_aux = model(test_input.narrow(0, b, mini_batch_size))

            out = torch.argmax(out, dim=1)
            out_aux = torch.argmax(out_aux, dim=1)

            err += (out != test_target.narrow(0, b, mini_batch_size)).sum()
            err_aux += (out_aux != test_classes.narrow(0, b, mini_batch_size).T.reshape(-1)).sum()

        print("### Epoch {:3d}: Target error ={:.2f}%  Auxillary error={:.2f}% ###".format(e+1, err/test_input.size(0)*100, err_aux/test_input.size(0)*50))


### Epoch   1: Target error =46.00%  Auxillary error=87.85% ###
### Epoch  11: Target error =37.80%  Auxillary error=73.15% ###
### Epoch  21: Target error =30.70%  Auxillary error=66.70% ###
### Epoch  31: Target error =25.90%  Auxillary error=59.65% ###
### Epoch  41: Target error =24.00%  Auxillary error=55.00% ###
### Epoch  51: Target error =23.10%  Auxillary error=51.50% ###
### Epoch  61: Target error =22.40%  Auxillary error=48.75% ###
### Epoch  71: Target error =22.20%  Auxillary error=47.25% ###
### Epoch  81: Target error =22.20%  Auxillary error=46.35% ###
### Epoch  91: Target error =22.30%  Auxillary error=46.00% ###
### Epoch 100: Target error =22.30%  Auxillary error=45.55% ###
