# Small data and deep learning
This Pratical session proposes to study several techniques for improving challenging context, in which few data and resources are available.

# Introduction
Assume we are in a context where few "gold" labeled data are available for training, say $\mathcal{X}_{\text{train}}\triangleq\{(x_n,y_n)\}_{n\leq N_{\text{train}}}$, where $N_{\text{train}}$ is small. A large test set $\mathcal{X}_{\text{test}}$ is available. A large amount of unlabeled data, $\mathcal{X}$, is available. We also assume that we have a limited computational budget (e.g., no GPUs).

For each question, write a commented *Code* or a complete answer as a *Markdown*. When the objective of a question is to report a CNN accuracy, please use the following format to report it, at the end of the question:

| Model | Number of  epochs  | Train accuracy | Test accuracy |
|------|------|------|------|
|   XXX  | XXX | XXX | XXX |

If applicable, please add the field corresponding to the  __Accuracy on Full Data__ as well as a link to the __Reference paper__ you used to report those numbers. (You do not need to train a CNN on the full CIFAR10 dataset)

In your final report, please keep the logs of each training procedure you used. We will only run this jupyter if we have some doubts on your implementation. 

__The total file sizes should be reasonable (feasible with 2MB only!). You will be asked to hand in the notebook, together with any necessary files required to run it if any.__


You can use https://colab.research.google.com/ to run your experiments.

## Training set creation
__Question 1 (2 points):__ Propose a dataloader or modify the file located at https://github.com/pytorch/vision/blob/master/torchvision/datasets/cifar.py in order to obtain a training loader that will only use the first 100 samples of the CIFAR-10 training set. 

In [None]:
## Download the CIFAR10 dataset using the PyTorch dataloaders
import torch
import torchvision
import torchvision.transforms as transforms

# *****START CODE 
## Data
##Here you are free to add further transform functions if you wish
print('==> Preparing data..')
transform_train = transforms.Compose([
    transforms.ToTensor(),  transforms.Normalize((0.1307,), (0.3081,))])

transform_test = transforms.Compose([
    transforms.ToTensor(),  transforms.Normalize((0.1307,), (0.3081,))])

transform_train_transfer = transforms.Compose([transforms.Resize((64,64)),
    transforms.ToTensor(),  transforms.Normalize((0.1307,), (0.3081,))])

transform_test_transfer = transforms.Compose([transforms.Resize((64,64)),
    transforms.ToTensor(),  transforms.Normalize((0.1307,), (0.3081,))])




trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform_train)
trainset_transfer = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform_train_transfer)


L_train = []
L_train_transfer =[]
for i in range(100):
    L_train.append(trainset[i])
    L_train_transfer.append(trainset_transfer[i])

trainloader = torch.utils.data.DataLoader(L_train, batch_size= 10 , shuffle=True)
trainloader_transfer = torch.utils.data.DataLoader(L_train_transfer, batch_size= 10 , shuffle=True)

testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform_test)
testset_transfer = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform_test_transfer)

L_test = []
L_test_transfer =[]
for i in range(len(testset)):
    L_test.append(testset[i])
    L_test_transfer.append(testset_transfer[i])
    
testloader = torch.utils.data.DataLoader(L_test, batch_size= 10, shuffle=False)
testloader_transfer = torch.utils.data.DataLoader(L_test_transfer, batch_size= 10, shuffle=False)


L_weak = []
for i in range(100, 3000):
    L_weak.append(trainset_transfer[i])
    
weakloader = torch.utils.data.DataLoader(L_weak, batch_size= 100, shuffle=False)

In [None]:
trainset[1][0].shape

This is our dataset $\mathcal{X}_{\text{train}}$, it will be used until the end of this project. The remaining samples correspond to $\mathcal{X}$. The testing set $\mathcal{X}_{\text{test}}$ corresponds to the whole testing set of CIFAR-10.

## Testing procedure
__Question 2 (1.5 points):__ Explain why the evaluation of the training procedure is difficult. Propose several solutions.

Because we do not have a lot of data, the training procedure is difficult. However, several solutions are possible to manage this problem. One way is to use data augmentation, that is adding various transformations to real data, like rotations, flips, contrasts, color balance, noise to our image. 
On the other hand, we can rely on multi-tasking. Especially if we have related enought tasks, we could increase in accuracy for the real targeted task. 
Finally, a last solution could be to add some noise in our data. 

# Raw approach: the baseline

In this section, the goal is to train a CNN on $\mathcal{X}_{\text{train}}$ and compare its performance with reported numbers from the litterature. You will have to re-use and/or design a standard classification pipeline. You should optimize your pipeline to obtain the best performances (image size, data augmentation by flip, ...).

The key ingredients for training a CNN are the batch size, as well as the learning rate schedule, i.e. how to decrease the learning rate as a function of the number of epochs. A possible schedule is to start the learning rate at 0.1 and decreasing it every 30 epochs by 10. In case of divergence, reduce the laerning rate. A potential batch size could be 10, yet this can be cross-validated.

You can get some baselines accuracies in this paper: http://openaccess.thecvf.com/content_cvpr_2018/papers/Keshari_Learning_Structure_and_CVPR_2018_paper.pdf. Obviously, it is a different context for those researchers who had access to GPUs.

## ResNet architectures

__Question 3 (4 points):__ Write a classification pipeline for $\mathcal{X}_{\text{train}}$, train from scratch and evaluate a *ResNet-18* architecture specific to CIFAR10 (details about the ImageNet model can be found here: https://arxiv.org/abs/1512.03385). Please report the accuracy obtained on the whole dataset as well as the reference paper/GitHub link.

*Hint:* You can re-use the following code: https://github.com/kuangliu/pytorch-cifar. During a training of 10 epochs, a batch size of 10 and a learning rate of 0.01, one obtains 40% accuracy on $\mathcal{X}_{\text{train}}$ (\~2 minutes) and 20% accuracy on $\mathcal{X}_{\text{test}}$ (\~5 minutes).

All the functions below come from the github link provided above. 

In [3]:
import os
import argparse


In [4]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import torch.backends.cudnn as cudnn

import torchvision
import torchvision.transforms as transforms
import numpy as np


In [5]:
import os
import sys
import time
import math

import torch.nn as nn
import torch.nn.init as init


term_width = 80

TOTAL_BAR_LENGTH = 65.
last_time = time.time()
begin_time = last_time

def progress_bar(current, total, msg=None):
    global last_time, begin_time
    if current == 0:
        begin_time = time.time()  # Reset for new bar.

    cur_len = int(TOTAL_BAR_LENGTH*current/total)
    rest_len = int(TOTAL_BAR_LENGTH - cur_len) - 1

    sys.stdout.write(' [')
    for i in range(cur_len):
        sys.stdout.write('=')
    sys.stdout.write('>')
    for i in range(rest_len):
        sys.stdout.write('.')
    sys.stdout.write(']')

    cur_time = time.time()
    step_time = cur_time - last_time
    last_time = cur_time
    tot_time = cur_time - begin_time

    L = []
    L.append('  Step: %s' % format_time(step_time))
    L.append(' | Tot: %s' % format_time(tot_time))
    if msg:
        L.append(' | ' + msg)

    msg = ''.join(L)
    sys.stdout.write(msg)
    for i in range(term_width-int(TOTAL_BAR_LENGTH)-len(msg)-3):
        sys.stdout.write(' ')

    # Go back to the center of the bar.
    for i in range(term_width-int(TOTAL_BAR_LENGTH/2)+2):
        sys.stdout.write('\b')
    sys.stdout.write(' %d/%d ' % (current+1, total))

    if current < total-1:
        sys.stdout.write('\r')
    else:
        sys.stdout.write('\n')
    sys.stdout.flush()

def format_time(seconds):
    days = int(seconds / 3600/24)
    seconds = seconds - days*3600*24
    hours = int(seconds / 3600)
    seconds = seconds - hours*3600
    minutes = int(seconds / 60)
    seconds = seconds - minutes*60
    secondsf = int(seconds)
    seconds = seconds - secondsf
    millis = int(seconds*1000)

    f = ''
    i = 1
    if days > 0:
        f += str(days) + 'D'
        i += 1
    if hours > 0 and i <= 2:
        f += str(hours) + 'h'
        i += 1
    if minutes > 0 and i <= 2:
        f += str(minutes) + 'm'
        i += 1
    if secondsf > 0 and i <= 2:
        f += str(secondsf) + 's'
        i += 1
    if millis > 0 and i <= 2:
        f += str(millis) + 'ms'
        i += 1
    if f == '':
        f = '0ms'
    return f

In [None]:
class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, in_planes, planes, stride=1):
        super(BasicBlock, self).__init__()
        self.conv1 = nn.Conv2d(
            in_planes, planes, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3,
                               stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)

        self.shortcut = nn.Sequential()
        if stride != 1 or in_planes != self.expansion*planes:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_planes, self.expansion*planes,
                          kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(self.expansion*planes)
            )

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        out += self.shortcut(x)
        out = F.relu(out)
        return out

    

class ResNet(nn.Module):
    def __init__(self, block, num_blocks, num_classes=10):
        super(ResNet, self).__init__()
        self.in_planes = 64

        self.conv1 = nn.Conv2d(3, 64, kernel_size=3,
                               stride=1, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1)
        self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)
        self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2)
        self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2)
        self.linear = nn.Linear(512*block.expansion, num_classes)

    def _make_layer(self, block, planes, num_blocks, stride):
        strides = [stride] + [1]*(num_blocks-1)
        layers = []
        for stride in strides:
            layers.append(block(self.in_planes, planes, stride))
            self.in_planes = planes * block.expansion
        return nn.Sequential(*layers)

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.layer1(out)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.layer4(out)
        out = F.avg_pool2d(out, 4)
        out = out.view(out.size(0), -1)
        out = self.linear(out)
        return out


def ResNet18():
    return ResNet(BasicBlock, [2, 2, 2, 2])


def test():
    net = ResNet18()
    y = net(torch.randn(1, 3, 32, 32))
    print(y.size())

In [None]:
parser = argparse.ArgumentParser(description='PyTorch CIFAR10 Training')
parser.add_argument('-f')
parser.add_argument('--lr', default=0.01, type=float, help='learning rate')
parser.add_argument('--resume', '-r', action='store_true',
                    help='resume from checkpoint')
args = parser.parse_args()
device = 'cuda' if torch.cuda.is_available() else 'cpu'

In [None]:
# Model
print('==> Building model..')
model = ResNet18()

model = model.to(device)
if device == 'cuda':
    model = torch.nn.DataParallel(model)
    cudnn.benchmark = True

if args.resume:
    # Load checkpoint.
    print('==> Resuming from checkpoint..')
    assert os.path.isdir('checkpoint'), 'Error: no checkpoint directory found!'
    checkpoint = torch.load('./checkpoint/ckpt.pth')
    model.load_state_dict(checkpoint['net'])
    best_acc = checkpoint['acc']
    start_epoch = checkpoint['epoch']

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=args.lr,
                      momentum=0.9, weight_decay=5e-4)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=200)

In [None]:
# Training
def train(epoch, data):
    print('\nEpoch: %d' % epoch)
    model.train()
    train_loss = []
    correct = 0
    total = 0
    compt = 0
    for batch_idx, (inputs, targets) in enumerate(data):

        
        inputs, targets = inputs.to(device), targets.to(device)
        optimizer.zero_grad()
        outputs = model(inputs)
        targets = targets.long()
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer.step()

        train_loss += [loss.item()]
        _, predicted = outputs.max(1)
        total += targets.size(0)
        correct += predicted.eq(targets).sum().item()
    print("Iteration: {0} | Loss: {1} | Training accuracy: {2}%".format(epoch+1, np.mean(train_loss), 100. *correct/total))
    return ("Iteration: {0} | Loss: {1} | Training accuracy: {2}%".format(epoch+1, train_loss, 100. *correct/total))

In [None]:
def test(epoch, data):
    best_acc = 0
    model.eval()
    test_loss = []
    correct = 0
    total = 0
    with torch.no_grad():
        for batch_idx, (inputs, targets) in enumerate(data):
            inputs, targets = inputs.to(device), targets.to(device)
            outputs = model(inputs)
            loss = criterion(outputs, targets)

            test_loss += [loss.item()]
            _, predicted = outputs.max(1)
            total += targets.size(0)
            correct += predicted.eq(targets).sum().item()

    # Save checkpoint.
    acc = 100.*correct/total
    if acc > best_acc:
        print('Saving..')
        state = {
            'net': model.state_dict(),
            'acc': acc,
            'epoch': epoch,
        }
        if not os.path.isdir('checkpoint'):
            os.mkdir('checkpoint')
        torch.save(state, './checkpoint/ckpt.pth')
        best_acc = acc
    print("Iteration: {0} | Loss: {1} | Test accuracy: {2}%".format(epoch+1, np.mean(test_loss), 100. *correct/total))
    return ("Iteration: {0} | Loss: {1} | Test accuracy: {2}%".format(epoch+1, np.mean(test_loss), 100. *correct/total))

In [None]:
for epoch in range(10):  # You should get about 53% accuracy on train and 21% on test
    train(epoch,trainloader)
    test(epoch,testloader)
    scheduler.step()

In [None]:
for batch_idx, (inputs, targets) in enumerate(trainloader):
  inputs, targets = inputs.to(device), targets.to(device)

In [None]:
import pandas as pd 

d1 = {'Model': ['Resnet18()'], 'Number of epochs': [10], 'Train accuracy': ['53%'], 'Test accuracy': ['21%']}
df1 = pd.DataFrame(data = d1)

In [None]:
df1

# Transfer learning

We propose to use pre-trained models on a classification and generative task, in order to improve the results of our setting.

## ImageNet features

Now, we will use some pre-trained models on ImageNet and see how well they compare on CIFAR. A list is available on: https://pytorch.org/docs/stable/torchvision/models.html.

__Question 4 (3 points):__ Pick a model from the list above, adapt it for CIFAR and retrain its final layer (or a block of layers, depending on the resources to which you have access to). Report its accuracy.

For this given part, we chose the AlexNet model. 

In [None]:
import torchvision.models as models
import torch.nn  as nn

In [None]:
# Model - Newmodel = alexnet
print('==> Building model..')

model = models.alexnet(pretrained = True)


for param in model.parameters():
    param.requires_grad = False

num_ftrs = model.classifier[6].in_features
model.classifier[6] = nn.Linear(num_ftrs, 10)

model = model.to(device)

if device == 'cuda':
    model = torch.nn.DataParallel(model)
    cudnn.benchmark = True

if args.resume:
    # Load checkpoint.
    print('==> Resuming from checkpoint..')
    assert os.path.isdir('checkpoint'), 'Error: no checkpoint directory found!'
    checkpoint = torch.load('./checkpoint/ckpt.pth')
    model.load_state_dict(checkpoint['net'])
    best_acc = checkpoint['acc']
    start_epoch = checkpoint['epoch']

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.module.classifier.parameters(), lr=args.lr,
                      momentum=0.9, weight_decay=5e-4)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=200)

In [None]:
for epoch in range(10):  #You should get about 97% for training and 32% accuracy for testing
    train(epoch,trainloader_transfer)
    test(epoch, testloader_transfer)
    scheduler.step()

In [None]:
d2 = {'Model': ['AlexNet'], 'Number of epochs': [10], 'Train accuracy': ['97%'], 'Test accuracy': ['31.8%']}
df2 = pd.DataFrame(data = d2)

In [None]:
df2

# Incorporating *a priori*
Geometrical *a priori* are appealing for image classification tasks. For now, we only consider linear transformations $\mathcal{T}$ of the inputs $x:\mathbb{S}^2\rightarrow\mathbb{R}$ where $\mathbb{S}$ is the support of an image, meaning that:

$$\forall u\in\mathbb{S}^2,\mathcal{T}(\lambda x+\mu y)(u)=\lambda \mathcal{T}(x)(u)+\mu \mathcal{T}(y)(u)\,.$$

For instance if an image had an infinite support, a translation $\mathcal{T}_a$ by $a$ would lead to:

$$\forall u, \mathcal{T}_a(x)(u)=x(u-a)\,.$$

Otherwise, one has to handle several boundary effects.

__Question 5 (1.5 points):__ Explain the issues when dealing with translations, rotations, scaling effects, color changes on $32\times32$ images. Propose several ideas to tackle them.

# Rotation 
An issue which may raise with this technic is that image dimension may not be preserved after rotation. (ex: if an image is a triangle, rotating it by 180 degrees would preserve the size)

# Scaling effects
An issue with scaling effects is that the image can be scaled outward or inward and consequently could be larger than the original image size or reduce the image size (forcing to make assumptions about what lies beyond the boundary).


# Translations
One of the main cons of translations is that the targeted information could be loss if the translation is too significative. We could tackle this problem by tuning a translation threshold that will prevent the translation to be too big.
But in an overall aspect, it force our convolutional network to look at everywhere.


# Ideas to tackle the lack of background after the transformation
There are many methods to tackle this issue in the documentation. Here are some of them :
  -  **Constant** :
The idea is to fill the unknown region with some constant value. This may not work for natural images, but can work for images taken in a monochromatic background

  - **Edge** :
The edge values of the image are extended after the boundary. This method can work for mild translations.

  - **Reflect** :
The image pixel values are reflected along the image boundary. This method is useful for continuous or natural backgrounds containing trees, mountains etc.

  - **Wrap** : 
We repeat the image beyond its boundary. This method is not as popularly used as the rest as it does not make sense for a lot of scenarios.


# Color changes
The results look more artistic than realistic and the computational cost is quiet high, but results are very good.


## Data augmentations

__Question 6 (3 points):__ Propose a set of geometric transformation beyond translation, and incorporate them in your training pipeline. Train the model of the __Question 3__ with them and report the accuracies.

In this part, we tried different data augmentation methods and report for some combinations of them the accuracy obtained.

In [None]:
from imgaug import augmenters as iaa
import imgaug as ia

In [None]:
class ImgAugTransform:
    def __init__(self):
        self.aug = iaa.Sequential([
            iaa.Scale((320, 96)),
            iaa.Sometimes(0.25, iaa.GaussianBlur(sigma=(0, 3.0))),
            iaa.Fliplr(0.5),
            iaa.Affine(rotate=(-20, 20), mode='symmetric'),
            iaa.Sometimes(0.25,
                      iaa.OneOf([iaa.Dropout(p=(0, 0.1)),
                                 iaa.CoarseDropout(0.1, size_percent=0.5)])),
            iaa.AddToHueAndSaturation(value=(-10, 10), per_channel=True)
        ])
      
    def __call__(self, img):
        img = np.array(img)
        return self.aug.augment_image(img)

In [None]:
##Here you are free to add further transform functions if you wish
import PIL

print('==> Preparing data..')
transform_trainbis = torchvision.transforms.Compose([
  #  torchvision.transforms.Resize((320,96)),
    torchvision.transforms.ToTensor(),
    torchvision.transforms.ColorJitter(hue=.05, saturation=.05),
#    transforms.ColorJitter(
 #           brightness=0.1*torch.abs(torch.randn(1)).item(),
 #           contrast=0.1*torch.abs(torch.randn(1)).item(),
  #          saturation=0.1*torch.abs(torch.randn(1)).item(),
  #          hue=0.1*torch.abs(torch.randn(1)).item()),
    torchvision.transforms.RandomCrop(32, padding = 4),
    torchvision.transforms.RandomHorizontalFlip(p=0.5),
    #torchvision.transforms.RandomVerticalFlip(p=0.5),
    #torchvision.transforms.RandomRotation(40, resample=PIL.Image.NEAREST),
    torchvision.transforms.Normalize((0.1307,),(0.3081,))
    
])

transform_train = transforms.Compose([
    transforms.ToTensor(),  transforms.Normalize((0.45,), (0.45,))])

transform_testbis = torchvision.transforms.Compose([
    torchvision.transforms.ToTensor(),
    torchvision.transforms.Normalize((0.5,),(0.5,)),
])

trainset_bis = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform_train)

L_trainbis = []
for i in range(100):
    L_trainbis.append(trainset_bis[i])


trainloaderbis = torch.utils.data.DataLoader(L_trainbis, batch_size= 5, shuffle=True)

testset_bis = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform_testbis)

L_testbis = []
for i in range(len(testset_bis)):
    L_testbis.append(testset_bis[i])
    
testloaderbis = torch.utils.data.DataLoader(L_testbis, batch_size= 5, shuffle=False)

In [None]:
# Model
print('==> Building model..')
model = ResNet18()

model = model.to(device)
if device == 'cuda':
    model = torch.nn.DataParallel(model)
    cudnn.benchmark = True

if args.resume:
    # Load checkpoint.
    print('==> Resuming from checkpoint..')
    assert os.path.isdir('checkpoint'), 'Error: no checkpoint directory found!'
    checkpoint = torch.load('./checkpoint/ckpt.pth')
    model.load_state_dict(checkpoint['net'])
    best_acc = checkpoint['acc']
    start_epoch = checkpoint['epoch']

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=args.lr,
                      momentum=0.9, weight_decay=5e-4)
#optimizer = optim.Adam(model.parameters(), lr=args.lr, weight_decay= 5e-4)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=200)

In [None]:
import random
#random.shuffle(dataset)
for epoch in range(20):
    train(epoch,trainloaderbis)
    test(epoch, testloaderbis)
    scheduler.step()

In [None]:
 # Here we are reported all our results in a dataframe.

 pd.set_option('max_colwidth', 400)


d2 = {'Model': ['Resnet18() + RandomHorizontalFlip(p=0.5) + RandomVerticalFlip(p=0.5) + RandomRotation(45, resample=PIL.Image.NEAREST) + Normalize((0.1307,),(0.3081,))',
               'Resnet18() + ColorJitter + RandomHorizontalFlip(p=0.5) + RandomVerticalFlip(p=0.5) + RandomRotation(45, resample=PIL.Image.NEAREST) + Normalize((0.1307,),(0.3081,))',
               'Resnet18() + RandomCrop(32, padding = 4) + RandomHorizontalFlip(p=0.5) + Normalize((0.1307,),(0.3081,) ',
                'Resnet18() + RandomCrop(32, padding = 4) + RandomHorizontalFlip(p=0.5) + Normalize((0.1307,),(0.3081,)',
                'Resnet18() + Resize((64,64)) + Colorjitter + RandomCrop(32, padding = 4) + RandomHorizontalFlip(p=0.5) + RandomVerticalFlip(p=0.5) + RandomRotation(45, resample=PIL.Image.NEAREST) + Normalize((0.1307,),(0.3081,))	'
                ,'Resnet18() + Brightness =0.5 + Normalize((0.1307,),(0.3081,))',
                'Resnet18() + Brightness =0.8 + Saturation = 0.8 + Normalize((0.1307,),(0.3081,))'],
      'Parameters' : ['batch_size = 10 + SGD + lr = 0.01',
                     'batch_size = 5 + Adam + lr = 0.01',
                     'batch_size = 5 + SGD + lr = 0.01',
                     'batch_size = 5 + SGD + lr = 0.01',
                      'batch_size = 10 + SGD + lr = 0.01	',
                      'batch_size = 10 + Adam + lr = 0.0001',
                      'batch_size = 10 + Adam + lr = 0.0001'], 
      'Number of epochs': [15, 20, 20, 30,30,30,30],
      'Train accuracy': ['89%','41%','91%', '96%','96%','100%','100%'],
      'Test accuracy': ['19.66%','18.27%', '23.07%', '22.61%',	'20.85%','23.62%',"20.82%"]}


df2 = pd.DataFrame(data = d2)

In [None]:
df2

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# functions to show an image (that helps us better know which data augmentation used or not used)


def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()


# get some random training images
dataiter = iter(trainloaderbis)
images, labels = dataiter.next()

# show images
imshow(torchvision.utils.make_grid(images))

# Conclusions

__Question 7 (5 points):__ Write a short report explaining the pros and the cons of each methods that you implemented. 25% of the grade of this project will correspond to this question, thus, it should be done carefully. In particular, please add a plot that will summarize all your numerical results.


During this Lab, We've dealed with 3 differents methode to train a model with a small dataset : **Training on a Resnet**, **Transfer learning** and **Data augmentation**. Lets now compare theme and see their advantages.

- **Transfer Learning**  : 
This method consist in using a model already train on an other dataset, on a quite similar task. We are just modifying the last fully-connected layer that apply the classification task, and train only the final layer on the train set. It is a very convenient method that allow us to reach a good accuracy on the test set with a reduced computational time. 

  However, It has some disadvantages that need to be taken into account. The first thing to take care is that the model has been trained on the same kind of data (type of image, size..) and with a very large dataset.

- **Data augmentation**  : Here we "artificially" augment the size of our dataset by applying some transformation on images. This allow us to get new images and have a bigger dataset. 

  This method is very useful when we have a small dataset, but if it's not carefully use, it can quickly lead to a non realistic dataset. In fact, some transformation can lead to a non sense image that doesn't exist in the real life (e.g. : an image of elephent reversed). So each transformation applied on the dataset need to be chosen carefully, so that the model can train on images closed to the original dataset.

- **Resnet** : The network here simply don't have enough images to catch the pattern of each class in the dataset

To sum up our experimentation on the lab, we figure out that the method method that reach the best accuracy is the transferd learning (32%) followed by the data augmentation (23%) and finally the supervised method (21%).
One can think that it would be good to combine the 2 first method in order to get a better accuracy.

In [None]:
# Here we plot the results we got based on the three methods used
import matplotlib.pyplot as plt

list_methode = ['Resnet18' , 'Alexnet(pre-trained)', 'data augmentation']
list_acc_train = [53,97,100]
list_acc_test = [21,32,23.62]

plt.plot(list_methode,list_acc_train, label='Train accuracy')
plt.plot(list_methode,list_acc_test, label='Test accuracy')
plt.legend()
plt.show()

# Weak supervision

__Bonus \[open\] question (up to 3 points):__ Pick a weakly supervised method that will potentially use $\mathcal{X}\cup\mathcal{X}_{\text{train}}$ to train a representation (a subset of $\mathcal{X}$ is also fine). Evaluate it and report the accuracies. You should be careful in the choice of your method, in order to avoid heavy computational effort.

On this part, we tried on a weakly supervised method that aims to train our model on a small dataset labelled (100 datas). Then our model make prediction on the remaining training dataset and suppose that its prediction are true. 
We then train our model and the whole training set and make prediction on the test set. 

In [None]:
# Model - Newmodel = alexnet
print('==> Building model..')

model = models.alexnet(pretrained = True)


for param in model.parameters():
    param.requires_grad = False
    

num_ftrs = model.classifier[6].in_features
model.classifier[6] = nn.Linear(num_ftrs, 10)

model = model.to(device)

if device == 'cuda':
    model = torch.nn.DataParallel(model)
    cudnn.benchmark = True

if args.resume:
    # Load checkpoint.
    print('==> Resuming from checkpoint..')
    assert os.path.isdir('checkpoint'), 'Error: no checkpoint directory found!'
    checkpoint = torch.load('./checkpoint/ckpt.pth')
    model.load_state_dict(checkpoint['net'])
    best_acc = checkpoint['acc']
    start_epoch = checkpoint['epoch']

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.module.classifier.parameters(), lr=args.lr,
                      momentum=0.9, weight_decay=5e-4)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=200)

for epoch in range(10):  # You should get about 97% for training and 32% accuracy for testing
    train(epoch,trainloader_transfer)
    test(epoch,testloader_transfer)
    scheduler.step()

In [None]:
import torch
import numpy as np
from torch.utils.data import TensorDataset, DataLoader
import torch.nn.functional as F

model.eval()
big_train=[]
with torch.no_grad():
    for batch_idx, (inputs, targets) in enumerate(weakloader):
        inputs, targets = inputs.to(device), targets.to(device)
        outputs = model(inputs)
        _, predicted = outputs.max(1)
        big_train.append((inputs,predicted))

input =[]
target=[]
for i in range (len(big_train)):
  inputs,targets = big_train[i]
  inputs = inputs.cpu()
  targets = targets.cpu()
  input.append(inputs.numpy())
  target.append(targets.numpy())

transform_bigdataset = transforms.Compose([transforms.Resize((32,32)),
    transforms.ToTensor(),  transforms.Normalize((0.1307,), (0.3081,))])

input = torch.Tensor(input).reshape((2900,3,64,64)) # transform to torch tensor
target = torch.Tensor(target).reshape((2900))

input = F.interpolate(input, (32, 32))


big_dataset = TensorDataset(torch.squeeze(input),torch.squeeze(target)) # We create our new and bigger datset
big_dataloader = DataLoader(big_dataset, batch_size=10, shuffle=True ) # We create our new dataloader


In [None]:
for batch_idx, (inputs, targets) in enumerate(big_dataloader):
  inputs, targets = inputs.to(device), targets.to(device)

In [None]:
model = ResNet18()

model = model.to(device)
if device == 'cuda':
    model = torch.nn.DataParallel(model)
    cudnn.benchmark = True

if args.resume:
    # Load checkpoint.
    print('==> Resuming from checkpoint..')
    assert os.path.isdir('checkpoint'), 'Error: no checkpoint directory found!'
    checkpoint = torch.load('./checkpoint/ckpt.pth')
    model.load_state_dict(checkpoint['net'])
    best_acc = checkpoint['acc']
    start_epoch = checkpoint['epoch']

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=args.lr,
                      momentum=0.9, weight_decay=5e-4)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=200)
model.train()


for epoch in range(20):  # You should get about 90% for training and 26% accuracy for testing
    train(epoch,big_dataloader)
    test(epoch, testloader)
    scheduler.step()
  