# Homework 2, *part 2* (60 points)

In this assignment you will build a heavy convolutional neural net (CNN) to solve Tiny ImageNet image classification. Try to achieve as high accuracy as possible.

## Deliverables

* This file,
* a "checkpoint file" from `torch.save(model.state_dict(), ...)` that contains model's weights (which a TA should be able to load to verify your accuracy).

## Grading

* 9 points for reproducible training code and a filled report below.
* 12 points for building a network that gets above 20% accuracy.
* 6.5 points for beating each of these milestones on the validation set:
  * 25.0%
  * 30.0%
  * 32.5%
  * 35.0%
  * 37.5%
  * 40.0%
    
## Restrictions

* Don't use pretrained networks.

## Tips

* One change at a time: never test several new things at once.
* Google a lot.
* Use GPU.
* Use regularization: L2, batch normalization, dropout, data augmentation.
* Use Tensorboard ([non-Colab](https://github.com/lanpa/tensorboardX) or [Colab](https://medium.com/@tommytao_54597/use-tensorboard-in-google-colab-16b4bb9812a6)) or a similar interactive tool for viewing progress.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
import tiny_imagenet
tiny_imagenet.download(".")

./tiny-imagenet-200 already exists, not downloading


Training and validation images are now in `tiny-imagenet-200/train` and `tiny-imagenet-200/val`.

In [3]:
# Your code here
import torch
import torch.nn as nn
import torch.optim as optim
import torch.utils.data as data
from torch.optim import lr_scheduler

import torchvision

import time, copy
from pathlib import Path

In [5]:
# TAKEN FROM THE PYTORCH TUTORIAL (https://pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html)
# almost nothing was changed here (only scheduler is added)
def train_model(model, dataloaders, criterion, optimizer, scheduler, num_epochs=25, is_inception=False):
    since = time.time()

    val_acc_history = []

    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0

    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)

        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                scheduler.step()
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0

            # Iterate over data.
            for inputs, labels in dataloaders[phase]:
                inputs = inputs.to(device)
                labels = labels.to(device)

                # zero the parameter gradients
                optimizer.zero_grad()

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    # Get model outputs and calculate loss
                    # Special case for inception because in training it has an auxiliary output. In train
                    #   mode we calculate the loss by summing the final output and the auxiliary output
                    #   but in testing we only consider the final output.
                    if is_inception and phase == 'train':
                        # From https://discuss.pytorch.org/t/how-to-optimize-inception-model-with-auxiliary-classifiers/7958
                        outputs, aux_outputs = model(inputs)
                        loss1 = criterion(outputs, labels)
                        loss2 = criterion(aux_outputs, labels)
                        loss = loss1 + 0.4 * loss2
                    else:
                        outputs = model(inputs)
                        loss = criterion(outputs, labels)

                    _, preds = torch.max(outputs, 1)

                    # backward + optimize only if in training phase
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

                # statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)

            epoch_loss = running_loss / len(dataloaders[phase].dataset)
            epoch_acc = running_corrects.double() / len(dataloaders[phase].dataset)

            print('{} Loss: {:.4f} Acc: {:.4f}'.format(phase, epoch_loss, epoch_acc))

            # deep copy the model
            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())
            if phase == 'val':
                val_acc_history.append(epoch_acc)

        print()

    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))
    print('Best val Acc: {:4f}'.format(best_acc))

    # load best model weights
    model.load_state_dict(best_model_wts)
    return model, val_acc_history

In [6]:
transforms = {
    'train':torchvision.transforms.Compose([
        torchvision.transforms.RandomHorizontalFlip(),
        torchvision.transforms.RandomCrop(56),
        torchvision.transforms.ColorJitter(saturation=.05, hue=.05),
        torchvision.transforms.ToTensor()
    ]),
    'val': torchvision.transforms.Compose([
        torchvision.transforms.TenCrop(56),
        torchvision.transforms.Lambda(
            lambda crops: torch.stack(
                [torchvision.transforms.ToTensor()(crop) for crop in crops]
            )
        )
    ])
}

data_dir = Path('tiny-imagenet-200')
image_datasets = {x: torchvision.datasets.ImageFolder(data_dir / x,
                                                      transforms[x]) 
                  for x in ['train', 'val']}

data_loaders = {x: data.DataLoader(image_datasets[x], 
                                   batch_size=100,
                                   shuffle=True, 
                                   num_workers=64)
              for x in ['train', 'val']}

In [7]:
list(torchvision.models.resnet18().children())

[Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False),
 BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
 ReLU(inplace),
 MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False),
 Sequential(
   (0): BasicBlock(
     (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
     (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
     (relu): ReLU(inplace)
     (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
     (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
   )
   (1): BasicBlock(
     (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
     (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
     (relu): ReLU(inplace)
     (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bi

In [8]:
class TunedResNet18(nn.Module):
    def __init__(self):
        super().__init__()
        model = torchvision.models.resnet18()
        model.fc = nn.Linear(model.fc.in_features, 200) # change 1000 by 200 to fit Tiny ImageNet dataset
        self._model = model
    
    def forward(self, inputs):
        if inputs.dim() > 4:
            bs, ncrops, c, h, w = inputs.size()
            result = self._model(inputs.view(-1, c, h, w)) 
            outputs = result.view(bs, ncrops, -1).mean(1)
        else:
            outputs = self._model(inputs)
        
        return outputs

In [11]:
model = TunedResNet18()

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = model.to(device)

#Multi GPU
prll_model = nn.DataParallel(model)

#Loss Function
criterion = nn.CrossEntropyLoss()

# Observe that all parameters are being optimized
optimizer = optim.Adam(prll_model.parameters(), weight_decay=1e-4)

# Decay LR by a factor of 0.1 every 7 epochs
#scheduler = lr_scheduler.StepLR(optimizer, step_size=7, gamma=0.1)

# Cyclic LR (which is not used here like periodic)
scheduler = lr_scheduler.CosineAnnealingLR(optimizer, 20, 1e-6)

```python
#WARNING: Don't run this cell on a weak laptop!
#Train
model = train_model(prll_model, 
                    data_loaders, 
                    criterion, 
                    optimizer, 
                    scheduler,
                    num_epochs=15)
```

When everything is done, please compute accuracy on the validation set and report it below.

In [12]:
val_accuracy = .49
print("Validation accuracy: %.2f%%" % (val_accuracy * 100))

Validation accuracy: 49.00%


Save the model weights in Colab and download them with the following code
```python
torch.save(model.state_dict(), 'model_weights-HW2.pth') 
from google.colab import files
files.download('model_weights-HW2.pth')
```

## Test model

In [None]:
# PLEAS, USE `transforms['val']` (is is mandatory for my model)
test_dataset = torchvision.datasets.ImageFolder(<TA, SPECIFY DIR HERE>,
                                                transforms['val']) 

test_loader = data.DataLoader(test_dataset, batch_size=10000, num_workers= <IF_NEEDED>)

inputs, labels = next(iter(test_loader))
Y_pred = model(inputs).argmax(1)
res = (Y_pred == labels).type(torch.float).mean()
print("accuracy calculated on the test :", res)

# Report

Below, please mention

* a brief history of tweaks and improvements;
* what is the final architecture and why?
* what is the training method (batch size, optimization algorithm, ...) and why?
* Any regularization and other techniques applied and their effects;

The reference format is:

*"I have analyzed these and these articles|sources|blog posts, tried that and that to adapt them to my problem and the conclusions are such and such".*

# LOGS
***A brief history of tweaks***  
* Data augmentation
    - apply a horizontal flip with prob. 0.5
    - randomly crop an image of size $56 \times 56$ pixels
    - add a random variation in color (color jitter)
* Validation 
    - For making a prediction on an image, first take ten crops of size $56 \times 56$ of that image (four corners and the center, and their horizontal flips). Get the probabilities to belong to a class for each of the crops. Make a final prediction of the original image by averaging these probability vectors.
* Architecture & Experiment Setting
    - ResNet18 (I chose it because it has shown the best performance in the [article](http://cs231n.stanford.edu/reports/2017/pdfs/937.pdf) I followed doing this task)
    - Adam (it performed well in HW1)
    - L2 regularization (because it is written in the task and easy to add by passing the keyword argument to an optimizer)
    - some other details: batch size and learning rate and the number of epochs I left unchanged (in the [code](https://github.com/tjmoon0104/Tiny-ImageNet-Classifier/blob/master/ResNet18_Baseline.ipynb) I used as a stencil) because I was doing the 2nd part of the HW in the last hours before the deadline
    
**Small Discussion**  
I managed to execute "my code" (the code build from different snippets) two times on GPU.
The first time I did it not using crops and the "validation trick" (which I took from the same [article](http://cs231n.stanford.edu/reports/2017/pdfs/937.pdf)) and got slightly higher than 40%. Having added these tweaks and run it the second time, the model did 10% better. I also was going to try cyclic learning rates from all the same [article](http://cs231n.stanford.edu/reports/2017/pdfs/937.pdf). Possibly, I will test it and download (in the case it improves the results) latter (the next day after deadline). See *After Deadline* section

**REFERENCES**  
[Code which ~~I used abusively~~ I reworked](https://github.com/tjmoon0104/Tiny-ImageNet-Classifier/blob/master/ResNet18_Baseline.ipynb) - According to my friend's words, this is the code of Stanford's similar course attendee  
[Good article](http://cs231n.stanford.edu/reports/2017/pdfs/937.pdf) - Techniques for Image Classification on Tiny-ImageNet  
[Favourite Documentation](https://pytorch.org/docs/stable/index.html?source=Google&medium=PaidSearch&utm_campaign=1711275050&utm_adgroup=77115349188&utm_keyword=pytorch&utm_offering=AI&utm_Product=PYTorch&gclid=EAIaIQobChMIuvy2zpTY4QIVzI4YCh15ywNTEAAYASACEgJ2y_D_BwE) - Pytorch documentation

**Conclusions**  
I have revised a lot about CNN. These things are interesting even during the second reading. I have acquired some implementation knowledge on doing data augmentation in pytorch. I have become acquainted with generous Google Colab

***After Deadline***  
++_Git commits style_++  
- Increase the number of epochs from 15 to 20  
(just to try if it benifits or not - no, it does not)  
- Try CosineAnnealingLR   
  (Cyclic LR which is here used just for obtaining the desired attenuation of LR factor)  
  \[a small improvement - just over 2% (48 $\longrightarrow \;\gtrsim$ 50)\]
- Add images normalization 
  (improvement $\sim 0.5 \%$) - refused from it as a result