# [Scene Recognition with Deep Learning](https://www.cc.gatech.edu/~hays/compvision/proj5/)
For this project we are going to focus on scene classification for 15 scene types with a state-of-the-art approach: deep learning. The task is also known as image classification. 

Basic learning objectives of this project:
1. Construct the fundamental pipeline for performing deep learning using PyTorch;
2. Understand the concepts behind different layers, optimizers, and learning schedules;
3. Experiment with different models and observe the performance.

The starter code is mostly initialized to 'placeholder' just so that the starter
code does not crash when run unmodified and you can get a preview of how
results are presented.

In [None]:
# flag to modify everything to run better on Colab; change it to true if you want to run on colab
use_colab = False 

## Part 0: Setup for Colab
You can skip this part if you are not running your notebook on Colab.

### Download Data

Download the data for training the network. It's exactly the same as that's been provided for you, but we'll fetch this from the cloud to keep uploads small

In [None]:
# uncomment for running on colab
# !wget "https://cc.gatech.edu/~hays/compvision/proj5/data.zip" -O data.zip && unzip -qq data.zip
# !rm ./data.zip

### Upload code and unit tests

Once you have finished your code, run `python ./zip_for_colab.py` and all the required files and tests will be written to `cv_proj5.zip`.

Click the folder icon on the left of the colab UI, and click on the upload button right below the "Files" heading. You should have done a similar process for Project 4.

Run the cell below once your upload completes to extract your uploaded files.

In [None]:
# uncomment for running on colab
# !unzip -qq cv_proj5.zip -d ./

### Preparation

We'll import the required functions and set up GPU computation.

Click on Runtime $\rightarrow$ Change Runtime Type, and select "GPU" under hardware accelerator.

In [None]:
import os

import torch

from proj5_code.runner import Trainer
from proj5_code.optimizer import get_optimizer
from proj5_code.simple_net import SimpleNet
from proj5_code.simple_net_final import SimpleNetFinal
from proj5_code.my_resnet import MyResNet18
from proj5_code.data_transforms import (get_fundamental_transforms,
                                        get_fundamental_normalization_transforms,
                                        get_fundamental_augmentation_transforms,
                                        get_all_transforms)
from proj5_code.stats_helper import compute_mean_and_std
from proj5_code.confusion_matrix import (generate_confusion_data, generate_confusion_matrix,
                                         plot_confusion_matrix, get_pred_images_for_target,
                                         generate_and_plot_confusion_matrix)
from proj5_code.dl_utils import save_trained_model_weights

%load_ext autoreload
%autoreload 2

In [None]:
from proj5_unit_tests.utils import verify
from proj5_unit_tests.test_stats_helper import test_mean_and_variance
from proj5_unit_tests.test_image_loader import (test_dataset_length, test_unique_vals,
                                                test_class_values, test_load_img_from_path)
from proj5_unit_tests.test_data_transforms import (test_fundamental_transforms, 
                                                   test_data_augmentation_transforms,
                                                   test_data_augmentation_with_normalization_transforms)
from proj5_unit_tests.test_dl_utils import test_compute_accuracy, test_compute_loss
from proj5_unit_tests.test_simple_net import test_simple_net
from proj5_unit_tests.test_simple_net_final import test_simple_net_final
from proj5_unit_tests.test_my_resnet import test_my_resnet
from proj5_unit_tests.test_confusion_matrix import (test_generate_confusion_matrix, 
                                                    test_generate_confusion_matrix_normalized)

In [None]:
is_cuda = True
is_cuda = is_cuda and torch.cuda.is_available() # will turn off cuda if the machine doesnt have a GPU

In [None]:
data_path = '../data/' if not use_colab else './data/'
model_path = '../model_checkpoints/' if not use_colab else './model_checkpoints/'

## Part 1: SimpleNet
To train a network in PyTorch, we need 4 components:
1. **Dataset** - an object which can load the data and labels given an index.
2. **Model** - an object that contains the network architecture definition.
3. **Loss function** - a function that measures how far the network output is from the ground truth label.
4. **Optimizer** - an object that optimizes the network parameters to reduce the loss value.

### Part 1.1: Datasets
Now let's create the **Datasets** object to be used later. Remember back in Project 1, we have initialized such a class to load 5 images? Here the task is similar: we have to load each image as well as it's classification label. The key idea is to store the paths to all the images in your dataset, and then be able to provide the image file path and its ground truth class id when given the index of a data example.

We will map the scene names (text) into indices 0 to 14 in the image loader. You can choose any mapping you want but once fixed, it has to be consistent throughout this notebook.

**TODO 1:** complete the `image_loader.py`

In [None]:
inp_size = (64,64)
print("Testing your image loader (length):", verify(test_dataset_length))
print("Testing your image loader (values):", verify(test_unique_vals))
print("Testing your image loader (classes):", verify(test_class_values))
print("Testing your image loader (paths):", verify(test_load_img_from_path))

### Data transforms
**TODO 2:** complete the function `get_fundamental_transforms()` in `data_transforms.py` to compile the following fundamental transforms:
1. Resize the input image to the desired shape;
2. Convert it to a tensor.

In [None]:
print("Testing your fundamental data transforms: ", verify(test_fundamental_transforms))

### Part 1.2: Model
The data is ready! Now we are preparing to move to the actual core of deep learning: the architecture. To get you started in this part, simply define a **2-layer** model in the `simple_net.py`. Here by "2 layers" we mean **2 convolutional layers**, so you need to figure out the supporting utilities like ReLU, Max Pooling, and Fully Connected layers, and configure them with proper parameters to make the tensor flow.

You may refer to Figure 2 in proj5 handout for a sample network architecture (it's the architecture TAs used in their implementation and is sufficient to get you pass Part 1).

**TODO 3**: Do the following in ```simple_net.py```:
- ```self.conv_layers```
- ```self.fc_layers```
- ```forward()```

Leave the ```self.loss_criterion = None``` for now.

In [None]:
print("Testing your SimpleNet architecture: ", verify(test_simple_net))

In [None]:
simple_model = SimpleNet()

### Loss function
When defining your model architecture, also initialize the `loss_criterion` variable there. Remeber this is multi-class classification problem, and choose the [appropriate loss function](https://pytorch.org/docs/stable/nn.html#loss-functions) might be useful here.

**TODO 4:** Assign a loss function to ```self.loss_criterion``` in ```simple_net.py```.

In [None]:
print(simple_model)

### Optimizer
**TODO 5:** **initialize the following cell with proper values for learning rate and weight decay** (you will need to come back and tune these values for better performance once the trainer section is done)

In [None]:
# TODO: add a decent initial setting and tune from there. The values are intentionally bad.
optimizer_config = {
  "optimizer_type": "adam",
  "lr": 1e-3,
  "weight_decay": 1e-1
}

**TODO 6:** complete the ```get_optimizer()``` function in ```optimizer.py```. The helper function accepts three basic configurations as defined below. Any other configuration is optional. *SGD* optimizer type should be supported, anything else is optional.

In [None]:
optimizer = get_optimizer(simple_model, optimizer_config)

### Part 1.3: Trainer

**TODO 7:** Next we define the trainer for the model; to start, we will need to do the following in ```dl_utils.py```:
- ```compute_loss()```: use the model's loss criterion and compute the corresponding loss between the model output and the ground-truth labels.
- ```compute_accuracy()```: compute the classification accuracy given the prediction logits and the ground-truth labels.

In [None]:
print("Testing your trainer (loss values): ", verify(test_compute_loss))
print("Testing your trainer (accuracy computation): ", verify(test_compute_accuracy))

Then pass in the model, optimizer, transforms for both the training and testing datasets into the trainer, and proceed to the next cell to train it. If you have implemented everything correctly, you should be seeing a decreasing loss value.

**Note** in this project, we will be using the test set as the validation set (i.e. using it to guide our decisions about models and hyperparameters while training. In actual practise, you would not interact with the test set until reporting the final results.

**Note** that your CPU should be sufficient to handle the training process for all networks in this project, and the following training cells will take less than 5 minutes; you may also want to decrease the value for `num_epochs` and quickly experiment with your parameters. The default value of **30** is good enough to get you around the threshold for Part 1, and you are free to increase it a bit and adjust other parameters in this part.

In [None]:
# re-init the model so that the weights are all random
simple_model_base = SimpleNet()
optimizer = get_optimizer(simple_model_base, optimizer_config)

trainer = Trainer(data_dir=data_path, 
                  model = simple_model_base,
                  optimizer = optimizer,
                  model_dir = os.path.join(model_path, 'simple_net'),
                  train_data_transforms = get_fundamental_transforms(inp_size),
                  val_data_transforms = get_fundamental_transforms(inp_size),
                  batch_size = 32,
                  load_from_disk = False,
                  cuda = is_cuda
                 )

In [None]:
%%time
trainer.run_training_loop(num_epochs=30)

After you have finished the training process, now plot out the loss and accuracy history. You can also check out the final accuracy for both training and testing data. Copy the accuracy plots and values onto the report, and answer the questions there. 

In [None]:
trainer.plot_loss_history()
trainer.plot_accuracy()

In [None]:
train_accuracy = trainer.train_accuracy_history[-1]
validation_accuracy = trainer.validation_accuracy_history[-1]
print('Train Accuracy = {}; Validation Accuracy = {}'.format(train_accuracy, validation_accuracy))

**TODO 8:** Obtain **45%** validation accuracy to receive full credit for Part 1. You can go back to TODO 5 to tune your parameters using the following tips:
1. If the loss decreases very slowly, try increasing the value of the lr (learning rate).
2. Initially keep the value of weight decay (L2-regularization) very low.
3. Try to first adjust lr in multiples of 3 initially. When you are close to reasonable performance, do a more granular adjustment.
4. If you want to increase the validation accuracy by a little bit, try increasing the weight_decay to prevent overfitting. Do not use tricks from Section 6 just yet.

### Save the model for your SimpleNet
Once you are satisfied with the performance of your trained model, you need to save it so that you can upload it to Gradescope along with the other models.

We'll save the model to the current directory. If you're running locally on your computer, this should be in the `proj5_code` folder, which is the desired location for uploading to gradescope.

If you are running on Colab, make sure you download the trained `.pt` files that will be generated. This process is similar to that of Project 4.
- Click on the folder icon in the left hand side menu
- Select the 3 dots next to the `<out_name>.pt` file that is generated and click download
- Store the file in the `proj5_code` folder.

In [None]:
save_trained_model_weights(simple_model_base, out_dir="./")

## Part 2: SimpleNet with additional modifications

In Part 1 we implemented a basic CNN, but it doesn’t perform very well.  Let’s try a few tricks to see if we canimprove our model performance. You can start by copying your `SimpleNet` architecture from `simple_net.py` into `SimpleNetFinal` class in `simple_net_final.py`.

### Part 2.1: Problem 1 We don’t have enough training data. Let’s “jitter.”

We are going to increase our amount of training data by left-right mirroring and color jittering the training images during the learning process.

**TODO 9:** complete the `get_fundamental_augmentation_transforms()` function in `data_transforms.py`: first copy your existing fundamental transform implementation into this function, and then insert a couple of other transforms which help you do the above adjustment.                 

Useful functions:`transforms.RandomHorizontalFlip`, `transforms.ColorJitter`

In [None]:
print("Testing your data transforms with data augmentation: ", verify(test_data_augmentation_transforms))

### Part 2.2: Problem 2 The images aren’t zero-centered and variance-normalized.

We are going to "zero-center" and "normalize" the dataset so that each entry has zero mean and the overall standard deviation is 1. 

**TODO 10**:  fill in the `compute_mean_and_std()` in `stats_helper.py` to compute the **mean** and **standard deviation** of both training and validation data.

In [None]:
print("Testing your mean and std computation: ", verify(test_mean_and_variance))
dataset_mean, dataset_std = compute_mean_and_std(data_path)

In [None]:
print('Dataset mean = {}, standard deviation = {}'.format(dataset_mean, dataset_std))

**TODO 11**: complete the function `get_all_transforms()` function in `data_transforms.py` to normalize the input using the passed in mean andstandard deviation: you need to copy your implementation of `get_fundamental_augmentation_transforms()` into this function first.                 

In [None]:
print("Testing your normalized data transforms: ", verify(test_data_augmentation_with_normalization_transforms))

In [None]:
inp_size = (64,64)

### Part 2.3-2.5: Problem 3 ~ 5: Modify the network.

**TODO 12:** modify the layers in the `SimpleNet` class in `simple_net.py`:
1. Add the dropout layer
2. Add one or two more blocks of “conv/pool/relu”.
3. Add a batch normalization layer after each convolutional layer (except for the last)

In [None]:
print("Testing your SimpleNetFinal architecture: ", verify(test_simple_net_final))

In [None]:
simple_model_final = SimpleNetFinal()
print(simple_model_final)

Similar to the previous part, **initialize the following cell with proper values for learning rate and weight decay**.

In [None]:
# TODO: add a decent initial setting and tune from there. The values are intentionally bad.
optimizer_config = {
  "optimizer_type": "adam",
  "lr": 1e-3,
  "weight_decay": 1e-1
}

The following cell will take longer than Part 1.3, as now we have more data (and more variability), and the model is slightly more complicated than before as well; however, it should finish within 10~15 minutes anyway, and the default num_epochs is also good enough as a starting point for you to pass this part.

In [None]:
simple_model_final = SimpleNetFinal()
optimizer = get_optimizer(simple_model_final, optimizer_config)

trainer = Trainer(data_dir=data_path, 
                  model = simple_model_final,
                  optimizer = optimizer,
                  model_dir = os.path.join(model_path, 'simple_model_final'),
                  train_data_transforms = get_all_transforms(inp_size, [dataset_mean], [dataset_std]),
                  val_data_transforms = get_fundamental_normalization_transforms(inp_size, [dataset_mean], [dataset_std]),
                  batch_size = 32,
                  load_from_disk = False,
                  cuda = is_cuda
                 )

In [None]:
%%time
trainer.run_training_loop(num_epochs=30)

Similar to Part 1, now plot out the loss and accuracy history. Also copy the plots onto the report, and answer the questions accordingly.

In [None]:
trainer.plot_loss_history()
trainer.plot_accuracy()

In [None]:
train_accuracy = trainer.train_accuracy_history[-1]
validation_accuracy = trainer.validation_accuracy_history[-1]
print('Train Accuracy = {}; Validation Accuracy = {}'.format(train_accuracy, validation_accuracy))

**TODO 13:** Obtain a **55%** validation accuracy to receive full credit for Part 2.

### Save the model for your SimpleNetFinal

In [None]:
save_trained_model_weights(simple_model_final, out_dir="./")

### Part 2.6: Analysis using confusion matrix
A confusion matrix is a helpful tool for visualizing the performance of classification algorithms. Each row of the matrix represents the instances in a predicted class, while each column represents the instances in an actual class. The confusion matrix counts the number of instances of a given (target, prediction) pair. We are able to use this to understand the classification behaviour.

A confusion matrix can also be normalized by dividing each row by the total number of instances of the target class. This is helpful for comparing between large and small datasets, as well as when there is significant class imbalance.

**TODO 14:** Do the following to visualize the confusion matrix:

1. Implement the code to extract the predictions and targets from a model and a dataset
2. Implement the code to generate the confusion matrix, and its normalized form
3. Plot the confusion matrix and try to understand how your model is performing, and where it falls short. We'll use this later on for the report.

In [None]:
print(verify(test_generate_confusion_matrix))
print(verify(test_generate_confusion_matrix_normalized))

In [None]:
%%time
targets, predictions, class_labels = generate_confusion_data(trainer.model,
                                                             trainer.val_dataset,
                                                             use_cuda=is_cuda)

In [None]:
confusion_matrix = generate_confusion_matrix(targets, predictions, len(class_labels))

In [None]:
plot_confusion_matrix(confusion_matrix, class_labels)

## Part 3: ResNet
You can see that after the above adjustment, our model performance increases in terms of testing accuracy. Although the training accuracy drops, now it's closer to the testing values and that's more natural in terms of performance. But we are not satisfied with the final performance yet. Our model, in the end, is still a 2-layer SimpleNet and it might be capable of capturing some features, but could be improved a lot if we go **deeper**. In this part we are going to see the power of a famous model: ResNet18.

In [None]:
inp_size = (224, 224)

### Part 3.1 & 3.2: Fine-tuning the ResNet
Now let's define a ResNet which can be fit onto our dataset. PyTorch has provided us with pre-trained models like ResNet18, so what you want to do is to load the model first, and then adjust some of the layers such that it fits with our own dataset, instead of outputing scores to 1000 classes from the original ResNet18 model.


**TODO 15:** Switch to `my_resnet.py`, and copy the network architecture and weights of all but the last fc layers from the pretrained network.

After you have defined the correct architecture of the model, make some tweaks to the existing layers: **freeze** the **convolutional** layers and first 2 **linear** layers so we don't update the weights of them; more details can be found in the instruction webpage.

Note that you are allowed to add more layers/unfreeze more layers if you see fit.

In [None]:
print("Testing your ResNet architecture: ", verify(test_my_resnet))

In [None]:
my_resnet = MyResNet18()
print(my_resnet)

In [None]:
# TODO: add a decent initial setting and tune from there. The values are intentionally bad.
optimizer_config = {
  "optimizer_type": "sgd",
  "lr": 1e-10,
  "weight_decay": 1e-1
}

In [None]:
my_resnet = MyResNet18()
optimizer = get_optimizer(my_resnet, optimizer_config)

trainer = Trainer(data_dir=data_path, 
                  model = my_resnet,
                  optimizer = optimizer,
                  model_dir = os.path.join(model_path, 'resnet18'),
                  train_data_transforms = get_all_transforms(inp_size, [dataset_mean], [dataset_std]),
                  val_data_transforms = get_fundamental_normalization_transforms(inp_size, [dataset_mean], [dataset_std]),
                  batch_size = 32,
                  load_from_disk = False,
                  cuda = is_cuda
                 )

The following training cell will take roughly 20 minutes or slightly more using CPU (but possibly under 5 minute using GPU depending on the batch size; the TAs got it within 3 minutes on a GTX1060).

In [None]:
%%time
trainer.run_training_loop(num_epochs=5)

Like both previous sections, you are required to pass a threshold of **85%** for this part. Copy the plots and values onto the report and answer questions accordingly.

In [None]:
trainer.plot_loss_history()
trainer.plot_accuracy()

In [None]:
train_accuracy = trainer.train_accuracy_history[-1]
validation_accuracy = trainer.validation_accuracy_history[-1]
print('Train Accuracy = {}; Validation Accuracy = {}'.format(train_accuracy, validation_accuracy))

**TODO 16**: Obtain a **85%** validation accuracy to receive full credits for Part 3.

### Save Trained MyResnet18 model

In [None]:
save_trained_model_weights(my_resnet, out_dir="./")

### Part 2.3 Visualize and Analyze Confusion Matrix

**TODO 17:** Visualize and analyze the confusion matrix.

You'll need to find an example of an image that is misclassified for the report. Use the confusion matrix and the `get_pred_images_for_target` function to help your analysis

In [None]:
generate_and_plot_confusion_matrix(my_resnet, trainer.val_dataset, use_cuda=is_cuda)

In [None]:
#########################
# Use this cell to visualize your images depending on the confusion matrix visualization
#########################