# CS231a PSET 3 Problem 2: Representation Learning with Self-Supervised Learning


# Overview

In this notebook we will be using the [Fashion MNIST dataset](https://github.com/zalandoresearch/fashion-mnist), a variation on the classic [MNIST dataset](https://en.wikipedia.org/wiki/MNIST_database), to showcase how self-supervised representation learning can be utilized for more efficient training in downstream tasks. We will do the following things:

1. Train a classifier from scratch on the Fashion MNIST dataset and observe how fast and well it learns.

2. Train useful representations via predicting image rotations, rather than classifying clothing types.

3. Transfer our rotation pretraining features to solve the classification task with much less data than in step 1.

First, you should upload the files in 'code/p2' directory onto a location of your choosing in Drive and run the following to have access to them. You can also skip this step and just upload the files directly using the files tab, though any changes you make will be gone if you close the tab or the colab runtime ends.

In [None]:
from google.colab import drive

drive.mount('/content/drive', force_remount=True)

# Enter the foldername in your Drive where you have saved the unzipped
# '.py' files from the p2 folder
# e.g. 'cs231a/pset3/p2'
FOLDERNAME = 'cs231a/pset3/p2'

assert FOLDERNAME is not None, "[!] Enter the foldername."

%cd drive/My\ Drive
%cd $FOLDERNAME

You should now be able to click on the folder icon to the left and see a folder that says 'drive' above a folder that says 'sample_data'. Open it, go to MyDrive, and then navigate to where you put the files. You can double click on any .py file to modify it within this Colab notebook, and we recommend you work on these problems using that. Let's confirm the files are uploaded and accessible:

In [None]:
import import_test

If the above import of works, you are ready to get going with the rest of this problem! Before that, let's make sure you allocate a GPU so that code runs faster: click Runtime -> Change runtime type -> Hardware Accelerator -> GPU and your Colab instance will automatically be backed by GPU compute.

# Fashion MNIST Data Preparation

First, let's get the data prepared. Luckily, PyTorch has a handy function to download it for us in its [torchvision.datasets](https://pytorch.org/docs/stable/torchvision/datasets.html#cifar) package. Go ahead and get the 
required torchvision version by running the following; you'll only need to do so once, and then click Runtime->Restart runtime to move on. Every time you restart the runtime, you'll need to re-run everything.

In [None]:
!pip install torchvision==0.2.1 #need this version to get processed data

Now we can go ahead and get the data:

In [None]:
# Download Fashion MNIST dataset from PyTorch 
import torch
import torchvision
import torchvision.transforms as transforms

transform = transforms.Compose([
                              transforms.Resize((32,32)),
                              transforms.ToTensor(),
                              ])
PATH_TO_STORE_DATA = 'problem2/data/'
dataset_train = torchvision.datasets.FashionMNIST(PATH_TO_STORE_DATA, download=True, train=True, 
                                             transform=transform)
dataset_test = torchvision.datasets.FashionMNIST(PATH_TO_STORE_DATA, download=True, train=False, 
                                            transform=transform)

Now that we have downloaded the data, we will implement a PyTorch [Dataset](https://pytorch.org/tutorials/beginner/data_loading_tutorial.html) so that we can load subsets of the full Fashion MNIST dataset and use either clothing type or image rotation as the label for a given image. Fill in the requisite bits of code in data.py marked with TODO (you can either do so directly through the file explorer on the left or do so locally and re-upload it), and try to execute the following:

In [None]:
from importlib import reload  
import data 

Now, let's create an instance of this Dataset for training Fashion MNIST  classification. We will create two versions of the training dataset, one with all the data and one with a small subset. If you have bugs in your code, simply modify data.py and re-run this bit.

In [None]:
data = reload(data) #reload for making changes during debugging
train_full_dataset = data.MNISTDatasetWrapper(dataset_train, pct=1.0)
test_full_dataset = data.MNISTDatasetWrapper(dataset_test, pct=1.0)
print('Full dataset: {0} Training Samples | {1} Test Samples'.format(
    len(train_full_dataset), len(test_full_dataset)))

train_small_dataset = data.MNISTDatasetWrapper(dataset_train, pct=0.05)
print('Small train dataset: {0} Training Samples'.format(len(train_small_dataset)))

Let's use the handy show_batch function to get an idea of what's in the dataset:

In [None]:
train_full_dataset.show_batch()

# PyTorch Vision Model

Next, we need to define our neural net architectures for training on the data. Because we want to ultimately train for two objectives (clothing type classification and rotation classification), we will do this via several classes so that the weights gotten from representation learning can be re-used later for more efficient clothing classification.
Fill in the marked portions of models.py, and try to execute the following:

In [None]:
import models
models = reload(models) #reload for making changes during debugging

image_embed_net = models.ImageEmbedNet().cuda()
classify_net = models.ClassifyNet(10).cuda()
mnist_classify_model = models.ImageClassifyModel(image_embed_net, classify_net)


If running the above results in errors, revise your code in models.py and re-run as before.

# Training for Fashion MNIST Class Prediction

Let's now implement a method for training on the dataset with the model we defined above. We will create a re-usable function that can be used for both representation learning and learning to classify Fashion MNIST images. This will involve the following:
*   Given the dataset, creating a PyTorch [DataLoader](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader) which can take care of shuffling the dataset as well as combining multiple image.
*   Creating a PyTorch loss function that can be used for optimizing our model for the task of classification. We will use the standard [Cross Entropy Loss](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html).
*   Creating a PyTorch [optimizer](https://pytorch.org/docs/stable/optim.html) to update the weights of the model given the loss computation.
*   Lastly, our two training loops (one for the number of epochs, and one for iterating over the dataset) in which we use all the above to train the model.

Fill in the relevant portions of code in training.py, and try to execute the following to go ahead and train on the Fashion MNIST classification task. If training.py is finished, we now just need to call its train function:

In [None]:
import training

training = reload(training)
# Create fresh model before every run to make sure we start from scratch
image_embed_net = models.ImageEmbedNet().cuda()
classify_net = models.ClassifyNet(10).cuda()
mnist_classify_model = models.ImageClassifyModel(image_embed_net, 
                                                          classify_net)

training.train(train_full_dataset, mnist_classify_model, 16, 10)

You should get training accuracy of around 0.92. With the model now trained, let's implement a test function and call it to see how well it works on the test set. Finish the marked portions in testing.py and run the following:

In [None]:
import testing

testing = reload(testing)
testing.test(test_full_dataset, mnist_classify_model, 16)

You should get test set accuracy slighty lower than the train set accuracy. The accuracy is not great; on such simple data it should be fairly easy to get close to perfect accuracy. We'll try to address this with representation learning.

Before that, let's try training on the smaller train set, and see how well the model can work on the test set.

In [None]:
image_embed_net = models.ImageEmbedNet().cuda()
classify_net = models.ClassifyNet(10).cuda()
mnist_classify_model = models.ImageClassifyModel(image_embed_net, classify_net)
training.train(train_small_dataset, mnist_classify_model, 16, 10)
testing.test(test_full_dataset, mnist_classify_model, 16)

You should get both lower training and testing accuracy, since we are not training with much less data. If we iterate over the data for more epochs it is possible to get better results, but still below the accuracy gotten with the full dataset:

# Representation Learning via Rotation Classification


Now, let's define new datasets for doing our representation learning by predicting the rotation of Fashion MNIST images, and once again call show_batch to get a look at the data:

In [None]:
data = reload(data) #reload for making changes during debugging

train_rotation_dataset = data.MNISTDatasetWrapper(dataset_train, 
                                          pct=1.0, for_rotation_classification=True)
test_rotation_dataset = data.MNISTDatasetWrapper(dataset_test, 
                                         pct=1.0, for_rotation_classification=True)
train_rotation_dataset.show_batch()

Now, let's train a model on the rotation prediction task by once again using our train function:

In [None]:
rotation_image_embed_net = models.ImageEmbedNet().cuda()
rotation_classify_net = models.ClassifyNet(8).cuda()
mnist_rotation_classify_model = models.ImageClassifyModel(rotation_image_embed_net, 
                                                   rotation_classify_net)
training.train(train_rotation_dataset, mnist_rotation_classify_model, 16, 20)

As you should see, the network manages to get quite good at predicting rotations, with around 0.98 accuracy.




We should once again get testing accuracy similar to training accuracy (around 0.98):

In [None]:
testing.test(test_rotation_dataset, mnist_rotation_classify_model, 16)

# Fine-Tuning for Fashion MNIST classification

Now that we have pretrained our model on the rotation prediction task, let's reuse the image embed part of it to train it for the task of class classification. We will use load_state_dict to transfer over the weights from the trained model to a new instance of it, so we can later re-use the same representation learning weights in a different setup. Let's first try it on the full dataset and see how fast it converges compared to when we did not pretrain it.

In [None]:
image_embed_net = models.ImageEmbedNet().cuda()
image_embed_net.load_state_dict(rotation_image_embed_net.state_dict())
classify_net = models.ClassifyNet(10).cuda()
mnist_classify_model = models.ImageClassifyModel(image_embed_net, classify_net)
training.train(train_full_dataset, mnist_classify_model, 16, 10)

In [None]:
testing.test(test_full_dataset, mnist_classify_model, 16)

As we can see, it improves faster and achieves better train and test performance, although the improvement is not huge.

Now, let's try training with the small dataset again and see how well that works:

In [None]:
image_embed_net = models.ImageEmbedNet().cuda()
image_embed_net.load_state_dict(rotation_image_embed_net.state_dict())
classify_net = models.ClassifyNet(10).cuda()
mnist_classify_model = models.ImageClassifyModel(image_embed_net, classify_net)
training.train(train_small_dataset, mnist_classify_model, 16, 10)

In [None]:
testing.test(test_full_dataset, mnist_classify_model, 16)

Now we can see that the with the smaller dataset the pretrained features make a lot of difference, as we get a substantial improvement in training and test accuracy! 

What if we just train for longer? With such a small training dataset, it's possible to achieve perfect accuracy:

In [None]:
image_embed_net = models.ImageEmbedNet().cuda()
image_embed_net.load_state_dict(rotation_image_embed_net.state_dict())
classify_net = models.ClassifyNet(10).cuda()
mnist_classify_model = models.ImageClassifyModel(image_embed_net, classify_net)
training.train(train_small_dataset, mnist_classify_model, 16, 50)


In [None]:
testing.test(test_full_dataset, mnist_classify_model, 16)

As we can see, while training for longer on the small dataset gets perfect train accuracy, the test accuracy is no better than what we got before. Part of the benefit of having pre-trained features is greater robustness to this sort of overfitting.

# Conclusion

That's it, you are done! Remember to submit your code by .py files to the autograder.


Credits: Aspects of this notebook have been adapted from [here](https://colab.research.google.com/github/AmarSaini/Epoching-Blog/blob/master/_notebooks/2020-03-23-Self-Supervision-with-FastAI.ipynb#scrollTo=lsQmOOQsMVFT)