We will review the concepts of transfer learning and semi-supervised learning and try implementing them.  

### Transfer Learning : 

We know that deep learning models are very data hungry. It is not always possible to collect and annotate millions of data for the task of our interest. Instead what we would like to do is transfer learning. 

Transfer learning is a phenomena where we use the existing ConvNet model trained on very large dataset as our fixed feature extractor or as a initialization. We then tune this model to our task of concern. The major transfer learning scenarios are as follows:

- #### ConvNet as fixed feature extractor : 

We take an existing Convolutional Neural Network trained on ImageNet(contains 1.2 million images with 1000 categories) dataset and remove the fully connected layers of the pretrained model. We then freeze the layers(don't update the weights) of the ConvNet. Finally, we add our custom fully connected layers after the feature extractor to train the model. In this model, note that the feature extractor(pretrained model) is frozen implies during training only our newly defined fully connected layers are updated and the feature extractor values are fixed.

- #### Fine-tuning the ConvNet : 

The other strategy is to train the model as a whole. In finetuning, we not only replace and retrain the added layers with the new dataset but also we finetune the whole feature extractor with the new dataset. It means, the pretrained model is no more a fixed feature extractor, it is also adapted to the new dataset by updating the weights. Intuitively, it is better to assume the pretrained model as a good initialization rather than a fixed feature extractor. Depending on the size of available dataset, we either fine-tune all the layers of ConvNet or only some of the layer.
> - If the datset is too small(<1000 samples) finetuning the whole model increases the chances of overfitting. <br>
> - A medium sized datset, finetuning the whole model can be done but regularizing the model carefully will prevent it from overfitting. 


In this assignment we will implement transfer learning by taking a pretrained ConvNet and using it as a fixed feature extractor. We import "Resnet-18" model from the PyTorch models package and use it as a feature extractor. 
We freeze all the layers of Resnet-18 and add a fully-connected layer on the top of it. As discussed above, we train the model (frozen feature extractor + new fc layer) with the new dataset.

In [2]:
import os
import torch
import random
import dataset
import numpy as np
import torch.nn as nn
import torch.optim as optim
from torchvision import models
from data_list import ImageList
from torch.utils.data import DataLoader

ModuleNotFoundError: No module named 'dataset'

In [3]:
# define a pretrained model "Resnet-18" from the models package
model = None

Define a fully connected layer compatible with our dataset and replace it with the existing layer. We know that to initialize a fc-layer we requires the parameters: input size and output size.

- Input dim is the output dim of the flattened layer of the feature extractor.
- Output size is the number of classes


In [None]:
# The output dimension of flatten layer is same as input dimension of fc layer. So, getting the input dimes
inpSize = resnet18.fc.in_features
output_dim = None

# define the fully connected layer below.
fc_layer = None


Replace the existing fully connected layer in Resnet-18 with the newly defined fully connected layer. We can access the fully connected layer in model as **model.fc**

In [1]:
model.fc = None

NameError: name 'model' is not defined

Until now we have defined the model. But, we want the pretrained model to be a fixed feature extractor. As discussed in the above theory, for the model to be used as a fixed feature extractor we need to freeze all the layers other than fully connected layers in the model.    

Freezing the layers is the same as preventing the layers from updating their weights in the optimization step. As discussed in the PyTorch tutorial we can freeze the layers by changing the requires_grad parameter of every layer in the model from True to False. 

In [None]:
## Using for loop we iterate through all the parameters of the model.
## we need to turn off the requires_grad for all tha layers other than fc layer.
## update the if condition, such that fc layers are not considered
## update the param.requires_grad such that layers are frozen.

for name,param in model.named_paramters():
    if _______:
        param.requires_grad = None

We must define an optimizer for udpating the weights and biases of the model. In this assignment we will use an Adam optimizer. *Take a look at the PyTorch tutorial on how to use the optimizer*. Observe carefully that optimizer expects the model parameters as an input argument. In our assignment, we froze the feature extractor and update only the fully connected layer. To optimize only the trainable parameters we filter out the non-trainable parameters 

we can use **filter** function and **lambda operator** to filter out all the non-trainable parameters. 



In [None]:
## With the model fully setup, we now define the hyper parameters we use for training the model.
epochs = 5
learning_rate = None
batch_size = None


With the model initialized, we now train the model with our new dataset. We follow the same steps we worked with in the previous assignment to train our model. We also record the losses for every epoch and plot them. 

In [None]:
## training Loop
for epoch in range(epochs):
    for idx, (images, labels) in enumerate(test_loader):
        
        
    