<a href="https://colab.research.google.com/github/rahiakela/deep-learning-research-and-practice/blob/main/deep-learning-with-pytorch-step-by-step/Part-II-Computer-Vision/05_vanishing_and_exploding_gradients.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Vanishing and Exploding Gradients

Now, let's consider what is Transfer learning?

The idea is quite simple. First, some big tech company, which has access to virtually
infinite amounts of data and computing power, develops and trains a huge model
for their own purpose. 

Next, once it is trained, its architecture and the corresponding trained weights (the pre-trained model) are released. Finally,
everyone else can use these weights as a starting point and fine-tune them
further for a different (but similar) purpose.

That’s transfer learning in a nutshell.

Now, we are aware of the necessary steps to use transfer learning
with pre-trained models for computer vision tasks: using ImageNet statistics for
pre-processing the inputs, freezing layers (or not), replacing the "top" layer, and
optionally speeding up training by generating features and training the "top" of
the model independently.



##Setup

In [None]:
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:80% !important; }</style>"))

In [None]:
try:
    import google.colab
    import requests
    url = 'https://raw.githubusercontent.com/dvgodoy/PyTorchStepByStep/master/config.py'
    r = requests.get(url, allow_redirects=True)
    open('config.py', 'wb').write(r.content)    
except ModuleNotFoundError:
    pass

from config import *
config_chapter7()
# This is needed to render the plots in this chapter
from plots.chapter7 import *

Downloading files from GitHub repo to Colab...
Finished!


In [None]:
import numpy as np
from PIL import Image

import torch
import torch.optim as optim
import torch.nn as nn
import torch.nn.functional as F

from torch.utils.data import DataLoader, Dataset, random_split, TensorDataset
from torchvision.transforms import Compose, ToTensor, Normalize, Resize, ToPILImage, CenterCrop, RandomResizedCrop
from torchvision.datasets import ImageFolder
from torchvision.models import alexnet, resnet18, inception_v3
#from torchvision.models.alexnet import model_urls
try:
  from torchvision.models.utils import load_state_dict_from_url
except ImportError:
  from torch.hub import load_state_dict_from_url

from stepbystep.v3 import StepByStep
from data_generation.rps import download_rps

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [None]:
import os
# content/gdrive/My Drive/Kaggle is the path where kaggle.json is  present in the Google Drive
os.environ['KAGGLE_CONFIG_DIR'] = "/content/gdrive/MyDrive/kaggle-keys"

In [None]:
%%shell

# download dataset from kaggle> URL: https://www.kaggle.com/datasets/sanikamal/rock-paper-scissors-dataset
kaggle datasets download -d sanikamal/rock-paper-scissors-dataset

unzip -qq rock-paper-scissors-dataset.zip
rm -rf rock-paper-scissors-dataset.zip

Downloading rock-paper-scissors-dataset.zip to /content
 97% 438M/452M [00:04<00:00, 100MB/s]
100% 452M/452M [00:04<00:00, 102MB/s]




In [None]:
device = "cuda" if torch.cuda.is_available() else "cpu"

In [None]:
def freeze_model(model):
  for parameter in model.parameters():
    parameter.requires_grad = False

def preprocessed_dataset(model, loader, device=None):
  if device is None:
    device = next(model.parameters()).device
  
  features = None
  labels = None

  for i, (x, y) in enumerate(loader):
    model.eval()
    x = x.to(device)
    output = model(x)
    if i == 0:
      features = output.detach().cpu()
      labels = y.cpu()
    else:
      features = torch.cat([features, output.detach().cpu()])
      labels = torch.cat([labels, y.cpu()])

  dataset = TensorDataset(features, labels)
  return dataset

## Data Preparation

The data preparation step will be a bit more demanding this time since we’ll be
standardizing the images.Besides, we can use the ImageFolder dataset now.

The Rock Paper Scissors dataset is organized like that:

```
rps/paper/paper01-000.png
rps/paper/paper01-001.png

rps/rock/rock01-000.png
rps/rock/rock01-001.png

rps/scissors/scissors01-000.png
rps/scissors/scissors01-001.png
```

The dataset is also perfectly balanced, with each sub-folder containing 840 images
of its particular class.

In [None]:
ROOT_FOLDER = "Rock-Paper-Scissors"

Since we’re using a pre-trained model, we need to use the standardization
parameters used to train the original model. 

In other words, we need to use the
statistics of the original dataset used to train that model.

So, the data preparation step for the Rock Paper Scissors dataset looks like this now:

In [None]:
normalizer = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
composer = Compose([
  Resize(256),
  CenterCrop(224),
  ToTensor(),
  normalizer
])

train_data = ImageFolder(root=f"{ROOT_FOLDER}/train", transform=composer)
val_data = ImageFolder(root=f"{ROOT_FOLDER}/test", transform=composer)

# Builds a loader of each set
train_loader = DataLoader(train_data, batch_size=16, shuffle=True)
val_loader = DataLoader(val_data, batch_size=16)

##Fine-Tuning

Let's use the smallest version of the `ResNet` model (`resnet18`) and either
fine-tune it or use it as a feature extractor only.

In [None]:
torch.manual_seed(42)

model = resnet18(weights=True)
model.fc = nn.Linear(512, 3)

There is no freezing since fine-tuning entails the training of all the weights, not only
those from the "top" layer.

In [None]:
multi_loss_fn = nn.CrossEntropyLoss(reduction="mean")
optimizer_model = optim.Adam(model.parameters(), lr=3e-4)

We have everything set to train.

In [None]:
sbs_transfer = StepByStep(model, multi_loss_fn, optimizer_model)
sbs_transfer.set_loaders(train_loader, val_loader)
sbs_transfer.train(1)

Let’s see what the model can accomplish after training for a single epoch.

In [None]:
StepByStep.loader_apply(val_loader, sbs_transfer.correct)

tensor([[124, 124],
        [124, 124],
        [124, 124]])

If we had frozen the layers in the model above, it would have been a case of
feature extraction suitable for data augmentation since we would be training the
"top" layer while it was still attached to the rest of the model.

##Feature Extraction

So, we’re modifying the model (replacing the "top" layer
with an identity layer) to generate a dataset of features first and then using it to
train the real "top" layer independently.

In [None]:
# Model Configuration
model = resnet18(weights=True).to(device)
model.fc = nn.Identity()
freeze_model(model)

In [None]:
# Data Preparation — Preprocessing
train_preproc = preprocessed_dataset(model, train_loader)
val_preproc = preprocessed_dataset(model, val_loader)
train_preproc_loader = DataLoader(train_preproc, batch_size=16, shuffle=True)
val_preproc_loader = DataLoader(val_preproc, batch_size=16)

Once the dataset of features and its corresponding loaders are ready, we only need
to create a model corresponding to the "top" layer and train it in the usual way.

In [None]:
# Model Configuration — Top Model
torch.manual_seed(42)

top_model = nn.Sequential(nn.Linear(512, 3))

multi_loss_fn = nn.CrossEntropyLoss(reduction="mean")
optimizer_top = optim.Adam(top_model.parameters(), lr=3e-4)

In [None]:
# Model Training — Top Model
sbs_top = StepByStep(top_model, multi_loss_fn, optimizer_top)
sbs_top.set_loaders(train_preproc_loader, val_preproc_loader)
sbs_top.train(10)

In [None]:
# We surely can evaluate the model now
StepByStep.loader_apply(val_preproc_loader, sbs_top.correct)

tensor([[ 98, 124],
        [124, 124],
        [104, 124]])

But, if we want to try it out on the original dataset (containing the images), we need to reattach the "top" layer.

In [None]:
model.fc = top_model
sbs_temp = StepByStep(model, None, None)

In this case, both loss function and
optimizers are set to None since we won’t be training the model anymore.

In [None]:
StepByStep.loader_apply(val_loader, sbs_temp.correct)

tensor([[ 98, 124],
        [124, 124],
        [104, 124]])

We got the same results, as expected.