# Lab: Fine-tuning an Image Model for Weather Classification

As discussed in class, training large computer vision models from scratch takes enormous amounts of data and computational training.  If you don't have days to wait, it is best to **fine tune** a pre-existing well-trained model.  In this lab, we will demonstrate how to fine tune a model for weather classification.  

In going through this lab, you will learn to:

* Download a dataset form **Kagglehub**
* Use operating system python calls to explore files and perform tasks such as splitting the data into training and test
* Download a **base model**.  In this lab, we use the simple MobileNetV2 since it is computationally easy
* **Fine tune** the model with a simple **classifier head**
* Add progress bars in trianing and evalating while training is occuring


## Using Google Colab's Free Tier GPU
The lab is greatly assisted with a GPU.  While most laptops have GPUs, they are generally not useful for ML.  So, assuming you do not have access to a specific machine designed for ML, I suggest you use Google colab.  You can follow the instructions on this [Stanford blog](https://rcpedia.stanford.edu/blog/2024/03/28/train-machine-learning-models-on-colab-gpu/#:~:text=in%20your%20Drive.-,Switch%20to%20Using%20a%20GPU,Click%20Save%20.) to select a GPU.  

The free tier of Google Colab comes with a T4 Telsa GPU which is sufficient for this lab.  For more money, you can select the A100 GPU which a bit faster.  But, for this relatively small model, there is not much gain going to the A100 GPU (I found it only goes about 50% faster).




## Loading the dataset from Kaggle

We will use a [Kaggle dataset](https://www.kaggle.com/datasets/jehanbhathena/weather-dataset) with images of scenes in different weather conditions like frost, rain, ...  
First, we download the data set with the following command.  The total dataset is almost 1 GB large, so the downloading could take a few minutes.

In [None]:
import kagglehub

# Download latest version
path = kagglehub.dataset_download("jehanbhathena/weather-dataset")

print("Path to dataset files:", path)

After the download, there will be one sub-folder for each category.  The directory structure will look like:

~~~
dataset_path/
â”œâ”€â”€ dew/
â”‚   â”œâ”€â”€ image1.jpg
â”‚   â”œâ”€â”€ image2.jpg
â”‚   â””â”€â”€ ...
â”œâ”€â”€ fogsmog/
â”‚   â”œâ”€â”€ image1.jpg
â”‚   â”œâ”€â”€ image2.jpg
â”‚   â””â”€â”€ ...
â”œâ”€â”€ frost/
â”‚   â”œâ”€â”€ image1.jpg
â”‚   â”œâ”€â”€ image2.jpg
â”‚   â””â”€â”€ ...
â”œâ”€â”€ glaze/
â”‚   â”œâ”€â”€ image1.jpg
â”‚   â”œâ”€â”€ image2.jpg
â”‚   â””â”€â”€ ...
~~~


 Print the names of the sub-folders and the number of files in each sub-folder.  Also, create a list `categories` with  Some useful commands are:
* `os.listdir` which lists all the directories
* `os.path.join(dataset_path, d)`:  creates the path for the subfolder `d` in `dataset_path`.

In [None]:
import os
import pandas as pd
dataset_path = os.path.join(path, "dataset")


# TODO:  Populate the lists below with category names and their corresponding file counts
#   categories = ...
#   file_counts = ...


# TODO: Create a DataFrame from the lists and print it


Next, randomly select an image from each category and print it.

* Use the `fig,axs = plt.subplot(...)` to create an array to plot the images.  
* Loop over the categories
* In each category directory, randomly select an image file
* Use the `img = Image.open(img_path)` to load the image.
* Use `ax[i].imshow(img)` to display the image.

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import random
from PIL import Image



nclass = len(categories)
nrow = 2
ncol = (nclass + 1) // nrow

# Create the subplot grid with nrow and ncol rows and columns
#   fig, axs = plt.subplots(...)
fig, axs = plt.subplots(nrow, ncol, figsize=(15, 5))

# TODO:  Randomly select an image from each category and display it



## Creating a Training and Test Datasets
This particular dataset has a single set of images.  Write code that will create two directories `train` and `test`, each with a sub-folder structure  with one set of images per class.  You should randomly place a fraction `split_ratio` of the images in the `train` folder, and the remaining in the `test` folder.

In [None]:
# Set the training and test directory paths
train_dir = os.path.normpath(os.path.join(path, '..',"train"))
test_dir = os.path.normpath(os.path.join(path, '..',"test"))

# TODO:  Loop over each category and split the images into training and test sets
# in each category according to the split_ratio


# TODO:  Count the number of images in each category for training and test sets
# and display the counts in a DataFrame


## Downloading the base model

Pytorch ha a number of excellent pre-trained models that we can use for fine tuning.    For this lab, to make the training easy, we will use a lightweight model called **MobileNetV2**.  MobileNetV2 is a CNN developed by Google that targeting mobile devices that are computationally limited. It uses an architecture with inverted residual blocks and linear bottlenecks to improve performance while keeping computational costs low. A nice summary of the model can be found in this [Medium post](https://medium.com/codex/a-summary-of-the-mobilenetv2-inverted-residuals-and-linear-bottlenecks-paper-e19b187cb78a).  We can download the model as follows.

First we download the key packages and set a `transform` needed for the MobileNetV2.

In [None]:
import os
import torch
import torchvision
import torchvision.transforms as transforms
from torchvision import datasets, models
from torch.utils.data import DataLoader
import torch.nn as nn
import torch.optim as optim

In [None]:
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])


Next, we create `DataLoader` classes to load the models.


In [None]:
# TODO:  Create the training and test datasets.  Use the `datasets.ImageFolder` class
# with the appropriate directory and transform.
#    train_dataset = ...
#    test_dataset = ...


# TODO:  Create the training and test dataloaders.  Use a batch size of 16.
#    train_loader = ...
#    test_loader = ...


# TODO:  Print the number of classes.



Now we download the pre-trained model.

In [None]:
model = models.mobilenet_v2(pretrained=True)



The MobileNetV2 model has two main components:
- `model.features`
This is the **base model**, also known as the **backbone**. It contains the convolutional layers that extract features from the input image.
ðŸ‘‰ In transfer learning, we typically freeze this part to retain the pretrained feature extractor.
- `model.classifier`
This is the **classifier head**. It takes the output from `model.features` and maps it to the final class predictions.
ðŸ‘‰ We **replace and train** this part to adapt the model to our specific task (e.g., weather classification)


First, we can see all the layers with the following command.

In [None]:
model.features

To get some insight into the model:

* Loop over the layers in the model with `name, layer in model.features.named_children`
* For each layer, get the layer type with `layer_type = layer.__class__.__name__`
* Get the total number of parameters in the layers
    * You can loop over the parameters with `for p in layers.parameters`
    * Then count the number of paramters with `p.numel()`
* Print a `pandas.DataFrame` with the layer `name`, `layer_type`, and number of elements.
* Also, print the total number of paramters.

You should see that the model has about 2.2M parameters.

In [None]:
# TODO:  Get the layer names, types, and number of parameters in each layer of
# model.features
#    layer_data = []
#    for name, layer in model.features.named_children():
#       layer_datai = {'name': ..., 'layer_type': ..., 'num_param': ...}
#       layer_data.append(layer_datai)


# TODO:  Create a pandas DataFrame and print the layer data


# TODO:  Print the total number of parameters



We will next the number of features from the final layer of the features model.  This value will be the number of inputs to the classifier head.

In [None]:
# TODO:  Get the number of outputs of the final layer of model.features
#    num_features = ...

Now, let's look at the classifier.  This is a simple model with two layers:
* A dropout layer
* A fully connected layer with 1000 output features for a 1000-way softmax (recall the original ImageNet has 1000 classes).

In [None]:
model.classifier

For fine-tuning, we will replace the classifer head with a small MLP:
*  A `nn.Dropout` layer with 0.2 dropout
*  A `nn.Linear` layer taking the `num_features` input to the `num_hidden` output
* A ReLU activation
* A final linear layer with the one output for each class.

In [None]:
num_hidden = 100

# TODO:  Replace the classifier head
#  model.classifier = nn.Sequential(...)



Next, we **freeze** the parameters in all the `model.features` layers, so we only retrain the final layer.  This will make the training much faster.  You can loop over `model.features.parameters()` and set `param.requires_grad = False`.

In [None]:
# TODO:  Freeze all the parameters in model.features

To confirm we set everything correctly, loop over `model.parameters()` and find the total number of parameters that are trainable and total number that are fixed.  You should get that only about 129,000 are trainable.

In [None]:
# TODO:  Print the total number of trainable and fixed parameters
#   trainable = ...
#   fixed = ...


## Loading the model to a GPU
For training image models, it greatly helps to use a GPU.  
* Use the `torch.cuda.is_available()` to see if a GPU is available.  
* If so, print the number of GPUs with `torch.cuda.device_count()`
* Also, print the GPU name with `torch.cudu.get_device_name` and `torch.cuda.get_current_device()`.


In [None]:
# TODO:  See if there is a GPU and what type of GPU



Finally, we move the model to the GPU with the following command.

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)



## Training the model
We are now ready to train our model.  First we select the loss function criterion and optimizer.

In [None]:
# TODO:  Set loss function and optimizer
#   criterion = ...  (use cross entropy loss)
#   optimizer = ...  (use Adam with a lr=1e-4)


You can now train the model by completing the following code.  With a T4 GPU, each epoch should complete in a few minutes.  You should be able to get about 85% accuracy.  You can higher accuracy with a larger model, larger classifier head, and more time.  But, I want you to just understand the basic ideas.



In [None]:
from tqdm import tqdm

nepochs = 5
for epoch in range(nepochs):  # Adjust epochs as needed
    model.train()
    running_loss = 0.0

    loop = tqdm(train_loader, desc=f"Epoch {epoch+1}", leave=True)
    for images, labels in loop:

        # TODO:  Move the images to the GPU
        #   images = images.to(...)
        #   labels = labels.to(...)
        

        # TODO:  Perform the back-prop on the data
        #    optimizer.zero_grad()
        #    outputs = ...
        #    loss = ...
        #    ...
        

        # Update the running loss and progress bar
        running_loss += loss.item()
        loop.set_postfix(loss=loss.item())

    print(f"Epoch {epoch+1}, Training Loss: {running_loss:.4f}")

    # Evaluation after each epoch
    model.eval()
    correct = 0
    total = 0

    with torch.no_grad():
      loop = tqdm(test_loader, desc="Evaluating", leave=True)

      for images, labels in loop:

          # TODO:  Move the images and labels to the GPU
          #   images = images.to(...)
          #   labels = labels.to(...)
          

          # TODO:  Update the total number of correct and total number of images
          #    correct += ...
          #    total += ...
          

          # Update postfix with current accuracy
          accuracy = 100 * correct / total
          loop.set_postfix({'Accuracy': f'{accuracy:.2f}%'})

    accuracy = 100 * correct / total
    print(f"Epoch {epoch+1}, Test Accuracy: {accuracy:.2f}%\n")



In [None]:
# Save the model
torch.save(model.state_dict(), "mobilenetv2_weights.pth")


## Evaluating the model

Let's conclude by evaluating the model.  First we reset the model architecture and then load the parameters.

In [None]:
# Set parameters
num_hidden = 100

# Load base model
model = models.mobilenet_v2()

# TODO:  Rebuild the classifier head
#  model.classifier = nn.Sequential(...)


# Load weights
model.load_state_dict(torch.load("mobilenetv2_weights.pth"))
model.to(device)
model.eval();


Now run the model and create a confusion matrix.
* Evaluate the model on the `test_loader` dataset
* Add a `tqdm` loop to display the progress as it evaluates the data
* Create a confusion matrix and display the confusion matrix with `ConfusionMatrixDisplay` function.

In [None]:
from tqdm import tqdm
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay


# TODO:  Evaluate the model on the test_loader dataset.
#
#  with torch.no_grad():
#      for images, labels in test_loader:
#        ...
#
#  cm  = confusion_matrix(...)

model.eval()
correct = 0
total = 0
all_preds = []
all_labels = []


Finally, print the top `k=10` confusion matrix pairs with the highest percent errors.  Your final list should be something like:
~~~
Top 10 pairs with the highest error:
      snow ->      glaze:  error=0.08750
     frost ->       snow:  error=0.07586
     glaze ->       hail:  error=0.07563
      rain ->  lightning:  error=0.07512
      hail ->      glaze:  error=0.06250
     glaze ->       snow:  error=0.05517
     frost ->  sandstorm:  error=0.05303
      snow ->      frost:  error=0.05128
 sandstorm ->      frost:  error=0.04487
      hail ->       snow:  error=0.04138
~~~




In [None]:
# TODO
# Print the name of the category pair with the highest error.
