In [1]:
# %run supportvectors-common.ipynb

# Lab Exercise: Classifying Willow Tree vs. Oak Tree Using CNN

## Objective:
The goal of this lab is to design and implement a Convolutional Neural Network (CNN) to classify between two types of trees: **Willow** and **Oak**. You will work with a dataset that contains images of these two tree classes, divided into subfolders. The exercise will guide you through creating a custom dataset class, designing a CNN model, and writing a training script to perform the binary classification.


## Tasks Breakdown:

### 1. Dataset Preparation
   Create a PyTorch Dataset class to load and preprocess images from the `trees` folder.
     - **Folder Structure:**
       - `trees/Willowtree/` – contains images of Willow trees.
       - `trees/Oaktree/` – contains images of Oak trees.
   - **Objective:** 
     - Write a custom PyTorch Dataset class to:
       - Load images from both folders.
       - Convert images to PyTorch tensors.
       - Apply standard image transformations (e.g., resizing, normalization).
       - Assign labels: 0 for Willow trees and 1 for Oak trees.


### 2. CNN Architecture Design
  Design a CNN architecture for binary classification.
   - **Objective:**
     - Create a PyTorch CNN model that:
       - Contains several convolutional, activation, and pooling layers.
       - Includes fully connected layers at the end for binary classification.


### 3. Training Script
  Implement a training script to train the CNN model on the dataset.
   - **Objective:**
     - Write a script to:
       - Split the dataset into training and validation sets.
       - Define a loss function (binary cross-entropy) and an optimizer (e.g., Adam).
       - Train the model for a specified number of epochs.
       - Evaluate the model on the validation set after each epoch.
       - Output training and validation accuracy at each step.
   

### 4. Evaluation
  Evaluate the trained model and report performance metrics.
   - **Objective:**
     - After training, evaluate the model on a test dataset and report:
       - Accuracy.
       - Precision and recall.
       - Confusion matrix.



In [2]:
# imports

import matplotlib.pyplot as plt

# svlearn
from svlearn.trees.tree_dataset import TreeDataset
from svlearn.trees.preprocess import Preprocessor
from svlearn.config.configuration import ConfigurationMixin
from svlearn.train.visualization_utils import (
    show_image_with_denormalization,
    show_sample_image,
    visualize_classification_training_results,
)
from svlearn.train.simple_trainer import train_simple_network

from sklearn.metrics import accuracy_score

# torch
import torch
from torch import nn
from torch.utils.data import DataLoader
from torch.optim import AdamW
from torch.optim.lr_scheduler import StepLR


In [None]:
config = ConfigurationMixin().load_config()
data_dir = config['tree-classification']['data']
results_dir = config['tree-classification']['results']


## 1. Dataset Preparation

Let's load the images from our `data_dir`. The `Preprocessor` does all the preprocessing including loading image paths , label encoding, and spliting the dataset for training and evaluation. 

The preprocess method returns 
 - `train_df` , `test_df` - each containing image paths and their corresponding integer labels 
 - `label_encoder` which we will later use for inference.

In [None]:
preprocessor = Preprocessor()
train_df, val_df, label_encoder = preprocessor.preprocess(data_dir)
train_df.head()

## Image Dataset

### Image Transformations

Let's load the images into our tree dataset and apply some transformations. while transforming images for training we want to create as much variability as possible so that the model can generalize well. We randomly distort the images to make it difficult for the model to overfit. But while evaluating we don't apply these random transformations and try to retain the original image as much as possible.

In [5]:
from torchvision.transforms import v2


# 
train_transform = v2.Compose([
    v2.ToImage(), 
    v2.RandomResizedCrop(224 , scale = (0.5, 1)), # Randomly crop and resize to 224x224
    v2.RandomHorizontalFlip(p=0.5),       # Randomly flip the image horizontally with a 50% chance
    v2.ColorJitter(brightness=0.4 , contrast=0.4, saturation=0.4), # randomly change the brightness , contrast and saturation of images
    v2.ToDtype(torch.float32, scale=True), # ensure te tensor is of float datatype
    v2.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), # normalize tensor 
    
])

test_transform = v2.Compose([
    v2.ToImage(), 
    v2.Resize(size=(224 , 224)),  # resize all images to a standard size suitable for the cnn model
    v2.ToDtype(torch.float32, scale=True), # ensure te tensor is of float datatype
    v2.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), # normalize tensor 
])


**Normalization** is an essential preprocessing step in computer vision tasks. It rescales pixel values in an image to a common range, typically between 0 and 1, or -1 and 1. 

Why Normalize?
 - **Consistency**: Images can have different lighting, contrast, or color variations. Normalization reduces the impact of these differences, making images more consistent for the model to learn from.

 - **Prevents Large Gradients**: If the input values are too large, it can cause large gradient values during backpropagation, leading to instability or slow learning. Normalization keeps the gradients in a reasonable range, making the learning process smoother.


In this case, the mean and std (standard deviation) are specific values chosen for typical images from the ImageNet dataset (which we can use for general image datasets too). These values ensure that the pixel values have a mean of 0 and a standard deviation of 1, which helps the CNN learn more effectively.



In [6]:
train_dataset = TreeDataset(train_df, transform=train_transform)
val_dataset = TreeDataset(val_df, transform=test_transform)


### How do the inputs look?

Now that the dataset is created, let's take a sample from the test dataset of an image with an Oak tree. After all these transformations what does the image look like?

In [None]:
show_sample_image(val_dataset , 2)

The pixel value are not too pleasing to the human eye. This is specifically because of `Normalization`. We can negate this transformation (but retain all other previous transformations )by denormalizing. i.e. we multiply the standard deviation and add the mean back to the pixel value. 

In [None]:
show_image_with_denormalization(val_dataset, 2)

let's also see a sample of a willow tree image, this time from the train dataset. Notice that each time you run the cell below the image is slightly different. This is because of the `train_transform` we applied previously. Every time the dataloader requests a sample, the dataset thus returns a slightly modified version of the image. 

In [None]:
show_image_with_denormalization(train_dataset, 205)

### DataLoaders
We create dataloaders from the train and test dataset. 

In [10]:
train_loader = DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)
val_loader = DataLoader(dataset=val_dataset, batch_size=32, shuffle=False)

## 2. CNN Model
Next Let's design a simple CNN architecture that uses the tools we learned in theory class. We want the model (with all it's weights and biases) classify the two types of trees: Weeping Willow and Oak Tree. 

Each block will have a convolution layer,  activation , a batch norm layer and a pooling. 
 - **Convolution Layer** applies filters (small matrices) to input images to detect specific patterns. As the filter slides across the image, it creates feature maps, highlighting the presence of these patterns. Each filter learns to detect different patterns, and deeper layers learn more complex features.
 - **Activation Layer** introduces non-linearity to the network.
 - **Batch Normalization Layer** standardizes the input to a layer by scaling and shifting it, ensuring that the mean is close to 0 and the variance is near 1.
 - **Max Pooling** reduce the spatial dimensions (height and width) of feature maps while preserving the most important information. It makes the network more **robust to minor distortions** or varations in the input

After colvolutions , we enter the familiar territory of fully connected layers which in the end produce outputs representing the model's prediction for both the classes. 


In [11]:
num_classes = 2

model = nn.Sequential(
        # ----------------------------------------------------------------------------------------------------------------------------

        # Convolution Block 1
            nn.Conv2d(in_channels=3, out_channels=6, kernel_size=5, padding=0),     # ( B , 3 , 224 , 224 ) ->  ( B , 6 , 220 , 220 )
            nn.BatchNorm2d(num_features=6),                                     
            nn.ReLU(), 
            nn.MaxPool2d(kernel_size=2, stride=2),                                  # ( B , 6 , 220 , 220 ) ->  ( B , 6 , 110 , 110 )

        # ----------------------------------------------------------------------------------------------------------------------------
        # Convolution Block 2
            nn.Conv2d(in_channels=6, out_channels=16, kernel_size=5, padding=0),    # ( B , 6 , 110 , 110 ) ->  ( B , 16 , 106 , 106 )
            nn.BatchNorm2d(num_features=16),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),                                   # ( B , 16 , 106 , 106 ) ->  ( B , 16 , 53 , 53 )

        # ----------------------------------------------------------------------------------------------------------------------------
        # Convolution Block 3
            nn.Conv2d(in_channels=16, out_channels=32, kernel_size=4),              # ( B , 16 , 53 , 53 ) ->  ( B , 32 , 50 , 50 )                           
            nn.BatchNorm2d(num_features=32),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),                                   # ( B , 32 , 50 , 50 )   ->  ( B , 32 , 25 , 25 ) 

        # ----------------------------------------------------------------------------------------------------------------------------
            nn.Flatten(), # Change from 2D image to 1D tensor to be able to pass inputs to linear layer
        # ----------------------------------------------------------------------------------------------------------------------------
    
        # Linear Block 1
            nn.Linear(in_features=32 * 25 * 25, out_features=180),
            nn.ReLU(),

        # ----------------------------------------------------------------------------------------------------------------------------
        # Linear block 2
            nn.Linear(in_features=180, out_features=84),
            nn.ReLU(),

        # ----------------------------------------------------------------------------------------------------------------------------
            nn.Linear(in_features=84, out_features=num_classes)
        # ----------------------------------------------------------------------------------------------------------------------------
        )

Next let's define our optimizer to update the model parameters and run our training loop.

In [12]:
optimizer = AdamW(model.parameters(), lr = 0.001)
scheduler = StepLR(optimizer, step_size=3, gamma=0.5)


## 3. Training Script

We will reuse a classification trainer script we previously used to do binary classification

In [None]:
result = train_simple_network(
                        model=model,
                        optimizer=optimizer,
                        lr_scheduler=scheduler,
                        loss_func=nn.CrossEntropyLoss(),
                        train_loader=train_loader,
                        test_loader=val_loader,
                        epochs=10,
                        score_funcs={'accuracy': accuracy_score},
                        classify=True,
                        checkpoint_file=f"{results_dir}/cnn-model-trial.pt")

Let's print out the results to see the learning progress of our model

In [None]:
result

## 4. Evaluation

In [None]:

visualize_classification_training_results(result['train loss'] , 
                                          result['test loss'] , 
                                          result['train accuracy'] , 
                                          result['test accuracy'], 
                                          dir_path=results_dir, 
                                          filename="image classification")

## Model Inference

Next let's test out our model with an image. We load the saved model weights and biases from our checkpoint directory and reset our model's paramters to these values.

In [None]:
checkpoint = torch.load(f"{results_dir}/cnn-model-trial.pt")

model.load_state_dict(checkpoint['model_state_dict'])
model.eval();

### Load an image from file
Download an image from the internet and paste and assign it's path to `img_path`. 

In [None]:
from PIL import Image

img_path = "/home/chandar/data/trees/Oak/images379.jpg"
image = Image.open(img_path).convert("RGB")

# Convert the tensor image back to PIL image for display
image = v2.ToPILImage()(image)

# Display the image
plt.figure(figsize=(4, 4))
plt.imshow(image)
plt.title("Sample Image")
plt.axis('off')  # Hide axis for better visualization
plt.show()

Let's transform to the image to convert to the image into the input that our model expects. By doing `unsqueeze.()` we add an additional dimension that represents a batch (of size 1) 

In [None]:
input = test_transform(image).unsqueeze(0)
input.shape

### Get prediction
Did the model classify the image correctly? 

In [None]:
y_hat = model(input)
prediction = torch.argmax(y_hat)
label_encoder.inverse_transform([prediction])[0]

## Finetuning a Pretrained model

### VGG16

We designed our CNN model from scratch, let us know use a Pretrained model - VGG and finetune the last few weights of the network with the help of our dataset.

In [None]:
import torchvision.models as models

train_loader = DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)
val_loader = DataLoader(dataset=val_dataset, batch_size=32, shuffle=False)

# Load the VGG16 model
vgg_model = models.vgg16(pretrained=True)

# Freeze the feature extraction layers
for param in vgg_model.parameters():
    param.requires_grad = False

# Modify the classifier for 2 classes
vgg_model.classifier[6] = nn.Linear(4096, 2) 

optimizer = torch.optim.Adam(vgg_model.classifier[6].parameters(), lr=0.001)

In [None]:
result = train_simple_network(
                        model=vgg_model,
                        optimizer=optimizer,
                        loss_func=nn.CrossEntropyLoss(),
                        train_loader=train_loader,
                        test_loader=val_loader,
                        epochs=10,
                        score_funcs={'accuracy': accuracy_score},
                        classify=True,
                        checkpoint_file=f"{results_dir}/vgg-model-01.pt")

In [None]:
result

In [None]:
visualize_classification_training_results(result['train loss'] , 
                                          result['test loss'] , 
                                          result['train accuracy'] , 
                                          result['test accuracy'], 
                                          dir_path=results_dir, 
                                          filename="image classification")