<p style="text-align:center">
    <a href="https://www.ict.mahidol.ac.th/en/" target="_blank">
    <img src="https://www3.ict.mahidol.ac.th/ICTSurveysV2/Content/image/MUICT2.png" width="400" alt="Faculty of ICT">
    </a>
</p>

# Lab12: Computer Vision with Pretrained Models

This lab assignment provides a practical introduction to image classification using pre-trained deep learning models with PyTorch. You will work with the EfficientNet-b0 model, a powerful and efficient architecture, and apply it to a small "Ants vs. Bees" dataset. The lab focuses on two key transfer learning techniques: feature extraction and fine-tuning. You will learn how to load and modify pre-trained models, preprocess image data, train and evaluate your models, and compare the performance of feature extraction versus fine-tuning. The lab also includes saving the trained models for potential deployment.

Upon completion of this lab, you will be able to:

1. **Load and Modify Pre-trained Models**: Load a pre-trained EfficientNet-b0 model from torchvision and modify its classifier layer to adapt it to a new dataset.
2. **Implement Feature Extraction**: Freeze the pre-trained model's layers and train only the newly added classifier for feature extraction.
3. **Perform Fine-tuning**: Unfreeze and train some or all of the pre-trained model's layers along with the new classifier for fine-tuning.
4. **Preprocess Image Data**: Apply necessary image transformations (resizing, normalization, data augmentation) for pre-trained models.
5. **Train and Evaluate Models**: Train the feature extraction and fine-tuning models using PyTorch and evaluate their performance using metrics like accuracy, precision, recall, F1-score, and confusion matrix.
6. **Compare Model Performance**: Analyze and compare the performance of the feature extraction and fine-tuning models, discussing the differences in results.
7. **Save Trained Models**: Save the trained models for later use, such as deployment in a Streamlit application.
8. **Understand Transfer Learning**: Gain a practical understanding of transfer learning concepts, including the trade-offs between feature extraction and fine-tuning.
9. **Deploy the classification model**: Gain practical experience on deploying the fine-tuned model with a simple Streamlit application.


__Intructions:__
1. Append your ID at the end of this jupyter file name. For example, ```ITCS227_Lab12_Assignment_6788123.ipynb```
2. Complete each task in the lab.
3. Once finished, raise your hand to call a TA.
4. The TA will check your work and give you an appropriate score.
5. Submit the source code to MyCourse as record-keeping.

## Task01: Classification Model Development

In this lab, we will use the "Ants vs. Bees" dataset, available as part of the lab package. 

In [1]:
#Set the path to the dataset. 
data_dir = 'lab12_datasets\hymenoptera_data'    #<-- Change it to the actual path

###  1. Setup and Dataset Loading

In [2]:
import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.optim as optim
from sklearn.metrics import confusion_matrix, classification_report
import matplotlib.pyplot as plt
import numpy as np
import os
from PIL import Image

In [3]:
# Check if the data directory exists
if not os.path.exists(data_dir):
    print(f"Error: Data directory '{data_dir}' not found. Please download and organize the dataset.")
    exit()

In [4]:
# Define data transformations
transform_train = transforms.Compose([
    transforms.RandomResizedCrop(224),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

transform_test = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

In [5]:
# Load the dataset
trainset = torchvision.datasets.ImageFolder(root=os.path.join(data_dir, 'train'), transform=transform_train)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=32, shuffle=True, num_workers=2)

testset = torchvision.datasets.ImageFolder(root=os.path.join(data_dir, 'val'), transform=transform_test)
testloader = torch.utils.data.DataLoader(testset, batch_size=32, shuffle=False, num_workers=2)

classes = trainset.classes

### 2. Load Pre-trained EfficientNet-b0 (Feature Extraction)

Load the pre-trained EfficientNet-b0 model

In [6]:
model_feature_extraction = torchvision.models.efficientnet_b0(pretrained=True)



In [7]:
# Freeze all layers (feature extraction)
for param in model_feature_extraction.parameters():
    param.requires_grad = False

# Modify the classifier
num_features = model_feature_extraction.classifier[1].in_features
model_feature_extraction.classifier[1] = nn.Linear(num_features, len(classes))

# Move the model to the device
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model_feature_extraction.to(device)

EfficientNet(
  (features): Sequential(
    (0): Conv2dNormActivation(
      (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): SiLU(inplace=True)
    )
    (1): Sequential(
      (0): MBConv(
        (block): Sequential(
          (0): Conv2dNormActivation(
            (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)
            (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): SiLU(inplace=True)
          )
          (1): SqueezeExcitation(
            (avgpool): AdaptiveAvgPool2d(output_size=1)
            (fc1): Conv2d(32, 8, kernel_size=(1, 1), stride=(1, 1))
            (fc2): Conv2d(8, 32, kernel_size=(1, 1), stride=(1, 1))
            (activation): SiLU(inplace=True)
            (scale_activation): Sigmoid()
          )
          (2): Conv2dNormActivat

### 3. Train the Feature Extraction Model

In [8]:
# Setting the hyperparameters
criterion = nn.CrossEntropyLoss()
optimizer_feature_extraction = optim.Adam(model_feature_extraction.classifier[1].parameters(), lr=0.001)

In [9]:
num_epochs = 5
for epoch in range(num_epochs):
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        inputs, labels = data[0].to(device), data[1].to(device)
        optimizer_feature_extraction.zero_grad()
        outputs = model_feature_extraction(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer_feature_extraction.step()
        running_loss += loss.item()
    print(f"Epoch {epoch+1}, Feature Extraction Loss: {running_loss/len(trainloader)}")

Epoch 1, Feature Extraction Loss: 0.6861136555671692
Epoch 2, Feature Extraction Loss: 0.5333259887993336
Epoch 3, Feature Extraction Loss: 0.42565224692225456
Epoch 4, Feature Extraction Loss: 0.3468816466629505
Epoch 5, Feature Extraction Loss: 0.3669058382511139


### 4. Evaluation of Feature Extraction Model

In [10]:
correct = 0
total = 0
y_true_fe = []
y_pred_fe = []

with torch.no_grad():
    for data in testloader:
        images, labels = data[0].to(device), data[1].to(device)
        outputs = model_feature_extraction(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

        y_true_fe.extend(labels.cpu().numpy())
        y_pred_fe.extend(predicted.cpu().numpy())

print(f"Accuracy of Feature Extraction model: {100 * correct / total}%")
print(classification_report(y_true_fe, y_pred_fe, target_names=classes))

Accuracy of Feature Extraction model: 52.287581699346404%
              precision    recall  f1-score   support

        ants       0.48      0.44      0.46        70
        bees       0.56      0.59      0.57        83

    accuracy                           0.52       153
   macro avg       0.52      0.52      0.52       153
weighted avg       0.52      0.52      0.52       153



### 5. Load Pre-trained EfficientNet-b0 (Fine-tuning)

In [11]:
model_fine_tuning = torchvision.models.efficientnet_b0(pretrained=True)
num_features = model_fine_tuning.classifier[1].in_features
model_fine_tuning.classifier[1] = nn.Linear(num_features, len(classes))
model_fine_tuning.to(device)



EfficientNet(
  (features): Sequential(
    (0): Conv2dNormActivation(
      (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): SiLU(inplace=True)
    )
    (1): Sequential(
      (0): MBConv(
        (block): Sequential(
          (0): Conv2dNormActivation(
            (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)
            (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): SiLU(inplace=True)
          )
          (1): SqueezeExcitation(
            (avgpool): AdaptiveAvgPool2d(output_size=1)
            (fc1): Conv2d(32, 8, kernel_size=(1, 1), stride=(1, 1))
            (fc2): Conv2d(8, 32, kernel_size=(1, 1), stride=(1, 1))
            (activation): SiLU(inplace=True)
            (scale_activation): Sigmoid()
          )
          (2): Conv2dNormActivat

### 6. Train the Fine-tuning Model

In [12]:
criterion = nn.CrossEntropyLoss()
optimizer_fine_tuning = optim.Adam(model_fine_tuning.parameters(), lr=0.0001) # Lower learning rate for fine-tuning

In [13]:
# Adjust the num_epochs as needed. This cell can take several minutes with CPU.

num_epochs = 5
for epoch in range(num_epochs):
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        inputs, labels = data[0].to(device), data[1].to(device)
        optimizer_fine_tuning.zero_grad()
        outputs = model_fine_tuning(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer_fine_tuning.step()
        running_loss += loss.item()
    print(f"Epoch {epoch+1}, Fine-tuning Loss: {running_loss/len(trainloader)}")

Epoch 1, Fine-tuning Loss: 0.6363744139671326
Epoch 2, Fine-tuning Loss: 0.5152878686785698
Epoch 3, Fine-tuning Loss: 0.3994421809911728
Epoch 4, Fine-tuning Loss: 0.3174458369612694
Epoch 5, Fine-tuning Loss: 0.24865223839879036


### 7. Evaluation of Fine-tuning Model

In [14]:
correct = 0
total = 0
y_true_ft = []
y_pred_ft = []

with torch.no_grad():
    for data in testloader:
        images, labels = data[0].to(device), data[1].to(device)
        outputs = model_fine_tuning(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

        y_true_ft.extend(labels.cpu().numpy())
        y_pred_ft.extend(predicted.cpu().numpy())

print(f"Accuracy of Fine-tuning model: {100 * correct / total}%")
print(classification_report(y_true_ft, y_pred_ft, target_names=classes))

Accuracy of Fine-tuning model: 59.47712418300654%
              precision    recall  f1-score   support

        ants       0.56      0.54      0.55        70
        bees       0.62      0.64      0.63        83

    accuracy                           0.59       153
   macro avg       0.59      0.59      0.59       153
weighted avg       0.59      0.59      0.59       153



### 8. Comparison of Models

In [15]:
print("Comparison of Models:")
print("Feature Extraction Model:")
print(classification_report(y_true_fe, y_pred_fe, target_names=classes))
print("Fine-tuning Model:")
print(classification_report(y_true_ft, y_pred_ft, target_names=classes))

Comparison of Models:
Feature Extraction Model:
              precision    recall  f1-score   support

        ants       0.48      0.44      0.46        70
        bees       0.56      0.59      0.57        83

    accuracy                           0.52       153
   macro avg       0.52      0.52      0.52       153
weighted avg       0.52      0.52      0.52       153

Fine-tuning Model:
              precision    recall  f1-score   support

        ants       0.56      0.54      0.55        70
        bees       0.62      0.64      0.63        83

    accuracy                           0.59       153
   macro avg       0.59      0.59      0.59       153
weighted avg       0.59      0.59      0.59       153



### 9. Save the Models

In [16]:
torch.save(model_feature_extraction.state_dict(), 'ants_bees_feature_extraction.pth')
torch.save(model_fine_tuning.state_dict(), 'ants_bees_fine_tuning.pth')

### 10. Answer the following questions

**Q1**: *What is the primary difference between feature extraction and fine-tuning in the context of transfer learning?*

A1: Fine tuning is better than Feature Extraction because is study on added datasets which make F1 score bettwe

<details><summary><span style="color:red">&#x1F6C8; Help</span> (Use this only as a last resort!!)</summary>
    
```
A1: Feature extraction involves freezing the pre-trained model's layers and training only a newly added classifier on top of the pre-trained features. Fine-tuning, on the other hand, involves unfreezing some or all of the pre-trained model's layers and training them along with the new classifier, allowing the model to adapt its learned features to the specific task.
```
</details>

**Q2**: *Why is it important to use the same image transformations during inference (evaluation) as were used during training?*

A2: To ensure that input data is in the same format and does not make bias to accuracy

<details><summary><span style="color:red">&#x1F6C8; Help</span> (Use this only as a last resort!!)</summary>
    
```
A2: Pre-trained models are trained on data that has been preprocessed in a specific way. Using the same transformations during inference ensures that the input data is in the same format and distribution as the data the model was trained on, leading to consistent and accurate predictions.
```
</details>

**Q3**: *What are the advantages of using a pre-trained model like EfficientNet-b0 for image classification, compared to training a model from scratch?*

A3: Reducing training time and Resource consumtion

<details><summary><span style="color:red">&#x1F6C8; Help</span> (Use this only as a last resort!!)</summary>
    
```
A3: Using a pre-trained model offers several advantages:
* Reduced training time: The model has already learned general image features.
* Less data required: Fine-tuning or feature extraction often requires significantly less data than training from scratch.
* Improved performance: Pre-trained models often achieve higher accuracy due to the rich feature representations learned from large datasets.
```
</details>

**Q4**: *What is the purpose of freezing the pre-trained model's layers when performing feature extraction?*

A4: To prevent weight from being updated

<details><summary><span style="color:red">&#x1F6C8; Help</span> (Use this only as a last resort!!)</summary>
    
```
A4: Freezing the pre-trained model's layers prevents their weights from being updated during training. This ensures that the learned features from the pre-trained model are preserved and used as fixed feature extractors.
```
</details>

**Q5**: *What metrics are used to evaluate the performance of the image classification models in this lab?*

A5: Accuracy, precision, recall, F1-score, Confusion matrix

<details><summary><span style="color:red">&#x1F6C8; Help</span> (Use this only as a last resort!!)</summary>
    
```
A5: The models are evaluated using accuracy, precision, recall, F1-score, and a confusion matrix. These metrics provide a comprehensive understanding of the model's performance in terms of overall correctness, class-specific performance, and potential misclassifications.
```
</details>

**Q6**: *What is the purpose of the torch.save() function in the lab, and what information is saved?*

A6: To save the model

<details><summary><span style="color:red">&#x1F6C8; Help</span> (Use this only as a last resort!!)</summary>
    
```
A6: The torch.save() function is used to save the trained models' state dictionaries (the learned weights and biases). This allows the models to be loaded and used later for inference or deployment without needing to retrain them.
```
</details>

**Q7**: *Why do we add the unsqueeze(0) when using a single image for inference?*

A7: To make input in the correct format

<details><summary><span style="color:red">&#x1F6C8; Help</span> (Use this only as a last resort!!)</summary>
    
```
A7: Most PyTorch models, especially those for computer vision, expect input data in batches (even if it's a batch of size 1). unsqueeze(0) adds a batch dimension to the image tensor, making it compatible with the model's input format.
```
</details>

**Q8**: *What is the difference between `.eval()` and `.train()` in PyTorch?* 

A8: evaluation mode and training mode

<details><summary><span style="color:red">&#x1F6C8; Help</span> (Use this only as a last resort!!)</summary>
    
```
A8: .eval() sets the model to evaluation mode, which turns off features like dropout and batch normalization that are used during training. .train() sets the model to training mode, enabling these features. It's crucial to use .eval() during inference to ensure consistent predictions.
```
</details>

**Q9**: *In the lab, which model (feature extraction or fine-tuned) achieved higher accuracy on the validation dataset, and why might this be the case?* 

A9: fine-tuning because it's studied on added datasets

<details><summary><span style="color:red">&#x1F6C8; Help</span> (Use this only as a last resort!!)</summary>
    
```
A9: Typically, the fine-tuned model achieves higher accuracy. This is because fine-tuning allows the model to adapt the pre-trained weights to the specific characteristics of the "Ants vs. Bees" dataset, leading to more specialized and accurate feature representations.
```
</details>

**Q10**: *If you had a much larger dataset of ants and bees images, how might that change the performance difference between feature extraction and fine-tuning?*

A10: fine tuning will be better because it get more data to learn

<details><summary><span style="color:red">&#x1F6C8; Help</span> (Use this only as a last resort!!)</summary>
    
```
A10: With a larger dataset, fine-tuning would likely show a more significant improvement in performance compared to feature extraction. The larger dataset would provide enough data for the model to effectively adapt its pre-trained weights without overfitting, leading to more accurate and robust feature representations.
```
</details>

## Task02: Deploy Your Best Image Classification Model with Streamlit

1. **Select Your Best Model**: Determine which model (`ants_bees_fine_tuning.pth` or `ants_bees_feature_extraction.pth`) achieved the highest validation accuracy during the lab.

2. **Create a Streamlit App**:
    - Write a Python script (`app.py`) using Streamlit.
    - Load the selected model's state dictionary (`.pth` file). Remember to define the EfficientNet-b0 model architecture in your script.
    - Implement an image upload functionality using `st.file_uploader()`.
    - Apply the same image preprocessing transformations (resizing, normalization) used during training.
    - Perform inference on the uploaded image using your loaded model.
    - Display the predicted class (ants or bees) and the corresponding confidence score.
    - Optionally, display the uploaded image and the class probabilities.

3. **Use the Tutorial as a Guide**: You can use the "Cats vs. Dogs Image Classification" Streamlit tutorial provided earlier as a template. Adapt the code to load your "Ants vs. Bees" model and display the appropriate results.

4. **Run Your App**: Run your Streamlit app from the command line using streamlit run app.py.

5. **Test Your App**: Upload various ant and bee images to test the performance of your deployed model.

### Deliverables
Along with this Notebook file, please submit the followings:
* The app.py script containing your Streamlit application.
* A screenshot of your running Streamlit application displaying a successful prediction.

<details><summary><span style="color:red">&#x1F6C8; Help</span> (Use this only as a last resort!!)</summary>
    
```python
# app.py - Streamlit app for Ants vs. Bees image classification

import streamlit as st
import torch
import torchvision.transforms as T
from PIL import Image
import os
import torch.nn as nn

# 1. Load the Fine-tuned Model

# Load the saved model (make sure the model definition is available in your Streamlit app)
# You need to have the same model architecture definition as in your training script.
# For example, if you trained with EfficientNet-B0:

model = torch.hub.load('pytorch/vision:main', 'efficientnet_b0', pretrained=False)
num_features = model.classifier[1].in_features
model.classifier[1] = nn.Linear(num_features, 2)  # 2 classes (ants, bees)
model.load_state_dict(torch.load('ants_bees_fine_tuning.pth', map_location=torch.device('cpu'))) # Load to CPU
model.eval()

# Define the image transformations (same as in training/validation)
transform = T.Compose([
    T.Resize(256),       # Resize for EfficientNet
    T.CenterCrop(224),   # Center crop for consistent input size
    T.ToTensor(),
    T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) # ImageNet stats
])

classes = ['ants', 'bees']  # Class names (same as in training)

# 2. Create the Streamlit App

st.title("Ants vs. Bees Image Classifier")

uploaded_file = st.file_uploader("Choose an image...", type=["jpg", "png", "jpeg"])

if uploaded_file is not None:
    image = Image.open(uploaded_file).convert("RGB")
    st.image(image, caption="Uploaded Image", use_column_width=True)

    if st.button("Classify"):
        with st.spinner("Classifying..."):  # Show a spinner while processing
            input_tensor = transform(image).unsqueeze(0)  # Add batch dimension

            with torch.no_grad():
                output = model(input_tensor)
                probabilities = torch.nn.functional.softmax(output[0], dim=0) # Softmax for probabilities
                predicted_class_index = torch.argmax(probabilities).item()
                predicted_class = classes[predicted_class_index]
                confidence = probabilities[predicted_class_index].item() * 100

            st.header("Prediction")
            st.write(f"The image is a {predicted_class} with {confidence:.2f}% confidence.")

            # Display probabilities for each class (optional)
            st.subheader("Class Probabilities")
            for i, class_name in enumerate(classes):
              st.write(f"{class_name}: {probabilities[i].item()*100:.2f}%")
```
</details>

<p style="text-align:center;">That's it! Congratulations! <br> 
    Now, call an LA to check your solution. Then, upload your code on MyCourses.</p>