### **Tutorial 06: Using Pre-Trained Models**

**Pre-trained Models (networks):** are models that have been previously trained on large benchmark datasets like ImageNet, and are now available for reuse. The key advantage of pre-trained models is that they have already learned to extract general features from images, such as edges, textures, and more complex patterns. This allows them to be fine-tuned for your specific task with much less data and computational cost than training a model from scratch.


##### **Common Pre-trained Models in TorchVision**

TorchVision offers a wide range of pre-trained models that have been trained on large benchmark datasets like ImageNet. These models can be used directly for inference or fine-tuned for specific tasks.

**ResNet (Residual Networks)**: 
- ResNet models (ResNet18, ResNet34, ResNet50, ResNet101, etc.) are some of the most popular architectures for image classification tasks. These models use residual blocks to mitigate the vanishing gradient problem in deep networks.
- Example: torchvision.models.resnet50(pretrained=True)

**VGG**
- VGG networks (VGG11, VGG16, VGG19) are known for their simplicity and deep architecture. These networks consist of stacked convolutional layers followed by fully connected layers.
- Example: torchvision.models.vgg16(pretrained=True)

**DenseNet**
- DenseNet (DenseNet121, DenseNet169) connects each layer to every other layer in a feed-forward fashion, improving gradient flow and feature reuse.
- Example: torchvision.models.densenet121(pretrained=True)

**AlexNet**
- One of the first deep neural networks for image classification, AlexNet consists of multiple convolutional layers followed by fully connected layers.
- Example: torchvision.models.alexnet(pretrained=True)

**InceptionV3**

- InceptionV3 is a deep neural network designed to optimize computational efficiency while maintaining high performance. It uses inception blocks that apply multiple filters of different sizes simultaneously.
- Example: torchvision.models.inception_v3(pretrained=True)

**MobileNetV2**
- MobileNetV2 is a lightweight model optimized for mobile and edge devices. It uses depthwise separable convolutions for computational efficiency.
- Example: torchvision.models.mobilenet_v2(pretrained=True)

**EfficientNet**
- EfficientNet is a family of models that scale depth, width, and resolution to achieve higher accuracy with fewer parameters.
- Example: torchvision.models.efficientnet_b0(pretrained=True)

**Vision Transformer (ViT)**
- Vision Transformers are based on the Transformer architecture, which was originally designed for natural language processing tasks. They are becoming increasingly popular in image classification.
- Example: torchvision.models.vit_b_16(pretrained=True)

### Using the Pre-trained model to test your data

To use a pre-trained model for inference or evaluation on your own dataset, you first need to load the model and modify the output layer to suit your task. For example, if you're working with a 10-class classification problem, you'll need to change the final layer to output 10 classes.

---

### **Task**: Using Pre-trained Models on Own Images

##### **Objective**:
Learn how to use and evaluate the performance of pre-trained models (at least 5 different models) on images in the **pet images** folder and compute classification metrics such as True Positives (TP), False Positives (FP), and Accuracy. The results should be reported in a table with the first column as the model name and the rest of the columns as TP, FP, and Accuracy.

#### **Pre-trained Models**:
Evaluate the following models on the **pet images** dataset. 

1. **ResNet50**
2. **AlexNet**
3. **VGG16**
4. **DenseNet121**
5. **MobileNetV2**

These models are all pre-trained on the ImageNet dataset and are available in popular frameworks like PyTorch.

#### **Metrics to Compute**: For each model, compute the following:

- **True Positives (TP)**: Correctly classified positive samples.
- **False Positives (FP)**: Incorrectly classified negative samples as positive.
- **Accuracy**: The overall accuracy of the model.

#### **Compare the Results**: After evaluating all models, compile the results into a table. The table should have the following structure:

| Model          | True Positives (TP) | False Positives (FP) | Accuracy (%) |
|----------------|---------------------|----------------------|--------------|
| ResNet50       | 500                 | 20                   | 95.0         |
| AlexNet        | 450                 | 50                   | 90.0         |
| VGG16          | 480                 | 40                   | 92.0         |
| DenseNet121    | 470                 | 30                   | 93.5         |
| MobileNetV2    | 490                 | 10                   | 96.0         |


##### **Code Helper**: Below code is provided to test on a single image using the ResNet50 Model




In [1]:
import torchvision.models as models
import torch
from torchvision.models import ResNet50_Weights
from torchvision import transforms
from PIL import Image
import cv2
from utils import data_loader

model = models.resnet50(weights=ResNet50_Weights.IMAGENET1K_V1)

num_ftrs = model.fc.in_features
model.fc = torch.nn.Linear(num_ftrs, 2)

image_path = "./data/pet_images/cat_01.jpg"
image = cv2.imread(image_path, cv2.IMREAD_COLOR) 
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# Convert to PIL Image
image_pil = Image.fromarray(image_rgb)

preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
input_tensor = preprocess(image_pil)
input_batch = input_tensor.unsqueeze(0)  

model.eval()  
with torch.no_grad():  # No gradients needed for inference
    output = model(input_batch)

# Get predicted class
prob, predicted_class = torch.max(output, 1)
print(f"Predicted class: {predicted_class.item()}, Prob: {prob}")

# ImageNet class labels
dl = data_loader.DataLoader()
class_idx = predicted_class.item()

class_idx_to_label = dl.imagenet1000_cls_id_label()
predicted_label = class_idx_to_label[str(class_idx)][1]
print(f"Predicted label: {predicted_label}")

Predicted class: 1, Prob: tensor([-0.3211])
Predicted label: goldfish
