# Transfer Learning with Pre-trained Models
Use in-build models with pre-trained weights and apply them to the Food-11 dataset.

In [1]:
from torchvision import transforms

train_transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.RandomHorizontalFlip(),
    transforms.RandomRotation(10),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

val_transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

test_transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])


In [2]:
from torchvision import datasets

train_path = '/kaggle/input/food11-image-dataset/training'
val_path = '/kaggle/input/food11-image-dataset/validation'
test_path = '/kaggle/input/food11-image-dataset/evaluation'

train_dataset = datasets.ImageFolder(root=train_path, transform=train_transform)
val_dataset = datasets.ImageFolder(root=val_path, transform=val_transform)
test_dataset = datasets.ImageFolder(root=test_path, transform=test_transform)


In [3]:
from torch.utils.data import DataLoader

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

data_iter = iter(train_loader)
images, labels = next(data_iter)

print("Batch image size:", images.size())  
print("Batch labels:", labels) 


Batch image size: torch.Size([32, 3, 224, 224])
Batch labels: tensor([ 4,  0,  1, 10,  6,  4,  0,  5,  0,  0,  0,  3,  2,  8,  7,  8,  0,  5,
         9, 10,  2,  2,  2,  4,  9,  9,  4,  5,  5,  5,  2,  6])


## Step 1: Select at least THREE different pre-trained models
Selected at least THREE different pre-trained models, e.g. ShuffleNet, Inception V3, and MobileNet V3. Check PyTorch documentation for more details. Justify your choice of models, considering their architectural strengths and suitability for the task.

**Models choosen**
----



**1. ShuffleNet**


###i. Efficiency: ShuffleNet was developed for efficiency. It implements group convolutions and channel shuffle operations that reduce computation while keeping accuracy relatively high.



###ii. Low Latency: This characteristic does well on real-time applications or situations where minimum processing time is required. Good for applications involving mobile or embedded devices, it helps where you would want to deploy the model in such scenarios.



###iii. Good for Image Classification: Even with its lightweight architecture, ShuffleNet performs very well concerning image classification tasks, including classification among many classes like Food-11

**Reason why it is best for Food-11**
###1. The dataset has a manageable number of categories, i.e. 11 food types, thus making it faster and efficiency classifier on the catering models given limited resources.
###2. Use in applications where deployment is intended on a mobile device without much accuracy sacrifice..



---



**2. EfficientNet V2**

###i. Squeeze and excitation blocks: This model uses squeeze and excitation block, which perfectly enhances the important feature representation by dynamically re-calibrating channel-wise feature responses. This helps in such cases when the object has been classified because it makes sure of the most relevant parts that make the images in the trained process focus on the most important attributes used for classification. This is specifically true within food classification applications due to textures and colors or fine details.

###ii. High Accuracy: It uses Neural Architecture Search (NAS) to automagically surf model scaling and thus ensure the best possible depth-width-resolution trade-off for a particular dataset. It adapts its structure dynamically based on dataset requirements, leading to higher accuracy with fewer parameters.

###iii. Fine tunes Well: This model is pre-trained extensively on data from large datasets such as ImageNet. As a result, it shows strong generalization ability toward food classification tasks. A structured feature extractor qualifies this model highly for transfer learning, in which less labeled data is needed for higher accuracy, making it extremely applicable for fine-tuning on specialized datasets like Food-11.

**Reason why it is best for Food-11**
###1.  Food recognition models generally result in real-time applications, like taking pictures of food for calorie tracking, stuff like menus in restaurants, and identifying food items. 
###2. This model is also very fast to classify images using light computing device.

---



**3. MobileNet V3**

###i. Mobile Optimization: MobileNet V3 optimizes for mobile devices, which becomes a major advantage when model size and inference time become deciding factors.

###ii. Efficient: MobileNet V3 applies depthwise separable convolutions to minimize parameters and computations. Therefore, compared to other networks, it is much lighter in weight without sacrificing much in accuracy.

###iii. Good Tradeoff Between Accuracy and Speed: There is a tradeoff between an inference one can speed and one can accuracy, and MobileNet V3 is a good choice if both performance and efficiency are needed.

**Reason why it is best for Food-11**
###1. The diversity shown in Food-11 images allows MobileNet V3 design to generalize greatly on such types of datasets without increasing heavy computations
###2. A great choice when you need to balance between accuracy and computational efficiency, especially for applications on mobile devices or systems with limited computational power

## Step 2: For each chosen model

### a. Load the pre-trained model and modify the classification head
Load the pre-trained model and modify the classification head (the final fully connected layer) to match the number of classes in the Food-11 dataset.

**ShuffleNet**

In [8]:
import torch
import torch.nn as nn
from torchvision import models

shufflenet = models.shufflenet_v2_x1_0(weights=models.ShuffleNet_V2_X1_0_Weights.IMAGENET1K_V1)

food_num_shuffle = shufflenet.fc.in_features
shufflenet.fc = nn.Linear(food_num_shuffle, 11)  

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
shufflenet = shufflenet.to(device)


**EfficientNet V2**

In [6]:
import torch
import torch.nn as nn
import torchvision
from torchvision import models

efficientnet_v2 = models.efficientnet_v2_s(weights=models.EfficientNet_V2_S_Weights.IMAGENET1K_V1)

food_num_eff = efficientnet_v2.classifier[1].in_features
efficientnet_v2.classifier[1] = nn.Linear(food_num_eff, 11)  

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
efficientnet_v2 = efficientnet_v2.to(device)


Downloading: "https://download.pytorch.org/models/efficientnet_v2_s-dd5fe13b.pth" to /root/.cache/torch/hub/checkpoints/efficientnet_v2_s-dd5fe13b.pth
100%|██████████| 82.7M/82.7M [00:01<00:00, 83.8MB/s]


**MobileNet V3**

In [7]:
import torch
import torch.nn as nn
import torchvision
from torchvision import models

mobilenet_v3 = models.mobilenet_v3_large(weights=models.MobileNet_V3_Large_Weights.IMAGENET1K_V1)

food_num_mobile = mobilenet_v3.classifier[3].in_features
mobilenet_v3.classifier[3] = nn.Linear(food_num_mobile, 11)  

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
mobilenet_v3 = mobilenet_v3.to(device)


Downloading: "https://download.pytorch.org/models/mobilenet_v3_large-8738ca79.pth" to /root/.cache/torch/hub/checkpoints/mobilenet_v3_large-8738ca79.pth
100%|██████████| 21.1M/21.1M [00:00<00:00, 85.4MB/s]


### b. Fine-tune the model
Fine-tune the model. Experiment with different hyperparameter settings (learning rate, batch size, etc.) to optimize performance. Explain your tuning strategy.

**ShuffleNet**

In [13]:
import torch.optim as optim
from torch.utils.data import DataLoader
import torch.nn.functional as F
from torch import nn

optimizer = optim.Adam(shufflenet.parameters(), lr=1e-4, weight_decay=1e-4)

scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=7, gamma=0.1)

num_epochs = 10

for epoch in range(num_epochs):
    shufflenet.train()
    running_loss = 0.0
    for i, lab in train_loader:
        i, lab = i.to(device), lab.to(device)

        optimizer.zero_grad()

        outputs = shufflenet(i)
        loss = F.cross_entropy(outputs, lab)

        loss.backward()
        optimizer.step()

        running_loss += loss.item()

    scheduler.step()

    shufflenet.eval()
    val_loss = 0.0
    with torch.no_grad():
        for i, lab in val_loader:
            i, lab = i.to(device), lab.to(device)
            outputs = shufflenet(i)
            val_loss += F.cross_entropy(outputs, lab).item()

    val_loss /= len(val_loader)
    print(f"Epoch {epoch+1}/{num_epochs}, Training Loss: {running_loss/len(train_loader)}, Validation Loss: {val_loss}")


Epoch 1/10, Training Loss: 0.643575984779685, Validation Loss: 0.5480339040427848
Epoch 2/10, Training Loss: 0.5131644405014693, Validation Loss: 0.45035087072324975
Epoch 3/10, Training Loss: 0.4208368671075426, Validation Loss: 0.42166456880254877
Epoch 4/10, Training Loss: 0.3556286305382028, Validation Loss: 0.4001387778531622
Epoch 5/10, Training Loss: 0.30858215899432745, Validation Loss: 0.3836018298321438
Epoch 6/10, Training Loss: 0.27670327578837045, Validation Loss: 0.39514666949226346
Epoch 7/10, Training Loss: 0.2340304879140121, Validation Loss: 0.37874830495221196
Epoch 8/10, Training Loss: 0.19494248260548006, Validation Loss: 0.3709305880108365
Epoch 9/10, Training Loss: 0.18461875595200603, Validation Loss: 0.3661576414791246
Epoch 10/10, Training Loss: 0.17464023715783283, Validation Loss: 0.3631447854589809


**EfficientNet V2**

In [20]:
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

transform = transforms.Compose([
    transforms.Resize((224, 224)), 
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) 
])



train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)

criterion = nn.CrossEntropyLoss()  
optimizer = optim.Adam(efficientnet_v2.parameters(), lr=0.0001)  


num_epochs = 10

for epoch in range(num_epochs):
    efficientnet_v2.train()
    running_train_loss = 0.0
    correct_train = 0
    total_train = 0
    
    for i, lab in train_loader:
        i, lab = i.to(device), lab.to(device)
        
        optimizer.zero_grad()
        outputs = efficientnet_v2(i)
        loss = criterion(outputs, lab)
        loss.backward()
        optimizer.step()
        
        running_train_loss += loss.item()
        
        _, predicted = torch.max(outputs, 1)
        total_train += lab.size(0)
        correct_train += (predicted == lab).sum().item()
    
    avg_train_loss = running_train_loss / len(train_loader)
    
    efficientnet_v2.eval()
    running_val_loss = 0.0
    correct_val = 0
    total_val = 0
    
    with torch.no_grad():
        for i, lab in val_loader:
            i, lab = i.to(device), lab.to(device)
            
            outputs = efficientnet_v2(i)
            loss = criterion(outputs, lab)
            running_val_loss += loss.item()
            
            _, predicted = torch.max(outputs, 1)
            total_val += lab.size(0)
            correct_val += (predicted == lab).sum().item()
    
    avg_val_loss = running_val_loss / len(val_loader)
    
    print(f"Epoch [{epoch+1}/{num_epochs}], Training Loss: {avg_train_loss:.4f}, Validation Loss: {avg_val_loss:.4f} ")
         
    


Epoch [1/10], Training Loss: 0.2296, Validation Loss: 0.2404 
Epoch [2/10], Training Loss: 0.1199, Validation Loss: 0.2094 
Epoch [3/10], Training Loss: 0.0762, Validation Loss: 0.2374 
Epoch [4/10], Training Loss: 0.0671, Validation Loss: 0.2447 
Epoch [5/10], Training Loss: 0.0535, Validation Loss: 0.2618 
Epoch [6/10], Training Loss: 0.0372, Validation Loss: 0.2635 
Epoch [7/10], Training Loss: 0.0456, Validation Loss: 9.7429 
Epoch [8/10], Training Loss: 0.0357, Validation Loss: 0.6203 
Epoch [9/10], Training Loss: 0.0369, Validation Loss: 0.2145 
Epoch [10/10], Training Loss: 0.0354, Validation Loss: 2.0196 


**MobileNet V3**

In [21]:
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import DataLoader

optimizer = optim.Adam(mobilenet_v3.parameters(), lr=1e-4, weight_decay=1e-4)
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=7, gamma=0.1)

num_epochs = 10

mobilenet_v3 = mobilenet_v3.to(device)  

for epoch in range(num_epochs):
    mobilenet_v3.train()
    running_loss = 0.0
    for i, lab in train_loader:
        i, lab = i.to(device), lab.to(device)

        optimizer.zero_grad()

        outputs = mobilenet_v3(i)
        loss = F.cross_entropy(outputs, lab)

        loss.backward()
        optimizer.step()

        running_loss += loss.item()

    scheduler.step()

    mobilenet_v3.eval()
    val_loss = 0.0
    with torch.no_grad():
        for i, lab in val_loader:
            i, lab = i.to(device), lab.to(device)
            outputs = mobilenet_v3(i)
            val_loss += F.cross_entropy(outputs, lab).item()

    val_loss /= len(val_loader)
    print(f"Epoch {epoch+1}/{num_epochs}, Training Loss: {(running_loss/len(train_loader)):.4f}, Validation Loss: {val_loss:.4f}")


Epoch 1/10, Training Loss: 0.8798, Validation Loss: 0.4516
Epoch 2/10, Training Loss: 0.3443, Validation Loss: 0.3555
Epoch 3/10, Training Loss: 0.2271, Validation Loss: 0.3496
Epoch 4/10, Training Loss: 0.1564, Validation Loss: 0.3903
Epoch 5/10, Training Loss: 0.1073, Validation Loss: 0.3477
Epoch 6/10, Training Loss: 0.0879, Validation Loss: 0.3612
Epoch 7/10, Training Loss: 0.0662, Validation Loss: 0.3659
Epoch 8/10, Training Loss: 0.0370, Validation Loss: 0.3276
Epoch 9/10, Training Loss: 0.0318, Validation Loss: 0.3221
Epoch 10/10, Training Loss: 0.0267, Validation Loss: 0.3264


**Fine Tuning Strategy**

1. Freeze Early Layers: First freeze the earlier layers of the pre-trained model; these layers have learned low-level features such as edges and textures, which will generally be valuable for any dataset. The only fine-tuning will therefore occur at later layers.


2. Unfreeze Layers Gradually: After a few epochs on the final training layers, unfreeze the early layers to enable further refinement of the model to the Food-11-specific features

3. Learning Rate: Fine-tuning should be done under lower learning rates, since higher learning rates might burn up the relevant learned features that the pre-trained model has.

4. Optimizer: Adam or SGD can act as the optimizer. Adam is often a good choice for transfer learning, as it can adapt the learning rate based on the gradients.

5. Batch Size: Test different batch sizes. A common size batch could be 32 or 64, this can be adjusted according to Maximum GPU memory available and performance, as usually smaller batch sizes create much noisier gradients, while larger sizes may improve stability.

6. Learning Rate Scheduler: Learning rate scheduler such as StepLR or ReduceLROnPlateau would dynamically adjust the learning rate during the training.

### c. Evaluate the performance of each fine-tuned model
Evaluate the performance of each fine-tuned model on the Food-11 dataset.

**ShuffleNet**

In [22]:
import torch
import torch.nn.functional as F

shufflenet.eval()

num_epochs = 10  
test_results = []

for epoch in range(num_epochs):
    test_loss = 0.0
    correct = 0
    total = 0

    with torch.no_grad(): 
        for i, lab in test_loader:
            i, lab = i.to(device), lab.to(device)

            outputs = shufflenet(i)

            loss = F.cross_entropy(outputs, lab)
            test_loss += loss.item()

            _, predicted = torch.max(outputs, 1)  
            total += lab.size(0)
            correct += (predicted == lab).sum().item()

    avg_test_loss = test_loss / len(test_loader)
    test_acc = 100 * correct / total

    print(f"Epoch {epoch+1}/{num_epochs}, Test Loss: {avg_test_loss:.4f}, Test Accuracy: {test_acc:.2f}%")

    test_results.append((epoch + 1, avg_test_loss, test_acc))


Epoch 1/10, Test Loss: 0.3017, Test Accuracy: 89.84%
Epoch 2/10, Test Loss: 0.3017, Test Accuracy: 89.84%
Epoch 3/10, Test Loss: 0.3017, Test Accuracy: 89.84%
Epoch 4/10, Test Loss: 0.3017, Test Accuracy: 89.84%
Epoch 5/10, Test Loss: 0.3017, Test Accuracy: 89.84%
Epoch 6/10, Test Loss: 0.3017, Test Accuracy: 89.84%
Epoch 7/10, Test Loss: 0.3017, Test Accuracy: 89.84%
Epoch 8/10, Test Loss: 0.3017, Test Accuracy: 89.84%
Epoch 9/10, Test Loss: 0.3017, Test Accuracy: 89.84%
Epoch 10/10, Test Loss: 0.3017, Test Accuracy: 89.84%


**EfficientNet V2**

In [23]:
import torch
import torch.nn.functional as F

efficientnet_v2.eval()

num_epochs = 10  
test_results = []

for epoch in range(num_epochs):
    test_loss = 0.0
    acc = 0
    total = 0

    with torch.no_grad():  
        for i, lab in test_loader:
            i, lab = i.to(device), lab.to(device)

            outputs = efficientnet_v2(i)  

            loss = F.cross_entropy(outputs, lab)
            test_loss += loss.item()

            _, predicted = torch.max(outputs, 1)  
            total += lab.size(0)
            acc += (predicted == lab).sum().item()

    avg_test_loss = test_loss / len(test_loader)
    test_acc = 100 * acc / total

    print(f"Epoch {epoch+1}/{num_epochs}, Test Loss: {avg_test_loss:.4f}, Test Accuracy: {test_acc:.2f}%")

    test_results.append((epoch + 1, avg_test_loss, test_acc))


Epoch 1/10, Test Loss: 1.0026, Test Accuracy: 95.10%
Epoch 2/10, Test Loss: 1.0026, Test Accuracy: 95.10%
Epoch 3/10, Test Loss: 1.0026, Test Accuracy: 95.10%
Epoch 4/10, Test Loss: 1.0026, Test Accuracy: 95.10%
Epoch 5/10, Test Loss: 1.0026, Test Accuracy: 95.10%
Epoch 6/10, Test Loss: 1.0026, Test Accuracy: 95.10%
Epoch 7/10, Test Loss: 1.0026, Test Accuracy: 95.10%
Epoch 8/10, Test Loss: 1.0026, Test Accuracy: 95.10%
Epoch 9/10, Test Loss: 1.0026, Test Accuracy: 95.10%
Epoch 10/10, Test Loss: 1.0026, Test Accuracy: 95.10%


**MobileNet V3**

In [24]:
import torch
import torch.nn.functional as F

mobilenet_v3.eval()

num_epochs = 10  
test_results = []

for epoch in range(num_epochs):
    test_loss = 0.0
    correct = 0
    total = 0

    with torch.no_grad():  
        for i, lab in test_loader:
            i, lab = i.to(device), lab.to(device)

            outputs = mobilenet_v3(i)

            loss = F.cross_entropy(outputs, lab)
            test_loss += loss.item()

            _, predicted = torch.max(outputs, 1)  
            total += lab.size(0)
            correct += (predicted == lab).sum().item()

    avg_test_loss = test_loss / len(test_loader)
    test_accuracy = 100 * correct / total

    print(f"Epoch {epoch+1}/{num_epochs}, Test Loss: {avg_test_loss:.4f}, Test Accuracy: {test_accuracy:.2f}%")

    test_results.append((epoch + 1, avg_test_loss, test_accuracy))


Epoch 1/10, Test Loss: 0.2717, Test Accuracy: 92.35%
Epoch 2/10, Test Loss: 0.2717, Test Accuracy: 92.35%
Epoch 3/10, Test Loss: 0.2717, Test Accuracy: 92.35%
Epoch 4/10, Test Loss: 0.2717, Test Accuracy: 92.35%
Epoch 5/10, Test Loss: 0.2717, Test Accuracy: 92.35%
Epoch 6/10, Test Loss: 0.2717, Test Accuracy: 92.35%
Epoch 7/10, Test Loss: 0.2717, Test Accuracy: 92.35%
Epoch 8/10, Test Loss: 0.2717, Test Accuracy: 92.35%
Epoch 9/10, Test Loss: 0.2717, Test Accuracy: 92.35%
Epoch 10/10, Test Loss: 0.2717, Test Accuracy: 92.35%


### d. Compare the results obtained with the different pre-trained models
Discuss which model performed best and analyze the reasons for the observed differences in performance.

**Resulting summary table of model performances**


| Model           | Best Validation Loss | Test Loss | Test Accuracy (%) | Final Training Loss |
|---------------|--------------------|-----------|----------------|------------------|
| ShuffleNet V2  | 0.3631             | 0.3017    | 89.84          | 0.1746           |
| EfficientNet V2 | 0.2145             | 1.0026    | 95.10          | 0.0354           |
| MobileNet V3   | 0.3221             | 0.2717    | 92.35          | 0.0267           |


**Best Performing Model: EfficientNetV2 with 95.10% accuracy**

**Analysis:** 

Overall, EfficientNetV2 showed best results as it had the highest accuracy at 95.10% and validated performance with a loss of 0.2145, thus making it best model for Food-11 classification.

MobileNetV3 also had good values in accuracy (92.35%) and lowest test loss (0.2717), making it economical when deployed.

Although ShuffleNet V2 gives high efficiency, it has small accuracy at only 89.84% and high test loss making it the least effective model.


**Reasons for model performance**

**EfficientNet V2**
1. The compound scaling improves the feature extraction by optimizing the network in the depth, width, and resolution of the representation.
This helps the model to distinguish minor differences in food images thus improving the classification performance.
2. Dropout, Stochastic Depth, and Squeeze-and-Excitation layers prevent the model from overfitting and maintain the high accuracy of validation and test sets.

3. EfficientNetV2-trained large dataset (ImageNet1K) thereby make more possible transfer-learning and hence better for adaptation to the Food-11 dataset.

**MobileNetV3**
1. The given configuration utilizes Depth wise Separable Convolutions to produce a reduced computation. Furthermore, it is indeed easier for training and inference for real-time food classification apps.
2. Highly strong generalization with lowest test loss implies that lesser resources are used for computation; thus, it performs well for new as well as unseen test data.

**ShuffleNetV2** 
1. It is a lightweight model but did not manage to perform well in comparison with the depth architectures. It might have lost the important details in food images.
2. Higher test loss denotes comparatively weak generalization against MobileNetV3 and EfficientNetV2.


## Step 3: References
Include details on all the resources used to complete this part.

1. https://pytorch.org/hub/pytorch_vision_shufflenet_v2/
2. https://medium.com/aimonks/shufflenet-revolutionizing-mobile-deep-learning-e15237239f47
3. https://pytorch.org/vision/main/models/efficientnetv2.html
4. https://medium.com/towards-data-science/efficientnetv2-faster-smaller-and-higher-accuracy-than-vision-transformers-98e23587bf04
5. https://medium.com/@RobuRishabh/understanding-and-implementing-mobilenetv3-422bd0bdfb5a
6. https://pytorch.org/vision/main/models/mobilenetv3.html