# Transfer Learning with Pre-trained Models [3 points]
Use in-build models with pre-trained weights and apply them to the Food-11 dataset.

## Step 1: Select at least THREE different pre-trained models
Selected at least THREE different pre-trained models, e.g. ShuffleNet, Inception V3, and MobileNet V3. Check PyTorch documentation for more details. Justify your choice of models, considering their architectural strengths and suitability for the task.

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.transforms as transforms
import torchvision.models as models
from torchvision.datasets import ImageFolder
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt
from tqdm import tqdm
from sklearn.metrics import accuracy_score

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Using device:", device)

def transformation_function(model_name):
    if model_name == "inception":
        training_data_transformation = transforms.Compose([
            transforms.Resize((299, 299)),
            transforms.RandomHorizontalFlip(),
            transforms.RandomRotation(10),
            transforms.ToTensor(),
            transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])
        ])
        validation_data_transformation = transforms.Compose([
            transforms.Resize((299, 299)),
            transforms.ToTensor(),
            transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])
        ])
    else:
        training_data_transformation = transforms.Compose([
            transforms.Resize((224, 224)),
            transforms.RandomHorizontalFlip(),
            transforms.RandomRotation(10),
            transforms.ToTensor(),
            transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])
        ])
        validation_data_transformation = transforms.Compose([
            transforms.Resize((224, 224)),
            transforms.ToTensor(),
            transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])
        ])
    return training_data_transformation, validation_data_transformation

Using device: cuda


### ShuffleNet
Efficiency and Speed:
ShuffleNet is known for its lightweight architecture and fast inference, which makes it especially useful when computational resources are limited or when the model needs to run on mobile or embedded devices.

Channel Shuffle Mechanism:
Its channel shuffle operation mixes features effectively, striking a balance between performance and computational cost. This is particularly beneficial when rapid processing is required.

### Inception
Multi-Scale Feature Extraction:
Inception (specifically Inception v3) employs multiple parallel convolutional filters of various sizes. This design enables the model to capture details at different scales, which is essential for accurately distinguishing among diverse food items that vary in texture and detail.

Auxiliary Classifiers and Higher Resolution:
The use of auxiliary classifiers improves gradient flow during training, aiding in faster convergence and reducing the risk of overfitting. Additionally, Inception’s requirement for 299×299 input images allows it to retain more image detail—an important factor when subtle differences matter.

### MobileNet
Optimized for Mobile and Embedded Systems:
MobileNet V3 is designed with efficiency in mind. Its architecture, which relies on depthwise separable convolutions, significantly reduces both the number of parameters and the computational overhead, making it ideal for applications on resource-constrained devices.

Balance of Speed and Accuracy:
Despite its streamlined design, MobileNet maintains competitive accuracy. This balance ensures that the model performs well on the Food-11 dataset while still delivering fast and responsive performance.

### Suitability for the Food-11 Task
Handling Diverse Visual Features:
Food-11 includes a variety of food items where both fine details and overall shapes are important. Inception’s ability to extract multi-scale features is especially advantageous here, while the efficiency of ShuffleNet and MobileNet allows for effective deployment in settings with limited resources.

Practical Deployment Considerations:
The combination of these models provides flexibility. In scenarios where high accuracy is paramount and computational resources are ample, Inception is a strong candidate. Conversely, for applications requiring rapid inference on mobile devices, ShuffleNet and MobileNet are excellent choices.

## Step 2: For each chosen model

### a. Load the pre-trained model and modify the classification head
Load the pre-trained model and modify the classification head (the final fully connected layer) to match the number of classes in the Food-11 dataset.

In [None]:
number_classes = 11
training_directory = "data/training"
validation_directory   = "data/validation"

def loading_the_model_function(model_name):
    if model_name == "shufflenet":
        model = models.shufflenet_v2_x1_0(pretrained=True)
        model.fc = nn.Linear(model.fc.in_features, number_classes)
    elif model_name == "inception":
        model = models.inception_v3(pretrained=True, aux_logits=True)
        model.fc = nn.Linear(model.fc.in_features, number_classes)
    elif model_name == "mobilenet":
        model = models.mobilenet_v3_large(pretrained=True)
        model.classifier[-1] = nn.Linear(model.classifier[-1].in_features, number_classes)
    else:
        raise ValueError("Model not supported.")
    return model.to(device)

### b. Fine-tune the model
Fine-tune the model. Experiment with different hyperparameter settings (learning rate, batch size, etc.) to optimize performance. Explain your tuning strategy.

In [None]:
def train_the_model_function(model, train_loader, val_loader, learning_rate, num_epochs, model_exp_name):
    loss_function = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=learning_rate)
    training_losses, validation_losses = [], []
    best_val_loss = float('inf')
    best_model_state = None
    best_val_acc = 0.0

    for epoch in range(num_epochs):
        model.train()
        running_train_loss = 0.0

        for images, labels in tqdm(train_loader, desc=f"Training {model_exp_name} Epoch {epoch+1}/{num_epochs}"):
            images, labels = images.to(device), labels.to(device)
            optimizer.zero_grad()
            outputs = model(images)
            if hasattr(outputs, 'logits'):
                outputs = outputs.logits
            loss = loss_function(outputs, labels)
            loss.backward()
            optimizer.step()
            running_train_loss += loss.item() * images.size(0)

        epoch_train_loss = running_train_loss / len(train_loader.dataset)
        training_losses.append(epoch_train_loss)
        model.eval()
        running_val_loss = 0.0
        correct = 0
        total = 0

        with torch.no_grad():

            for images, labels in val_loader:
                images, labels = images.to(device), labels.to(device)
                outputs = model(images)
                if hasattr(outputs, 'logits'):
                    outputs = outputs.logits
                loss = loss_function(outputs, labels)
                running_val_loss += loss.item() * images.size(0)
                _, predicted = torch.max(outputs, 1)
                correct += (predicted == labels).sum().item()
                total += labels.size(0)

        epoch_val_loss = running_val_loss / len(val_loader.dataset)
        validation_losses.append(epoch_val_loss)
        val_acc = correct / total
        print(f"{model_exp_name} Epoch {epoch+1}: Train Loss={epoch_train_loss:.4f}, Val Loss={epoch_val_loss:.4f}, Val Acc={val_acc:.4f}")
        if epoch_val_loss < best_val_loss:
            best_val_loss = epoch_val_loss
            best_val_acc = val_acc
            best_model_state = model.state_dict()
            
    model.load_state_dict(best_model_state)
    return model, training_losses, validation_losses, best_val_loss, best_val_acc


In [None]:
hyperparameters_configurations = [
    {"learning_rate": 0.0005, "batch_size": 32, "num_epochs": 10},
    {"learning_rate": 0.001,  "batch_size": 32, "num_epochs": 10},
    {"learning_rate": 0.0005, "batch_size": 64, "num_epochs": 10}
]
models_to_train = ["inception", "mobilenet"]
results = {}

for model_name in models_to_train:
    results[model_name] = {}
    training_data_transformation, validation_data_transformation = transformation_function(model_name)

    for a, config in enumerate(hyperparameters_configurations, start=1):
        experiment_name = f"{model_name}_exp{a}"
        print(f"\n=== Running Experiment: {experiment_name} ===")
        training_dataset = ImageFolder(root=training_directory, transform=training_data_transformation)
        validation_dataset   = ImageFolder(root=validation_directory, transform=validation_data_transformation)
        train_loader = DataLoader(training_dataset, batch_size=config["batch_size"], shuffle=True)
        val_loader   = DataLoader(validation_dataset, batch_size=config["batch_size"], shuffle=False)
        model = loading_the_model_function(model_name)
        trained_model, training_losses, validation_losses, best_val_loss, best_val_acc = train_the_model_function(
            model, train_loader, val_loader,
            learning_rate=config["learning_rate"],
            num_epochs=config["num_epochs"],
            model_exp_name=experiment_name
        )
        torch.save(trained_model.state_dict(), f"{experiment_name}_best.pth")
        results[model_name][experiment_name] = {
            "config": config,
            "training_losses": training_losses,
            "validation_losses": validation_losses,
            "best_val_loss": best_val_loss,
            "best_val_acc": best_val_acc
        }

print("\n=== Summary of Experiments ===")

for model_name in results:
    
    for experiment_name, res in results[model_name].items():
        print(f"{experiment_name}: Config: {res['config']} | Best Val Loss: {res['best_val_loss']:.4f} | Best Val Acc: {res['best_val_acc']:.4f}")


=== Running Experiment: inception_exp1 ===


Training inception_exp1 Epoch 1/10: 100%|██████████| 309/309 [02:28<00:00,  2.08it/s]


inception_exp1 Epoch 1: Train Loss=0.8551, Val Loss=0.8925, Val Acc=0.7236


Training inception_exp1 Epoch 2/10: 100%|██████████| 309/309 [01:29<00:00,  3.45it/s]


inception_exp1 Epoch 2: Train Loss=0.5599, Val Loss=0.5518, Val Acc=0.8213


Training inception_exp1 Epoch 3/10: 100%|██████████| 309/309 [01:29<00:00,  3.44it/s]


inception_exp1 Epoch 3: Train Loss=0.4603, Val Loss=0.5029, Val Acc=0.8455


Training inception_exp1 Epoch 4/10: 100%|██████████| 309/309 [01:29<00:00,  3.45it/s]


inception_exp1 Epoch 4: Train Loss=0.3794, Val Loss=0.4944, Val Acc=0.8504


Training inception_exp1 Epoch 5/10: 100%|██████████| 309/309 [01:29<00:00,  3.43it/s]


inception_exp1 Epoch 5: Train Loss=0.3118, Val Loss=0.4784, Val Acc=0.8516


Training inception_exp1 Epoch 6/10: 100%|██████████| 309/309 [01:29<00:00,  3.45it/s]


inception_exp1 Epoch 6: Train Loss=0.2981, Val Loss=0.5771, Val Acc=0.8309


Training inception_exp1 Epoch 7/10: 100%|██████████| 309/309 [01:29<00:00,  3.44it/s]


inception_exp1 Epoch 7: Train Loss=0.2582, Val Loss=0.5962, Val Acc=0.8210


Training inception_exp1 Epoch 8/10: 100%|██████████| 309/309 [01:29<00:00,  3.44it/s]


inception_exp1 Epoch 8: Train Loss=0.2280, Val Loss=0.4029, Val Acc=0.8770


Training inception_exp1 Epoch 9/10: 100%|██████████| 309/309 [01:30<00:00,  3.43it/s]


inception_exp1 Epoch 9: Train Loss=0.2161, Val Loss=0.5993, Val Acc=0.8309


Training inception_exp1 Epoch 10/10: 100%|██████████| 309/309 [01:29<00:00,  3.44it/s]


inception_exp1 Epoch 10: Train Loss=0.1813, Val Loss=0.5301, Val Acc=0.8542

=== Running Experiment: inception_exp2 ===


Training inception_exp2 Epoch 1/10: 100%|██████████| 309/309 [01:31<00:00,  3.36it/s]


inception_exp2 Epoch 1: Train Loss=1.1615, Val Loss=1.0543, Val Acc=0.6641


Training inception_exp2 Epoch 2/10: 100%|██████████| 309/309 [01:31<00:00,  3.39it/s]


inception_exp2 Epoch 2: Train Loss=0.8281, Val Loss=1.2044, Val Acc=0.6408


Training inception_exp2 Epoch 3/10: 100%|██████████| 309/309 [01:30<00:00,  3.40it/s]


inception_exp2 Epoch 3: Train Loss=0.6745, Val Loss=0.9630, Val Acc=0.7143


Training inception_exp2 Epoch 4/10: 100%|██████████| 309/309 [01:31<00:00,  3.39it/s]


inception_exp2 Epoch 4: Train Loss=0.5905, Val Loss=0.7614, Val Acc=0.7668


Training inception_exp2 Epoch 5/10: 100%|██████████| 309/309 [01:30<00:00,  3.40it/s]


inception_exp2 Epoch 5: Train Loss=0.5262, Val Loss=0.7996, Val Acc=0.7583


Training inception_exp2 Epoch 6/10: 100%|██████████| 309/309 [01:31<00:00,  3.39it/s]


inception_exp2 Epoch 6: Train Loss=0.4659, Val Loss=0.6333, Val Acc=0.8029


Training inception_exp2 Epoch 7/10: 100%|██████████| 309/309 [01:32<00:00,  3.35it/s]


inception_exp2 Epoch 7: Train Loss=0.4225, Val Loss=0.6154, Val Acc=0.8122


Training inception_exp2 Epoch 8/10: 100%|██████████| 309/309 [01:31<00:00,  3.39it/s]


inception_exp2 Epoch 8: Train Loss=0.3790, Val Loss=0.6886, Val Acc=0.7991


Training inception_exp2 Epoch 9/10: 100%|██████████| 309/309 [01:31<00:00,  3.37it/s]


inception_exp2 Epoch 9: Train Loss=0.3470, Val Loss=0.5587, Val Acc=0.8338


Training inception_exp2 Epoch 10/10: 100%|██████████| 309/309 [01:31<00:00,  3.39it/s]


inception_exp2 Epoch 10: Train Loss=0.3151, Val Loss=0.5034, Val Acc=0.8452

=== Running Experiment: inception_exp3 ===


Training inception_exp3 Epoch 1/10: 100%|██████████| 155/155 [01:28<00:00,  1.76it/s]


inception_exp3 Epoch 1: Train Loss=0.7334, Val Loss=0.7489, Val Acc=0.7545


Training inception_exp3 Epoch 2/10: 100%|██████████| 155/155 [01:27<00:00,  1.77it/s]


inception_exp3 Epoch 2: Train Loss=0.4194, Val Loss=0.6259, Val Acc=0.8052


Training inception_exp3 Epoch 3/10: 100%|██████████| 155/155 [01:27<00:00,  1.77it/s]


inception_exp3 Epoch 3: Train Loss=0.3125, Val Loss=0.7090, Val Acc=0.7904


Training inception_exp3 Epoch 4/10: 100%|██████████| 155/155 [01:29<00:00,  1.73it/s]


inception_exp3 Epoch 4: Train Loss=0.2897, Val Loss=0.5879, Val Acc=0.8120


Training inception_exp3 Epoch 5/10: 100%|██████████| 155/155 [01:27<00:00,  1.76it/s]


inception_exp3 Epoch 5: Train Loss=0.2394, Val Loss=0.4608, Val Acc=0.8665


Training inception_exp3 Epoch 6/10: 100%|██████████| 155/155 [01:27<00:00,  1.77it/s]


inception_exp3 Epoch 6: Train Loss=0.2112, Val Loss=0.4387, Val Acc=0.8638


Training inception_exp3 Epoch 7/10: 100%|██████████| 155/155 [01:28<00:00,  1.75it/s]


inception_exp3 Epoch 7: Train Loss=0.1788, Val Loss=0.4587, Val Acc=0.8691


Training inception_exp3 Epoch 8/10: 100%|██████████| 155/155 [01:28<00:00,  1.76it/s]


inception_exp3 Epoch 8: Train Loss=0.1667, Val Loss=0.6109, Val Acc=0.8341


Training inception_exp3 Epoch 9/10: 100%|██████████| 155/155 [01:28<00:00,  1.75it/s]


inception_exp3 Epoch 9: Train Loss=0.1095, Val Loss=0.6372, Val Acc=0.8408


Training inception_exp3 Epoch 10/10: 100%|██████████| 155/155 [01:27<00:00,  1.77it/s]


inception_exp3 Epoch 10: Train Loss=0.1613, Val Loss=0.4851, Val Acc=0.8717

=== Running Experiment: mobilenet_exp1 ===


Training mobilenet_exp1 Epoch 1/10: 100%|██████████| 309/309 [01:15<00:00,  4.11it/s]


mobilenet_exp1 Epoch 1: Train Loss=0.6697, Val Loss=0.4378, Val Acc=0.8598


Training mobilenet_exp1 Epoch 2/10: 100%|██████████| 309/309 [01:14<00:00,  4.16it/s]


mobilenet_exp1 Epoch 2: Train Loss=0.3400, Val Loss=0.4188, Val Acc=0.8630


Training mobilenet_exp1 Epoch 3/10: 100%|██████████| 309/309 [01:14<00:00,  4.14it/s]


mobilenet_exp1 Epoch 3: Train Loss=0.2377, Val Loss=0.5434, Val Acc=0.8440


Training mobilenet_exp1 Epoch 4/10: 100%|██████████| 309/309 [01:13<00:00,  4.20it/s]


mobilenet_exp1 Epoch 4: Train Loss=0.1997, Val Loss=0.6146, Val Acc=0.8306


Training mobilenet_exp1 Epoch 5/10: 100%|██████████| 309/309 [01:14<00:00,  4.14it/s]


mobilenet_exp1 Epoch 5: Train Loss=0.1627, Val Loss=0.4228, Val Acc=0.8755


Training mobilenet_exp1 Epoch 6/10: 100%|██████████| 309/309 [01:13<00:00,  4.20it/s]


mobilenet_exp1 Epoch 6: Train Loss=0.1469, Val Loss=0.4620, Val Acc=0.8723


Training mobilenet_exp1 Epoch 7/10: 100%|██████████| 309/309 [01:14<00:00,  4.14it/s]


mobilenet_exp1 Epoch 7: Train Loss=0.1270, Val Loss=0.5601, Val Acc=0.8484


Training mobilenet_exp1 Epoch 8/10: 100%|██████████| 309/309 [01:13<00:00,  4.18it/s]


mobilenet_exp1 Epoch 8: Train Loss=0.1148, Val Loss=0.5659, Val Acc=0.8603


Training mobilenet_exp1 Epoch 9/10: 100%|██████████| 309/309 [01:13<00:00,  4.20it/s]


mobilenet_exp1 Epoch 9: Train Loss=0.1116, Val Loss=0.4442, Val Acc=0.8875


Training mobilenet_exp1 Epoch 10/10: 100%|██████████| 309/309 [01:14<00:00,  4.15it/s]


mobilenet_exp1 Epoch 10: Train Loss=0.0930, Val Loss=0.5917, Val Acc=0.8571

=== Running Experiment: mobilenet_exp2 ===


Training mobilenet_exp2 Epoch 1/10: 100%|██████████| 309/309 [01:13<00:00,  4.23it/s]


mobilenet_exp2 Epoch 1: Train Loss=0.7867, Val Loss=0.9222, Val Acc=0.7423


Training mobilenet_exp2 Epoch 2/10: 100%|██████████| 309/309 [01:14<00:00,  4.14it/s]


mobilenet_exp2 Epoch 2: Train Loss=0.4659, Val Loss=1.7529, Val Acc=0.5904


Training mobilenet_exp2 Epoch 3/10: 100%|██████████| 309/309 [01:17<00:00,  3.99it/s]


mobilenet_exp2 Epoch 3: Train Loss=0.3808, Val Loss=0.5728, Val Acc=0.8329


Training mobilenet_exp2 Epoch 4/10: 100%|██████████| 309/309 [01:14<00:00,  4.17it/s]


mobilenet_exp2 Epoch 4: Train Loss=0.3494, Val Loss=0.5780, Val Acc=0.8376


Training mobilenet_exp2 Epoch 5/10: 100%|██████████| 309/309 [01:15<00:00,  4.09it/s]


mobilenet_exp2 Epoch 5: Train Loss=0.2787, Val Loss=0.6113, Val Acc=0.8210


Training mobilenet_exp2 Epoch 6/10: 100%|██████████| 309/309 [01:14<00:00,  4.15it/s]


mobilenet_exp2 Epoch 6: Train Loss=0.2608, Val Loss=0.8426, Val Acc=0.7886


Training mobilenet_exp2 Epoch 7/10: 100%|██████████| 309/309 [01:14<00:00,  4.13it/s]


mobilenet_exp2 Epoch 7: Train Loss=0.2368, Val Loss=0.5821, Val Acc=0.8490


Training mobilenet_exp2 Epoch 8/10: 100%|██████████| 309/309 [01:15<00:00,  4.12it/s]


mobilenet_exp2 Epoch 8: Train Loss=0.2241, Val Loss=0.6655, Val Acc=0.8277


Training mobilenet_exp2 Epoch 9/10: 100%|██████████| 309/309 [01:13<00:00,  4.19it/s]


mobilenet_exp2 Epoch 9: Train Loss=0.1912, Val Loss=0.7307, Val Acc=0.8163


Training mobilenet_exp2 Epoch 10/10: 100%|██████████| 309/309 [01:14<00:00,  4.15it/s]


mobilenet_exp2 Epoch 10: Train Loss=0.1778, Val Loss=0.6153, Val Acc=0.8431

=== Running Experiment: mobilenet_exp3 ===


Training mobilenet_exp3 Epoch 1/10: 100%|██████████| 155/155 [01:08<00:00,  2.26it/s]


mobilenet_exp3 Epoch 1: Train Loss=0.6593, Val Loss=0.5335, Val Acc=0.8388


Training mobilenet_exp3 Epoch 2/10: 100%|██████████| 155/155 [01:08<00:00,  2.27it/s]


mobilenet_exp3 Epoch 2: Train Loss=0.2653, Val Loss=0.4475, Val Acc=0.8676


Training mobilenet_exp3 Epoch 3/10: 100%|██████████| 155/155 [01:08<00:00,  2.27it/s]


mobilenet_exp3 Epoch 3: Train Loss=0.1786, Val Loss=0.5296, Val Acc=0.8580


Training mobilenet_exp3 Epoch 4/10: 100%|██████████| 155/155 [01:08<00:00,  2.26it/s]


mobilenet_exp3 Epoch 4: Train Loss=0.1435, Val Loss=0.4464, Val Acc=0.8799


Training mobilenet_exp3 Epoch 5/10: 100%|██████████| 155/155 [01:09<00:00,  2.24it/s]


mobilenet_exp3 Epoch 5: Train Loss=0.1051, Val Loss=0.5860, Val Acc=0.8647


Training mobilenet_exp3 Epoch 6/10: 100%|██████████| 155/155 [01:07<00:00,  2.28it/s]


mobilenet_exp3 Epoch 6: Train Loss=0.0935, Val Loss=0.5347, Val Acc=0.8653


Training mobilenet_exp3 Epoch 7/10: 100%|██████████| 155/155 [01:08<00:00,  2.28it/s]


mobilenet_exp3 Epoch 7: Train Loss=0.0862, Val Loss=0.5808, Val Acc=0.8633


Training mobilenet_exp3 Epoch 8/10: 100%|██████████| 155/155 [01:08<00:00,  2.26it/s]


mobilenet_exp3 Epoch 8: Train Loss=0.0586, Val Loss=0.5473, Val Acc=0.8688


Training mobilenet_exp3 Epoch 9/10: 100%|██████████| 155/155 [01:07<00:00,  2.29it/s]


mobilenet_exp3 Epoch 9: Train Loss=0.0481, Val Loss=0.5398, Val Acc=0.8773


Training mobilenet_exp3 Epoch 10/10: 100%|██████████| 155/155 [01:08<00:00,  2.27it/s]


mobilenet_exp3 Epoch 10: Train Loss=0.0851, Val Loss=0.5209, Val Acc=0.8799

=== Summary of Experiments ===
inception_exp1: Config: {'learning_rate': 0.0005, 'batch_size': 32, 'num_epochs': 10} | Best Val Loss: 0.4029 | Best Val Acc: 0.8770
inception_exp2: Config: {'learning_rate': 0.001, 'batch_size': 32, 'num_epochs': 10} | Best Val Loss: 0.5034 | Best Val Acc: 0.8452
inception_exp3: Config: {'learning_rate': 0.0005, 'batch_size': 64, 'num_epochs': 10} | Best Val Loss: 0.4387 | Best Val Acc: 0.8638
mobilenet_exp1: Config: {'learning_rate': 0.0005, 'batch_size': 32, 'num_epochs': 10} | Best Val Loss: 0.4188 | Best Val Acc: 0.8630
mobilenet_exp2: Config: {'learning_rate': 0.001, 'batch_size': 32, 'num_epochs': 10} | Best Val Loss: 0.5728 | Best Val Acc: 0.8329
mobilenet_exp3: Config: {'learning_rate': 0.0005, 'batch_size': 64, 'num_epochs': 10} | Best Val Loss: 0.4464 | Best Val Acc: 0.8799


Our strategy involves exploring a few key combinations of hyperparameters to observe how they influence the model's performance. 

Learning Rate:
We compare 0.0005 and 0.001. A lower learning rate (0.0005) might result in a slower yet more stable convergence, whereas a higher learning rate (0.001) can accelerate training but might risk overshooting optimal points.

Batch Size:
We test batch sizes of 32 and 64. A smaller batch size (32) introduces more variability in gradient estimation, which can help the model escape local minima. In contrast, a larger batch size (64) usually provides more stable updates, although it requires more memory.

Epochs:
By fixing the number of epochs to 10 for each configuration, we ensure that the only differences in performance are due to the learning rate and batch size.

### c. Evaluate the performance of each fine-tuned model
Evaluate the performance of each fine-tuned model on the Food-11 dataset.

In [None]:
models_to_train = ["shufflenet"]
results = {}

for model_name in models_to_train:
    results[model_name] = {}
    training_data_transformation, validation_data_transformation = transformation_function(model_name)

    for a, config in enumerate(hyperparameters_configurations, start=1):
        experiment_name = f"{model_name}_exp{a}"
        print(f"\n=== Running Experiment: {experiment_name} ===")
        training_dataset = ImageFolder(root=training_directory, transform=training_data_transformation)
        validation_dataset   = ImageFolder(root=validation_directory, transform=validation_data_transformation)
        train_loader = DataLoader(training_dataset, batch_size=config["batch_size"], shuffle=True)
        val_loader   = DataLoader(validation_dataset, batch_size=config["batch_size"], shuffle=False)
        model = loading_the_model_function(model_name)
        trained_model, training_losses, validation_losses, best_val_loss, best_val_acc = train_the_model_function(
            model, train_loader, val_loader,
            learning_rate=config["learning_rate"],
            num_epochs=config["num_epochs"],
            model_exp_name=experiment_name
        )
        torch.save(trained_model.state_dict(), f"{experiment_name}_best.pth")
        results[model_name][experiment_name] = {
            "config": config,
            "training_losses": training_losses,
            "validation_losses": validation_losses,
            "best_val_loss": best_val_loss,
            "best_val_acc": best_val_acc
        }

print("\n=== Summary of Experiments ===")

for model_name in results:
    
    for experiment_name, res in results[model_name].items():
        print(f"{experiment_name}: Config: {res['config']} | Best Val Loss: {res['best_val_loss']:.4f} | Best Val Acc: {res['best_val_acc']:.4f}")


=== Running Experiment: shufflenet_exp1 ===


Training shufflenet_exp1 Epoch 1/10: 100%|██████████| 309/309 [01:11<00:00,  4.31it/s]


shufflenet_exp1 Epoch 1: Train Loss=1.2059, Val Loss=0.5976, Val Acc=0.8178


Training shufflenet_exp1 Epoch 2/10: 100%|██████████| 309/309 [01:11<00:00,  4.33it/s]


shufflenet_exp1 Epoch 2: Train Loss=0.5280, Val Loss=0.4475, Val Acc=0.8583


Training shufflenet_exp1 Epoch 3/10: 100%|██████████| 309/309 [01:11<00:00,  4.30it/s]


shufflenet_exp1 Epoch 3: Train Loss=0.3863, Val Loss=0.4404, Val Acc=0.8560


Training shufflenet_exp1 Epoch 4/10: 100%|██████████| 309/309 [01:11<00:00,  4.29it/s]


shufflenet_exp1 Epoch 4: Train Loss=0.3111, Val Loss=0.4265, Val Acc=0.8574


Training shufflenet_exp1 Epoch 5/10: 100%|██████████| 309/309 [01:12<00:00,  4.29it/s]


shufflenet_exp1 Epoch 5: Train Loss=0.2526, Val Loss=0.3964, Val Acc=0.8802


Training shufflenet_exp1 Epoch 6/10: 100%|██████████| 309/309 [01:12<00:00,  4.27it/s]


shufflenet_exp1 Epoch 6: Train Loss=0.2196, Val Loss=0.4410, Val Acc=0.8688


Training shufflenet_exp1 Epoch 7/10: 100%|██████████| 309/309 [01:11<00:00,  4.31it/s]


shufflenet_exp1 Epoch 7: Train Loss=0.1833, Val Loss=0.4018, Val Acc=0.8825


Training shufflenet_exp1 Epoch 8/10: 100%|██████████| 309/309 [01:12<00:00,  4.27it/s]


shufflenet_exp1 Epoch 8: Train Loss=0.1768, Val Loss=0.4489, Val Acc=0.8691


Training shufflenet_exp1 Epoch 9/10: 100%|██████████| 309/309 [01:11<00:00,  4.34it/s]


shufflenet_exp1 Epoch 9: Train Loss=0.1452, Val Loss=0.4323, Val Acc=0.8764


Training shufflenet_exp1 Epoch 10/10: 100%|██████████| 309/309 [01:11<00:00,  4.30it/s]


shufflenet_exp1 Epoch 10: Train Loss=0.1309, Val Loss=0.4417, Val Acc=0.8773

=== Running Experiment: shufflenet_exp2 ===


Training shufflenet_exp2 Epoch 1/10: 100%|██████████| 309/309 [01:11<00:00,  4.34it/s]


shufflenet_exp2 Epoch 1: Train Loss=1.0484, Val Loss=0.6717, Val Acc=0.7735


Training shufflenet_exp2 Epoch 2/10: 100%|██████████| 309/309 [01:11<00:00,  4.31it/s]


shufflenet_exp2 Epoch 2: Train Loss=0.5546, Val Loss=0.5046, Val Acc=0.8417


Training shufflenet_exp2 Epoch 3/10: 100%|██████████| 309/309 [01:11<00:00,  4.33it/s]


shufflenet_exp2 Epoch 3: Train Loss=0.4463, Val Loss=0.4406, Val Acc=0.8569


Training shufflenet_exp2 Epoch 4/10: 100%|██████████| 309/309 [01:12<00:00,  4.29it/s]


shufflenet_exp2 Epoch 4: Train Loss=0.3761, Val Loss=0.4856, Val Acc=0.8513


Training shufflenet_exp2 Epoch 5/10: 100%|██████████| 309/309 [01:11<00:00,  4.33it/s]


shufflenet_exp2 Epoch 5: Train Loss=0.3205, Val Loss=0.4632, Val Acc=0.8583


Training shufflenet_exp2 Epoch 6/10: 100%|██████████| 309/309 [01:11<00:00,  4.33it/s]


shufflenet_exp2 Epoch 6: Train Loss=0.3005, Val Loss=0.4924, Val Acc=0.8481


Training shufflenet_exp2 Epoch 7/10: 100%|██████████| 309/309 [01:10<00:00,  4.36it/s]


shufflenet_exp2 Epoch 7: Train Loss=0.2521, Val Loss=0.4546, Val Acc=0.8586


Training shufflenet_exp2 Epoch 8/10: 100%|██████████| 309/309 [01:12<00:00,  4.29it/s]


shufflenet_exp2 Epoch 8: Train Loss=0.2412, Val Loss=0.4669, Val Acc=0.8615


Training shufflenet_exp2 Epoch 9/10: 100%|██████████| 309/309 [01:11<00:00,  4.33it/s]


shufflenet_exp2 Epoch 9: Train Loss=0.2277, Val Loss=0.5190, Val Acc=0.8513


Training shufflenet_exp2 Epoch 10/10: 100%|██████████| 309/309 [01:11<00:00,  4.31it/s]


shufflenet_exp2 Epoch 10: Train Loss=0.2053, Val Loss=0.5418, Val Acc=0.8542

=== Running Experiment: shufflenet_exp3 ===


Training shufflenet_exp3 Epoch 1/10: 100%|██████████| 155/155 [01:06<00:00,  2.33it/s]


shufflenet_exp3 Epoch 1: Train Loss=1.4357, Val Loss=0.7105, Val Acc=0.7679


Training shufflenet_exp3 Epoch 2/10: 100%|██████████| 155/155 [01:06<00:00,  2.33it/s]


shufflenet_exp3 Epoch 2: Train Loss=0.5624, Val Loss=0.4902, Val Acc=0.8484


Training shufflenet_exp3 Epoch 3/10: 100%|██████████| 155/155 [01:06<00:00,  2.32it/s]


shufflenet_exp3 Epoch 3: Train Loss=0.3813, Val Loss=0.4222, Val Acc=0.8688


Training shufflenet_exp3 Epoch 4/10: 100%|██████████| 155/155 [01:07<00:00,  2.30it/s]


shufflenet_exp3 Epoch 4: Train Loss=0.3039, Val Loss=0.4158, Val Acc=0.8676


Training shufflenet_exp3 Epoch 5/10: 100%|██████████| 155/155 [01:06<00:00,  2.34it/s]


shufflenet_exp3 Epoch 5: Train Loss=0.2315, Val Loss=0.4181, Val Acc=0.8697


Training shufflenet_exp3 Epoch 6/10: 100%|██████████| 155/155 [01:06<00:00,  2.33it/s]


shufflenet_exp3 Epoch 6: Train Loss=0.1861, Val Loss=0.4145, Val Acc=0.8781


Training shufflenet_exp3 Epoch 7/10: 100%|██████████| 155/155 [01:06<00:00,  2.33it/s]


shufflenet_exp3 Epoch 7: Train Loss=0.1555, Val Loss=0.4890, Val Acc=0.8598


Training shufflenet_exp3 Epoch 8/10: 100%|██████████| 155/155 [01:06<00:00,  2.34it/s]


shufflenet_exp3 Epoch 8: Train Loss=0.1420, Val Loss=0.4805, Val Acc=0.8644


Training shufflenet_exp3 Epoch 9/10: 100%|██████████| 155/155 [01:06<00:00,  2.33it/s]


shufflenet_exp3 Epoch 9: Train Loss=0.1212, Val Loss=0.4225, Val Acc=0.8799


Training shufflenet_exp3 Epoch 10/10: 100%|██████████| 155/155 [01:06<00:00,  2.33it/s]


shufflenet_exp3 Epoch 10: Train Loss=0.1122, Val Loss=0.4302, Val Acc=0.8764

=== Summary of Experiments ===
shufflenet_exp1: Config: {'learning_rate': 0.0005, 'batch_size': 32, 'num_epochs': 10} | Best Val Loss: 0.3964 | Best Val Acc: 0.8802
shufflenet_exp2: Config: {'learning_rate': 0.001, 'batch_size': 32, 'num_epochs': 10} | Best Val Loss: 0.4406 | Best Val Acc: 0.8569
shufflenet_exp3: Config: {'learning_rate': 0.0005, 'batch_size': 64, 'num_epochs': 10} | Best Val Loss: 0.4145 | Best Val Acc: 0.8781


'''''''''''''### d. Compare the results obtained with the different pre-trained models
Discuss which model performed best and analyze the reasons for the observed differences in performance.

ShuffleNet:
The ShuffleNet configuration with a learning rate of 0.0005 and a batch size of 32 achieved the highest validation accuracy at approximately 88.02% with the lowest loss. Its design, which emphasizes efficiency through channel shuffling, likely helped it capture essential features effectively while keeping the model lightweight and less prone to overfitting.

MobileNet:
MobileNet’s best outcome came with a configuration using a learning rate of 0.0005 and a batch size of 64, resulting in a validation accuracy around 87.99%. While MobileNet is also built for efficiency, its slightly lower performance suggests that it might be more sensitive to the chosen batch size. This sensitivity means that even small changes in training parameters can have a noticeable impact on its overall performance.

Inception:
Inception, known for its multi-scale feature extraction and the use of auxiliary classifiers, achieved a validation accuracy of about 87.70% with a learning rate of 0.0005 and a batch size of 32. Although its architectural strengths allow it to capture a wide variety of features, it appears that Inception may be more susceptible to the exact hyperparameter settings. Minor adjustments in learning rate or batch size could lead to different outcomes, which is reflected in its performance.

Overall Assessment:
While all models performed strongly, ShuffleNet slightly outperformed the others in this particular set of experiments. The observed differences are likely due to each model's architecture and how they respond to specific training conditions. A lower learning rate (0.0005) seemed to provide more stable convergence across the board, and a smaller batch size (32) generally helped improve the performance by introducing a healthy level of stochasticity in the gradient updates.

In conclusion, ShuffleNet emerged as the best performer in this scenario, but the differences are marginal.

## Step 3: References
Include details on all the resources used to complete this part.

1) dataset link - https://www.kaggle.com/datasets/trolukovich/food11-image-dataset <br>
2) https://www.geeksforgeeks.org/inception-v2-and-v3-inception-network-versions/ <br>
3) https://pytorch.org/hub/pytorch_vision_shufflenet_v2/ <br>
4) https://keras.io/api/applications/mobilenet/ <br>
5) https://www.geeksforgeeks.org/image-resizing-using-opencv-python/ <br>
6) https://www.geeksforgeeks.org/how-to-normalize-images-in-pytorch/