## BBM 409 - Programming Assignment 3

###  GAZİ KAĞAN SOYSAL 2210356050

## 1. Implementing a CNN from Scratch

### 1.1. Introduction

**TASK:** In this task, we determine which animal the images are by applying filters that extract corner, edge, etc. features from the photos of animals using CNN architecture.

##### What are the main components of a CNN architecture?
- Convolutional Layers: Extract features using filters/kernels.
- Pooling Layers: Downsample feature maps to reduce dimensionality.
- Activation Functions: Introduce non-linearity (e.g., ReLU).
- Fully Connected Layers: Map extracted features to output classes.
- Dropout/Regularization: Prevent overfitting.

##### Why do we use CNNs in image classification?
- CNNs automatically learn spatial hierarchies of features (edges, shapes, textures).
- They reduce parameters compared to fully connected networks, improving efficiency.
- Translation invariance via pooling layers ensures robustness to position shifts.

**Dataset:** The dataset contains a total of 4500 images, 450 for each of 10 different animal species.

### 1.2. Data Loading and Preprocessing

In [1]:
## Import necessary libraries
import os
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
from torch.optim.lr_scheduler import StepLR
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from torch.utils.data import Subset, DataLoader, Dataset
from torch.utils.data import Subset
from collections import defaultdict
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

In [None]:
## Load the dataset using PyTorch's data loading utilities
## Apply necessary preprocessing such as resizing and normalization
dataset = datasets.ImageFolder(root="/content/pa3_subset_animal")

train_transform = transforms.Compose([
    transforms.Resize((256, 256)),
    transforms.RandomAffine(degrees=10, translate=(0.1, 0.1), scale=(0.8, 1.2)),
    transforms.RandomHorizontalFlip(),
    transforms.RandomRotation(10),
    transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.2),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])



test_val_transform = transforms.Compose([
    transforms.Resize((256, 256)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

- Since our dataset is a small dataset, it is necessary to diversify the dataset so that the model can learn more details.

- Increasing the data variety allows the model to capture finer details and learn better. That's why we apply various transformations to the train data (rotate, shift, etc.)

- We also set our images to a standard size (256,256) and normalize the pixel values ​​to make more consistent calculations.

- However, we do not apply these transformations for test and validation sets because we need to test the data on real-life data when testing.

In [None]:
class TransformedSubset(Dataset):
    def __init__(self, original_dataset, indices, transform=None):
        self.original_dataset = original_dataset
        self.indices = indices
        self.transform = transform

    def __getitem__(self, idx):
        image, label = self.original_dataset[self.indices[idx]]

        if self.transform:
            image = self.transform(image)

        return image, label

    def __len__(self):
        return len(self.indices)


- We use this class to apply transforms individually.

In [None]:
class_indices = defaultdict(list)
for idx, (image, label) in enumerate(dataset):
    class_indices[label].append(idx)

In [None]:
train_indices, val_indices, test_indices = [], [], []

for label, indices in class_indices.items():
    train_indices.extend(indices[:300])
    val_indices.extend(indices[300:375])
    test_indices.extend(indices[375:])

train_dataset = TransformedSubset(dataset, train_indices, train_transform)
val_dataset = TransformedSubset(dataset, val_indices, test_val_transform)
test_dataset = TransformedSubset(dataset, test_indices, test_val_transform)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=64, shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)

- We take equal number of images from each class for train, validation and test set. 300 images for train set, 75 images for validation and test set from each class.

- We use loaders to get data by dividing it. These loaders provide space and ease of calculation by bringing data together in batches. There are 64 images in each batch.

### 1.3. Define your CNN architecture

##### Reason of Architecture:
- 5 layers is a suitable number to provide balance between overfitting and underfitting.

- Increasing number of filters allows to detect more detailed features as layers get deeper.

- Applying padding as 1 and setting kernel size as 3*3 prevents data loss and pays attention to details by calculating smaller regions.

- Applying pooling as 2*2 and setting stride as 1 summarizes the data and prevents data loss.

- The 1st fully connected layer brings together the information obtained. The 2nd fully connected layer provides probability information about which class the image belongs to.

##### Why is used ReLU for activation function?
- The ReLu function equates negative values ​​to 0, which creates non-linearity. Thus, more complex relationships are learned.
- Gradients are calculated faster because the derivative can be simply taken.

In [None]:
## Design CNN architecture
class CNNModel(nn.Module):
    def __init__(self):
        super(CNNModel, self).__init__()

        self.conv1 = nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(16, 32, 3, padding=1)
        self.conv3 = nn.Conv2d(32, 64, 3, padding=1)
        self.conv4 = nn.Conv2d(64, 128, 3, padding=1)
        self.conv5 = nn.Conv2d(128, 256, 3, padding=1)

        self.pool = nn.MaxPool2d(2, 2)

        self.fc1 = nn.Linear(256 * 8 * 8, 512)
        self.fc2 = nn.Linear(512, 10)

    def forward(self, X):
        X = self.pool(F.relu(self.conv1(X)))
        X = self.pool(F.relu(self.conv2(X)))
        X = self.pool(F.relu(self.conv3(X)))
        X = self.pool(F.relu(self.conv4(X)))
        X = self.pool(F.relu(self.conv5(X)))

        X = X.view(-1, 256 *8 * 8)

        X = F.relu(self.fc1(X))

        X = self.fc2(X)

        return X

### 1.4 Prepare the model for training

In [None]:
cnn_model = CNNModel()

In [None]:
## Define appropriate loss function for multi-class classification
loss_func = nn.CrossEntropyLoss()

- The Cross Entropy function depends on the probability of the true class of the samples. It tries to keep the probability of the true class as high as possible. This is suitable for our problem.

In [None]:
## Choose an optimizer (SGD or Adam) and set its parameters (e.g., learning rate)
optimizer = optim.Adam(cnn_model.parameters(), lr=0.0005, weight_decay=1e-5)

- Adam Optimization Algorithm can automatically adjust the learning rate for each parameter by looking at the history of the parameter's gradients.

- We define a weight decay value to prevent the weights from increasing too much. This penalizes high weights and ensures that the loss is high.

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
cnn_model = cnn_model.to(device)

- To train the model faster, we move the model to GPU.

### 1.5 Train and Validate the CNN model

In [None]:
## Iterate over the training dataset in mini-batches
## Implement forward pass, compute loss, and backward pass for gradient computation
## Update model parameters using the optimizer based on computed gradients
## Validate the model on the validation set periodically and plot the validation loss
## Repeat the training process for a suitable number of epochs (at least 30epochs)

train_losses = []
val_losses = []
val_accuracies = []

true_prediction = 0
total = 0

epochs = 40

for epoch in range(epochs):
    cnn_model.train()
    running_loss = 0.0
    for inputs, labels in train_loader:
        inputs, labels = inputs.to(device), labels.to(device)

        optimizer.zero_grad()

        outputs = cnn_model(inputs)

        loss = loss_func(outputs, labels)

        loss.backward()
        optimizer.step()

        running_loss += loss.item()

    train_loss = running_loss / len(train_loader)
    train_losses.append(train_loss)

    val_running_loss = 0.0
    cnn_model.eval()
    with torch.no_grad():
        for val_inputs, val_labels in val_loader:
            val_inputs, val_labels = val_inputs.to(device), val_labels.to(device)
            val_outputs = cnn_model(val_inputs)
            val_loss = loss_func(val_outputs, val_labels)
            val_running_loss += val_loss.item()

            _, predicted_target = torch.max(val_outputs.data, 1)
            total += val_labels.size(0)
            true_prediction += (predicted_target == val_labels).sum().item()

    val_loss = val_running_loss / len(val_loader)
    val_losses.append(val_loss)

    val_accuracy = 100 * true_prediction / total
    val_accuracies.append(val_accuracy)

    print(f"Epoch {epoch+1}/{epochs}, Train Loss: {train_loss:.4f}, Val Loss: {val_loss:.4f}, Val Accuracy: {val_accuracy:.2f}%")

In [None]:
plt.plot(train_losses, label='Training Loss')
plt.plot(val_losses, label='Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.title('Training and Validation Loss')
plt.show()

- The loss values ​​in the train set and validation set are as follows. The train loss value decreases as the model adapts to the train set. The validation loss value remains constant after a point. This indicates that the model cannot learn any more and if it starts to increase, overfitting occurs.

- The learning rate we chose (0.0005) is an optimal value for model training speed and balanced progress.

- We chose a batch size of 64 because a smaller batch size can cause the gradients to be noisier and the model to perform slower. A larger batch size increases the computational cost in RAM.

In [None]:
plt.plot(val_accuracies, label='Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.title('Validation Accuracy')
plt.show()

- Validation accuracy increases as the training process progresses, and after a certain point, the rate of increase slows down. At this point, we finish the training process while our model is at its best.

### 1.6 Evaluate the trained model on the test set

In [None]:
## Test the trained model on the test set to evaluate its performance

cnn_model.eval()

true_prediction = 0
total = 0

all_preds = []
all_labels = []

with torch.no_grad():
    for inputs, labels in test_loader:
        inputs, labels = inputs.to(device), labels.to(device)
        outputs = cnn_model(inputs)
        _, predicted_target = torch.max(outputs.data, 1)
        total += labels.size(0)
        true_prediction += (predicted_target == labels).sum().item()

        all_preds.extend(predicted_target.cpu().numpy())
        all_labels.extend(labels.cpu().numpy())

print(f"Accuracy: {100 * true_prediction / total}%")

In [None]:
## Compute metrics such as accuracy, precision, recall, and F1-score to assess classification performance.
report_cnn = classification_report(all_labels, all_preds, target_names=['cane', 'cavallo', 'elefante', 'farfalla', 'gallina', 'gatto', 'mucca', 'pecora', 'ragno', 'scoiattolo'], output_dict = True)
df_report_cnn = pd.DataFrame(report_cnn).transpose()
df_report_cnn.round(4)

In [None]:
## Visualize confusion matrix to understand the model's behavior across different classes
conf_matrix = confusion_matrix(all_labels, all_preds)

plt.figure(figsize=(10, 8))
sns.heatmap(conf_matrix, annot=True, fmt="d", cmap="Blues", xticklabels=['cane', 'cavallo', 'elefante', 'farfalla', 'gallina', 'gatto', 'mucca', 'pecora', 'ragno', 'scoiattolo'], yticklabels=['cane', 'cavallo', 'elefante', 'farfalla', 'gallina', 'gatto', 'mucca', 'pecora', 'ragno', 'scoiattolo'])
plt.ylabel('Real Labels')
plt.xlabel('Predicted')
plt.title('Confusion Matrix')
plt.show()

### 1.7 Conclusion and interpretation

- Our evaluation on the test set was 59.3%. This is an average value. It shows that the model learned the general features but could not learn the fine details.

- In the training phase, the small size of the data made it difficult to train the model. We applied various transformations to diversify the data. While applying these transformations, it was important to determine which ones were necessary and which ones were unnecessary and to ensure that they were suitable for the data we needed to test in real life.

- We could decrease the learning rate after a certain epoch. However, since the loss and validation accuracy levels did not fall to the desired level, doing so would not change the result much. This usage could be applied after reaching a desired point, so that fine calculations could be made and overfitting would not occur.

- We could have learned finer details by applying more filters, but when we tried this, we saw that the model was quickly overfitting, so this method was not successful.

### 1.8 Kaggle Test Result

In [None]:
import os
from PIL import Image
import torch
import torchvision.transforms as transforms
import pandas as pd

# Step 1: Load the paths of test set images
test_dir = 'test-images2' # Adjust the path to your test images directory
test_image_paths = [os.path.join(test_dir, img_name) for img_name in os.listdir(test_dir)]
# Sort the filenames numerically
sorted_files = sorted(test_image_paths, key=lambda x: int(''.join(filter(str.isdigit, x))))

# Step 2: Preprocess the test set images
test_images = []
for img_path in sorted_files:
    img = Image.open(img_path).convert('RGB').resize((256, 256))  # Ensure image is in RGB format
    img = transforms.ToTensor()(img)  # Convert to tensor
    img = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])(img)  # Normalize pixel values
    test_images.append(img)


# Convert the list of images to a single tensor
test_images = torch.stack(test_images)

# Step 3: Load the best performing model
model = cnn_model
model.eval()

# Step 4: Predict class labels for test set images
predictions = []
for image in test_images:
    image = image.to(device)
    image = image.unsqueeze(0)  # Add batch dimension
    output = model(image)
    predicted_class = output.argmax(dim=1).item()  # Find the index with maximum score
    predictions.append(predicted_class)

# Step 5: Map predicted class labels to corresponding class names
class_labels = {
    0: 'cane', 1: 'cavallo', 2: 'elefante', 3: 'farfalla',
    4: 'gallina', 5: 'gatto', 6: 'mucca', 7: 'pecora',
    8: 'ragno', 9: 'scoiattolo'
}

# Step 6: Save predictions to CSV file
df = pd.DataFrame({'ID': range(1, len(predictions) + 1), 'Label': [class_labels[p] for p in predictions]})
df.to_csv('cnn_predictions.csv', index=False)

#### Kaggle Result: %57.7 (user: Kağan Soysal)

## 2. Exploring Transfer Learning with ResNet50 and MobileNet

### 2.1. Introduction

**TASK:** We will train certain parts of the ResNet18 and MobileNet models, which were previously trained using millions of images, on our own training set to make the models more suitable for our own dataset. In this way, the predictions will be more accurate.

##### Why do we freeze the rest and train only last layers?
- The first layers learn general features (edge, corner) and are the same in every task. The last layers learn more specific details, so this is the part where we will adapt it to our dataset.

### 2.2. Load the pre-trained ResNet18 model


In [None]:
!pip install torchinfo

In [None]:
import torchvision.models as models
from torchinfo import summary

In [None]:
## Utilize torchvision library to load the pre-trained ResNet50 model
## Ensure that the model's architecture matches ResNet50, by checking the model summary.
res_net_18_model = models.resnet18(pretrained=True)
summary(res_net_18_model, input_size=(1, 3, 224, 224))

### 2.3 Modify the ResNet18 model for transfer learning

- As the first model we will only update the last layer.

In [None]:
res_net_18_model_1 = models.resnet18(weights=models.ResNet18_Weights.DEFAULT)

## Freeze all layers of the ResNet18 model.
for param in res_net_18_model_1.parameters():
    param.requires_grad = False

## Replace the final fully connected layer with a new FC layer matching the number of classes
res_net_18_model_1.fc = nn.Linear(res_net_18_model_1.fc.in_features, 10)

## Unfreeze the final FC layer
for param in res_net_18_model_1.fc.parameters():
    param.requires_grad = True

## Define appropriate loss function and optimizer for training
loss_func = nn.CrossEntropyLoss()
optimizer = optim.Adam(res_net_18_model_1.fc.parameters(), lr=0.001)

## Train the modified ResNet18 model on the animal-10 image dataset. (base model)
res_net_18_model_1 = res_net_18_model_1.to(device)

train_losses_1 = []
val_losses_1 = []
val_accuracy_1 = []

res_net_18_model_1.train()
epochs = 30

for epoch in range(epochs):
    running_loss = 0.0
    for inputs, labels in train_loader:
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer.zero_grad()

        outputs = res_net_18_model_1(inputs)
        loss = loss_func(outputs, labels)

        loss.backward()
        optimizer.step()

        running_loss += loss.item()

        train_loss = running_loss / len(train_loader)
        train_losses_1.append(train_loss)

        val_running_loss = 0.0


        res_net_18_model_1.eval()

        true_prediction = 0
        total = 0

    with torch.no_grad():
        for val_inputs, val_labels in val_loader:
            val_inputs, val_labels = val_inputs.to(device), val_labels.to(device)
            val_outputs = res_net_18_model_1(val_inputs)
            val_loss = loss_func(val_outputs, val_labels)
            val_running_loss += val_loss.item()
            _, predicted_target = torch.max(val_outputs.data, 1)
            total += val_labels.size(0)
            true_prediction += (predicted_target == val_labels).sum().item()

    val_loss = val_running_loss / len(val_loader)
    val_losses_1.append(val_loss)
    val_accuracy = 100 * true_prediction / total
    val_accuracy_1.append(val_accuracy)

    print(f"Epoch {epoch+1}/{epochs}, Train Loss: {train_loss:.4f}, Val Loss: {val_loss:.4f}, Val Accuracy: {val_accuracy:.2f}%")

- Although we already had a high accuracy of 81% at the beginning, thanks to the process we carried out, we increased the validation accuracy to 92%.


In [None]:
plt.plot(train_losses_1, label='Training Loss')
plt.plot(val_losses_1, label='Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.title('Training and Validation Loss')
plt.show()

In [None]:
plt.plot(val_accuracy_1, label='Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Validation Accuracy')
plt.legend()
plt.title('Validation Accuracy')
plt.show()

- As the second model, we will update the 3rd and 4th convolutional layers as well as the last layer.

- Since we will be updating more layers, we reduce the learning rate slightly compared to the first model, otherwise we may corrupt the already learned parameters.

In [None]:
## Define another ResNet18 model
res_net_18_model_2 = models.resnet18(weights=models.ResNet18_Weights.DEFAULT)

## Freeze all layers of the ResNet18 model.
for param in res_net_18_model_2.parameters():
    param.requires_grad = False

## Replace the final fully connected layer with a new FC layer matching the number of classes
res_net_18_model_2.fc = nn.Linear(res_net_18_model_2.fc.in_features, 10)

## Unfreeze the final FC layer
for param in res_net_18_model_2.fc.parameters():
    param.requires_grad = True

## Unfreeze convolutional layers 3 and 4 of the ResNet18 model and again proceed with training. (second model)
for param in res_net_18_model_2.layer3.parameters():
    param.requires_grad = True

for param in res_net_18_model_2.layer4.parameters():
    param.requires_grad = True

loss_func = nn.CrossEntropyLoss()
optimizer = optim.Adam(filter(lambda p: p.requires_grad, res_net_18_model_2.parameters()), lr=0.0001)

res_net_18_model_2 = res_net_18_model_2.to(device)

train_losses_2 = []
val_losses_2 = []
val_accuracy_2 = []

res_net_18_model_2.train()
epochs = 30

for epoch in range(epochs):
    running_loss = 0.0
    for inputs, labels in train_loader:
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer.zero_grad()

        outputs = res_net_18_model_2(inputs)
        loss = loss_func(outputs, labels)

        loss.backward()
        optimizer.step()

        running_loss += loss.item()

    train_loss = running_loss / len(train_loader)
    train_losses_2.append(train_loss)

    val_running_loss = 0.0

    res_net_18_model_2.eval()

    true_prediction = 0
    total = 0

    with torch.no_grad():
        for val_inputs, val_labels in val_loader:
            val_inputs, val_labels = val_inputs.to(device), val_labels.to(device)
            val_outputs = res_net_18_model_2(val_inputs)
            val_loss = loss_func(val_outputs, val_labels)
            val_running_loss += val_loss.item()
            _, predicted_target = torch.max(val_outputs.data, 1)
            total += val_labels.size(0)
            true_prediction += (predicted_target == val_labels).sum().item()

    val_loss = val_running_loss / len(val_loader)
    val_losses_2.append(val_loss)
    val_accuracy = 100 * true_prediction / total
    val_accuracy_2.append(val_accuracy)

    print(f"Epoch {epoch+1}/{epochs}, Train Loss: {train_loss:.4f}, Val Loss: {val_loss:.4f}, Val Accuracy: {val_accuracy:.2f}%")

- Likewise, we have moved the accuracy value to an even higher level with the operation we have performed here.

In [None]:
plt.plot(train_losses_2, label='Training Loss')
plt.plot(val_losses_2, label='Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.title('Training and Validation Loss')
plt.show()

In [None]:
plt.plot(val_accuracy_2, label='Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Validation Accuracy')
plt.legend()
plt.title('Validation Accuracy')
plt.show()

- As the 3rd model, we will update all layers without freezing any layer.

- We make the adjustment more subtle by making the learning rate a little smaller than in other models.

In [None]:
## Define another ResNet18 model
res_net_18_model_3 = models.resnet18(weights=models.ResNet18_Weights.DEFAULT)

## Replace the final fully connected layer with a new FC layer matching the number of classes proceed with training. (third model)
res_net_18_model_3.fc = nn.Linear(res_net_18_model_3.fc.in_features, 10)

res_net_18_model_3 = res_net_18_model_3.to(device)

loss_func = nn.CrossEntropyLoss()
optimizer = optim.Adam(res_net_18_model_3.parameters(), lr=0.00005)

train_losses_3 = []
val_losses_3 = []
val_accuracy_3 = []

epochs = 30

for epoch in range(epochs):
    res_net_18_model_3.train()
    running_loss = 0.0
    for inputs, labels in train_loader:
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer.zero_grad()

        outputs = res_net_18_model_3(inputs)
        loss = loss_func(outputs, labels)

        loss.backward()
        optimizer.step()

        running_loss += loss.item()

    train_loss = running_loss / len(train_loader)
    train_losses_3.append(train_loss)

    val_running_loss = 0.0

    res_net_18_model_3.eval()

    true_prediction = 0
    total = 0

    with torch.no_grad():
        for val_inputs, val_labels in val_loader:
            val_inputs, val_labels = val_inputs.to(device), val_labels.to(device)
            val_outputs = res_net_18_model_3(val_inputs)
            val_loss = loss_func(val_outputs, val_labels)
            val_running_loss += val_loss.item()
            _, predicted_target = torch.max(val_outputs.data, 1)
            total += val_labels.size(0)
            true_prediction += (predicted_target == val_labels).sum().item()

    val_loss = val_running_loss / len(val_loader)
    val_losses_3.append(val_loss)
    val_accuracy = 100 * true_prediction / total
    val_accuracy_3.append(val_accuracy)

    print(f"Epoch {epoch+1}/{epochs}, Train Loss: {train_loss:.4f}, Val Loss: {val_loss:.4f}, Val Accuracy: {val_accuracy:.2f}%")

In [None]:
plt.plot(train_losses_3, label='Training Loss')
plt.plot(val_losses_3, label='Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.title('Training and Validation Loss')
plt.show()

In [None]:
plt.plot(val_accuracy_3, label='Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Validation Accuracy')
plt.legend()
plt.title('Validation Accuracy')
plt.show()

- Validation accuracy was slightly higher in the 3rd model compared to the others. Since we updated all layers, we made the entire model suitable for our own dataset.

### 2.4 Evaluate the fine-tuned ResNet18 model

In [None]:
## Test the best model on the test set to evaluate its performance.
res_net_18_model_3.eval()

true_prediction = 0
total = 0

all_preds = []
all_labels = []

with torch.no_grad():
    for inputs, labels in test_loader:
        inputs, labels = inputs.to(device), labels.to(device)
        outputs = res_net_18_model_3(inputs)
        _, predicted_target = torch.max(outputs.data, 1)
        total += labels.size(0)
        true_prediction += (predicted_target == labels).sum().item()

        all_preds.extend(predicted_target.cpu().numpy())
        all_labels.extend(labels.cpu().numpy())

print(f"Accuracy: {100 * true_prediction / total}%")

In [None]:
## Compute metrics such as accuracy, precision, recall, and F1-score to assess classification performance.
report_resnet18 = classification_report(all_labels, all_preds, target_names=['cane', 'cavallo', 'elefante', 'farfalla', 'gallina', 'gatto', 'mucca', 'pecora', 'ragno', 'scoiattolo'], output_dict = True)
df_report_resnet18 = pd.DataFrame(report_resnet18).transpose()
df_report_resnet18.round(4)

In [None]:
## Visualize confusion matrix to understand the model's behavior across different classes
conf_matrix = confusion_matrix(all_labels, all_preds)

plt.figure(figsize=(10, 8))
sns.heatmap(conf_matrix, annot=True, fmt="d", cmap="Blues", xticklabels=['cane', 'cavallo', 'elefante', 'farfalla', 'gallina', 'gatto', 'mucca', 'pecora', 'ragno', 'scoiattolo'], yticklabels=['cane', 'cavallo', 'elefante', 'farfalla', 'gallina', 'gatto', 'mucca', 'pecora', 'ragno', 'scoiattolo'])
plt.ylabel('Real Labels')
plt.xlabel('Predicted')
plt.title('Confusion Matrix')
plt.show()

### 2.5 Kaggle Test Result

In [None]:
import os
from PIL import Image
import torch
import torchvision.transforms as transforms
import pandas as pd

# Step 1: Load the paths of test set images
test_dir = 'test-images2' # Adjust the path to your test images directory
test_image_paths = [os.path.join(test_dir, img_name) for img_name in os.listdir(test_dir)]
# Sort the filenames numerically
sorted_files = sorted(test_image_paths, key=lambda x: int(''.join(filter(str.isdigit, x))))

# Step 2: Preprocess the test set images
test_images = []
for img_path in sorted_files:
    img = Image.open(img_path).convert('RGB').resize((256, 256))  # Ensure image is in RGB format
    img = transforms.ToTensor()(img)  # Convert to tensor
    img = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])(img)  # Normalize pixel values
    test_images.append(img)


# Convert the list of images to a single tensor
test_images = torch.stack(test_images)

# Step 3: Load the best performing model
model = res_net_18_model_3
model.eval()

# Step 4: Predict class labels for test set images
predictions = []
for image in test_images:
    image = image.to(device)
    image = image.unsqueeze(0)  # Add batch dimension
    output = model(image)
    predicted_class = output.argmax(dim=1).item()  # Find the index with maximum score
    predictions.append(predicted_class)

# Step 5: Map predicted class labels to corresponding class names
class_labels = {
    0: 'cane', 1: 'cavallo', 2: 'elefante', 3: 'farfalla',
    4: 'gallina', 5: 'gatto', 6: 'mucca', 7: 'pecora',
    8: 'ragno', 9: 'scoiattolo'
}

# Step 6: Save predictions to CSV file
df = pd.DataFrame({'ID': range(1, len(predictions) + 1), 'Label': [class_labels[p] for p in predictions]})
df.to_csv('resnet18_predictions.csv', index=False)

#### Kaggle Result: %93.2 (user: Kağan Soysal)

### 2.7. Load the pre-trained MobileNet model

In [None]:
import torchvision.models as models

In [None]:
## Utilize torchvision library to load the pre-trained MobileNetV2 model
## Ensure that the model's architecture matches MobileNetV2, by checking the model summary.
mobile_net_1 = models.mobilenet_v2(weights=models.MobileNet_V2_Weights.DEFAULT)
summary(mobile_net_1, input_size=(1, 3, 224, 224))

### 2.8 Modify the MobileNet model for transfer learning

- In the first mobile net model we will only update the last layer

In [None]:
## Freeze all layers of the MobileNet model.
for param in mobile_net_1.parameters():
    param.requires_grad = False

## Replace the final fully connected layer with a new FC layer matching the number of classes
mobile_net_1.classifier[1] = nn.Linear(mobile_net_1.last_channel, 10)

## Unfreeze the final FC layer
for param in mobile_net_1.classifier[1].parameters():
    param.requires_grad = True

## Define appropriate loss function and optimizer for training
loss_func = nn.CrossEntropyLoss()
optimizer = optim.Adam(mobile_net_1.classifier[1].parameters(), lr=0.001)

## Train the modified MobileNet model on the animal-10 image dataset. (base model)
epochs = 30
train_losses_1 = []
val_losses_1 = []
val_accuracy_1 = []

mobile_net_1 = mobile_net_1.to(device)

for epoch in range(epochs):
    mobile_net_1.train()
    running_loss = 0.0

    for inputs, labels in train_loader:
        inputs, labels = inputs.to(device), labels.to(device)

        optimizer.zero_grad()
        outputs = mobile_net_1(inputs)
        loss = loss_func(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()

    train_loss = running_loss / len(train_loader)
    train_losses_1.append(train_loss)

    mobile_net_1.eval()
    val_running_loss = 0.0
    correct = 0
    total = 0

    with torch.no_grad():
        for val_inputs, val_labels in val_loader:
            val_inputs, val_labels = val_inputs.to(device), val_labels.to(device)
            val_outputs = mobile_net_1(val_inputs)
            val_loss = loss_func(val_outputs, val_labels)
            val_running_loss += val_loss.item()

            _, predicted_target = torch.max(val_outputs.data, 1)
            total += val_labels.size(0)
            correct += (predicted_target == val_labels).sum().item()

    val_loss = val_running_loss / len(val_loader)
    val_losses_1.append(val_loss)
    val_accuracy = 100 * correct / total
    val_accuracy_1.append(val_accuracy)

    print(f"Epoch {epoch+1}/{epochs}, Train Loss: {train_loss:.4f}, Val Loss: {val_loss:.4f}, Val Accuracy: {val_accuracy:.2f}%")

In [None]:
plt.plot(train_losses_1, label='Training Loss')
plt.plot(val_losses_1, label='Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.title('Training and Validation Loss')
plt.show()

In [None]:
plt.plot(val_accuracy_1, label='Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Validation Accuracy')
plt.legend()
plt.title('Validation Accuracy')
plt.show()

- In the second mobile net model we will update all layers of the model.

In [None]:
## Define another MobileNet model
mobile_net_2 = models.mobilenet_v2(weights=models.MobileNet_V2_Weights.DEFAULT)

## Replace the final fully connected layer with a new FC layer matching the number of classes proceed with training. (second model)
mobile_net_2.classifier[1] = nn.Linear(mobile_net_2.last_channel, 10)

loss_func = nn.CrossEntropyLoss()
optimizer = optim.Adam(mobile_net_2.parameters(), lr=0.0001)

epochs = 10
train_losses_2 = []
val_losses_2 = []
val_accuracy_2 = []

mobile_net_2 = mobile_net_2.to(device)

for epoch in range(epochs):
    mobile_net_2.train()
    running_loss = 0.0

    for inputs, labels in train_loader:
        inputs, labels = inputs.to(device), labels.to(device)

        optimizer.zero_grad()
        outputs = mobile_net_2(inputs)
        loss = loss_func(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()

    train_loss = running_loss / len(train_loader)
    train_losses_2.append(train_loss)

    mobile_net_2.eval()
    val_running_loss = 0.0
    correct = 0
    total = 0

    with torch.no_grad():
        for val_inputs, val_labels in val_loader:
            val_inputs, val_labels = val_inputs.to(device), val_labels.to(device)
            val_outputs = mobile_net_2(val_inputs)
            val_loss = loss_func(val_outputs, val_labels)
            val_running_loss += val_loss.item()

            _, predicted_target = torch.max(val_outputs.data, 1)
            total += val_labels.size(0)
            correct += (predicted_target == val_labels).sum().item()

    val_loss = val_running_loss / len(val_loader)
    val_losses_2.append(val_loss)
    val_accuracy = 100 * correct / total
    val_accuracy_2.append(val_accuracy)

    print(f"Epoch {epoch+1}/{epochs}, Train Loss: {train_loss:.4f}, Val Loss: {val_loss:.4f}, Val Accuracy: {val_accuracy:.2f}%")

In [None]:
plt.plot(train_losses_2, label='Training Loss')
plt.plot(val_losses_2, label='Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.title('Training and Validation Loss')
plt.show()

In [None]:
plt.plot(val_accuracy_2, label='Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Validation Accuracy')
plt.legend()
plt.title('Validation Accuracy')
plt.show()

- Since we updated all layers in the second model, it provided higher accuracy on our dataset.

### 2.9 Evaluate the fine-tuned MobileNet model

In [None]:
## Test the best model on the test set to evaluate its performance.
mobile_net_2.eval()

true_prediction = 0
total = 0

all_preds = []
all_labels = []

with torch.no_grad():
    for inputs, labels in test_loader:
        inputs, labels = inputs.to(device), labels.to(device)
        outputs = mobile_net_2(inputs)
        _, predicted_target = torch.max(outputs.data, 1)
        total += labels.size(0)
        true_prediction += (predicted_target == labels).sum().item()

        all_preds.extend(predicted_target.cpu().numpy())
        all_labels.extend(labels.cpu().numpy())

print(f"Accuracy: {100 * true_prediction / total}%")

## Comment on the results

In [None]:
## Compute metrics such as accuracy, precision, recall, and F1-score to assess classification performance.
report = classification_report(all_labels, all_preds, target_names=['cane', 'cavallo', 'elefante', 'farfalla', 'gallina', 'gatto', 'mucca', 'pecora', 'ragno', 'scoiattolo'], output_dict = True)
df_report_mobilenet = pd.DataFrame(report).transpose()
df_report_mobilenet.round(4)

In [None]:
## Visualize confusion matrix to understand the model's behavior across different classes
conf_matrix = confusion_matrix(all_labels, all_preds)

plt.figure(figsize=(10, 8))
sns.heatmap(conf_matrix, annot=True, fmt="d", cmap="Blues", xticklabels=['cane', 'cavallo', 'elefante', 'farfalla', 'gallina', 'gatto', 'mucca', 'pecora', 'ragno', 'scoiattolo'], yticklabels=['cane', 'cavallo', 'elefante', 'farfalla', 'gallina', 'gatto', 'mucca', 'pecora', 'ragno', 'scoiattolo'])
plt.ylabel('Real Labels')
plt.xlabel('Predicted')
plt.title('Confusion Matrix')
plt.show()

### 2.10 Kaggle Test Result

In [None]:
import os
from PIL import Image
import torch
import torchvision.transforms as transforms
import pandas as pd

# Step 1: Load the paths of test set images
test_dir = 'test-images2' # Adjust the path to your test images directory
test_image_paths = [os.path.join(test_dir, img_name) for img_name in os.listdir(test_dir)]
# Sort the filenames numerically
sorted_files = sorted(test_image_paths, key=lambda x: int(''.join(filter(str.isdigit, x))))

# Step 2: Preprocess the test set images
test_images = []
for img_path in sorted_files:
    img = Image.open(img_path).convert('RGB').resize((256, 256))  # Ensure image is in RGB format
    img = transforms.ToTensor()(img)  # Convert to tensor
    img = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])(img)  # Normalize pixel values
    test_images.append(img)


# Convert the list of images to a single tensor
test_images = torch.stack(test_images)

# Step 3: Load the best performing model
model = mobile_net_2
model.eval()

# Step 4: Predict class labels for test set images
predictions = []
for image in test_images:
    image = image.to(device)
    image = image.unsqueeze(0)  # Add batch dimension
    output = model(image)
    predicted_class = output.argmax(dim=1).item()  # Find the index with maximum score
    predictions.append(predicted_class)

# Step 5: Map predicted class labels to corresponding class names
class_labels = {
    0: 'cane', 1: 'cavallo', 2: 'elefante', 3: 'farfalla',
    4: 'gallina', 5: 'gatto', 6: 'mucca', 7: 'pecora',
    8: 'ragno', 9: 'scoiattolo'
}

# Step 6: Save predictions to CSV file
df = pd.DataFrame({'ID': range(1, len(predictions) + 1), 'Label': [class_labels[p] for p in predictions]})
df.to_csv('mobilenet_predictions.csv', index=False)

#### Kaggle Result: %94.0 (user: Kağan Soysal)

## 3. Analyze advantages and disadvantages

**Transfer Learning**

*Advantages*:
- Since it is pre-tuned, the training process takes a short time.

- It gives good results even for small data sets because it has already learned the general features.

*Disadvantages*:
- It is inefficient if there are significant differences between the dataset it was previously trained on and the target dataset.

- If the model is very large and complex, it can memorize the target dataset.

**Training from Scratch**

*Advantages*:
- Since it is trained from scratch, it can fully adapt to the dataset

- It can be specially prepared for special problems.

*Disadvantages*:
- Requires a lot of time and computational resources.

- Requires a large and diverse dataset for high accuracy.

In [None]:
## Compare the best fine-tuned MobileNet model performance with the best CNN model implemented from scratch
result_df1 = pd.concat([df_report_mobilenet, df_report_cnn], axis=1, ignore_index=False)
result_df1

In [None]:
## Compare the best fine-tuned MobileNet model performance with the best ResNet18 model implemented from scratch
result_df2 = pd.concat([df_report_mobilenet, df_report_resnet18], axis=1, ignore_index=False)
result_df2