### **Libraries**

In [1]:
import time
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader, Dataset
import torchvision.transforms as transforms
from torchvision.datasets import ImageFolder
from torch.utils.data import DataLoader
import tensorflow_datasets as tfds
from sklearn.metrics import f1_score
import matplotlib.pyplot as plt
import numpy as np
import torchvision.models as models
from efficientnet_pytorch import EfficientNet
from sklearn.metrics import f1_score
import torchvision.datasets as datasets
from PIL import Image


In [2]:
# Check GPU availability
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

Using device: cuda


### **Dataset**

The **Oxford-IIIT Pet** dataset consists of of 7,349 color images, each of size 224x224 pixels, divided into 38 classes with around 200 images per class.

In this case, images are resized to match the required input size of some models as most of them require a 224x224 image size as they were programmed on ImageNet.

In [3]:
# Size tranformation
transform = transforms.Compose([
    transforms.Resize((224, 224)),  # Resize to match models' expected input size
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) # ImageNet mean values
])

# Size tranformation for InceptionV3
transform_inception = transforms.Compose([
    transforms.Resize((299, 299)),  # Resize to 299x299 pixels for InceptionV3
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

# Load the Flower-102 dataset
(ds_train, ds_test), ds_info = tfds.load(
    'oxford_iiit_pet',
    split=['train', 'test'],
    shuffle_files=True,
    as_supervised=True,
    with_info=True,
)

# Convert TFDS datasets to ImageFolder-like structure for PyTorch
def tfds_to_imagefolder(ds, transform):
    images, labels = [], []
    for image, label in tfds.as_numpy(ds):
        image = transform(transforms.ToPILImage()(image))
        images.append(image)
        labels.append(label)
    dataset = list(zip(images, labels))
    return dataset

trainset = tfds_to_imagefolder(ds_train, transform)
testset = tfds_to_imagefolder(ds_test, transform)
trainset_inception = tfds_to_imagefolder(ds_train, transform_inception)
testset_inception = tfds_to_imagefolder(ds_test, transform_inception)

# Create DataLoaders
trainloader = DataLoader(trainset, batch_size=32, shuffle=True, num_workers=2)
testloader = DataLoader(testset, batch_size=32, shuffle=False, num_workers=2)
trainloader_inception = DataLoader(trainset_inception, batch_size=32, shuffle=True, num_workers=2)
testloader_inception = DataLoader(testset_inception, batch_size=32, shuffle=False, num_workers=2)


The CNN architectures that will be considered are the following:


- **AlexNet:** Original architecture
- **ResNet:** ResNet-50
- **VGGNet:** VGG-16
- **GoogleNet:** Inception v3
- **EfficientNet:** EfficientNet-B1, EfficientNet-B3 EfficientNet-B5

In [4]:
# Define models
models = {
    'AlexNet': torchvision.models.alexnet(pretrained=True),
    'ResNet50': torchvision.models.resnet50(pretrained=True),
    'VGG16': torchvision.models.vgg16(pretrained=True),
    'InceptionV3': torchvision.models.inception_v3(pretrained=True, aux_logits=True),
    'EfficientNet-B1': torchvision.models.efficientnet_b1(pretrained=True),
    'EfficientNet-B3': torchvision.models.efficientnet_b3(pretrained=True),
    'EfficientNet-B5': torchvision.models.efficientnet_b5(pretrained=True)
}



The following code ensures that each model is adapted to the Flower-102 dataset by replacing the final layer of the model. For models like AlexNet, VGG, and EfficientNet, this modification is made to the classifier's last layer, while for Inception and other models, it's made to the last fully connected layer 

In [5]:
# Modify the final layer to match CIFAR-100 classes
for name, model in models.items():
    if 'EfficientNet' in name:
        num_ftrs = model.classifier[1].in_features
        model.classifier[1] = nn.Linear(num_ftrs, 37)
    elif 'Inception' in name:
        num_ftrs = model.fc.in_features
        model.fc = nn.Linear(num_ftrs, 37)
    elif 'AlexNet' in name or 'VGG' in name:
        num_ftrs = model.classifier[-1].in_features
        model.classifier[-1] = nn.Linear(num_ftrs, 37)
    else:
        num_ftrs = model.fc.in_features
        model.fc = nn.Linear(num_ftrs, 37)
    models[name] = model.to(device)

This part initializes the loss function used for training the models. Cross-entropy loss measures how well a model's predicted probability distribution matches the actual distribution of the labels.

The mathematical definition of the cross-entropy loss for a dataset is defined as:

$$
L = -\frac{1}{N} \sum_{n=1}^N \sum_{i=1}^C y_{ni} \log(p_{ni})
$$

- \(N\): The number of samples.
- \(y_{ni}\): The actual binary indicator for sample \(n\) and class \(i\).
- \(p_{ni}\): The predicted probability for sample \(n\) and class \(i\).

In [6]:
# Define loss function
criterion = nn.CrossEntropyLoss()

For the following part:

The **train_model** funtion trains a given model on the training dataset for a specified number of epochs while tracking the loss and accuracy.

The **evaluate_model** functio evaluates the trained model on the test dataset.

The **train_model_inception** function trains the InceptionV3 model with accuracy tracking. The function outputs a tuple when *aux_logits* is *True*. The code checks if the output is a tuple and uses only the main output (ignoring auxiliary outputs).

In [7]:
# Function to train the model with accuracy tracking
def train_model(model, trainloader, criterion, optimizer, num_epochs=25):
    model.train()
    for epoch in range(num_epochs):
        running_loss = 0.0
        correct = 0
        total = 0
        start_time = time.time()
        for i, data in enumerate(trainloader, 0):
            inputs, labels = data[0].to(device), data[1].to(device)
            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            running_loss += loss.item()

            _, preds = torch.max(outputs, 1)
            correct += torch.sum(preds == labels).item()
            total += labels.size(0)

        epoch_loss = running_loss / len(trainloader)
        epoch_accuracy = correct / total
        end_time = time.time()

        print(f"Epoch {epoch+1}, Loss: {epoch_loss:.4f}, Accuracy: {epoch_accuracy:.4f}, Time: {end_time - start_time:.2f} seconds")

# Function to evaluate the model
def evaluate_model(model, testloader):
    model.eval()
    all_preds = []
    all_labels = []
    correct = 0
    total = 0
    with torch.no_grad():
        for inputs, labels in testloader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            _, preds = torch.max(outputs, 1)
            correct += torch.sum(preds == labels).item()
            total += labels.size(0)
            all_preds.extend(preds.cpu().numpy())
            all_labels.extend(labels.cpu().numpy())

    accuracy = correct / total
    f1 = f1_score(all_labels, all_preds, average='weighted')
    return accuracy, f1

# Define a modified train function for InceptionV3
def train_model_inception(model, trainloader, criterion, optimizer, num_epochs=25):
    model.train()
    for epoch in range(num_epochs):
        running_loss = 0.0
        start_time = time.time()
        correct = 0
        total = 0
        for i, data in enumerate(trainloader, 0):
            inputs, labels = data[0].to(device), data[1].to(device)
            optimizer.zero_grad()
            outputs = model(inputs)
            if isinstance(outputs, tuple):
                outputs = outputs[0]  # Use only the main output
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            running_loss += loss.item()
            _, preds = torch.max(outputs, 1)
            correct += (preds == labels).sum().item()
            total += labels.size(0)
        end_time = time.time()
        epoch_accuracy = correct / total
        print(f"Epoch {epoch+1}, Loss: {running_loss / len(trainloader)}, Accuracy: {epoch_accuracy * 100:.2f}%, Time: {end_time - start_time} seconds")

### **Model's training**

Now, each one of the architectures previously mentioned are trained in the same way. The models are trained in the training data for 25 epochs using the Adam optimizer with a learning rate of 0.001. The most important definition of this section are the following:

- **Epoch** refers to one complete pass of the entire training dataset through the learning algorithm. 

- **Adam optimizer** (*optim.Adam*) is an adaptive learning rate optimization algorithm designed specifically for training deep neural networks. It combines the advantages of two other extensions of stochastic gradient descent: AdaGrad and RMSProp.

- **Learning rate** controls how much to change the model in response to the estimated error each time the model weights are updated during training. 

After training, the model is evaluated on the test data (testloader) to calculate the accuracy and F1 score. Finally, the results including the accuracy, F1 score, and training time are printed to the console.

#### **AlexNet**

In [8]:
# Train and evaluate AlexNet
model_name = 'AlexNet'
print(f"\nTraining {model_name}...\n")
model = models[model_name]
optimizer = optim.Adam(model.parameters(), lr=0.001) # lr=0.001 is considered standar for the learning rate
start_time = time.time()
train_model(model, trainloader, criterion, optimizer, num_epochs=25) # num_epochs=25 due to hardware limitation
training_time = time.time() - start_time
accuracy, f1 = evaluate_model(model, testloader)

print(f"{model_name} - Top-1 Accuracy: {accuracy * 100:.2f}%")
print(f"{model_name} - F1 Score: {f1 * 100:.2f}%")
print(f"{model_name} - Training Time: {training_time:.2f} seconds")


Training AlexNet...

Epoch 1, Loss: 3.6336, Accuracy: 0.0234, Time: 28.92 seconds
Epoch 2, Loss: 3.6128, Accuracy: 0.0217, Time: 16.61 seconds
Epoch 3, Loss: 3.6116, Accuracy: 0.0239, Time: 14.09 seconds
Epoch 4, Loss: 3.6116, Accuracy: 0.0220, Time: 14.21 seconds
Epoch 5, Loss: 3.6114, Accuracy: 0.0261, Time: 14.88 seconds
Epoch 6, Loss: 3.6114, Accuracy: 0.0226, Time: 14.03 seconds
Epoch 7, Loss: 3.6116, Accuracy: 0.0207, Time: 15.83 seconds
Epoch 8, Loss: 3.6115, Accuracy: 0.0193, Time: 14.14 seconds
Epoch 9, Loss: 3.6117, Accuracy: 0.0255, Time: 14.14 seconds
Epoch 10, Loss: 3.6145, Accuracy: 0.0239, Time: 14.28 seconds
Epoch 11, Loss: 3.6171, Accuracy: 0.0236, Time: 14.03 seconds
Epoch 12, Loss: 3.6124, Accuracy: 0.0212, Time: 13.81 seconds
Epoch 13, Loss: 3.6117, Accuracy: 0.0266, Time: 13.94 seconds
Epoch 14, Loss: 3.6115, Accuracy: 0.0207, Time: 13.92 seconds
Epoch 15, Loss: 3.6118, Accuracy: 0.0245, Time: 14.33 seconds
Epoch 16, Loss: 3.6115, Accuracy: 0.0245, Time: 13.80 sec

#### **ResNet-50**

In [9]:
# Train and evaluate ResNet50
model_name = 'ResNet50'
print(f"\nTraining {model_name}...\n")
model = models[model_name]
optimizer = optim.Adam(model.parameters(), lr=0.001)
start_time = time.time()
train_model(model, trainloader, criterion, optimizer, num_epochs=25)
training_time = time.time() - start_time
accuracy, f1 = evaluate_model(model, testloader)

print(f"{model_name} - Top-1 Accuracy: {accuracy * 100:.2f}%")
print(f"{model_name} - F1 Score: {f1 * 100:.2f}%")
print(f"{model_name} - Training Time: {training_time:.2f} seconds")


Training ResNet50...

Epoch 1, Loss: 2.2090, Accuracy: 0.3519, Time: 292.24 seconds
Epoch 2, Loss: 1.4822, Accuracy: 0.5380, Time: 290.04 seconds
Epoch 3, Loss: 1.0116, Accuracy: 0.6715, Time: 286.74 seconds
Epoch 4, Loss: 0.8466, Accuracy: 0.7220, Time: 302.06 seconds
Epoch 5, Loss: 0.6111, Accuracy: 0.7943, Time: 291.81 seconds
Epoch 6, Loss: 0.4466, Accuracy: 0.8598, Time: 285.60 seconds
Epoch 7, Loss: 0.4088, Accuracy: 0.8679, Time: 285.66 seconds
Epoch 8, Loss: 0.4812, Accuracy: 0.8432, Time: 292.26 seconds
Epoch 9, Loss: 0.3014, Accuracy: 0.8995, Time: 277.23 seconds
Epoch 10, Loss: 0.2504, Accuracy: 0.9255, Time: 276.89 seconds
Epoch 11, Loss: 0.1651, Accuracy: 0.9503, Time: 276.93 seconds
Epoch 12, Loss: 0.2067, Accuracy: 0.9364, Time: 280.37 seconds
Epoch 13, Loss: 0.2453, Accuracy: 0.9234, Time: 300.74 seconds
Epoch 14, Loss: 0.2667, Accuracy: 0.9079, Time: 290.67 seconds
Epoch 15, Loss: 0.2254, Accuracy: 0.9266, Time: 290.46 seconds
Epoch 16, Loss: 0.1484, Accuracy: 0.9500,

#### **VGG-16**

In [8]:
# Train and evaluate AlexNet
model_name = 'VGG16'
print(f"\nTraining {model_name}...\n")
model = models[model_name]
optimizer = optim.Adam(model.parameters(), lr=0.001) # lr=0.001 is considered standar for the learning rate
start_time = time.time()
train_model(model, trainloader, criterion, optimizer, num_epochs=25) # num_epochs=25 due to hardware limitation
training_time = time.time() - start_time
accuracy, f1 = evaluate_model(model, testloader)

print(f"{model_name} - Top-1 Accuracy: {accuracy * 100:.2f}%")
print(f"{model_name} - F1 Score: {f1 * 100:.2f}%")
print(f"{model_name} - Training Time: {training_time:.2f} seconds")


Training VGG16...

Epoch 1, Loss: 3.6632, Accuracy: 0.0253, Time: 652.60 seconds
Epoch 2, Loss: 3.6194, Accuracy: 0.0291, Time: 628.79 seconds
Epoch 3, Loss: 3.6097, Accuracy: 0.0318, Time: 623.50 seconds
Epoch 4, Loss: 3.5931, Accuracy: 0.0389, Time: 651.59 seconds
Epoch 5, Loss: 3.5831, Accuracy: 0.0351, Time: 633.40 seconds
Epoch 6, Loss: 3.5656, Accuracy: 0.0383, Time: 615.75 seconds
Epoch 7, Loss: 3.5575, Accuracy: 0.0424, Time: 604.34 seconds
Epoch 8, Loss: 3.5214, Accuracy: 0.0495, Time: 605.08 seconds
Epoch 9, Loss: 3.4742, Accuracy: 0.0573, Time: 604.79 seconds
Epoch 10, Loss: 3.4518, Accuracy: 0.0668, Time: 612.66 seconds
Epoch 11, Loss: 3.3979, Accuracy: 0.0753, Time: 613.74 seconds
Epoch 12, Loss: 3.3268, Accuracy: 0.0927, Time: 613.15 seconds
Epoch 13, Loss: 3.2965, Accuracy: 0.0981, Time: 614.07 seconds
Epoch 14, Loss: 3.1939, Accuracy: 0.1255, Time: 614.23 seconds
Epoch 15, Loss: 3.1582, Accuracy: 0.1313, Time: 614.65 seconds
Epoch 16, Loss: 3.0005, Accuracy: 0.1701, Ti

#### **GoogleNet: Inception V3**

In [11]:
# Training and evaluating InceptionV3
print("\nTraining InceptionV3...\n")
model_inceptionv3 = models['InceptionV3']
optimizer_inceptionv3 = optim.Adam(model_inceptionv3.parameters(), lr=0.001)

start_time = time.time()
train_model_inception(model_inceptionv3, trainloader_inception, criterion, optimizer_inceptionv3, num_epochs=25)
end_time = time.time()

accuracy_inceptionv3, f1_inceptionv3 = evaluate_model(model_inceptionv3, testloader_inception)
print(f"InceptionV3 - Top-1 Accuracy: {accuracy_inceptionv3 * 100:.2f}%")
print(f"InceptionV3 - F1 Score: {f1_inceptionv3:.2f}%")
print(f"Training and evaluation time for InceptionV3: {end_time - start_time} seconds")


Training InceptionV3...

Epoch 1, Loss: 1.609910361663155, Accuracy: 52.88%, Time: 1269.5452840328217 seconds
Epoch 2, Loss: 0.923236914821293, Accuracy: 72.15%, Time: 1257.8459775447845 seconds
Epoch 3, Loss: 0.7464942776638529, Accuracy: 76.36%, Time: 1260.9836275577545 seconds
Epoch 4, Loss: 0.5269273367912873, Accuracy: 83.75%, Time: 1267.3705883026123 seconds
Epoch 5, Loss: 0.45045023275458296, Accuracy: 85.22%, Time: 1274.8507289886475 seconds
Epoch 6, Loss: 0.28518909782819124, Accuracy: 90.92%, Time: 1283.828443288803 seconds
Epoch 7, Loss: 0.26359525316435356, Accuracy: 91.39%, Time: 1293.1750679016113 seconds
Epoch 8, Loss: 0.2554628939408323, Accuracy: 91.82%, Time: 1301.6602628231049 seconds
Epoch 9, Loss: 0.2650840433395427, Accuracy: 91.85%, Time: 1311.5917851924896 seconds
Epoch 10, Loss: 0.23584739147968914, Accuracy: 92.31%, Time: 1322.4633672237396 seconds
Epoch 11, Loss: 0.19334339828675856, Accuracy: 94.21%, Time: 1333.7199556827545 seconds
Epoch 12, Loss: 0.188160

#### **EfficientNet-B1**

In [12]:
# Training and evaluating EfficientNet-B1
print("\nTraining EfficientNet-B1...\n")
model_efficientnet_b1 = models['EfficientNet-B1'].to(device)  # Ensure the model is on GPU
optimizer_efficientnet_b1 = optim.Adam(model_efficientnet_b1.parameters(), lr=0.001)

start_time = time.time()
train_model(model_efficientnet_b1, trainloader, criterion, optimizer_efficientnet_b1, num_epochs=25)
training_time = time.time() - start_time

accuracy_efficientnet_b1, f1_efficientnet_b1 = evaluate_model(model_efficientnet_b1, testloader)
print(f"EfficientNet-B1 - Top-1 Accuracy: {accuracy_efficientnet_b1 * 100:.2f}%")
print(f"EfficientNet-B1 - F1 Score: {f1_efficientnet_b1:.2f}")
print(f"EfficientNet-B1 - Training Time: {training_time:.2f} seconds")


Training EfficientNet-B1...

Epoch 1, Loss: 1.0842, Accuracy: 0.7073, Time: 480.45 seconds
Epoch 2, Loss: 0.3706, Accuracy: 0.8842, Time: 480.69 seconds
Epoch 3, Loss: 0.1906, Accuracy: 0.9421, Time: 484.11 seconds
Epoch 4, Loss: 0.1444, Accuracy: 0.9568, Time: 485.00 seconds
Epoch 5, Loss: 0.1425, Accuracy: 0.9554, Time: 486.38 seconds
Epoch 6, Loss: 0.1237, Accuracy: 0.9611, Time: 487.85 seconds
Epoch 7, Loss: 0.1460, Accuracy: 0.9543, Time: 490.78 seconds
Epoch 8, Loss: 0.1115, Accuracy: 0.9649, Time: 493.35 seconds
Epoch 9, Loss: 0.0842, Accuracy: 0.9736, Time: 495.58 seconds
Epoch 10, Loss: 0.0753, Accuracy: 0.9780, Time: 496.48 seconds
Epoch 11, Loss: 0.0549, Accuracy: 0.9815, Time: 499.75 seconds
Epoch 12, Loss: 0.0652, Accuracy: 0.9793, Time: 501.13 seconds
Epoch 13, Loss: 0.0757, Accuracy: 0.9761, Time: 505.63 seconds
Epoch 14, Loss: 0.0855, Accuracy: 0.9731, Time: 505.63 seconds
Epoch 15, Loss: 0.0816, Accuracy: 0.9723, Time: 507.61 seconds
Epoch 16, Loss: 0.0650, Accuracy: 

#### **EfficientNet-B3**

In [13]:
# Training and evaluating EfficientNet-B3
print("\nTraining EfficientNet-B3...\n")
model_efficientnet_b3 = models['EfficientNet-B3'].to(device)  # Ensure the model is on GPU
optimizer_efficientnet_b3 = optim.Adam(model_efficientnet_b3.parameters(), lr=0.001)

start_time = time.time()
train_model(model_efficientnet_b3, trainloader, criterion, optimizer_efficientnet_b3, num_epochs=25)
training_time = time.time() - start_time

accuracy_efficientnet_b3, f1_efficientnet_b3 = evaluate_model(model_efficientnet_b3, testloader)
print(f"EfficientNet-B3 - Top-1 Accuracy: {accuracy_efficientnet_b3 * 100:.2f}%")
print(f"EfficientNet-B3 - F1 Score: {f1_efficientnet_b3:.2f}")
print(f"EfficientNet-B3 - Training Time: {training_time:.2f} seconds")


Training EfficientNet-B3...

Epoch 1, Loss: 1.1324, Accuracy: 0.7019, Time: 904.40 seconds
Epoch 2, Loss: 0.3803, Accuracy: 0.8837, Time: 903.99 seconds
Epoch 3, Loss: 0.1780, Accuracy: 0.9484, Time: 925.47 seconds
Epoch 4, Loss: 0.1479, Accuracy: 0.9492, Time: 951.82 seconds
Epoch 5, Loss: 0.1219, Accuracy: 0.9633, Time: 904.46 seconds
Epoch 6, Loss: 0.1098, Accuracy: 0.9674, Time: 902.85 seconds
Epoch 7, Loss: 0.1035, Accuracy: 0.9685, Time: 905.43 seconds
Epoch 8, Loss: 0.0886, Accuracy: 0.9715, Time: 902.53 seconds
Epoch 9, Loss: 0.1027, Accuracy: 0.9688, Time: 902.52 seconds
Epoch 10, Loss: 0.1015, Accuracy: 0.9707, Time: 962.46 seconds
Epoch 11, Loss: 0.0630, Accuracy: 0.9804, Time: 902.94 seconds
Epoch 12, Loss: 0.0728, Accuracy: 0.9785, Time: 942.58 seconds
Epoch 13, Loss: 0.0972, Accuracy: 0.9709, Time: 924.88 seconds
Epoch 14, Loss: 0.1045, Accuracy: 0.9655, Time: 899.98 seconds
Epoch 15, Loss: 0.0862, Accuracy: 0.9745, Time: 901.31 seconds
Epoch 16, Loss: 0.0666, Accuracy: 

#### **EfficientNet-B5**

In [15]:

# Training and evaluating EfficientNet-B5
print("\nTraining EfficientNet-B5...\n")
model_efficientnet_b5 = models['EfficientNet-B5'].to(device)  # Ensure the model is on GPU
optimizer_efficientnet_b5 = optim.Adam(model_efficientnet_b5.parameters(), lr=0.001)

start_time = time.time()
train_model(model_efficientnet_b5, trainloader, criterion, optimizer_efficientnet_b5, num_epochs=25)
training_time = time.time() - start_time

accuracy_efficientnet_b5, f1_efficientnet_b5 = evaluate_model(model_efficientnet_b5, testloader)
print(f"EfficientNet-B5 - Top-1 Accuracy: {accuracy_efficientnet_b5 * 100:.2f}%")
print(f"EfficientNet-B5 - F1 Score: {f1_efficientnet_b5:.2f}")
print(f"EfficientNet-B5 - Training Time: {training_time:.2f} seconds")



Training EfficientNet-B5...



OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 