# Fine-tune AlexNet to classify bees and ants

Fine-tune AlexNet on the [Hymenoptera dataset](https://www.kaggle.com/datasets/ajayrana/hymenoptera-data) from Kaggle.

The training set has 124 images of ants and 121 images of bees. One image in the training set labelled as an image of an ant just says "No image found". I removed this image from the training set, so we're left with 123 images of ants. 

The validation set has 70 images of ants and 83 of bees.

According to [Wikipedia](https://en.wikipedia.org/wiki/Hymenoptera), the order Hymenoptera includes insects besides ants and bees (such as wasps). The Kaggle dataset only includes images of ants and bees though.

The Mastering Pytorch text references the same Kaggle site as the source of the data, but says that there are 240 training images and 150 validation images, equally split between the two classes. I don't know why there's a difference. 

I freeze all of the parameters of the pretrained model except for the last two linear layers of the classifier. 

- Imports.
- Calculate the mean and std for data normalization.
- Datasets and dataloaders. The datasets are labeled train and val, but the val set is really a test set.
- Download pretrained AlexNet model. Modify the last layer so that it's suitable for 2 (rather than 10) classes. Freeze all parameters except for those of the last two linear layers. This results in around ~16.8 million trainable parameters.
- Define a function to train the model.
- Define the device (cpu), optimizer, and loss function.
- Briefly train the model.

There's no hyperparameter tuning here. 

### Imports

In [1]:
import os
import time
import numpy as np

import torch
import torch.nn as nn
from torchvision import datasets, models, transforms
from torchvision.models import alexnet, AlexNet_Weights
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt
from einops import rearrange, reduce

path = 'data/hymenoptera_data'

### Calculate mean and standard deviation for normalization

In [2]:
untransformed_train_dataset = datasets.ImageFolder(os.path.join(path, 'train'),
                                  transform=transforms.ToTensor())
len(untransformed_train_dataset)                                 

244

In [3]:
image_sizes = torch.empty(len(untransformed_train_dataset), dtype=torch.int32)
image_sums = torch.empty((len(untransformed_train_dataset), 3), dtype=torch.float32)

for idx, (image, _) in enumerate(untransformed_train_dataset):
    image = rearrange(image, 'c h w -> c (h w)')
    image_sum = reduce(image, 'c x -> c', 'sum')
    image_sizes[idx] = image.shape[-1]
    image_sums[idx] = image_sum

total_sum = reduce(image_sums, 'n c -> c', 'sum')
total_size = reduce(image_sizes, 'n -> ()', 'sum')
means = total_sum / rearrange(total_size, '() -> () 1') 
means

tensor([[0.5172, 0.4753, 0.3484]])

In [4]:
image_vars = torch.empty((len(untransformed_train_dataset), 3), dtype=torch.float32)
for idx, (image, _) in enumerate(untransformed_train_dataset):
    image = rearrange(image, 'c h w -> c (h w)')
    image -= rearrange(means, '1 c -> c 1')
    image *= image
    image_sum = reduce(image, 'c x -> c', 'sum')
    image_vars[idx] = image_sum

total_var = reduce(image_vars, 'n c -> c', 'sum')
std_sqs = total_var / rearrange(total_size, '() -> () 1')  
stds = torch.sqrt(std_sqs)
stds

tensor([[0.2776, 0.2575, 0.2865]])

In [5]:
means = rearrange(means, '1 c -> c')
stds = rearrange(stds, '1 c -> c')
means, stds

(tensor([0.5172, 0.4753, 0.3484]), tensor([0.2776, 0.2575, 0.2865]))

The text uses `[0.490, 0.449, 0.411]` and `[0.490, 0.449, 0.411]` for the channel means and standard deviations respectively. I don't know where these numbers come from. But it seems we're using different datasets, so it makes sense to find different means and standard deviations.

### Datasets and dataloaders

In [6]:
data_transformers = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize(means, stds)
    ]),
    'val': transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize(means, stds)
    ])
}
image_datasets = {phase: datasets.ImageFolder(os.path.join(path, phase), data_transformers[phase]) for phase in ['train', 'val']}

data_loaders = {phase: DataLoader(image_datasets[phase], batch_size=8,
    shuffle=True) for phase in ['train', 'val']}

dataset_sizes = {phase: len(image_datasets[phase]) for phase in ['train', 'val']}

classes = image_datasets['train'].classes
classes

['ants', 'bees']

### Modifying pretrained model; freezing most parameters

In [7]:
alexnet_to_finetune = alexnet(weights=AlexNet_Weights.IMAGENET1K_V1)

for name, module in alexnet_to_finetune.named_children():
    print(name, module)

features Sequential(
  (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
  (1): ReLU(inplace=True)
  (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
  (4): ReLU(inplace=True)
  (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (7): ReLU(inplace=True)
  (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (9): ReLU(inplace=True)
  (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (11): ReLU(inplace=True)
  (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
)
avgpool AdaptiveAvgPool2d(output_size=(6, 6))
classifier Sequential(
  (0): Dropout(p=0.5, inplace=False)
  (1): Linear(in_features=9216, out_features=4096, bias=True)
  (2): ReLU(inplace=True)
  (3): Dropout(p=0.5, inplace=F

In [8]:
alexnet_to_finetune.classifier[6] = nn.Linear(4096, 2)

for param in alexnet_to_finetune.features.parameters():
    param.requires_grad = False

for param in alexnet_to_finetune.classifier.parameters():
    param.requires_grad = False

for classifier_layer in [4, 6]:
    for param in alexnet_to_finetune.classifier[classifier_layer].parameters():
        param.requires_grad = True

def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

print(f'The model has {count_parameters(alexnet_to_finetune):,} trainable parameters')

The model has 16,789,506 trainable parameters


### Function to fine-tune pretrained model

In [9]:
def finetine_model(model, device, dataloader, optimizer, 
                    loss_function, epochs=10):
    start_time = time.time()
    model.to(device)

    for epoch in range(epochs):
        print(f'Epoch {epoch+1}/{epochs}')
        print('-'*10)

        for phase in ['train', 'val']:
            if phase == 'train':
                model.train()
            else:
                model.eval()

            running_loss = 0.
            running_corrects = 0

            for images, labels in dataloader[phase]:
                images, labels = images.to(device), labels.to(device)
                optimizer.zero_grad()
                with torch.set_grad_enabled(phase=='train'):
                    prediction_probs = model(images)
                    loss = loss_function(prediction_probs, labels)
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()
                running_loss += loss.item()
                predictions = prediction_probs.argmax(dim=1)
                batch_correct = predictions.eq(labels).sum().item()
                running_corrects += batch_correct

            epoch_loss = running_loss / dataset_sizes[phase] 
            epoch_acc = running_corrects / dataset_sizes[phase]
            print(f'{phase} Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f}') 

    time_elapsed = time.time() - start_time
    print(f'Training complete in {time_elapsed//60:.0f}m {time_elapsed%60:.0f}s')

### Define device, optimizer, and loss function, and set seed

In [10]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
optimizer = torch.optim.SGD(alexnet_to_finetune.parameters(), lr=0.001)
loss_function = nn.CrossEntropyLoss()
torch.manual_seed(0)

<torch._C.Generator at 0x2521869c3b0>

### Train model

In [11]:
finetine_model(model=alexnet_to_finetune, device=device, 
        dataloader=data_loaders, optimizer=optimizer,
        loss_function=loss_function, epochs=10) 

Epoch 1/10
----------
train Loss: 0.0623 Acc: 0.7787
val Loss: 0.0403 Acc: 0.8693
Epoch 2/10
----------
train Loss: 0.0427 Acc: 0.8566
val Loss: 0.0322 Acc: 0.9085
Epoch 3/10
----------
train Loss: 0.0336 Acc: 0.9098
val Loss: 0.0490 Acc: 0.8889
Epoch 4/10
----------
train Loss: 0.0350 Acc: 0.8934
val Loss: 0.0275 Acc: 0.9216
Epoch 5/10
----------
train Loss: 0.0257 Acc: 0.9139
val Loss: 0.0280 Acc: 0.9150
Epoch 6/10
----------
train Loss: 0.0291 Acc: 0.8975
val Loss: 0.0292 Acc: 0.8954
Epoch 7/10
----------
train Loss: 0.0277 Acc: 0.9139
val Loss: 0.0286 Acc: 0.9150
Epoch 8/10
----------
train Loss: 0.0242 Acc: 0.9426
val Loss: 0.0269 Acc: 0.9085
Epoch 9/10
----------
train Loss: 0.0233 Acc: 0.9344
val Loss: 0.0272 Acc: 0.9150
Epoch 10/10
----------
train Loss: 0.0184 Acc: 0.9631
val Loss: 0.0265 Acc: 0.9150
Training complete in 1m 38s
