## Goal of the project 
- The goal of your project is to develop a supervised machine learning algorithm (deep
or not) that is able to predict presence/absence of at least 5 classes in a test image.
- animal_classes = ["bird", "cat", "cow", "dog", "horse", "sheep"]


## About the dataset
- PASCAL Visual Object Classes (VOC) challenge
(http://host.robots.ox.ac.uk/pascal/VOC/voc2012/ ) has been used heavily by the
computer vision community for the developments of machine (deep) learning
algorisms in classification, detection and segmentation. There are 20 classes in total. 

### Dataset Structure
1. `Annotations`: Contains XML files with bounding box annotations for objects in each image.
2. `ImageSets`: Contains lists of image IDs for different splits (e.g., train, val, trainval).
3. `JPEGImages`: Contains the actual image files.
4. `SegmentationClass`: Contains semantic segmentation masks for images.
5. `SegmentationObject` : Contains instance segmentation masks for images.

<br>
VOCdevkit/ <br>
    ├── VOC2012/ <br>
    │   ├── Annotations/    # XML files for bounding boxes <br>
    │   ├── ImageSets/      # Splits: train.txt, val.txt <br>
    │   ├── JPEGImages/     # Raw images <br>
    │   ├── SegmentationClass/     # Raw images <br>
    │   ├── SegmentationObject/     # Raw images <br>   

# Step 1 - Set up environment by importing libraries 
- $ `pip install numpy pandas torch torchvision matplotlib opencv-python xmltodict`


In [75]:
import os                                            # working with file systems
import xml.etree.ElementTree as ET                   # Used for parsing XML files (Annotations)
import cv2                                           # Computer Vision Library used to manipulate (resize, convert format, normalize pixel values) images.
import numpy as np                                   # Numerical computations for multi-dimensional arrays
import torch                                         # deep learning library 
import torch.nn as nn                                # contains modules to build neural netwroks
import torch.optim as optim                          # Provides optimization algorithms for training neural networks.
from sklearn.metrics import classification_report    # machine learning library, we specifically need metrics
from torch.utils.data import DataLoader, TensorDataset #  used for handling datasets and data loading
from sklearn.model_selection import KFold

print("Libraries loaded")

Libraries loaded


# Step 2 -  Extract Information 
- Extract relevant data (image paths and labels) from the XML annotation files. We do this by importing a custom method called `parse_annotations` from `methods.py`
- I am only interested in the animals of the dataset which is why i add the animal_classes to filter out non-animals

In [76]:
from methods import parse_annotations  # custom method to parse annotations
from methods import parse_animal_annotations

annotations_directory = "./data/VOCdevkit/VOC2012/Annotations"
image_directory = "./data/VOCdevkit/VOC2012/JPEGImages"
trainval_file = "./data/VOCdevkit/VOC2012/ImageSets/Main/trainval.txt"
print("paths extracted ...")


# Define your target classes
animal_classes = ["bird", "cat", "cow", "dog", "horse", "sheep"]

# extract annotations for the training set
data = parse_animal_annotations(annotations_directory, image_directory, trainval_file, animal_classes)
# View the first 5 results
print("train data extracted here are the first 5 results ...")
print(data[:5])

paths extracted ...
train data extracted here are the first 5 results ...
[('./data/VOCdevkit/VOC2012/JPEGImages\\2008_000008.jpg', {'horse'}), ('./data/VOCdevkit/VOC2012/JPEGImages\\2008_000009.jpg', {'cow'}), ('./data/VOCdevkit/VOC2012/JPEGImages\\2008_000019.jpg', {'dog'}), ('./data/VOCdevkit/VOC2012/JPEGImages\\2008_000026.jpg', {'dog'}), ('./data/VOCdevkit/VOC2012/JPEGImages\\2008_000053.jpg', {'dog'})]


In [77]:
all_labels = {label for _, labels in data for label in labels}
print(f"Extracted classes: {all_labels}")


Extracted classes: {'horse', 'dog', 'bird', 'sheep', 'cat', 'cow'}


# Step 3: Preprocess the Data
- `Resize image` to a fixed `224x224` size to ensure uniformity. This is also the standard size for pretrained models and deep learning models as they require uniformity.
- `Normalize pixel` values to 0, 1 for easier processing. This is done by utilizing the custom preprocess_images method from `methods.py`
- `Convert labels to multi-hot vectors`

In [78]:
from methods import preprocess_images

images = preprocess_images(data, size=(224, 224))
print(f"Preprocessed {len(images)} images.")

Preprocessed 4199 images.


### Step 3.1 Convert labels into Multi-hot vectors so the machine understands it
- 1 -> appears
- 0 -> did not appear
- animal_classes = ["bird", "cat", "cow", "dog", "horse", "sheep"]


In [79]:
from methods import convert_labels


# Convert training labels to multi-hot vectors
labels = convert_labels(data, animal_classes)
print(f"Label vectors shape: {labels.shape}")


Label vectors shape: (4199, 6)


# Step 4 Train and Evaluate Model

In [29]:
from AnimalClassifier import AnimalClassifier

num_classes = len(animal_classes)  # 6 total animals hence 6 classes
model = AnimalClassifier(num_classes)

# Loss Function and Optimizer
loss_function = nn.BCELoss()  # binary cross entropy Loss
optimizer = optim.Adam(model.parameters(), lr=0.001) # adam optimizer 

print("Model, loss function, and optimizer instantiated ...")


Model, loss function, and optimizer instantiated ...


In [22]:
# Convert preprocessed data to PyTorch tensors
images_tensor = torch.tensor(images).permute(0, 3, 1, 2).float()  # (N, C, H, W)
labels_tensor = torch.tensor(labels).float()  # (N, num_classes)

# Create a DataLoader for batching
dataset = TensorDataset(images_tensor, labels_tensor)
loader = DataLoader(dataset, batch_size=32, shuffle=True)


In [32]:
# Training loop
num_epochs = 5  # Number of epochs

for epoch in range(num_epochs):
    model.train()  # Set the model to training mode
    running_loss = 0.0
    
    for inputs, labels in loader:
        # Zero the parameter gradients
        optimizer.zero_grad()
        
        # Forward pass
        outputs = model(inputs)
        
        # Compute loss
        loss = loss_function(outputs, labels)
        
        # Backward pass and optimize
        loss.backward()
        optimizer.step()
        
        # Track the running loss
        running_loss += loss.item()
    
    print(f"Epoch {epoch+1}/{num_epochs}, Loss: {running_loss / len(loader)}")


Epoch 1/5, Loss: 0.33866116539998486
Epoch 2/5, Loss: 0.2508147062
Epoch 3/5, Loss: 0.13672318117636623
Epoch 4/5, Loss: 0.0617789377345506
Epoch 5/5, Loss: 0.024150511697922466


## Step 5 Evaluate model with kfold cross validation to get real results
- The results from the previous model are unrealiable as it will definitely get a perfect score as it is being evaluated on things it has seen to get more results we must run k-fold cross validation to ensure reliable results


In [80]:
# Initialize k-Fold Cross-Validation
k = 5  # Number of folds
kf = KFold(n_splits=k, shuffle=True, random_state=42)

# Convert images and labels to NumPy arrays
images = np.array(images) # to be used by kfold to split as it needs to be iterable 
labels = np.array(labels)


In [81]:
print(images.shape)
print(labels.shape)

(4199, 224, 224, 3)
(4199, 6)


In [82]:
# Perform k-Fold Cross-Validation
fold_results = []  # Store F1-scores for each fold

for fold, (train_idx, val_idx) in enumerate(kf.split(images)):
    print(f"Fold {fold + 1}/{k}")
    
    # Split data into training and validation sets for this fold
    train_images, val_images = images[train_idx], images[val_idx]
    train_labels, val_labels = labels[train_idx], labels[val_idx]
    
    # Debugging: Check shapes and unique labels
    print(f"Fold {fold + 1}: train_labels shape = {train_labels.shape}, val_labels shape = {val_labels.shape}")
    print(f"Fold {fold + 1}: train unique labels = {np.unique(train_labels.argmax(axis=1))}")
    print(f"Fold {fold + 1}: val unique labels = {np.unique(val_labels.argmax(axis=1))}")
    
    # Convert to PyTorch tensors
    train_images_tensor = torch.tensor(train_images).permute(0, 3, 1, 2).float()
    train_labels_tensor = torch.tensor(train_labels).float()
    val_images_tensor = torch.tensor(val_images).permute(0, 3, 1, 2).float()
    val_labels_tensor = torch.tensor(val_labels).float()
    
    # Create DataLoaders
    train_dataset = TensorDataset(train_images_tensor.clone(), train_labels_tensor.clone())
    val_dataset = TensorDataset(val_images_tensor.clone(), val_labels_tensor.clone())
    train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
    val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)
    
    # Initialize the model, loss function, and optimizer
    model = AnimalClassifier(num_classes=len(animal_classes))
    criterion = nn.BCELoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
    
    # Train the model
    for epoch in range(2):  # Adjust epochs based on your setup
        model.train()
        running_loss = 0.0
        for batch_idx, (inputs, labels) in enumerate(train_loader):
            # Debugging: Check input and label shapes
            print(f"Epoch {epoch + 1}, Batch {batch_idx}: inputs shape = {inputs.shape}, labels shape = {labels.shape}")
            
            optimizer.zero_grad()
            outputs = model(inputs)
            
            # Debugging: Check output shape
            print(f"Epoch {epoch + 1}, Batch {batch_idx}: outputs shape = {outputs.shape}")
            
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            running_loss += loss.item()
        print(f"Epoch {epoch + 1}, Loss: {running_loss / len(train_loader)}")
    
    # Validate the model on this fold's validation set
    model.eval()
    val_predictions = []
    val_true_labels = []
    with torch.no_grad():
        for inputs, labels in val_loader:
            outputs = model(inputs)
            preds = (outputs > 0.5).int()
            val_predictions.extend(preds.tolist())
            val_true_labels.extend(labels.tolist())
    
    # Debugging: Reset validation metrics
    print(f"Fold {fold + 1}: Resetting validation predictions and labels.")
    val_predictions.clear()
    val_true_labels.clear()

    # Restrict predictions and true labels to valid classes
    val_true_labels = np.array(val_true_labels)[:, :len(animal_classes)]
    val_predictions = np.array(val_predictions)[:, :len(animal_classes)]

    # Calculate F1-score for this fold
    from sklearn.metrics import f1_score
    fold_f1 = f1_score(val_true_labels, val_predictions, average="macro")
    print(f"Fold {fold + 1} F1-Score: {fold_f1}")
    fold_results.append(fold_f1)

# Final output of fold results
print(f"Cross-validation F1-scores: {fold_results}")


Fold 1/5
Fold 1: train_labels shape = (3359, 6), val_labels shape = (840, 6)
Fold 1: train unique labels = [0 1 2 3 4 5]
Fold 1: val unique labels = [0 1 2 3 4 5]
Epoch 1, Batch 0: inputs shape = torch.Size([32, 3, 224, 224]), labels shape = torch.Size([32, 6])
Epoch 1, Batch 0: outputs shape = torch.Size([32, 6])
Epoch 1, Batch 1: inputs shape = torch.Size([32, 3, 224, 224]), labels shape = torch.Size([32, 6])
Epoch 1, Batch 1: outputs shape = torch.Size([32, 6])
Epoch 1, Batch 2: inputs shape = torch.Size([32, 3, 224, 224]), labels shape = torch.Size([32, 6])
Epoch 1, Batch 2: outputs shape = torch.Size([32, 6])
Epoch 1, Batch 3: inputs shape = torch.Size([32, 3, 224, 224]), labels shape = torch.Size([32, 6])
Epoch 1, Batch 3: outputs shape = torch.Size([32, 6])
Epoch 1, Batch 4: inputs shape = torch.Size([32, 3, 224, 224]), labels shape = torch.Size([32, 6])
Epoch 1, Batch 4: outputs shape = torch.Size([32, 6])
Epoch 1, Batch 5: inputs shape = torch.Size([32, 3, 224, 224]), labels s

IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

In [73]:
print("Number of images:", len(images))
print("Number of labels:", len(labels))


Number of images: 4199
Number of labels: 8
