## Goal of the project 
- The goal of your project is to develop a supervised machine learning algorithm (deep
or not) that is able to predict presence/absence of at least 5 classes in a test image.
- animal_classes = ["bird", "cat", "cow", "dog", "horse", "sheep"]


## About the dataset
- PASCAL Visual Object Classes (VOC) challenge
(http://host.robots.ox.ac.uk/pascal/VOC/voc2012/ ) has been used heavily by the
computer vision community for the developments of machine (deep) learning
algorisms in classification, detection and segmentation. There are 20 classes in total. 

### Dataset Structure
1. `Annotations`: Contains XML files with bounding box annotations for objects in each image.
2. `ImageSets`: Contains lists of image IDs for different splits (e.g., train, val, trainval).
3. `JPEGImages`: Contains the actual image files.
4. `SegmentationClass`: Contains semantic segmentation masks for images.
5. `SegmentationObject` : Contains instance segmentation masks for images.

<br>
VOCdevkit/ <br>
    ├── VOC2012/ <br>
    │   ├── Annotations/    # XML files for bounding boxes <br>
    │   ├── ImageSets/      # Splits: train.txt, val.txt <br>
    │   ├── JPEGImages/     # Raw images <br>
    │   ├── SegmentationClass/     # Raw images <br>
    │   ├── SegmentationObject/     # Raw images <br>   

# Step 1 - Set up environment by importing libraries 
- $ `pip install numpy pandas torch torchvision matplotlib opencv-python xmltodict`


In [1]:
import os                                            # working with file systems
import xml.etree.ElementTree as ET                   # Used for parsing XML files (Annotations)
import cv2                                           # Computer Vision Library used to manipulate (resize, convert format, normalize pixel values) images.
import numpy as np                                   # Numerical computations for multi-dimensional arrays
import torch                                         # deep learning library 
import torch.nn as nn                                # contains modules to build neural netwroks
import torch.optim as optim                          # Provides optimization algorithms for training neural networks.
from sklearn.metrics import classification_report    # machine learning library, we specifically need metrics
from torch.utils.data import DataLoader, TensorDataset #  used for handling datasets and data loading
from sklearn.model_selection import KFold

print("Libraries loaded")

Libraries loaded


# Step 2 -  Extract Information 
- Extract relevant data (image paths and labels) from the XML annotation files. We do this by importing a custom method called `parse_annotations` from `methods.py`
- I am only interested in the animals of the dataset which is why i add the animal_classes to filter out non-animals

In [3]:
from methods import parse_annotations  # custom method to parse annotations
from methods import parse_animal_annotations

annotations_directory = "./data/VOCdevkit/VOC2012/Annotations"
image_directory = "./data/VOCdevkit/VOC2012/JPEGImages"
train_file = "./data/VOCdevkit/VOC2012/ImageSets/Main/train.txt"
val_file = "./data/VOCdevkit/VOC2012/ImageSets/Main/val.txt"
print("paths extracted ...")


# Define your target classes
animal_classes = ["bird", "cat", "cow", "dog", "horse", "sheep"]

# extract annotations for the training set
train_data = parse_animal_annotations(annotations_directory, image_directory, train_file, animal_classes)
val_data = parse_animal_annotations(annotations_directory, image_directory, val_file, animal_classes)

# View the first 5 results
print("train data extracted here are the first 5 results ...")
print(train_data[:5])
print("val data extracted here are the first 5 results ...")
print(val_data[:5])

paths extracted ...
train data extracted here are the first 5 results ...
[('./data/VOCdevkit/VOC2012/JPEGImages\\2008_000008.jpg', {'horse'}), ('./data/VOCdevkit/VOC2012/JPEGImages\\2008_000019.jpg', {'dog'}), ('./data/VOCdevkit/VOC2012/JPEGImages\\2008_000053.jpg', {'dog'}), ('./data/VOCdevkit/VOC2012/JPEGImages\\2008_000060.jpg', {'cat'}), ('./data/VOCdevkit/VOC2012/JPEGImages\\2008_000066.jpg', {'dog'})]
val data extracted here are the first 5 results ...
[('./data/VOCdevkit/VOC2012/JPEGImages\\2008_000009.jpg', {'cow'}), ('./data/VOCdevkit/VOC2012/JPEGImages\\2008_000026.jpg', {'dog'}), ('./data/VOCdevkit/VOC2012/JPEGImages\\2008_000054.jpg', {'bird'}), ('./data/VOCdevkit/VOC2012/JPEGImages\\2008_000056.jpg', {'cat'}), ('./data/VOCdevkit/VOC2012/JPEGImages\\2008_000059.jpg', {'dog'})]


In [4]:
all_labels = {label for _, labels in train_data for label in labels}
print(f"Extracted classes: {all_labels}")


Extracted classes: {'cat', 'sheep', 'dog', 'cow', 'bird', 'horse'}


# Step 3: Preprocess the Data
- `Resize image` to a fixed `224x224` size to ensure uniformity. This is also the standard size for pretrained models and deep learning models as they require uniformity.
- `Normalize pixel` values to 0, 1 for easier processing. This is done by utilizing the custom preprocess_images method from `methods.py`
- `Convert labels to multi-hot vectors`

In [5]:
from methods import preprocess_images

train_images = preprocess_images(train_data, size=(224, 224))
print(f"Preprocessed {len(train_images)} images.")

val_images = preprocess_images(val_data, size=(224, 224))
print(f"Preprocessed {len(val_images)} images.")

Preprocessed 2091 images.
Preprocessed 2108 images.


### Step 3.1 Convert labels into Multi-hot vectors so the machine understands it
- 1 -> appears
- 0 -> did not appear
- animal_classes = ["bird", "cat", "cow", "dog", "horse", "sheep"]


In [6]:
from methods import convert_labels


# Convert training labels to multi-hot vectors
train_labels = convert_labels(train_data, animal_classes)
val_labels = convert_labels(val_data, animal_classes)
print(f"Train Label vectors shape: {train_labels.shape}")
print(f"val Label vectors shape: {val_labels.shape}")

Train Label vectors shape: (2091, 6)
val Label vectors shape: (2108, 6)


# Step 4 Train and Evaluate Model

In [7]:
from AnimalClassifier import AnimalClassifier

num_classes = len(animal_classes)  # 6 total animals hence 6 classes
model = AnimalClassifier(num_classes)

# Loss Function and Optimizer
loss_function = nn.BCELoss()  # binary cross entropy Loss
optimizer = optim.Adam(model.parameters(), lr=0.001) # adam optimizer 

print("Model, loss function, and optimizer instantiated ...")


Model, loss function, and optimizer instantiated ...


In [8]:
# Convert to PyTorch tensors
train_images_tensor = torch.tensor(train_images, dtype=torch.float32).permute(0, 3, 1, 2)  
val_images_tensor = torch.tensor(val_images, dtype=torch.float32).permute(0, 3, 1, 2)
train_labels_tensor = torch.tensor(train_labels, dtype=torch.float32)
val_labels_tensor = torch.tensor(val_labels, dtype=torch.float32)

In [10]:
# Create DataLoaders
train_loader = DataLoader(TensorDataset(train_images_tensor, train_labels_tensor), batch_size=16, shuffle=True)
val_loader = DataLoader(TensorDataset(val_images_tensor, val_labels_tensor), batch_size=16)


In [11]:
# Training loop
num_epochs = 5  # Number of epochs

# Training loop
for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    for images, labels in train_loader:
        optimizer.zero_grad()
        outputs = model(images)
        loss = loss_function(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    print(f"Epoch {epoch+1}/{num_epochs}, Loss: {running_loss/len(train_loader)}")


Epoch 1/5, Loss: 0.4696575514687837
Epoch 2/5, Loss: 0.41325606235111034
Epoch 3/5, Loss: 0.39668550882630677
Epoch 4/5, Loss: 0.3669066847735689
Epoch 5/5, Loss: 0.30983609507102094


In [12]:
# Evaluation loop
model.eval()
correct = 0
total = 0
with torch.no_grad():
    for images, labels in val_loader:
        outputs = model(images)
        predicted = (outputs > 0.5).float()
        correct += (predicted == labels).sum().item()
        total += labels.numel()
print(f"Validation Accuracy: {correct / total * 100:.2f}%")

Validation Accuracy: 82.04%


## Step 5 Evaluate model with kfold cross validation to get real results
- The results from the previous model are unrealiable as it will definitely get a perfect score as it is being evaluated on things it has seen to get more results we must run k-fold cross validation to ensure reliable results
