# Utils

This notebook contains shared functions for data preparation. This way i can reference the data without rewriting or copy pasting the functions twice.

In [1]:
# Imports
from PIL import Image
from torch.utils.data import Dataset, DataLoader
from torchvision.transforms import v2
import matplotlib.pyplot as plt
import os
import polars as pl
import torch
import torch.nn as nn

## Data Preparation

First of all we need to properly prepare our data. Therefore I implemented a PyTorch Dataset for the data to be conveniently accessed during training and validation.

Moreover I performed some data augmentation, namely i applied:

    - random horizontal flip
    - random rotation
    - random height and width shift

I choose to not use a vertical flip, because for example in the case of baskets this would destroy the meaning of the image.
Since the images are very low resolution gray scale images I decided to not apply any additional noise and color transforms.

Stats of the data:

    - 5 classes
    - 10.000 images in training set per class (50k total)
    - 5.000 images per class in test set (25k total)
    - 28 x 28 images -> 784 features

In [2]:
# Dataset class for QuickDraw
class QuickDrawDataset(Dataset):
    def __init__(self, annotations_file, img_dir, transform=None, class_label=None):
        self.img_labels = pl.read_csv(annotations_file)
        if class_label is not None:
            self.img_labels = self.img_labels.filter(pl.col("class_label") == class_label)
        self.img_dir = img_dir
        self.transform = transform

    def __len__(self):
        return len(self.img_labels)

    def __getitem__(self, idx):
        img_path, label = self.img_labels.row(idx)[1:]
        img_path = os.path.join(self.img_dir, img_path)
        image = Image.open(img_path)

        if self.transform:
            image = self.transform(image)

        return image, label

In [3]:
# Calculate  mean and std for normalization
norm_dataset = QuickDrawDataset('../dataset/train.csv', '../dataset/images', transform=v2.Compose([v2.ToImage(), v2.ToDtype(torch.float32, scale=True)]))
loader = DataLoader(norm_dataset, batch_size=64, shuffle=False, num_workers=6)

mean = 0.
std = 0.
nb_samples = 0

for data, _ in loader:
    batch_samples = data.size(0)  # batch size (64 here)
    data = data.view(batch_samples, data.size(1), -1)  # flatten H and W
    mean += data.mean(2).sum(0)  # mean per channel summed over batch
    std += data.std(2).sum(0)    # std per channel summed over batch
    nb_samples += batch_samples

mean /= nb_samples
std /= nb_samples

print('Mean:', mean)
print('Std:', std)

Mean: tensor([0.1982])
Std: tensor([0.3426])


In [4]:
# data augmentation transforms
train_transforms = v2.Compose([
    v2.Grayscale(num_output_channels=1),
    v2.RandomHorizontalFlip(p = 0.5),
    v2.RandomRotation(degrees = 10),
    v2.RandomAffine(degrees=0, translate=(0.1, 0.1)),  # Width and height shift
    v2.ToImage(),
    v2.ToDtype(torch.float32, scale=True),
    v2.Normalize(mean=mean, std=std)
])

test_transforms = v2.Compose([
    v2.Grayscale(num_output_channels=1),
    v2.ToImage(),
    v2.ToDtype(torch.float32, scale=True),
    v2.Normalize(mean=mean, std=std)
])

# Classification targets & subfolder names
classes = {
    0: 'basket',
    1: 'eye',
    2: 'binoculars',
    3: 'rabbit',
    4: 'hand',
}

In [5]:
# Create Train and Test Datasets
train_data = QuickDrawDataset('../dataset/train.csv', '../dataset/images', train_transforms)
test_data = QuickDrawDataset('../dataset/test.csv', '../dataset/images', test_transforms)

## Base Module

The least for the classifier model I want to construct different model architectures and evaluate them against each other. The later code serves a common base to add an arbitrary amount of layers to.

In [6]:
class BaseModule(nn.Module):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.layers = nn.Sequential()

    def forward(self, x):
        return self.layers(x)

## Device

Initialize a common device to speed up  training if mpu or cuda acceleration is available.

In [7]:
DEVICE = "mps" if torch.backends.mps.is_available() else "cuda" if torch.cuda.is_available() else "cpu"

## Sampling

In order to visually test the classification model I created a function that provides me a n images of each class as a tuple of their relative path and their class label at random.

In [8]:
# Function to sample a number of images of each class from the test data
def sample_image_from_each_class(n=1):
    df = pl.read_csv("../dataset/test.csv")
    sampled_images = df.group_by("class_label").map_groups(lambda group: group.sample(n))
    return list(zip(sampled_images["class_label"].to_list(), sampled_images["relative_path"].to_list()))

## Classwise Dataset

Since training conditional generative model did provide the output I expect, I will use this classwise dataset that only loads the image of a particular class

In [10]:
test = QuickDrawDataset('../dataset/test.csv', '../dataset/images', class_label=4)
len(test)

5000