## Demo 6 - movie poster classification

Multi-label (meaning that there is more than one "true" label) classification of movie poster images by genre.

In [1]:
import os
import sys
from skimage import io
from torch import nn
import torch
import numpy as np
import pandas as pd

### Load the data

We use a PyTorch `Dataset` to represent the data, which means we must implement `__init__`, `__len__` and `__getitem__`.  For efficiency's sake, we ideally want to load the image data (movie posters and their genre classifications) and represent them as a `Tensor` in memory.

The movie posters had to be converted to images of the same size and colour channels.  The resizing can be done inside Python but is slow, so they were converted on-disk using command-line tools. The colour channels are more efficient than resizing in Python, so that was done in Python.

Note that we need to permute the dimensions of the `Tensor` we create from `skimage` NumPy arrays. The latter represent the colour channels (three, for red-green-blue) as the innermost dimension (the fourth dimension, or dimension 3, meaning that pixels are represented as an array of three colour values).  The convolutional layers in PyTorch require the colour channel to be the second dimension (after the batch dimension, meaning that the image is represented as three overlapping images, one for each colour, where each pixel is a single value).  You need to use `Tensor.permute` not `Tensor.view`---the latter just redraws the "boxes" inside the array, it doesn't rearrange the data for a different order of dimensions.

In [2]:
device = "cuda:2"

In [3]:
from torch.utils.data import Dataset
from skimage import transform
from skimage import color

class MoviePosterDataset(Dataset):
    def __init__(self, csvfile, imagedir, device=device):
        self.posterlist = pd.read_csv(csvfile)
        self.imagedir = imagedir
        
        imageids = list(self.posterlist["Id"])
        imagefiles = ["{}/{}.jpg".format(self.imagedir, x) for x in imageids]
        images = [np.array(io.imread(x)) for x in imagefiles]
        images = np.array([color.gray2rgb(x) if len(x.shape) < 3 else x for x in images])
        
        truths = self.posterlist[self.posterlist.columns[2:]]
        self.truths = torch.Tensor(truths.to_numpy())
    
        tns = torch.from_numpy(images)
        self.images = tns.permute(0, 3, 1, 2)
        
        if device != "cpu":
            self.device = torch.device(device)
            self.images = self.images.to(self.device)
            self.truths = self.truths.to(self.device)
        
    def __len__(self):
        return len(self.posterlist)
    
    def __getitem__(self, idx):            
        truths = self.truths[idx]
        images = self.images[idx]
        
        return images, truths

We keep the `Tensor` on the CPU because of memory limitations created by sticking to one GPU (which only has 10GB of space to itself).  We will move it to the GPU in batches instead.

In [4]:
mpd = MoviePosterDataset("Multi_Label_dataset/train.csv", 
                         "Multi_Label_dataset/ImageSmaller", device=device)
print("Finished loading the dataset.")

Finished loading the dataset.


### Define the model

Defining a model with convolutional layers for images is technically a lot easier than defining an RNN-based model for human language.  The `Conv2d` layer automatically moves a 5x5 filter across the entire image, no effort required to manage padding and packing and unpacking and sequence issues. We do have to flatten the output of the `MaxPool2d` layer to feed it to the subsequent `Linear` layers.  The output of the model comes from `Sigmoid` over the number of classes, so that we have a binary classification for the 25 separate movie labels.  

In [5]:
class PosterClassifier(nn.Module):
    def __init__(self, dropout=0.1):
        super().__init__()
        self.conv2d = nn.Conv2d(3,3,5,padding=2)
        self.maxpool = nn.MaxPool2d(5,padding=2)
        self.relu = nn.ReLU()
        self.linear0 = nn.Linear(3*90*60, 3*90*60)
        self.dropout1 = nn.Dropout(dropout)
        self.tanh = nn.Tanh()
        self.linear1 = nn.Linear(3*90*60, 25)
        self.dropout2 = nn.Dropout(dropout)
        self.sigmoid = nn.Sigmoid()
        
    def forward(self, x):
        output = self.conv2d(x)
        output = self.maxpool(output)
        output = output.view(-1, 3*90*60)
        output = self.relu(output)
        output = self.linear0(output)
        output = self.dropout1(output)
        output = self.tanh(output)
        output = self.linear1(output)
        output = self.dropout2(output)
        output = self.sigmoid(output)
        
        return output

### Arrange the data for the model

We're going to do a 60/40 train/test split using PyTorch's own samplers and data loaders.

In [6]:
len(mpd)

7254

In [7]:
totalindices = list(range(len(mpd)))

In [8]:
import random
import math

random.shuffle(totalindices)
splitindex = math.floor(len(mpd)*0.6)


In [9]:
splitindex

4352

## Part #1: validation data (4 points)

Adjust the code in the notebook to give a 60/20/20 train/validation/testing split of the data. Testing data is split 50/50 into test and validation sets.

In [10]:
trainingindices = totalindices[:splitindex]
testingindices_ = totalindices[splitindex:]

In [11]:
splitindex2 = math.floor(len(testingindices_)*0.5)

In [12]:
splitindex2

1451

In [13]:
testingindices = testingindices_[:splitindex2]
validationindices = testingindices_[splitindex2:]

In [14]:
trainingsampler = torch.utils.data.SubsetRandomSampler(trainingindices)
testingsampler = torch.utils.data.SubsetRandomSampler(testingindices)
validationsampler = torch.utils.data.SubsetRandomSampler(validationindices)

In [15]:
len(trainingsampler), len(testingsampler),len(validationsampler)

(4352, 1451, 1451)

In [16]:
batches = 16

In [17]:
traindl = torch.utils.data.DataLoader(mpd, batch_size=batches, 
                                      sampler=trainingsampler, pin_memory=False)
valdl = torch.utils.data.DataLoader(mpd, batch_size=batches, sampler=validationsampler, pin_memory=False)

testdl = torch.utils.data.DataLoader(mpd, sampler=testingsampler)

# Jackard Index
Should be of shape [batch_size, image_height, image_width]

In [18]:
def jaccard(prediction, ground_truth):
    union = prediction + ground_truth
    union[union == 2] = 1
    intersection = prediction * ground_truth
    union = union.sum(axis=(1, 2))
    intersection = intersection.sum(axis=(1, 2))
    ji_nonezero_union = intersection[union != 0] / union[union != 0]
    ji = ji = torch.zeros(intersection.shape)
    if device != "cpu":
        ji = ji.cuda()
    ji[union != 0] = ji_nonezero_union
    return ji

In [19]:
def epoch_time(start_time, end_time):
    elapsed_time = end_time - start_time
    elapsed_mins = int(elapsed_time / 60)
    elapsed_secs = int(elapsed_time - (elapsed_mins * 60))
    return elapsed_mins, elapsed_secs

### Write and run the training loop

We use the training loop to send the data to the GPU, batch by batch.

In [20]:
evaluation_metric = "jaccard"
evaluation_metric = "BCE"

model_name = "image_model"

In [21]:
import torch.optim as optim
import time

def train(train_dataloader, val_dataloader, epochs=3):
    torch.cuda.empty_cache()
    model = PosterClassifier()
    model = model.to(device)
    optimizer = optim.Adam(model.parameters())

    criterion = nn.BCELoss()
    for epoch in range(epochs):
        start_time = time.time()
        
        # Training
        train_loss = 0
        train_batches = 0     
        for c, data in enumerate(train_dataloader):
            images, truth = data
            optimizer.zero_grad()
            output = model(images.float().to(device))
            if evaluation_metric != "jaccard":
                loss = criterion(output, truth.to(device))
            train_loss += loss
            train_batches += 1.0
            loss.backward()
            optimizer.step()

        # Validation
        best_val_loss = 0
        val_loss = 0
        val_batches = 0
        model.eval()
        for c, data in enumerate(val_dataloader):
            images, truth = data
            output = model(images.float().to(device))
            with torch.no_grad():
                if evaluation_metric != "jaccard":
                    loss = criterion(output, truth.to(device))           
            val_loss += loss
            val_batches += 1.0
            
        end_time = time.time()

        epoch_mins, epoch_secs = epoch_time(start_time, end_time)
        
        if val_loss < best_val_loss:
            best_val_loss = val_loss
            print("Best loss so far!")
            # save best model
            #torch.save(model.state_dict(), model_name)
            torch.save(model, model_name)
    
        #print("In epoch {}, training loss = {}".format(epoch, train_loss/train_batches))        
        #print("In epoch {}, validation loss = {}".format(epoch, val_loss/val_batches))
        
        print(f'Epoch: {epoch+1:02} | Epoch Time: {epoch_mins}m {epoch_secs}s')
        print(f'\tTrain Loss: {train_loss/train_batches:.3f}')
        print(f'\t Validation Loss: {val_loss/val_batches:.3f}')
        
        
    return model

In [22]:
trained = train(traindl,valdl, epochs=10)

Epoch: 01 | Epoch Time: 0m 23s
	Train Loss: 0.687%
	 Val. Loss: 0.833%
Epoch: 02 | Epoch Time: 0m 23s
	Train Loss: 0.737%
	 Val. Loss: 0.768%
Epoch: 03 | Epoch Time: 0m 23s
	Train Loss: 0.809%
	 Val. Loss: 0.807%
Epoch: 04 | Epoch Time: 0m 23s
	Train Loss: 0.821%
	 Val. Loss: 0.835%
Epoch: 05 | Epoch Time: 0m 23s
	Train Loss: 0.824%
	 Val. Loss: 0.781%
Epoch: 06 | Epoch Time: 0m 23s
	Train Loss: 0.833%
	 Val. Loss: 0.830%
Epoch: 07 | Epoch Time: 0m 23s
	Train Loss: 0.957%
	 Val. Loss: 0.999%
Epoch: 08 | Epoch Time: 0m 23s
	Train Loss: 1.194%
	 Val. Loss: 1.261%
Epoch: 09 | Epoch Time: 0m 23s
	Train Loss: 1.270%
	 Val. Loss: 1.391%
Epoch: 10 | Epoch Time: 0m 23s
	Train Loss: 1.434%
	 Val. Loss: 1.334%


This is quite bad---the loss only gets worse. We need to make adjustments to the model...

### Write and run a testing routine

For memory purposes, we're keeping the testing data on the CPU memory still.  We have to put the model into evaluation mode with `model.eval`, which turns off the dropout and other regularization useful in training---we want the test to represent a deterministic result of the trained model. We also test a single epoch as one big batch.

In [None]:
trained = torch.load(model_name)

In [None]:
def test(model, dataloader):
    model = model.to(device)
    model.eval()
    criterion = nn.BCELoss()
    sumloss = 0
    items = 0
    for c, data in enumerate(dataloader):
        images, truth = data
        output = model(images.float())
        loss = criterion(output, truth)
        sumloss += loss
        items += 1.0
    print("Loss on test data = {}".format(sumloss/items))

In [None]:
test(trained, testdl)

Loss is not quite the right way to evaluate a model like this.  However, it is "encouraging" that the loss on the test data is not wholly out of line from the loss in the training data.