This notebook will demonstrate our fundamental setup for training on the presented Dataset with PyTorch. Because lots of this code is later 'hidden away' from the enduser in SageMaker, it is good, to get a first impression about what is going on.

In [12]:
import torch
from torchvision import datasets, models, transforms
import torchvision
import os

from torch.autograd import Variable
import torch.functional as F
import torch.nn as nn
import torch.optim as optim
import numpy as np
import PIL


## Transform Pipelines

We ended the last noetbook (exploration) with lots of insights how the preprocessing is done in this project. While there are plenty python libraries for image manipulation, we should always focus on the compability with our target framework - that is PyTorch on AWS Sagemaker.
Hence I will demonstrate here, how the `imgaug` library and the PyTorch `torchvision.transforms` library play together and can be used for the later training steps.
First lets install `imgaug` again and define a transform object that does some sequential operations: 

In [3]:
!pip install git+https://github.com/aleju/imgaug
from imgaug import augmenters as iaa
import imgaug as ia

Collecting git+https://github.com/aleju/imgaug
  Cloning https://github.com/aleju/imgaug to /tmp/pip-req-build-mbxda9hs
Building wheels for collected packages: imgaug
  Running setup.py bdist_wheel for imgaug ... [?25ldone
[?25h  Stored in directory: /tmp/pip-ephem-wheel-cache-j8oodfiz/wheels/9c/f6/aa/41dcf2f29cc1de1da4ad840ef5393514bead64ac9e644260ff
Successfully built imgaug
[31mfastai 1.0.52 requires nvidia-ml-py3, which is not installed.[0m
[31mthinc 6.12.1 has requirement msgpack<0.6.0,>=0.5.6, but you'll have msgpack 0.6.0 which is incompatible.[0m
[33mYou are using pip version 10.0.1, however version 19.2.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [4]:
class AdvancedAugmentation:
  def __init__(self):
    self.aug = iaa.Sequential([
        iaa.Resize((224, 224)),
        # blur images with a sigma of 0 to 3.0
        iaa.Sometimes(0.25, iaa.GaussianBlur(sigma=(0, 3.0))),
        # horizontally flip 50% of the images
        iaa.Fliplr(0.5),
        # rotate by -20 to +20 degrees. The default mode is 'constant' which displays a constant value where the
        # picture was 'rotated out'. A better mode is 'symmetric' which 
        #'Pads with the reflection of the vector mirrored along the edge of the array' (see docs) 
        iaa.Affine(rotate=(-20, 20), mode='symmetric'),
        # do edge detection on 25% of the pictures
        iaa.Sometimes(0.25,
                      iaa.EdgeDetect(alpha=(0.5, 1.0))),
    ])
      
  def __call__(self, img):
    img = np.array(img)

    '''
    # This is a experimental try to get even more shape
    # informations via imgaug heatmap feature.
    # While this code works, I could not find out how
    # to create custom heatmaps (not the quoakka_heatmap) for our images

    heatmap = ia.quokka_heatmap(size=0.25)
    print(f'img shape via heatmap.shape {heatmap.shape}')
    print(f'img shape {img.shape}')
    image_aug, heatmap_aug = self.aug(image=img, heatmaps=heatmap)
    return np.hstack([image_aug, heatmap_aug.draw(cmap="gray")[0]])
    
    '''

    # either return a PIL.Image here or use PyTorch Lambda transform later on
    # return PIL.Image.fromarray(self.aug.augment_image(img))    
    return self.aug.augment_image(img)

PyTorch makes it very easy for us to specify a transformation pipeline. This happens with `torchvision.transforms`. The Compose object accepts different torchvision.transforms steps, that will be executed in sequence. To include our custom `imgaug` preprocessing step, we transfer the output of `AdvancedAugmentation` (that is `self.aug.augment_image(img)`). This needs to be done, because the library method `augment_image` returns an `ndarray`, that the following `torchvision.transforms` Transformers can not pick up. Hence I included a transform step, that picks up this `ndarray` and converts it back to an Image again: `torchvision.transforms.Lambda(lambda x: PIL.Image.fromarray(x))`.
Regarding the `valid` and `test` transforms, we don't want to manipulate the original images to much only resize (that is important, so our CNN can pick it up) and normalize the data.

In [5]:
data_transforms = {
    'train': torchvision.transforms.Compose([
        # this is where our imgaug transformations get included
        AdvancedAugmentation(),
        torchvision.transforms.Lambda(lambda x: PIL.Image.fromarray(x)),
        torchvision.transforms.RandomGrayscale(p=0.3),
        torchvision.transforms.ToTensor(),
        torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ]),
    'valid': torchvision.transforms.Compose([
        torchvision.transforms.Resize(224),
        torchvision.transforms.CenterCrop(224),
        torchvision.transforms.ToTensor(),
        torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ]),
    'test': torchvision.transforms.Compose([
        torchvision.transforms.Resize(224),
        torchvision.transforms.CenterCrop(224),
        torchvision.transforms.ToTensor(),
        torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ]) 
}

Before we use the data_transforms, we create a smaller dogImages folder that suits our purposes for this that notebook - that is to simply demonstratet the learning pipeline (the whole dataset leads to `No space left on device` errors hence the real data should be executed on a separate training container - like it will be done in later notebooks)

In [33]:
!pwd
!mkdir dogImages_local_training

/home/ec2-user/SageMaker/cnn-classifier
mkdir: cannot create directory ‘dogImages_local_training’: No space left on device


Now that we have the specification of our data transformers for each dataset `train`, `valid` and `test`, we create the corresponding datasets & dataloaders:

In [6]:
data_dir = 'dogImages'

image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x),
                                          data_transforms[x])
                  for x in ['train', 'valid','test']}

# ATTENTION: WE NEED TO SET num_workers to 0! Otherwise we will run into errors
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=8,
                                             shuffle=True, num_workers=0)
              for x in ['train', 'valid', 'test']}


## Training & Testing

After setting up our data loaders the next common step is to specify our training and test scripts that will train our PyTorch model on the data. The following scripts are very similar to the ones we learned during the course, so I will not comment so much:

In [7]:
def train(n_epochs, loaders, model, optimizer, criterion, use_cuda, save_path):
    """returns trained model"""
    
    valid_loss_min = np.Inf
    
    if os.path.exists(save_path):
        model.load_state_dict(torch.load(save_path))
    
    for epoch in range(1, n_epochs+1):
        # initialize variables to monitor training and validation loss
        train_loss = 0.0
        valid_loss = 0.0
        
        ###################
        # train the model #
        ###################
        model.train()
        for data, target in dataloaders['train']:
            # move to GPU
            if use_cuda:
                data, target = data.cuda(), target.cuda()
            
            # clear the gradients of all optimized variables
            optimizer.zero_grad()
            # forward pass: compute predicted outputs by passing inputs to the model
            output = model(data)
            # calculate the batch loss
            loss = criterion(output, target)
            # backward pass: compute gradient of the loss with respect to model parameters
            loss.backward()
            # perform a single optimization step (parameter update)
            optimizer.step()
            # update training loss
            train_loss += loss.item()*data.size(0)
            
        ######################    
        # validate the model #
        ######################
        model.eval()
        for data, target in dataloaders['valid']:
            # move to GPU
            if use_cuda:
                data, target = data.cuda(), target.cuda()
            ## update the average validation loss
    
            # forward pass: compute predicted outputs by passing inputs to the model
            output = model(data)
            # calculate the batch loss
            loss = criterion(output, target)
            # update average validation loss 
            valid_loss += loss.item()*data.size(0)
            
        # calculate average losses
        train_loss = train_loss/len(dataloaders['train'].dataset)
        valid_loss = valid_loss/len(dataloaders['valid'].dataset)
        
        # print training/validation statistics 
        print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'.format(
            epoch, 
            train_loss,
            valid_loss
            ))
        
         # save model if validation loss has decreased
        if valid_loss <= valid_loss_min:
            print('Validation loss decreased ({:.6f} --> {:.6f}).  Saving model ...'.format(
            valid_loss_min,
            valid_loss))
            torch.save(model.state_dict(), save_path)
            valid_loss_min = valid_loss
    # return trained model
    return model

In [8]:
def test(loaders, model, criterion, use_cuda):

    # monitor test loss and accuracy
    test_loss = 0.
    correct = 0.
    total = 0.

    model.eval()
    for batch_idx, (data, target) in enumerate(loaders['test']):
        # move to GPU
        if use_cuda:
            data, target = data.cuda(), target.cuda()
        # forward pass: compute predicted outputs by passing inputs to the model
        output = model(data)
        # calculate the loss
        loss = criterion(output, target)
        # update average test loss 
        test_loss = test_loss + ((1 / (batch_idx + 1)) * (loss.data - test_loss))
        # convert output probabilities to predicted class
        pred = output.data.max(1, keepdim=True)[1]
        # compare predictions to true label
        correct += np.sum(np.squeeze(pred.eq(target.data.view_as(pred))).cpu().numpy())
        total += data.size(0)
            
    print('Test Loss: {:.6f}\n'.format(test_loss))

    print('\nTest Accuracy: %2d%% (%2d/%2d)' % (
        100. * correct / total, correct, total))

## Model definition

In our project proposal we commited on using an advanced deep learning technique called transfer learning. In the next cells I demonstrate how this can be leveraged in PyTorch:

In [13]:
# define vgg16 model
model = models.vgg16(pretrained=True)

# Freeze parameters so we don't backprop through them
for param in model.parameters():
    param.requires_grad = False

# vgg16
n_inputs = model.classifier[6].in_features
last_layer = nn.Linear(n_inputs, 133)
model.classifier[6] = last_layer

# set the cuda flag
use_cuda = torch.cuda.is_available()

if use_cuda:
    model = model.cuda()

optimizer = optim.SGD(model.classifier.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()

Next we can train our model

In [None]:
# Fix for cuda error resulting from truncated images
# https://stackoverflow.com/a/23575424/7434289

from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True

model = train(5, dataloaders, model, optimizer, criterion, use_cuda, 'model.pt')

Epoch: 1 	Training Loss: 4.010934 	Validation Loss: 2.289984
Validation loss decreased (inf --> 2.289984).  Saving model ...
Epoch: 2 	Training Loss: 2.844379 	Validation Loss: 1.430140
Validation loss decreased (2.289984 --> 1.430140).  Saving model ...
Epoch: 3 	Training Loss: 2.353899 	Validation Loss: 1.086405
Validation loss decreased (1.430140 --> 1.086405).  Saving model ...
Epoch: 4 	Training Loss: 2.133385 	Validation Loss: 0.911726
Validation loss decreased (1.086405 --> 0.911726).  Saving model ...


And finally it is time to measure the performance of our model via accuracy:

In [None]:
test(dataloaders, model, criterion, use_cuda)