This notebook demonstrates how to leverage transfer learning to use your own image dataset to build and train an image classification model using MXNet and Amazon SageMaker.

We use, as an example, the creation of a trash classification model which, given some image, classifies it into one of three classes: compost, landfill, recycle. This is based on the [Show Before You Throw](https://www.youtube.com/watch?v=Ut1VGG6TOOw) project from an AWS DeepLens hackathon

# Part 1: Prepare training and validation data as RecordIO files

It is assumed that your custom dataset's images are present in an S3 bucket and that different classes are separated by named folders, as shown in the following directory structure:
```
|-deeplens-bucket

    |-images
    
        |-Compost 
    
        |-Landfill
    
        |-Recycle
 ```

#### This section has been adapted from this [mxnet-examples repository](https://github.com/sharmalakshay93/mxnet-examples/tree/master/custom_dataset) and the [MXNet API documentation](https://mxnet.incubator.apache.org/api/python/io/io.html#mxnet.recordio.MXIndexedRecordIO)

#### Import relevant modules

In [1]:
import os
import urllib.request
import boto3

####  Download image data from S3 to the notebook instance's local storage

Note: the code below will download all the data/objects in the bucket. Adjust the s3 path parameter accordingly if your bucket contains other objects that are irrelevant to this task

In [2]:
!mkdir deeplens-blog-bucket
!aws s3 cp --recursive s3://deeplens-blog-bucket deeplens-blog-bucket

download: s3://deeplens-blog-bucket/images/Compost/20180403-195616.jpg to deeplens-blog-bucket/images/Compost/20180403-195616.jpg
download: s3://deeplens-blog-bucket/images/Compost/20180403-195643.jpg to deeplens-blog-bucket/images/Compost/20180403-195643.jpg
download: s3://deeplens-blog-bucket/images/Compost/20180403-195651.jpg to deeplens-blog-bucket/images/Compost/20180403-195651.jpg
download: s3://deeplens-blog-bucket/images/Compost/20180403-195736.jpg to deeplens-blog-bucket/images/Compost/20180403-195736.jpg
download: s3://deeplens-blog-bucket/images/Compost/20180403-195729.jpg to deeplens-blog-bucket/images/Compost/20180403-195729.jpg
download: s3://deeplens-blog-bucket/images/Compost/20180403-195833.jpg to deeplens-blog-bucket/images/Compost/20180403-195833.jpg
download: s3://deeplens-blog-bucket/images/Compost/20180403-195701.jpg to deeplens-blog-bucket/images/Compost/20180403-195701.jpg
download: s3://deeplens-blog-bucket/images/Compost/20180403-195746.jpg to deeplens-blog-bu

Ensure that the newly created directories containing the downloaded data are structured as shown at the beginning of this tutorial.

####  Find MXNet package path, and save this path as a variable

In [3]:
import mxnet as mx
mxnet_path = mx.__file__[ : mx.__file__.rfind('/')]
print(mxnet_path)

/home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet


#### Prepare "list" files with train-val split

To be able to create the .rec files, we first need to split the data into training and validation sets (after shuffling) and create two list files for each. Here our split is 70-30 (specified by the `0.7` parameter below). `processed_rec` is the prefix we attach to the output `.rec` files, and `deeplens-blog-bucket/images` refers to the directory containing the images with each class as a subdirectory. This step also assigns (non-negative) integers to each class label.

In [4]:
!python $mxnet_path/tools/im2rec.py --list --recursive --train-ratio 0.7 processed_rec deeplens-blog-bucket/images

Compost 0
Landfill 1
Recycling 2


In the Jupyter directory navigation pane, you should be able to see two new files in the directory where this Jupyter notebook exists: `processed_rec_train.lst`, `processed_rec_val.lst`.

#### Prepare RecordIO files

The .rec files can now be prepared. The images are resized such that the shorter edge has a length of `240px`. Some other parameters are passed in, which are explained [here](https://github.com/apache/incubator-mxnet/blob/d2a856a3a2abb4e72edc301b8b821f0b75f30722/tools/im2rec.py#L206).

Note that the `processed_rec` and `deeplens-blog-bucket` parameters are similar to the previous step.

In [5]:
!python $mxnet_path/tools/im2rec.py --resize 240 --quality 95 --num-thread 16 processed_rec deeplens-blog-bucket/images

Creating .rec file from /home/ec2-user/SageMaker/processed_rec_train.lst in /home/ec2-user/SageMaker
time: 0.9254565238952637  count: 0
Creating .rec file from /home/ec2-user/SageMaker/processed_rec_val.lst in /home/ec2-user/SageMaker
time: 0.31188416481018066  count: 0


You should be able to see four new files in the directory where this Jupyter notebook exists (viewable via the Jupyter directory navigation pane): `processed_rec_train.idx`, `processed_rec_val.idx`, `processed_rec_train.rec`, and `processed_rec_val.rec`.

#### Save RecordIO files to S3

Of the above, `processed_rec_train.rec` and `processed_rec_val.rec` will need to saved in an S3 location to be able to train the SageMaker model. 

In [6]:
def save_to_s3(file_name):
    assert 'train' in file_name or 'val' in file_name, "Unable to determine if train or val"
    s3 = boto3.resource('s3')
    data = open(file_name, "rb")
    if 'train' in file_name:
        prefix = 'train/'
    elif 'val' in file_name:
        prefix = 'val/'
    key = 'rec_files/' + prefix + file_name
    s3.Bucket(bucket).put_object(Key=key, Body=data)

In [7]:
rec_file_list = ['processed_rec_train.rec', 'processed_rec_val.rec']
bucket = 'deeplens-blog-bucket'

for f in rec_file_list:
    save_to_s3(f)

#### Retrieve image dimensions

As part of the model building phase, the input image dimensions will need to be known. The shorter edge was a user-specified input above (240). The longer edge's length may be retrieved by checking the dimensions of any sample in the dataset.

Note: it is assumed that all images have equal dimensions.

In [8]:
record = mx.recordio.MXRecordIO('processed_rec_train.rec', 'r')
item = record.read()
_, img = mx.recordio.unpack_img(item)
img_shape_str = (',').join([str(img.shape[2]), str(img.shape[0]), str(img.shape[1])])
print(img_shape_str)

3,240,424


#### Retrieve dataset size

The size of the dataset will also need to be known.

In [9]:
train_records = mx.recordio.MXIndexedRecordIO('processed_rec_train.idx', 'processed_rec_train.rec', 'r')
train_samples = len(train_records.__dict__['idx'])
val_records = mx.recordio.MXIndexedRecordIO('processed_rec_val.idx', 'processed_rec_val.rec', 'r')
val_samples = len(val_records.__dict__['idx'])
print('train_samples:', train_samples)
print('val_samples:', val_samples)

train_samples: 384
val_samples: 165


This marks the end of the data preparation phase.

# Part 2: Model building and training

#### This section has been adapted from the [AWS Machine Learning Blog](https://aws.amazon.com/blogs/machine-learning/build-your-own-object-classification-model-in-sagemaker-and-import-it-to-deeplens/) 

#### Import relevant modules

In [10]:
%%time
import re
from sagemaker import get_execution_role
import time
from time import gmtime, strftime
%matplotlib inline

CPU times: user 977 ms, sys: 56.5 ms, total: 1.03 s
Wall time: 6.95 s


#### Import SageMaker execution role

In [11]:
role = get_execution_role()

#### 3.	Define docker containers and select the relevant training image based on your region

In [12]:
containers = {'us-west-2': '433757028032.dkr.ecr.us-west-2.amazonaws.com/image-classification:latest',
              'us-east-1': '811284229777.dkr.ecr.us-east-1.amazonaws.com/image-classification:latest',
              'us-east-2': '825641698319.dkr.ecr.us-east-2.amazonaws.com/image-classification:latest',
              'eu-west-1': '685385470294.dkr.ecr.eu-west-1.amazonaws.com/image-classification:latest'}
training_image = containers[boto3.Session().region_name]
print(training_image)

811284229777.dkr.ecr.us-east-1.amazonaws.com/image-classification:latest


#### Define dataset parameters

Other than `train_samples` and `img_shape_str` above, a few more dataset parameters have to be specified.

In [14]:
num_classes = 3
mini_batch_size =  14

#### Define model and training parameters

The model used is the Resnet-18 model, and therefore `num_layers = 18`. Since this model is known to have good parameters, it helps to start with  these parameters and fine-tune them further for the new classification task. For this reason, `use_pretrained_model = 1`.

In [15]:
num_layers = 18
mini_batch_size =  14
epochs = 10
learning_rate = 0.01
use_pretrained_model = 1

#### Define job name

In [16]:
job_name_prefix = 'sbyt-deeplens-blog'
timestamp = time.strftime('-%Y-%m-%d-%H-%M-%S', time.gmtime())
job_name = job_name_prefix + timestamp
print(job_name)

sbyt-deeplens-blog-2019-04-06-22-17-16


#### Define SageMaker params

A number of parameters are passed in to SageMaker to create and train the model. Several of these are those defined in the steps above.

In [17]:
training_params = \
{
    # specify the training docker image
    "AlgorithmSpecification": {
        "TrainingImage": training_image,
        "TrainingInputMode": "File"
    },
    "RoleArn": role,
    "OutputDataConfig": {
        "S3OutputPath": 's3://{}/{}/output'.format(bucket, job_name_prefix)
    },
    "ResourceConfig": {
        "InstanceCount": 1,
        "InstanceType": "ml.p2.xlarge",
        "VolumeSizeInGB": 50
    },
    "TrainingJobName": job_name,
    "HyperParameters": {
        "image_shape": img_shape_str,
        "num_layers": str(num_layers),
        "num_training_samples": str(train_samples),
        "num_classes": str(num_classes),
        "mini_batch_size": str(mini_batch_size),
        "epochs": str(epochs),
        "learning_rate": str(learning_rate),
        "use_pretrained_model": str(use_pretrained_model)
    },
    "StoppingCondition": {
        "MaxRuntimeInSeconds": 360000
    },
#Training data should be inside a subdirectory called "train"
#Validation data should be inside a subdirectory called "val"
#The algorithm currently only supports fullyreplicated model (where data is copied onto each machine)
    "InputDataConfig": [
        {
            "ChannelName": "train",
            "DataSource": {
                "S3DataSource": {
                    "S3DataType": "S3Prefix",
                    "S3Uri": 's3://{}/rec_files/train/'.format(bucket),
                    "S3DataDistributionType": "FullyReplicated"
                }
            },
            "ContentType": "application/x-recordio",
            "CompressionType": "None"
        },
        {
            "ChannelName": "validation",
            "DataSource": {
                "S3DataSource": {
                    "S3DataType": "S3Prefix",
                    "S3Uri": 's3://{}/rec_files/val/'.format(bucket),
                    "S3DataDistributionType": "FullyReplicated"
                }
            },
            "ContentType": "application/x-recordio",
            "CompressionType": "None"
        }
    ]
}
print('Training job name: {}'.format(job_name))
print('\nInput Data Location: {}'.format(training_params['InputDataConfig'][0]['DataSource']['S3DataSource']))

Training job name: sbyt-deeplens-blog-2019-04-06-22-17-16

Input Data Location: {'S3DataType': 'S3Prefix', 'S3Uri': 's3://deeplens-blog-bucket/rec_files/train/', 'S3DataDistributionType': 'FullyReplicated'}


#### Create and run SageMaker job

In [18]:
# create the Amazon SageMaker training job
sagemaker = boto3.client(service_name='sagemaker')
sagemaker.create_training_job(**training_params)

# confirm that the training job has started
status = sagemaker.describe_training_job(TrainingJobName=job_name)['TrainingJobStatus']
print('Training job current status: {}'.format(status))

try:
    # wait for the job to finish and report final status
    sagemaker.get_waiter('training_job_completed_or_stopped').wait(TrainingJobName=job_name)
    training_info = sagemaker.describe_training_job(TrainingJobName=job_name)
    status = training_info['TrainingJobStatus']
    print("Training job ended with status: " + status)
except:
    print('Training failed to start')
     # if an exception is raised, this implies that the job has failed
    message = sagemaker.describe_training_job(TrainingJobName=job_name)['FailureReason']
    print('Training failed with the following error: {}'.format(message))

Training job current status: InProgress
Training job ended with status: Completed


#### Training completion status and accuracy can be viewed by navigating to `Training -> Training Jobs -> job_name -> View logs` in the SageMaker console 

#### Trained model artifacts in S3

The model trained above can now be found in the `s3://deeplens-blog-bucket/sbyt-deeplens-blog/output` path, and can be further used to create a SageMaker model instance that can be deployed to the DeepLens device.

This marks the end of the model building and training phase.