# Image classification transfer learning demo

1. [Introduction](#Introduction)
2. [Prerequisites and Preprocessing](#Prequisites-and-Preprocessing)
3. [Fine-tuning the Image classification model](#Fine-tuning-the-Image-classification-model)
4. [Training parameters](#Training-parameters)
5. [Start the training](#Start-the-training)
6. [Inference](#Inference)


## Prequisites and Preprocessing

### Permissions and environment variables

Here we set up the linkage and authentication to AWS services. There are three parts to this:

* The roles used to give learning and hosting access to your data. This will automatically be obtained from the role used to start the notebook
* The S3 bucket that you want to use for training and model data
* The Amazon sagemaker image classification docker image which need not be changed

In [13]:
%%time
import sagemaker
from sagemaker import get_execution_role

role = get_execution_role()
sess = sagemaker.Session()
bucket = sess.default_bucket()
prefix = 'ic-transfer-learning'

CPU times: user 564 ms, sys: 36 ms, total: 600 ms
Wall time: 1.72 s


In [14]:
from sagemaker.amazon.amazon_estimator import get_image_uri

training_image = get_image_uri(sess.boto_region_name, 'image-classification', repo_version="latest")

## Preparing data for our model

In [8]:
base_dir='/tmp'

data_bucket_name='s3webcamuploader83a65c76f8384092b63d212639122190'
dataset_name = 'fingers-e'

%env BASE_DIR=$base_dir
%env S3_DATA_BUCKET_NAME = $data_bucket_name
%env DATASET_NAME = $dataset_name



env: BASE_DIR=/tmp
env: S3_DATA_BUCKET_NAME=s3webcamuploader83a65c76f8384092b63d212639122190
env: DATASET_NAME=fingers-e


In [9]:
%%bash
# Pull our images from S3
set -x
aws s3 sync s3://$S3_DATA_BUCKET_NAME/public/$DATASET_NAME $BASE_DIR/$DATASET_NAME --quiet

+ aws s3 sync s3://s3webcamuploader83a65c76f8384092b63d212639122190/public/fingers-e /tmp/fingers-e --quiet


In [10]:
# Find im2rec in our environment
import sys,os

suffix='/mxnet/tools/im2rec.py'
im2rec = list(filter( (lambda x: os.path.isfile(x + suffix )), sys.path))[0] + suffix
%env IM2REC=$im2rec

env: IM2REC=/home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/tools/im2rec.py


In [11]:
%%bash
set -x
# Use the IM2REC script to convert our images into RecordIO files

cd $BASE_DIR

# First we need to create two LST files (training and test lists), noting the correct label class for each image
# We'll also save the output of the LST files command, since it includes a list of all of our label classes
echo "Creating LST files"
python $IM2REC --list --recursive --pass-through --test-ratio=0.3 --train-ratio=0.7 $DATASET_NAME $DATASET_NAME > ${DATASET_NAME}_classes

echo "Label classes:"
cat ${DATASET_NAME}_classes

# Then we create RecordIO files from the LST files
echo "Creating RecordIO files"
python $IM2REC --num-thread=4 ${DATASET_NAME}_train.lst $DATASET_NAME
python $IM2REC --num-thread=4 ${DATASET_NAME}_test.lst $DATASET_NAME
ls -lh *.rec

Creating LST files
Label classes:
1 0
2 1
3 2
Creating RecordIO files
Creating .rec file from /tmp/fingers-e_train.lst in /tmp
time: 0.0054509639739990234  count: 0
Creating .rec file from /tmp/fingers-e_test.lst in /tmp
time: 0.0053479671478271484  count: 0
-rw-rw-r-- 1 ec2-user ec2-user 1.1M Sep 26 01:00 fingers-c_test.rec
-rw-rw-r-- 1 ec2-user ec2-user 2.5M Sep 26 01:00 fingers-c_train.rec
-rw-rw-r-- 1 ec2-user ec2-user 1.4M Sep 26 02:18 fingers-e_test.rec
-rw-rw-r-- 1 ec2-user ec2-user 3.3M Sep 26 02:18 fingers-e_train.rec


+ cd /tmp
+ echo 'Creating LST files'
+ python /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/tools/im2rec.py --list --recursive --pass-through --test-ratio=0.3 --train-ratio=0.7 fingers-e fingers-e
+ echo 'Label classes:'
+ cat fingers-e_classes
+ echo 'Creating RecordIO files'
+ python /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/tools/im2rec.py --num-thread=4 fingers-e_train.lst fingers-e
+ python /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/tools/im2rec.py --num-thread=4 fingers-e_test.lst fingers-e
+ ls -lh fingers-c_test.rec fingers-c_train.rec fingers-e_test.rec fingers-e_train.rec


In [15]:
# Upload our train and test RecordIO files to S3 in the bucket that our sagemaker session is using

s3train_path = 's3://{}/{}/train/'.format(bucket, prefix)
s3validation_path = 's3://{}/{}/validation/'.format(bucket, prefix)

# Clean up any existing data
!aws s3 rm s3://{bucket}/{prefix}/train --recursive
!aws s3 rm s3://{bucket}/{prefix}/validation --recursive

# Upload the rec files to the train and validation channels
!aws s3 cp /tmp/{dataset_name}_train.rec $s3train_path
!aws s3 cp /tmp/{dataset_name}_test.rec $s3validation_path



delete: s3://sagemaker-us-west-2-541003905521/ic-transfer-learning/train/fingers-c_train.rec
delete: s3://sagemaker-us-west-2-541003905521/ic-transfer-learning/validation/fingers-c_test.rec
upload: ../../../tmp/fingers-e_train.rec to s3://sagemaker-us-west-2-541003905521/ic-transfer-learning/train/fingers-e_train.rec
upload: ../../../tmp/fingers-e_test.rec to s3://sagemaker-us-west-2-541003905521/ic-transfer-learning/validation/fingers-e_test.rec


## Input data specification
Set the data type and channels used for training

In [16]:
train_data = sagemaker.session.s3_input(
    s3train_path, 
    distribution='FullyReplicated', 
    content_type='application/x-recordio', 
    s3_data_type='S3Prefix'
)

validation_data = sagemaker.session.s3_input(
    s3validation_path, 
    distribution='FullyReplicated', 
    content_type='application/x-recordio', 
    s3_data_type='S3Prefix'
)

data_channels = {'train': train_data, 'validation': validation_data}

Once we have the data available in the correct format for training, the next step is to actually train the model using the data. Before training the model, we need to setup the training parameters. The next section will explain the parameters in detail.

## Training
Now that we are done with all the setup that is needed, we are ready to train our object detector. To begin, let us create a ``sageMaker.estimator.Estimator`` object. This estimator will launch the training job.
### Training parameters
There are two kinds of parameters that need to be set for training. The first one are the parameters for the training job. These include:

* **Training instance count**: This is the number of instances on which to run the training. When the number of instances is greater than one, then the image classification algorithm will run in distributed settings. 
* **Training instance type**: This indicates the type of machine on which to run the training. Typically, we use GPU instances for these training 
* **Output path**: This the s3 folder in which the training output is stored

In [17]:
s3_output_location = 's3://{}/{}/output'.format(bucket, prefix)
ic = sagemaker.estimator.Estimator(
    training_image,
    role, 
    train_instance_count=1, 
    train_instance_type='ml.p3.2xlarge',
    train_volume_size = 10,
    train_max_run = 360000,
    input_mode= 'File',
    output_path=s3_output_location,
    sagemaker_session=sess
)

Apart from the above set of parameters, there are hyperparameters that are specific to the algorithm. These are:

* **num_layers**: The number of layers (depth) for the network. We use 18 in this samples but other values such as 50, 152 can be used.
* **use_pretrained_model**: Set to 1 to use pretrained model for transfer learning.
* **image_shape**: The input image dimensions,'num_channels, height, width', for the network. It should be no larger than the actual image size. The number of channels should be same as the actual image.
* **num_classes**: This is the number of output classes for the new dataset. Imagenet was trained with 1000 output classes but the number of output classes can be changed for fine-tuning. For caltech, we use 257 because it has 256 object categories + 1 clutter class.
* **num_training_samples**: This is the total number of training samples. It is set to 15240 for caltech dataset with the current split.
* **mini_batch_size**: The number of training samples used for each mini batch. In distributed training, the number of training samples used per batch will be N * mini_batch_size where N is the number of hosts on which training is run.
* **epochs**: Number of training epochs.
* **learning_rate**: Learning rate for training.
* **precision_dtype**: Training datatype precision (default: float32). If set to 'float16', the training will be done in mixed_precision mode and will be faster than float32 mode


In [24]:
mod = None

num_classes=! ls -l {base_dir}/{dataset_name} | wc -l
num_classes=int(num_classes[0]) - 1

num_training_samples=! cat {base_dir}/{dataset_name}_train.lst | wc -l
num_training_samples = int(num_training_samples[0])

common_hyperparameters=dict(
    use_pretrained_model=1,
    image_shape='3,224,224',
    num_classes=num_classes,
    num_training_samples=num_training_samples,
    precision_dtype='float32',
)


hyperparameters={
    **common_hyperparameters, 
    **dict(
        epochs=12,
        learning_rate=0.01,
        mini_batch_size=5,
    )
}


ic.set_hyperparameters(**hyperparameters)

hyperparameters

{'use_pretrained_model': 1,
 'image_shape': '3,224,224',
 'num_classes': 3,
 'num_training_samples': 168,
 'precision_dtype': 'float32',
 'epochs': 12,
 'learning_rate': 0.01,
 'mini_batch_size': 5}

## Start the training
Start training by calling the fit method in the estimator

In [None]:
ic.fit(inputs=data_channels, logs=True)
# 341 secs on a p2.xlarge and did not converge well after 30 epochs
# 169 secs on a p3.2xlarge AND converged T and V to 1.0 in 10 epochs! Test data shows .85 accuracy
# 169 secs on a p3.2xlarge with 12 epochs and .86 test accuracy...
# 

INFO:sagemaker:Creating training-job with name: image-classification-2018-09-26-02-27-23-709


2018-09-26 02:27:23 Starting - Starting the training job...
Launching requested ML instances...
Preparing the instances for training......
2018-09-26 02:29:19 Downloading - Downloading input data...
2018-09-26 02:29:33 Training - Downloading the training image.....
[31mDocker entrypoint called with argument(s): train[0m
[31m[09/26/2018 02:30:28 INFO 140590368741184] Reading default configuration from /opt/amazon/lib/python2.7/site-packages/image_classification/default-input.json: {u'beta_1': 0.9, u'gamma': 0.9, u'beta_2': 0.999, u'optimizer': u'sgd', u'use_pretrained_model': 0, u'eps': 1e-08, u'epochs': 30, u'lr_scheduler_factor': 0.1, u'num_layers': 152, u'image_shape': u'3,224,224', u'precision_dtype': u'float32', u'mini_batch_size': 32, u'weight_decay': 0.0001, u'learning_rate': 0.1, u'momentum': 0}[0m
[31m[09/26/2018 02:30:28 INFO 140590368741184] Reading provided configuration from /opt/ml/input/config/hyperparameters.json: {u'learning_rate': u'0.01', u'use_pretrained_model':

  for idx, event in sagemaker.logs.multi_stream_iter(client, log_group, stream_names, positions):


## Running a model locally

In [46]:
mod = None

import gc
gc.collect()

702

In [20]:
job = ic.latest_training_job
model_path = f"{base_dir}/{job.name}"

# Download the trained model from S3
! mkdir -p {model_path}
! aws s3 cp {ic.output_path}/{job.name}/output/model.tar.gz {model_path}/ --quiet
! cd {model_path} && tar -xzf model.tar.gz && rm model.tar.gz
! echo "Model extracted to {model_path}"

Model extracted to /tmp/image-classification-2018-09-26-02-19-48-589


In [21]:
import mxnet as mx

# MXNet wants the filename prefix for the model, which we can infer from prefix of the symbol.json file's name
symbol_filename = ! ls {model_path}/*-symbol.json
symbol_filename = symbol_filename[0].split('/')[-1]
model_files_prefix = f"{model_path}/{symbol_filename.replace('-symbol.json', '')}"

# Which snapshot to use?
epoch_snapshot_number=12

# Run our model on the CPU
ctx = mx.cpu()

# Initialize our model
sym, arg_params, aux_params = mx.model.load_checkpoint(model_files_prefix, epoch_snapshot_number)
mod = mx.mod.Module(symbol=sym, context=ctx, label_names=None)
mod.bind(
    for_training=False, 
    data_shapes=[('data', (1,3,224,224))], 
    label_shapes=mod._label_shapes
)
mod.set_params(arg_params, aux_params, allow_missing=True)

# Load classes from file (the captured output of the lst file creation)
classes_lines = open('/tmp/fingers-c_classes','r').read().splitlines()
classes=[parts.split(' ')[0] for parts in classes_lines]
classes

# Imports for working with image data
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

# Define a simple data batch (used for wrapping up an image to submit to the model)
from collections import namedtuple
Batch = namedtuple('Batch', ['data'])

def get_image(fname, show=False):
    img = mx.image.imread(fname)
    if img is None:
        return None
    if show:
        plt.imshow(img.asnumpy())
        plt.axis('off')
    # convert into format (batch, RGB, width, height)
    img = mx.image.imresize(img, 224, 224) # resize
    img = img.transpose((2, 0, 1)) # Channel first
    img = img.expand_dims(axis=0) # batchify
    return img

def classify_local(img_path, classes):
    img = get_image(img_path, show=False)
    # compute the predict probabilities
    mod.forward(Batch([img]))
    prob = mod.get_outputs()[0].asnumpy()
    # get highest prob index
    best_prob_index = np.argmax(prob)
#     # print the top-5
#     prob = np.squeeze(prob)
#     a = np.argsort(prob)[::-1]
#     for i in a[0:5]:
#         print('probability=%f, class=%s' %(prob[i], classes[i]))
    return(classes[best_prob_index], np.squeeze(prob)[best_prob_index])

#classify_local('/tmp/fingers-d/1/19a7f84f-46e4-4b79-a66c-54feac6649b0.jpg', classes)


In [179]:
# Clean up our local model to free RAM
mod = None

### Helper function to classify all images in a directory (with one sub-dir per class to test)

In [2]:
def classify_all(imgs_dir, classifier_func, classes, expected_class_from_filename_func):
    results = []

    for root, dirs, files in os.walk(imgs_dir):
        for file in files:
            if file.endswith(".jpg"):
                full_path = os.path.join(root, file)
                (predicted_class, probability) = classifier_func(full_path, classes)
                expected_class = expected_class_from_filename_func(full_path)
                is_correct = predicted_class == expected_class
                result = dict(
                    file=full_path,
                    is_correct=is_correct,
                    predicted_class=predicted_class,
                    expected_class=expected_class,
                    probability=probability
                )
                results.append(result)

    return results

### Classify some new test images never seen by the network

In [22]:
# Download a virgin data set
test_dataset_name = 'fingers-f'
! aws s3 sync s3://{data_bucket_name}/public/{test_dataset_name}/ {base_dir}/{test_dataset_name} 

download: s3://s3webcamuploader83a65c76f8384092b63d212639122190/public/fingers-f/1/59d6540a-5af1-41f4-b430-a82d733985a1.jpg to ../../../tmp/fingers-f/1/59d6540a-5af1-41f4-b430-a82d733985a1.jpg
download: s3://s3webcamuploader83a65c76f8384092b63d212639122190/public/fingers-f/1/150c6e0a-54d9-4fc6-bcc6-177c3f13da26.jpg to ../../../tmp/fingers-f/1/150c6e0a-54d9-4fc6-bcc6-177c3f13da26.jpg
download: s3://s3webcamuploader83a65c76f8384092b63d212639122190/public/fingers-f/1/19288ba0-2795-4f85-ad07-1ca6f617c661.jpg to ../../../tmp/fingers-f/1/19288ba0-2795-4f85-ad07-1ca6f617c661.jpg
download: s3://s3webcamuploader83a65c76f8384092b63d212639122190/public/fingers-f/1/89e43b31-d8de-4364-b2ed-1724a5d84935.jpg to ../../../tmp/fingers-f/1/89e43b31-d8de-4364-b2ed-1724a5d84935.jpg
download: s3://s3webcamuploader83a65c76f8384092b63d212639122190/public/fingers-f/1/8ef344e3-700d-4acd-9083-9acc27c18812.jpg to ../../../tmp/fingers-f/1/8ef344e3-700d-4acd-9083-9acc27c18812.jpg
download: s3://s3webcamuploader83a6

In [23]:
import os

# Run our virgin data set through the model and calculate the percentage of correct classifications
test_dir = base_dir + '/' + test_dataset_name
expected_class_from_filename_func = lambda f : f.split('/')[3]

results = classify_all(test_dir, classify_local, classes, expected_class_from_filename_func)
            
rights = [r for r in results if r['is_correct'] is True]
wrongs = [r for r in results if r['is_correct'] is False]
n_right = len(rights) * 1.0
n_wrong = len(wrongs) * 1.0

print("\n\n Total accuracy on new images: " + str(n_right / (n_right+n_wrong)))

mod = None



 Total accuracy on new images: 0.9


## Deploy the trained model

In [None]:
# Deploying a model to an endpoint takes a few minutes, are you sure you want to do this?
CONFIRM_DEPLOY = False # Change to true to deploy
assert(CONFIRM_DEPLOY)

ic_classifier = ic.deploy(
    initial_instance_count = 1,
    instance_type = 'ml.t2.medium'
)

### Calling a deployed endpoint

In [98]:
import json
import numpy as np
import os

def classify_deployed(file_name, classes):
    payload = None
    with open(file_name, 'rb') as f:
        payload = f.read()
        payload = bytearray(payload)

    ic_classifier.content_type = 'application/x-image'
    result = json.loads(ic_classifier.predict(payload))
    best_prob_index = np.argmax(result)
    return (classes[best_prob_index], result[best_prob_index])



### Clean up

When we're done with the endpoint, we can just delete it and the backing instances will be released.  Run the following cell to delete the endpoint.

In [97]:
ic_classifier.delete_endpoint()

INFO:sagemaker:Deleting endpoint with name: image-classification-2018-09-25-02-07-14-375


ClientError: An error occurred (ValidationException) when calling the DeleteEndpoint operation: Could not find endpoint "arn:aws:sagemaker:us-west-2:541003905521:endpoint/image-classification-2018-09-25-02-07-14-375".