![@mikegchambers](../../images/header.png)

# Image Classification with SageMaker built-in algorithm

In this notebook, we use the SageMaker SDK to train an image classification model from an Amazon SageMaker built-in algorithm.

## Import Libraries

In [None]:
import sagemaker
import json 
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure

## Setup the SageMaker session 

The SageMaker SDK has some convenience methods for getting a reference to a role, setting up a session, and getting the location of a 'default' bucket that can be used.

In [None]:
role = sagemaker.get_execution_role()
sess = sagemaker.Session()
bucket=sess.default_bucket()

## Locate Data

When the SageMaker container launches, we will pass in the location of the training data.  The data is expected to be in an S3 bucket.  We set this here such that we can use it later.  The location of this data is in a publicly accessible bucket in one of my accounts.  

We also set the location where we will save the completed model.

In [None]:
s3train = 's3://aws-mls-c01/cifar10/train/cifar10_train.rec'
s3validation = 's3://aws-mls-c01/cifar10/validation/cifar10_val.rec'

s3_output_location = 's3://{}/image-classificaiton/output'.format(bucket)

## Define the training image

Here we point SageMaker to the container we want to use.  In this case, the built in 'image-classification' algorithm/container is being used.

In [None]:
training_image = sagemaker.image_uris.retrieve('image-classification', region='us-east-1')

## Create a SageMaker Estimator

The SageMaker Estimator is one of the key object types in the SageMaker SDK.  Here we initialise the estimator specifying the instance type, how many instances we want to use, and other parameters including the use of spot instances. 

This is how SageMaker manages infrastructure for us with a simple SDK call to the API.

In [None]:
ic = sagemaker.estimator.Estimator( training_image,
                                    role, 
                                    instance_count=1, 
                                    instance_type='ml.p2.xlarge',
                                    volume_size = 50,
                                    max_run = 7200,
                                    input_mode= 'File',
                                    output_path=s3_output_location,
                                    sagemaker_session=sess,
                                    use_spot_instances=True,
                                    max_wait=7200)

Then we set some hyperparameters:

In [None]:
ic.set_hyperparameters(             use_pretrained_model=1,
                                    num_layers=50,
                                    image_shape = "3,32,32",
                                    num_classes=10,
                                    num_training_samples=50000,
                                    mini_batch_size=64,
                                    epochs=5,
                                    learning_rate=0.001,
                                    optimizer='adam')

Define our input channels:

In [None]:
train_data = sagemaker.inputs.TrainingInput(     s3train, 
                                                 distribution='FullyReplicated', 
                                                 content_type='application/x-recordio', 
                                                 s3_data_type='S3Prefix')

validation_data = sagemaker.inputs.TrainingInput(s3validation, 
                                                 distribution='FullyReplicated', 
                                                 content_type='application/x-recordio', 
                                                 s3_data_type='S3Prefix')

data_channels = {'train': train_data, 'validation': validation_data}

## Train the model

And finally, we call `fit` to train the model.

In [None]:
ic.fit(inputs=data_channels)

## Create an inference endpoint

Now that the model is created (and saved to S3 at `s3_output_location`) we can create an endpoint from the model, such that we can use it to make inference about new data.

In [None]:
ic_classifier = ic.deploy(initial_instance_count = 1, instance_type = 'ml.m4.xlarge')

Within the SageMaker SDK, we don't need to know the endpoint name, as we can simply reference it from the classifier we just made.  But it might be useful to know, so let's find out: 

In [None]:
ic_classifier.endpoint_name

Now let's create some data to use for testing.  Here we set the labels, and a test image:

In [None]:
labels = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']

f = open('./test-images/plane.jpg', 'rb') # opening a binary file
data = f.read()

## Make an inference/prediction

With our endpoint deployed and sample data ready, we can call `predict` and see what we find.

As this model was trained quickly, don't expect anything too amazing!  If you want to improve the accuracy change some of the hyperparameters and train again.  The first thing to try is to increase the number of epochs.

In [None]:
prediction = ic_classifier.predict(data, initial_args={"ContentType": "application/x-image"})
probs = json.loads(prediction)

print(probs)

And make it pretty:

In [None]:
figure(num=None, figsize=(8, 5), dpi=80, facecolor='w', edgecolor='k')
plt.bar(range(10), probs)
plt.xticks(range(10), labels)
plt.show()

In [None]:
index_of_prediction = np.argmax(probs)
label_of_prediciton = labels[index_of_prediction]

print("This looks like a {}.".format(label_of_prediciton))

## Clean up

Now let's tear down the endpoint as we are charged whilst it's up and running. 

In [None]:
# ic_classifier.delete_endpoint()