# sagemaker-demo-notebook

This notebook is an interactive companion to the article. In it we will do the following:

* Build a machine learning model image and store it on ECR, Amazon's container registry service.
* Train a machine learning model based on the image we just pushed.
* Deploy that model to a web endpoint.
* Deploy an arbitrary Sagemaker-complaint model artifact to a web endpoint.
* Perform a batch classification job using a SageMaker-compliant model artifact (unfinished?).

You may run this notebook either locally or in an AWS SageMaker instance.

If you are running locally, make sure that the account you are running this notebook under has all of the necessary permissions: `S3ReadOnlyAccess`, `SagemakerFullAccess`, `iam:GetRole`, and `ECRFullAccess`.

If you are running on AWS SageMaker, make sure that the role you pass to the notebook instance has all of these permissions available. Note that the default SageMaker execution context is **not** enough; it has the first permissions in the list above but not the latter two. You need to attach those permissions to the instance yourself.


## Getting the code

We start by downloading the code from [its repository](https://github.com/ResidentMario/quilt-sagemaker-demo) on GitHub.

In [1]:
!rm -rf quilt-sagemaker-demo > /dev/null 2>&1
!git clone https://github.com/ResidentMario/quilt-sagemaker-demo

Cloning into 'quilt-sagemaker-demo'...
remote: Enumerating objects: 95, done.[K
remote: Counting objects: 100% (95/95), done.[K
remote: Compressing objects: 100% (66/66), done.[K
remote: Total 95 (delta 48), reused 72 (delta 25), pack-reused 0[K
Unpacking objects: 100% (95/95), done.


In [2]:
%ls quilt-sagemaker-demo

app.py       Dockerfile             requirements.txt
build.ipynb  health-check-data.csv  [0m[01;32mrun.sh[0m*


The files are:
* `build.ipynb` &mdash; A Jupyter notebook that walks through building and training a model for classifying clothing that is based on the Fashion MNIST dataset.
* `app.py` &mdash; A simple `flask` app that serves a SageMaker-compliant model-as-an-app.
* `health-check-data.csv` &mdash; A small sample dataset used to ping the web service for health checks.
* `Dockerfile` &mdash; A Dockerfile that builds an image suitable for distribution on SageMaker.
* `run.sh` &mdash; The image runtime entrypoint.
* `requirements.txt` &mdash; A list of dependencies necesssary for building or running the model (locally or remotely).

...and this notebook.

## Pusing the container

The following shell script, inlined in this notebook, builds the Docker image we've imported and stores it in ECR.

In [3]:
%%sh

# construct the ECR name.
account=$(aws sts get-caller-identity --query Account --output text)
region=$(aws configure get region)
fullname="${account}.dkr.ecr.${region}.amazonaws.com/quiltdata/sagemaker-demo:latest"

# If the repository doesn't exist in ECR, create it.
# The pipe trick redirects stderr to stdout and passes it /dev/null.
# It's just there to silence the error.
aws ecr describe-repositories --repository-names "quiltdata/sagemaker-demo" > /dev/null 2>&1

# Check the error code, if it's non-zero then know we threw an error and no repo exists
if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "quiltdata/sagemaker-demo" > /dev/null
fi

# Get the login command from ECR and execute it directly
$(aws ecr get-login --region ${region} --no-include-email)

# Build the docker image, tag it with the full name, and push it to ECR
docker build  -t "quiltdata/sagemaker-demo" quilt-sagemaker-demo/
docker tag "quiltdata/sagemaker-demo" ${fullname}

docker push ${fullname}

Login Succeeded
Sending build context to Docker daemon  5.806MB
Step 1/15 : FROM python:3.6
 ---> 55fb8aca33df
Step 2/15 : RUN ["mkdir", "app"]
 ---> Using cache
 ---> 820a184440b0
Step 3/15 : WORKDIR "app"
 ---> Using cache
 ---> f1212684fe5c
Step 4/15 : COPY "requirements.txt" .
 ---> Using cache
 ---> 9f5df3de1353
Step 5/15 : RUN ["pip", "install", "-r", "requirements.txt"]
 ---> Using cache
 ---> ef8ee61411cd
Step 6/15 : COPY "app.py" .
 ---> d1e09b8de2ca
Step 7/15 : COPY "run.sh" .
 ---> f814dd9b407b
Step 8/15 : COPY "build.ipynb" .
 ---> 483adaf0e2c9
Step 9/15 : COPY "catalog-screencap.png" .
 ---> 8655d266443b
Step 10/15 : COPY "health-check-data.csv" .
 ---> 75f7bae55dc5
Step 11/15 : ENV FLASK_APP app.py
 ---> Running in 2f7a7b2f3710
Removing intermediate container 2f7a7b2f3710
 ---> 90038dd03705
Step 12/15 : RUN ["chmod", "+x", "./run.sh"]
 ---> Running in 83505dab532c
Removing intermediate container 83505dab532c
 ---> 05ce05a4a6cc
Step 13/15 : EXPOSE 5000
 ---> Running in 3

https://docs.docker.com/engine/reference/commandline/login/#credentials-store



## Training a model

We use `sagemaker.estimator.Estimator` object to perform model training.

Note that the `Estimator` object is parameterized with the image ARN (resource name), a role and session (passed down from the role executing this notebook instance), an instance and instance count, and an output path.

The `output_path` is an interesting case. The default behavior of the various algorithms that SageMaker comes packaged with is to output a `*.tar.gz` model artifact into an S3 bucket, and this is a design pattern you are encouraged to use when using a custom image (as well) by e.g. the presence of this argument.

Our image serializes model objects itself instead of relying on SageMaker to do it for us, rendering this argument useless. However it's not wise to omit it as SageMaker will automatically create a fresh run-dependent bucket for you if you do...

**User note**: you should change `output_path` in the code cell that follows to any random S3 bucket that you own or that hasn't been claimed yet.

In [5]:
import boto3
import re

import os
import numpy as np
import pandas as pd

from sagemaker import get_execution_role
import sagemaker as sage

In [7]:
# this line of code require additional iam:GetRole permissions.
role = get_execution_role()

sess = sage.Session()

account = sess.boto_session.client('sts').get_caller_identity()['Account']
region = sess.boto_session.region_name
image = '{}.dkr.ecr.{}.amazonaws.com/quiltdata/sagemaker-demo'.format(account, region)

Once the model is defined training is performed via `Esimator.fit`, mimicking the `scikit-learn` API.

In [8]:
clf = sage.estimator.Estimator(image,
                               role, 1, 'ml.c4.2xlarge',
                               output_path="s3://alpha-quilt-storage/junk",
                               sagemaker_session=sess)

clf.fit()

INFO:sagemaker:Creating training-job with name: sagemaker-demo-2019-01-16-23-24-48-787


2019-01-16 23:24:48 Starting - Starting the training job...
2019-01-16 23:24:50 Starting - Launching requested ML instances......
2019-01-16 23:25:55 Starting - Preparing the instances for training...
2019-01-16 23:26:43 Downloading - Downloading input data
2019-01-16 23:26:43 Training - Downloading the training image.....
[31m[NbConvertApp] Converting notebook build.ipynb to notebook[0m
[31m[NbConvertApp] Executing notebook with kernel: python3[0m

2019-01-16 23:27:24 Training - Training image download completed. Training in progress.[31m2019-01-16 23:27:51.608769: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA[0m

2019-01-16 23:28:20 Uploading - Uploading generated training model[31m[NbConvertApp] Writing 397368 bytes to build.ipynb[0m

2019-01-16 23:28:25 Completed - Training job completed
Billable seconds: 110


Running this code block trains out model and deposits it in a `clf.tar.gz` file in an S3 bucket somewhere.

## Deploying a model

### Deploy a fitted model as an endpoint

If we handle writing model artifacts ourselves directly in the image, it becomes necessary to overwrite the `model_data` class property as follows.

In [9]:
sage.estimator.Estimator.model_data =\
    "s3://alpha-quilt-storage/aleksey/fashion_mnist_clf/clf.tar.gz"

In [10]:
from sagemaker.predictor import csv_serializer
predictor = clf.deploy(1, 'ml.m4.xlarge', serializer=csv_serializer)

INFO:sagemaker:Creating model with name: sagemaker-demo-2019-01-16-23-29-01-518
INFO:sagemaker:Creating endpoint with name sagemaker-demo-2019-01-16-23-24-48-787


----------------------------------------------------------------------------!

In [30]:
# This fails because it lacks an authentication token.
# It might be possible to reconstruct the actual POST request being made.
# predictor.sagemaker_session.boto_session.get_credentials().token
# But the AWS docs are unclear about what name this hearder has.

# !curl -X "POST" -H "Content-Type: text/csv" -d @health-check-data.csv URI

In [32]:
X_test = pd.read_csv("./fashion-mnist_train.csv").head().iloc[:, 1:].values

In [33]:
predictor.predict(X_test)

b'4,\n9,\n4,\n0,\n3'

In [84]:
sess.delete_endpoint(predictor.endpoint)

#### Deploy a pre-trained model artifact as an endpoint

In [81]:
clf = sage.estimator.Estimator(image, role, 1, 'ml.c4.2xlarge',
                               output_path="s3://alpha-quilt-storage/junk", 
                               sagemaker_session=sess).create_model()
predictor = clf.deploy(1, 'ml.c4.2xlarge')

In [None]:
predictor.predict(X_test)

In [77]:
sess.delete_endpoint(predictor.endpoint)

#### Use a model artifact to perform a batch prediction run

In order to perform a batch transform you must have a model.

In [82]:
clf = sage.estimator.Estimator(image, role, 1, 'ml.c4.2xlarge',
                               output_path="s3://alpha-quilt-storage/junk", 
                               sagemaker_session=sess).create_model()

In [94]:
transformer = sagemaker.transformer.Transformer(
    base_transform_job_name='Batch-Transform',
    model_name='sagemaker-demo-2019-01-17-02-00-21-619',  # take this from a past training session
    instance_count=1,
    instance_type='ml.c4.xlarge',
    output_path='s3://alpha-quilt-storage/junk',
    sagemaker_session=sess
)

In [96]:
# start the job
# note: this will fail because the data is not quite in the right input format
# but it gets the idea across
transformer.transform(
    's3://alpha-quilt-storage/aleksey/fashion_mnist/fashion-mnist_train.csv', 
    content_type='text/csv', 
    split_type='Line'
)

# wait until transform job is completed
transformer.wait()

In [99]:
# TODO: test that this code works
import boto3
s3_client = boto3.resource('s3')
s3_client.download_file(bucket, 'kmeans_batch_example/output/valid-data.csv.out', 'valid-result')