# sagemaker-demo-notebook

This notebook is an interactive companion to the article. In it we will do the following:

* Build a machine learning model image and store it on ECR, Amazon's container registry service.
* Train a machine learning model based on the image we just pushed.
* Deploy that model to a web endpoint.
* Deploy an arbitrary Sagemaker-complaint model artifact to a web endpoint.
* Perform a batch classification job using a SageMaker-compliant model artifact (unfinished?).

You may run this notebook either locally or in an AWS SageMaker instance.

If you are running locally, make sure that the account you are running this notebook under has all of the necessary permissions: `S3ReadOnlyAccess`, `SagemakerFullAccess`, `iam:GetRole`, and `ECRFullAccess`.

If you are running on AWS SageMaker, make sure that the role you pass to the notebook instance has all of these permissions available. Note that the default SageMaker execution context is **not** enough; it has the first permissions in the list above but not the latter two. You need to attach those permissions to the instance yourself.


## Getting the code

We start by downloading the code from [its repository](https://github.com/ResidentMario/quilt-sagemaker-demo) on GitHub.

In [1]:
# !rm -rf quilt-sagemaker-demo > /dev/null 2>&1
# !git clone https://github.com/ResidentMario/quilt-sagemaker-demo

Cloning into 'quilt-sagemaker-demo'...
remote: Enumerating objects: 108, done.[K
remote: Counting objects: 100% (108/108), done.[K
remote: Compressing objects: 100% (72/72), done.[K
remote: Total 108 (delta 58), reused 82 (delta 32), pack-reused 0[K
Receiving objects: 100% (108/108), 861.00 KiB | 35.87 MiB/s, done.
Resolving deltas: 100% (58/58), done.


In [1]:
%ls quilt-sagemaker-demo

Dockerfile                     fashion-mnist_train.csv
README.md                      health-check-data.csv
app.py                         requirements.txt
build.ipynb                    [31mrun.sh[m[m*
clf.h5                         sagemaker-demo-notebook.ipynb


The files are:
* `build.ipynb` &mdash; A Jupyter notebook that walks through building and training a model for classifying clothing that is based on the Fashion MNIST dataset.
* `app.py` &mdash; A simple `flask` app that serves a SageMaker-compliant model-as-an-app.
* `health-check-data.csv` &mdash; A small sample dataset used to ping the web service for health checks.
* `Dockerfile` &mdash; A Dockerfile that builds an image suitable for distribution on SageMaker.
* `run.sh` &mdash; The image runtime entrypoint.
* `requirements.txt` &mdash; A list of dependencies necesssary for building or running the model (locally or remotely).

...and this notebook.

## Pusing the container

The following shell script, inlined in this notebook, builds the Docker image we've imported and stores it in ECR.

In [2]:
%%sh

# construct the ECR name.
account=$(aws sts get-caller-identity --query Account --output text)
region=$(aws configure get region)
fullname="${account}.dkr.ecr.${region}.amazonaws.com/quiltdata/sagemaker-demo:latest"

echo "DONE1"
# If the repository doesn't exist in ECR, create it.
# The pipe trick redirects stderr to stdout and passes it /dev/null.
# It's just there to silence the error.
aws ecr describe-repositories --repository-names "quiltdata/sagemaker-demo" > /dev/null 2>&1

echo "DONE2"
# Check the error code, if it's non-zero then know we threw an error and no repo exists
if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "quiltdata/sagemaker-demo" > /dev/null
fi


echo "DONE3"
# Get the login command from ECR and execute it directly
$(aws ecr get-login --region ${region} --no-include-email)

echo "DONE4"
# Build the docker image, tag it with the full name, and push it to ECR
docker build  -t "quiltdata/sagemaker-demo" quilt-sagemaker-demo/
docker tag "quiltdata/sagemaker-demo" ${fullname}
echo "DONE5"
docker push ${fullname}

DONE1
DONE2
DONE3
Login Succeeded
DONE4
Sending build context to Docker daemon  1.358GB
Step 1/14 : FROM python:3.6
 ---> 1c515a624542
Step 2/14 : RUN ["mkdir", "app"]
 ---> Using cache
 ---> 051fdec606bc
Step 3/14 : WORKDIR "app"
 ---> Using cache
 ---> 1050e38336a5
Step 4/14 : COPY "requirements.txt" .
 ---> fcf01e8fb82f
Step 5/14 : RUN ["pip", "install", "-r", "requirements.txt"]
 ---> Running in a15439a13430
Collecting git+https://github.com/quiltdata/t4.git#subdirectory=api/python (from -r requirements.txt (line 3))
  Cloning https://github.com/quiltdata/t4.git to /tmp/pip-req-build-mtypd_ju
[91m  Running command git clone -q https://github.com/quiltdata/t4.git /tmp/pip-req-build-mtypd_ju
[0mCollecting numpy (from -r requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/75/92/57179ed45307ec6179e344231c47da7f3f3da9e2eee5c8ab506bd279ce4e/numpy-1.17.1-cp36-cp36m-manylinux1_x86_64.whl (20.4MB)
Collecting pandas (from -r requirements.txt (line 2))
  Downl



## Training a model

We use `sagemaker.estimator.Estimator` object to perform model training.

Note that the `Estimator` object is parameterized with the image ARN (resource name), a role and session (passed down from the role executing this notebook instance), an instance and instance count, and an output path.

The `output_path` is an interesting case. The default behavior of the various algorithms that SageMaker comes packaged with is to output a `*.tar.gz` model artifact into an S3 bucket, and this is a design pattern you are encouraged to use when using a custom image (as well) by e.g. the presence of this argument.

Our image serializes model objects itself instead of relying on SageMaker to do it for us, rendering this argument useless. However it's not wise to omit it as SageMaker will automatically create a fresh run-dependent bucket for you if you do...

**User note**: you should change `output_path` in the code cell that follows to any random S3 bucket that you own or that hasn't been claimed yet.

In [1]:
import boto3
import re

import os
import numpy as np
import pandas as pd

from sagemaker import get_execution_role
import sagemaker as sage

In [2]:
# this line of code require additional iam:GetRole permissions.
# role = get_execution_role()
# role = "arn:aws:iam::645729154151:role/krunal-sagemaker-full-access"
role = "arn:aws:iam::645729154151:role/service-role/AmazonSageMaker-ExecutionRole-20190828T170129"
sess = sage.Session()

account = sess.boto_session.client('sts').get_caller_identity()['Account']
region = sess.boto_session.region_name
image = '{}.dkr.ecr.{}.amazonaws.com/quiltdata/sagemaker-demo'.format(account, region)

Once the model is defined training is performed via `Esimator.fit`, mimicking the `scikit-learn` API.

In [4]:
clf = sage.estimator.Estimator(image,
                               role, 1, 'ml.c4.2xlarge',
                               output_path="s3://krunal-bdso-sagemaker/quilt/quilt_sagemaker_demo/model",
                               sagemaker_session=sess)

clf.fit()

2019-08-29 21:33:32 Starting - Starting the training job...
2019-08-29 21:33:33 Starting - Launching requested ML instances......
2019-08-29 21:34:37 Starting - Preparing the instances for training...
2019-08-29 21:35:33 Downloading - Downloading input data
2019-08-29 21:35:33 Training - Downloading the training image........
[31m[NbConvertApp] Converting notebook build.ipynb to notebook[0m
[31m[NbConvertApp] Executing notebook with kernel: python3[0m

2019-08-29 21:36:37 Training - Training image download completed. Training in progress.[31m2019-08-29 21:37:02.446973: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA[0m
[31m2019-08-29 21:37:02.476718: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2900060000 Hz[0m
[31m2019-08-29 21:37:02.477248: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5635788bea10 executing computations on platform 

Running this code block trains out model and deposits it in a `clf.tar.gz` file in an S3 bucket somewhere.

## Deploying a model

### Deploy a fitted model as an endpoint

In [5]:
from sagemaker.predictor import csv_serializer
predictor = clf.deploy(1, 'ml.m4.xlarge', serializer=csv_serializer)

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------

KeyboardInterrupt: 

In [None]:
# This fails because it lacks an authentication token.
# It might be possible to reconstruct the actual POST request being made.
# predictor.sagemaker_session.boto_session.get_credentials().token
# But the AWS docs are unclear about what name this hearder has.

# !curl -X "POST" -H "Content-Type: text/csv" -d @health-check-data.csv URI

In [None]:
X_test = pd.read_csv("./fashion-mnist_train.csv").head().iloc[:, 1:].values

In [None]:
sess.delete_endpoint(predictor.endpoint)

#### Deploy a pre-trained model artifact as an endpoint

In [6]:
from sagemaker import Model

In [8]:
model = Model(
    model_data='s3://krunal-bdso-sagemaker/quilt/quilt_sagemaker_demo/model/sagemaker-demo-2019-08-29-21-33-31-876/output/model.tar.gz',
    image=image,
    role=role,
    sagemaker_session=sess
)
# model.deploy(1, 'ml.c4.2xlarge')
predictor = model.deploy(1, 'ml.t2.medium')

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------*

UnexpectedStatusException: Error hosting endpoint sagemaker-demo-2019-08-29-22-52-20-738: Failed. Reason:  The primary container for production variant AllTraffic did not pass the ping health check. Please check CloudWatch logs for this endpoint..

In [8]:
predictor = sage.predictor.RealTimePredictor(
    'sagemaker-demo-2019-01-18-22-48-00-247', 
    sagemaker_session=sess, 
    content_type="text/csv")

In [7]:
inp = "\n".join([",".join(l) for l in X_test.astype('str').tolist()])

In [6]:
response = predictor.predict(inp)

In [5]:
response

In [11]:
import sagemaker as sage

# get and pass the auth role and image path, same as before
# this step is unchanged from the training script
role = "arn:aws:iam::645729154151:role/service-role/AmazonSageMaker-ExecutionRole-20190828T170129"
sess = sage.Session()
account = sess.boto_session.client('sts').get_caller_identity()['Account']
region = sess.boto_session.region_name
image = '{}.dkr.ecr.{}.amazonaws.com/quiltdata/sagemaker-demo'.format(account, region)

# create a new Model object
clf = Model(
    # insert model path below
    model_data='s3://krunal-bdso-sagemaker/quilt/quilt_sagemaker_demo/model/sagemaker-demo-2019-08-29-21-33-31-876/output/model.tar.gz',
    image=image,
    role=role,
    sagemaker_session=sess
)

# deploy it to an endpoint
# predictor = clf.deploy(1, 'ml.c4.2xlarge')
predictor = clf.deploy(1, 'ml.t2.medium')

# connect to the endpoint
predictor = sage.predictor.RealTimePredictor(
    'sagemaker-demo-[...]',  # insert model name here 
    sagemaker_session=sess, 
    content_type="text/csv"
)

-------------------------------------------------------------------------------------------------------------------------

KeyboardInterrupt: 

#### Use a model artifact to perform a batch prediction run

In order to perform a batch transform you must have a model.

In [4]:
transformer = sage.transformer.Transformer(
    base_transform_job_name='Batch-Transform',
    model_name='sagemaker-demo-2019-01-18-22-48-00-247',  # take this from a past training session
    instance_count=1,
    instance_type='ml.c4.xlarge',
    output_path='s3://quilt-example/quilt/quilt_sagemaker_demo/model',
    sagemaker_session=sess
)

In [3]:
# start the job
# note: this requires that the input data be in exactly the format expected by the model!
transformer.transform(
    's3://alpha-quilt-storage/aleksey/fashion_mnist/fashion-mnist_train.csv', 
    content_type='text/csv', 
    split_type='Line'
)

# wait until transform job is completed
transformer.wait()

In [1]:
import boto3
s3_client = boto3.resource('s3')

In [2]:
s3_client.download_file('s3://quilt-example/', 'quilt_sagemaker_demo/model/[...]')