# Deploy Scikit-Learn models in Amazon SageMaker

This example is extends the blog post [Train and host Scikit-Learn models in Amazon SageMaker](https://aws.amazon.com/blogs/machine-learning/train-and-host-scikit-learn-models-in-amazon-sagemaker-by-building-a-scikit-docker-container/) by Thomas Hughes and Morgan Du to allow for training models outside of SageMaker and only using SageMaker for the hosting of the endpoint. 

For questions please reach out to Josiah Davis davjosia@amazon.com.

### (1) Train the model

In [1]:
import pandas as pd
train = pd.read_csv('iris_train.csv', header=None)
train.head()

Unnamed: 0,0,1,2,3,4
0,setosa,5.1,3.5,1.4,0.2
1,setosa,4.9,3.0,1.4,0.2
2,setosa,4.7,3.2,1.3,0.2
3,setosa,4.6,3.1,1.5,0.2
4,setosa,5.0,3.6,1.4,0.2


In [2]:
from sklearn.ensemble import RandomForestClassifier
y_train = train.iloc[:,0]
X_train = train.iloc[:,1:]

clf = RandomForestClassifier(n_estimators=20)
clf = clf.fit(X_train, y_train)

### (2) Save the model

In [3]:
import pickle
with open('model.pkl', 'wb') as out:
    pickle.dump(clf, out)

### (4) Update the Dockerfile to install additional packages if needed.
The dockerfile here is shown below.

In [4]:
!cat Dockerfile

# Build an image that can do training and inference in SageMaker
# This is a Python 3 image that uses the nginx, gunicorn, flask stack
# for serving inferences in a stable way.

FROM ubuntu:16.04

MAINTAINER Amazon AI <sage-learner@amazon.com>

RUN apt-get -y update && apt-get install -y --no-install-recommends \
         wget \
         python3 \
         nginx \
         ca-certificates \
    && rm -rf /var/lib/apt/lists/*

# Here we get all python packages.
# Pip leaves the install caches populated which uses a significant amount of space.
# This optimization save a fair amount of space in the image, which reduces start up time.
RUN wget https://bootstrap.pypa.io/get-pip.py && python3 get-pip.py && \
    pip install scipy scikit-learn pandas flask gevent gunicorn && rm -rf /root/.cache

# Set some environment variables. PYTHONUNBUFFERED keeps Python from buffering our standard
# output stream, which means that logs can be delivered to the user quickly. PYTHONDO

### (5) Run the `deploy.sh` script from command line: 

This script is doing three things:

1. Uploading the scikit model to S3
2. Building, tagging and pushing the Docker Container to ECS
3. Creating the SageMaker endpoint

**Note 1**: the execution role must have the following policies attached:
- AmazonSageMakerFullAccess
- AmazonS3FullAccess 

**Note 2**: first time users can update the defauly s3_model_location and sagemaker_exeuction_role in the deploy.sh script to avoid having to enter it manually each time. Once these defaults have been updated they no longer need to be specified in subsequent deployments.

`./deploy.sh <image_name> [<s3_model_location>] [<sagemaker_execution_role>]`

For example:

`./deploy.sh iris-model`

Or:

`./deploy.sh iris-model s3://sagemaker-demo-samples/iris-model/input/model.tar.gz AmazonSageMaker-ExecutionRole-20171204T150334`

In [6]:
!./deploy.sh iris-model-randomforest

a model.pkl
move: ./model.tar.gz to s3://sagemaker-demo-samples/iris-model-randomforest/input/model.tar.gz
Model uploaded to  s3://sagemaker-demo-samples/iris-model-randomforest/input/model.tar.gz
Login Succeeded
Sending build context to Docker daemon  102.4kB
Step 1/11 : FROM ubuntu:16.04
 ---> 0b1edfbffd27
Step 2/11 : MAINTAINER Amazon AI <sage-learner@amazon.com>
 ---> Using cache
 ---> 0b5849031ec8
Step 3/11 : RUN apt-get -y update && apt-get install -y --no-install-recommends          wget          python3          nginx          ca-certificates     && rm -rf /var/lib/apt/lists/*
 ---> Using cache
 ---> 5411fbf3c61a
Step 4/11 : RUN wget https://bootstrap.pypa.io/get-pip.py && python3 get-pip.py &&     pip install scipy scikit-learn pandas flask gevent gunicorn && rm -rf /root/.cache
 ---> Using cache
 ---> a3b829cbce12
Step 5/11 : ENV PYTHONUNBUFFERED=TRUE
 ---> Using cache
 ---> f173a55c7809
Step 6/11 : ENV PYTHONDONTWRITEBYTECODE=TRUE
 ---> Using cache
 ---> 91c3f5da4217
Step 7/

In [7]:
!aws sagemaker list-endpoints --name-contains iris-model-randomforest-20180829-105236

{
    "Endpoints": [
        {
            "EndpointName": "iris-model-randomforest-20180829-105236",
            "EndpointArn": "arn:aws:sagemaker:us-east-1:216321755658:endpoint/iris-model-randomforest-20180829-105236",
            "CreationTime": 1535557960.087,
            "LastModifiedTime": 1535557960.087,
            "EndpointStatus": "Creating"
        }
    ]
}


View the created objects in the management console:
- [model](https://console.aws.amazon.com/sagemaker/home?region=us-east-1#/models)
- [endpoint configuration](https://console.aws.amazon.com/sagemaker/home?region=us-east-1#/endpointConfig)
- [configuration](https://console.aws.amazon.com/sagemaker/home?region=us-east-1#/endpoints) 

### (6) Predict new observations

Note that for the new predictions there is no target variable and the features are formatted exactly as before.

In [8]:
iris_predict = pd.read_csv('iris_predict.csv', header=None)
iris_predict.head()

Unnamed: 0,0,1,2,3
0,5.0,3.5,1.3,0.3
1,4.5,2.3,1.3,0.3
2,4.4,3.2,1.3,0.2
3,5.0,3.5,1.6,0.6
4,5.1,3.8,1.9,0.4


In [12]:
import io, boto3
data_stream = io.StringIO()
iris_predict.to_csv(data_stream, header=None, index=None)

sess = boto3.Session()
response = sess.client('sagemaker-runtime').invoke_endpoint(
    EndpointName='iris-model-randomforest-20180829-105236', 
    Body=data_stream.getvalue(), 
    ContentType='text/csv', 
    Accept='Accept'
)
print(response['Body'].read().decode('ascii'))

setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica



### (7) Delete the endpoint

In [13]:
!aws sagemaker delete-endpoint --endpoint-name iris-model-randomforest-20180829-105236 --profile default

Alternative workflows:
- Run this jupyter notebook itself in a docker container, that way the environment, package versions will be consistent.
- Cloud Formation Template to create the IAM role for SageMaker execution that has AmazonSageMakerFullAccess and AmazonS3FullAccess both attached.