# Chapter 05. Model Deployment and Inference
---

## 5.1 Deploying a model for Batch Inference using SageMaker Batch Transform
---

In [1]:
# Change into the notebooks directory.
#%cd /root/sagemaker-course/notebooks/

/root/sagemaker-course/speaker


In [2]:
import boto3
import pandas as pd
import sagemaker
from sagemaker import get_execution_role

# S3 bucket information
BUCKET = 'sagemaker-course-20200517'
PREFIX = 'churn'
LOCAL_DATA_DIRECTORY = f'../data/{PREFIX}'
print(f"Artifacts will be written to s3://{BUCKET}/{PREFIX}")

# Session variables we'll use throughout the notebook
sagemaker_session = sagemaker.Session()
boto_session = sagemaker_session.boto_session
sagemaker_client = boto_session.client('sagemaker')
role = get_execution_role()
print(f'Role ARN: {role}')

Artifacts will be written to s3://sagemaker-course-20200517/churn
Role ARN: arn:aws:iam::209970524256:role/service-role/AmazonSageMaker-ExecutionRole-20200414T065516


#### Step 1. Prepare Input Data


```
An example of input file content:
                Record1-Attribute1, Record1-Attribute2, Record1-Attribute3, ..., Record1-AttributeM
                Record2-Attribute1, Record2-Attribute2, Record2-Attribute3, ..., Record2-AttributeM
                Record3-Attribute1, Record3-Attribute2, Record3-Attribute3, ..., Record3-AttributeM
                ...
                RecordN-Attribute1, RecordN-Attribute2, RecordN-Attribute3, ..., RecordN-AttributeM
```         

In [7]:
#!head -5 ../data/churn/test-batch.csv

186,0,137.8,97,187.7,118,146.4,85,8.7,6,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,1,0
132,25,113.2,96,269.9,107,229.1,87,7.1,7,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1
112,17,183.2,95,252.8,125,156.7,95,9.7,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1
91,24,93.5,112,183.4,128,240.7,133,9.9,3,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,1
22,0,110.3,107,166.5,93,202.3,96,9.5,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,1,0


In [8]:
local_file_path = '../data/churn/test-batch.csv'

inference_data = sagemaker_session.upload_data(
    local_file_path,
    bucket=BUCKET,
    key_prefix=PREFIX)

inference_data

's3://sagemaker-course-20200517/churn/test-batch.csv'

#### Step 2. Initialize a `Tranformer` object

In [None]:
# See the `sagemaker.model.Model` [API reference](https://sagemaker.readthedocs.io/en/stable/model.html) for more details.
from sagemaker import model

In [13]:
model_data = 's3://sagemaker-course-20200517/churn/builtin-xgboost-2020-05-17-13-59-20-155/output/model.tar.gz'
image = '257758044811.dkr.ecr.us-east-2.amazonaws.com/sagemaker-xgboost:0.90-2-cpu-py3'

churn_model = model.Model(model_data=model_data,
                          image_uri=image,
                          role=role)

See the `sagemaker.transformer.Transformer` [API reference](https://sagemaker.readthedocs.io/en/stable/transformer.html) for more details.

In [14]:
churn_transformer = churn_model.transformer(instance_count=1,
                                            instance_type='ml.m4.xlarge',
                                            strategy='MultiRecord',
                                            assemble_with='Line',
                                            output_path=f"s3://{BUCKET}/{PREFIX}/transform")

### Step 3. Run Transform Job 

**NOTE**: This takes about 3-5 minutes to return.

In [16]:
# Start a transform job and wait for it to finish
churn_transformer.transform(data=inference_data,
                            content_type='text/csv',
                            split_type='Line')

print('Waiting for transform job: ' + churn_transformer.latest_transform_job.job_name)
churn_transformer.wait()

Waiting for transform job: sagemaker-xgboost-2020-05-19-15-14-19-8-2020-05-19-15-20-35-923
.....................[34m[2020-05-19 15:23:50 +0000] [16] [INFO] Starting gunicorn 19.10.0[0m
[34m[2020-05-19 15:23:50 +0000] [16] [INFO] Listening at: unix:/tmp/gunicorn.sock (16)[0m
[34m[2020-05-19 15:23:50 +0000] [16] [INFO] Using worker: gevent[0m
[34m[2020-05-19 15:23:50 +0000] [23] [INFO] Booting worker with pid: 23[0m
[34m[2020-05-19 15:23:50 +0000] [24] [INFO] Booting worker with pid: 24[0m
[34m[2020-05-19 15:23:50 +0000] [25] [INFO] Booting worker with pid: 25[0m
[34m[2020-05-19 15:23:50 +0000] [29] [INFO] Booting worker with pid: 29[0m
[34m[2020-05-19:15:24:08:INFO] No GPUs detected (normal if no gpus installed)[0m
[34m169.254.255.130 - - [19/May/2020:15:24:08 +0000] "GET /ping HTTP/1.1" 200 0 "-" "Go-http-client/1.1"[0m
[34m[2020-05-19:15:24:08:INFO] No GPUs detected (normal if no gpus installed)[0m
[34m169.254.255.130 - - [19/May/2020:15:24:08 +0000] "GET /executi

In [17]:
churn_transformer.output_path

's3://sagemaker-course-20200517/churn/transform'

In [20]:
# List files in S3 bucket
s3_client = boto_session.client('s3')
s3_client.list_objects(Bucket = BUCKET, Prefix = f'{PREFIX}/transform')

{'ResponseMetadata': {'RequestId': '36A771A0516F41EB',
  'HostId': '439fsc6wz+7T2Ropf0f6yE8YaGFOFVOHuyHSI2YO96oq/B/cmkZlR4OVut673TrSlwkEgISTwU4=',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amz-id-2': '439fsc6wz+7T2Ropf0f6yE8YaGFOFVOHuyHSI2YO96oq/B/cmkZlR4OVut673TrSlwkEgISTwU4=',
   'x-amz-request-id': '36A771A0516F41EB',
   'date': 'Tue, 19 May 2020 16:21:59 GMT',
   'x-amz-bucket-region': 'us-east-2',
   'content-type': 'application/xml',
   'transfer-encoding': 'chunked',
   'server': 'AmazonS3'},
  'RetryAttempts': 0},
 'IsTruncated': False,
 'Marker': '',
 'Contents': [{'Key': 'churn/transform/test-batch.csv.out',
   'LastModified': datetime.datetime(2020, 5, 19, 15, 24, 9, tzinfo=tzlocal()),
   'ETag': '"e4e5cc7367fa2e3e835ebb4ee7c3d59f"',
   'Size': 6793,
   'StorageClass': 'STANDARD'}],
 'Name': 'sagemaker-course-20200517',
 'Prefix': 'churn/transform',
 'MaxKeys': 1000,
 'EncodingType': 'url'}

In [21]:
# Download the output data from S3 to local filesystem
s3_client.download_file(
    Bucket=BUCKET,
    Key=f"{PREFIX}/transform/test-batch.csv.out",
    Filename=f'{LOCAL_DATA_DIRECTORY}/test-batch.csv.out')

In [24]:
#!head -5 ../data/churn/test-batch.csv.out

0.010853796266019344
0.005068291909992695
0.008791499771177769
0.16663919389247894
0.004287515766918659


## 5.2 Creating a SageMaker Endpoint for Online Inference
---

See the `sagemaker.sklearn.SKLearnModel` [API reference](https://sagemaker.readthedocs.io/en/stable/sagemaker.sklearn.html#sagemaker.sklearn.model.SKLearnModel) for more details.

In [4]:
from sagemaker import sklearn

In [5]:
model_data = 's3://sagemaker-course-20200517/churn/custom-code-sklearn-2020-05-17-14-49-12-285/output/model.tar.gz'

sklearn_model = sklearn.SKLearnModel(framework_version='0.20.0',
                                     model_data=model_data, 
                                     role=role, entry_point='../scripts/sklearn/sklearn_rf.py')

See the `sagemaker.predictor.RealTimePredictor` [API reference](https://sagemaker.readthedocs.io/en/stable/predictors.html) for more details.

**NOTE: This takes about 6-8 minutes to return.**

In [6]:
sklearn_predictor = sklearn_model.deploy(initial_instance_count=1,
                                         instance_type="ml.m4.xlarge")

-------------!

In [28]:
type(sklearn_predictor)

sagemaker.sklearn.model.SKLearnPredictor

In [25]:
df = pd.read_csv(f'{LOCAL_DATA_DIRECTORY}/test-dataset.csv', header=None)

# Remove first column which contains labels
X = df.drop(labels=0, axis=1)
X.head()

Unnamed: 0,1,2,3,4,5,6,7,8,9,10,...,60,61,62,63,64,65,66,67,68,69
0,186,0,137.8,97,187.7,118,146.4,85,8.7,6,...,0,0,0,0,0,1,1,0,1,0
1,132,25,113.2,96,269.9,107,229.1,87,7.1,7,...,0,0,0,0,1,0,1,0,0,1
2,112,17,183.2,95,252.8,125,156.7,95,9.7,3,...,0,0,0,0,1,0,1,0,0,1
3,91,24,93.5,112,183.4,128,240.7,133,9.9,3,...,0,0,0,0,0,1,0,1,0,1
4,22,0,110.3,107,166.5,93,202.3,96,9.5,5,...,0,0,0,1,0,0,1,0,1,0


In [29]:
preds = sklearn_predictor.predict(X)
print(preds)

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0
 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0
 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0]


## 5.3 Configure Autoscaling for a Hosted Endpoint.

**To configure autoscaling for a model using the console**:

1. Open the Amazon SageMaker console at https://console.aws.amazon.com/sagemaker/.
2. In the navigation pane, choose **Endpoints**.
3. Choose the endpoint that you want to configure.
4. Under the **Endpoint runtime settings** heading, select the radio button corresponding to the model variant that you want to configure and click **Configure autoscaling**. The **Configure variant automatic scaling** page appears.
5. The **Variant automatic scaling** section lets us configure the min/max number of instances.
    * For **Minimum instance count**, type the minimum number of instances that you want the scaling policy to maintain. At least 1 instance is required.
    * For **Maximum instance count**, type the maximum number of instances that you want the scaling policy to maintain.
6. The **Built-in scaling policy** section lets us configure the conditions under which to scale the instances.
    * For the **Target value**, type the average number of invocations per instance per minute for the model. To determine this value, follow the guidelines in Load testing. Application Auto Scaling adds or removes instances to keep the metric close to the value that you specify.
    * For **Scale-in cool down** and **Scale-out cool down** type the number seconds for each cool down period. Assuming that the order in the list is based on either most important to less important of first applied to last applied.
    * Select **Disable scale in** to prevent the scaling policy from deleting variant instances if you want to ensure that your variant scales out to address increased traffic, but are not concerned with removing instances to reduce costs when traffic decreases. Scale-out activities are always enabled so that the scaling policy can create endpoint instances as needed.
7. Choose **Save**.

More details, including how to define a custom scaling policy, can be found in the [Developer Guide](https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-auto-scaling-add-console.html).

### Endpoint cleanup 

In [30]:
churn_model.delete_model()

sklearn_predictor.delete_endpoint()
sklearn_model.delete_model()