# 4. Performing Inference with a Trained Model

* Goals
    * Using a model trained on Day 1:
        * Host the model as an **endpoint** for online inference.
        * Use **Batch Transform** to perform batch inference using the model.
        * Demonstrate how to enable autoscaling.
* Code adapted from the [scikit_learn_iris](https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker-python-sdk/scikit_learn_iris) and [batch_transform_pca_dbscan_movie_clusters](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker_batch_transform/introduction_to_batch_transform/batch_transform_pca_dbscan_movie_clusters.ipynb) sample notebooks.

---
## 1. Setup

Change into the notebooks directory.

In [1]:
%cd /root/sagemaker-workshop-420/notebooks

/root/sagemaker-workshop-420/notebooks


In [2]:
import boto3
import numpy as np
import pandas as pd
import sagemaker
from sagemaker import get_execution_role
from sagemaker.sklearn import SKLearnModel
from sklearn import datasets

First, lets create our Sagemaker session and role.

In [3]:
boto_session = boto3.Session()
region = boto_session.region_name
sagemaker_session = sagemaker.Session()
role = get_execution_role()
print(role)

arn:aws:iam::209970524256:role/service-role/AmazonSageMaker-ExecutionRole-20200414T065516


## 2. Host a pretrained Sklearn Model for Online Inference

We will deploy use the sklearn model we trained on the Iris dataset as a hosted endpoint for online inference.

In [31]:
BUCKET = 'sagemaker-workshop-420'
PREFIX = 'iris'

LOCAL_DATA_DIRECTORY = f'../data/{PREFIX}'

print(f"\nArtifacts will be written to/read from s3://{BUCKET}/{PREFIX}")


Artifacts will be written to/read from s3://sagemaker-workshop-420/iris


### Step 1. Locate the S3 path of the serialized model

To utilize a trained model, we need to pass in the S3 URI of the serialized model artifact. We can find this by looking through the metadata of the Training Job. This can be done in the SageMaker Studio UI or in the AWS SageMaker console under Training Jobs.

In [23]:
model_s3_path = 's3://sagemaker-workshop-420/iris/sagemaker-scikit-learn-2020-04-15-22-35-55-023/output/model.tar.gz'

### Step 2. Initialize a `sklearn.model.Model` that can be deployed as an `Endpoint`

Next we create a `sagemaker.sklearn.SKLearnModel` object which allows us to deploy our pretrained model as an `Endpoint`. See the `sagemaker.sklearn.SKLearnModel` [API reference](https://sagemaker.readthedocs.io/en/stable/sagemaker.sklearn.html#sagemaker.sklearn.model.SKLearnModel) for more details.

In [24]:
sklearn_model = SKLearnModel(model_data=model_s3_path,
                             role=role, 
                             entry_point='../scripts/sklearn_iris.py')

### Step 3. Deploy the `Model` as an `Endpoint` and return a `RealtimePredictor` object.

See the `sagemaker.predictor.RealTimePredictor` [API reference](https://sagemaker.readthedocs.io/en/stable/predictors.html) for more details.

**NOTE: This takes about 6-8 minutes to return.**

In [22]:
sklearn_predictor = sklearn_model.deploy(initial_instance_count=1,
                                         instance_type="ml.m4.xlarge")

-------------!

### Step 4. Perform Inference

Next we can load in data and use our hosted endpoint for inference.

In [29]:
# Load Iris dataset
iris = datasets.load_iris()
iris.data[:5]

array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2]])

In [26]:
iris_preds = sklearn_predictor.predict(iris.data)

In [27]:
print(iris_preds)

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.
 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.
 2. 2. 2. 2. 2. 2.]


### Configure Autoscaling

**To configure autoscaling for a model using the console**

1. Open the Amazon SageMaker console at https://console.aws.amazon.com/sagemaker/.
2. In the navigation pane, choose **Endpoints**.
3. Choose the endpoint that you want to configure.
4. For **Endpoint runtime settings**, choose the model variant that you want to configure.
5. For **Endpoint runtime settings**, choose **Configure autoscaling**. The **Configure variant automatic scaling** page appears.
6. For **Minimum capacity**, type the minimum number of instances that you want the scaling policy to maintain. At least 1 instance is required.
7. For **Maximum capacity**, type the maximum number of instances that you want the scaling policy to maintain.
8. For the **target value**, type the average number of invocations per instance per minute for the model. To determine this value, follow the guidelines in Load testing. Application Auto Scaling adds or removes instances to keep the metric close to the value that you specify.
9. For **Scale-in cool down (seconds)** and **Scale-out cool down (seconds)**, type the number seconds for each cool down period. Assuming that the order in the list is based on either most important to less important of first applied to last applied.
10. Select **Disable scale in** to prevent the scaling policy from deleting variant instances if you want to ensure that your variant scales out to address increased traffic, but are not concerned with removing instances to reduce costs when traffic decreases, disable scale-in activities. Scale-out activities are always enabled so that the scaling policy can create endpoint instances as needed.
11. Choose **Save**.

More details, including how to define a custom scaling policy, can be found in the [Developer Guide](https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-auto-scaling-add-console.html).

### Endpoint cleanup <a class="anchor" id="endpoint_cleanup"></a>

When you're done with the endpoint, you'll want to clean it up.

In [15]:
sklearn_predictor.delete_endpoint()

## 3. Perform batch inference using SageMaker Batch Transform 

We can also use the trained model for asynchronous batch inference on S3 data using SageMaker Batch Transform.

### Step 1. Prepare Input Data

We will use the training data file we uploaded yesterday for inference.

```
An example of input file content:
                Record1-Attribute1, Record1-Attribute2, Record1-Attribute3, ..., Record1-AttributeM
                Record2-Attribute1, Record2-Attribute2, Record2-Attribute3, ..., Record2-AttributeM
                Record3-Attribute1, Record3-Attribute2, Record3-Attribute3, ..., Record3-AttributeM
                ...
                RecordN-Attribute1, RecordN-Attribute2, RecordN-Attribute3, ..., RecordN-AttributeM
```         

In [32]:
iris_X = pd.DataFrame(iris.data)

local_file_path = f'{LOCAL_DATA_DIRECTORY}/iris_batch_inference.csv'
iris_X.to_csv(local_file_path, header=False, index=False)

print(local_file_path)

../data/iris/iris_batch_inference.csv


In [33]:
!head ../data/iris/iris_batch_inference.csv

5.1,3.5,1.4,0.2
4.9,3.0,1.4,0.2
4.7,3.2,1.3,0.2
4.6,3.1,1.5,0.2
5.0,3.6,1.4,0.2
5.4,3.9,1.7,0.4
4.6,3.4,1.4,0.3
5.0,3.4,1.5,0.2
4.4,2.9,1.4,0.2
4.9,3.1,1.5,0.1


In [34]:
inference_data = sagemaker_session.upload_data(
    local_file_path,
    bucket=BUCKET,
    key_prefix=PREFIX)

inference_data

's3://sagemaker-workshop-420/iris/iris_batch_inference.csv'

### Step 2. Initialize a `Tranformer` object

See the `sagemaker.transformer.Transformer` [API reference](https://sagemaker.readthedocs.io/en/stable/transformer.html) for more details.

* Hardware specification (instance count and type). Prediction is embarassingly parallel, so feel free to test this with multiple instances, but since our dataset is not enormous, we'll stick to one.
* `strategy`: Which determines how records should be batched into each prediction request within the batch transform job. 'MultiRecord' may be faster, but some use cases may require 'SingleRecord'.
* `assemble_with`: Which controls how predictions are output. 'None' does not perform any special processing, 'Line' places each prediction on it's own line.
* `output_path`: The S3 location for batch transform to be output. Note, file(s) will be named with '.out' suffixed to the input file(s) names. In our case this will be 'train.csv.out'. Note that in this case, multiple batch transform runs will overwrite existing values unless this is updated appropriately.

In [35]:
sklearn_transformer = sklearn_model.transformer(instance_count=1,
                                                instance_type='ml.m4.xlarge',
                                                strategy='MultiRecord',
                                                assemble_with='Line',
                                                output_path=f"s3://{BUCKET}/{PREFIX}/transform")

### Step 3. Run Transform Job 

Using the Transformer, run a transform job on the S3 input data.

A critical parameter to set properly here is `split_type`. Since we are using CSV, we'll specify 'Line', which ensures we only pass one line at a time to our algorithm for prediction. Had we not specified this, we'd attempt to pass all lines in our file, which would exhaust our transformer instance's memory.

Note: Here we pass the S3 path as input rather than input we use in .fit().

**NOTE: This takes about 3-5 minutes to return.**

In [36]:
# Start a transform job and wait for it to finish
sklearn_transformer.transform(data=inference_data,
                              content_type='text/csv',
                              split_type='Line')

print('Waiting for transform job: ' + sklearn_transformer.latest_transform_job.job_name)
sklearn_transformer.wait()

Waiting for transform job: sagemaker-scikit-learn-2020-04-16-11-29-2020-04-16-11-29-58-815
...................[34mProcessing /opt/ml/code[0m
[34mBuilding wheels for collected packages: sklearn-iris
  Building wheel for sklearn-iris (setup.py): started[0m
[34m  Building wheel for sklearn-iris (setup.py): finished with status 'done'
  Created wheel for sklearn-iris: filename=sklearn_iris-1.0.0-py2.py3-none-any.whl size=7005 sha256=779d88c57bd20a2ee7153bf24a0149d458194f70ee36bee893e4875ddd38caf9
  Stored in directory: /tmp/pip-ephem-wheel-cache-jby6p02t/wheels/35/24/16/37574d11bf9bde50616c67372a334f94fa8356bc7164af8ca3[0m
[34mSuccessfully built sklearn-iris[0m
[34mInstalling collected packages: sklearn-iris[0m
[34mSuccessfully installed sklearn-iris-1.0.0[0m
  import imp[0m
[34m[2020-04-16 11:32:57 +0000] [38] [INFO] Starting gunicorn 19.9.0[0m
[34m[2020-04-16 11:32:57 +0000] [38] [INFO] Listening at: unix:/tmp/gunicorn.sock (38)[0m
[34m[2020-04-16 11:32:57 +0000] [38] [

### Step 4. Check Output Data

After the transform job has completed, download the output data from S3. For each file **FILENAME** in the input data, we have a corresponding file **FILENAME.out** containing the predicted labels from each input row. We can compare the predicted labels to the true labels saved earlier.

In [37]:
batch_output = sklearn_transformer.output_path
batch_output

's3://sagemaker-workshop-420/iris/transform'

In [38]:
# Download the output data from S3 to local filesystem
boto_session.client('s3').download_file(
    Bucket=BUCKET,
    Key=f"{PREFIX}/transform/iris_batch_inference.csv.out",
    Filename=f'{LOCAL_DATA_DIRECTORY}/iris_batch_inference.csv.out')

In [39]:
f'{LOCAL_DATA_DIRECTORY}/iris_batch_inference.csv.out'

'../data/iris/iris_batch_inference.csv.out'

In [40]:
!cat ../data/iris/iris_batch_inference.csv.out

[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0]
