# Lab 3: Hosting & Evaluation

**Goal:**

   In this lab, we'll explore the options for hosting your model built in Lab2 on SageMaker as  well as evaluate how well our model is performing when trying to predict whether a transaction is recurring. 
   
   * **Lab Outcome:** The outcome of this lab is to demonstrate multiple hosting options as well as evaluate our model
---

# Step 1: Setup & Configure Environment

In [None]:
# Retrieve the stored variables (variables stored in Lab 2)
%store -r training_job_name
%store -r data_prefix

In [None]:
%%time

import os
import boto3
import re
import sagemaker
from sagemaker.predictor import csv_serializer    # Converts strings for HTTP POST requests on inference


role = sagemaker.get_execution_role()
region = boto3.Session().region_name
sess = sagemaker.Session()

bucket = sagemaker.Session().default_bucket()
prefix = 'workshop/hosting'

print('S3 Bucket for model artifact:', bucket)
print('S3 Prefix for model artifact:', prefix)
print('S3 Prefix for model evaluation data:', data_prefix)


# customize to your bucket where you have stored the data
bucket_path = 'https://s3-{}.amazonaws.com/{}'.format(region, bucket)

---
# Step 2: Model Hosting & Evaluation - Persistent Endpoints

In this step, we'll explore hosting our model using a real-time persistent endpoint.  We are going to show both methods for hosting including: (1) Persistent Endpoints ~AND~ (2) Batch Transform.   Although we will showcase this purely to show the hosting options on Amazon SageMaker, the choice between utilizing persistent endpoints vs batch transform should really be decided based on the use case.  

### Configure & Deploy Endpoint

Below we will be hosting a single model (build in Lab2) behind a persistent endpoint and utilizing that endpoint for inference.  

In [None]:
# Retrieve model trained in Lab2 
from sagemaker.estimator import Estimator

# attach() is a method in the SageMaker SDK that attaches to an existing training job and creates an estimator
# bound to that training job.  We are not re-training - we attaching to the training job.  If it is completed,
# we can then deploy.  In this case we already know it completed in Lab 2 and are just attaching to deploy. 
xgb = Estimator.attach(training_job_name)

Now that we've attached to the model object from Lab 2, let's setup our endpoint and deploy..

*Note: You could optionally use AWS Python SDK to deploy (create_endpoint_config, create_endpoint).   In this example we are using the [SageMaker Python SDK](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html) which abstracts the underlying calls.*

In [None]:
# Configure endpoint & deploy

xgb_predictor = xgb.deploy(initial_instance_count=1,
                           instance_type='ml.m4.xlarge')

### Model Evaluation

There are many ways to evaluate the performance of a machine learning model, but let's start by simply comparing actual to predicted values. In this case, we're trying to predicting whether a given transaction is a recurring payment (1) or not a recurring payment (0). 

First we'll need to determine how we pass data into and receive data from our endpoint. Our data is currently stored as NumPy arrays in memory of our notebook instance. To send it in an HTTP POST request, we'll serialize it as a CSV string and then decode the resulting CSV.

Note: For inference with CSV format, SageMaker XGBoost requires that the data does NOT include the target variable.

In [None]:
from sagemaker.predictor import csv_serializer    # Converts strings for HTTP POST requests on inference

xgb_predictor.content_type = 'text/csv'
xgb_predictor.serializer = csv_serializer

Now, we'll use a simple function to:

Loop over our test dataset
Split it into mini-batches of rows
Convert those mini-batches to CSV string payloads (notice, we drop the target variable from our dataset first)
Retrieve mini-batch predictions by invoking the XGBoost endpoint
Collect predictions and convert from the CSV output our model provides into a NumPy array

In [None]:
import numpy as np
import pandas as pd

def predict(data, rows=500):
    split_array = np.array_split(data, int(data.shape[0] / float(rows) + 1))
    predictions = ''
    for array in split_array:
        predictions = ','.join([predictions, xgb_predictor.predict(array).decode('utf-8')])

    return np.fromstring(predictions[1:], sep=',')

In [None]:
%store -r test_data
predictions = predict(test_data.drop(['Recurring_Label_No', 'Recurring_Label_Yes'], axis=1).as_matrix())

A Confusion Matrix is one method for evaluating your model as it allows you to visualize the accuracy of model by comparing actual and predicted values.  In this case, we are evaluating whether a transaction is a recurring payment (1) or not a recurring payment (0).  Let's create a confusion matrix to evaluate our model...

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

df_cm1 = pd.crosstab(index=test_data['Recurring_Label_Yes'], columns=np.round(predictions), rownames=['actuals'], colnames=['predictions'])
tp = df_cm1.iloc[0,0]
fp = df_cm1.iloc[0,1]
fn = df_cm1.iloc[1,0]
tn = df_cm1.iloc[1,1]
results = [tp,fp,fn,tn]

# display results
sns.heatmap(df_cm1, annot=True, fmt='d', cmap="YlGnBu").set_title('Confusion Matrix')  

Reading a Confusion Matrix: 

| Actuals/Predicted |  Prediction-RecurringNo     | Prediction-RecurringNo |
| ----- | --------- | ------|
| Actual-RecurringNo |  TRUE NEGATIVE (TN)     | FALSE POSITIVE (FP) |
| Actual-RecurringYes | FALSE NEGATIVE (FN) | TRUE POSITIVE (TP)    |


Let's use the values above to calculate some additional metrics for evaluation of the model.  There are many packages/libraries you can use to automatically calculate metrics; however, we are going to create a function to manually calculate metrics.  

In [None]:
def get_metrics(results):
   #Manually calculate metrics

   precision = (tp/(tp+fp))
   print('Precision is: ', precision)
   recall = (tp/(tp+fn))
   print('Recall is: ', recall)
   accuracy = ((tp + tn)/(tp+fp+fn+tn))
   print('Accuracy is: ', accuracy)
   f1 = (2*(precision*recall)/(precision+recall))
   print('F1 Score is: ', f1)
   fpr = (fp/(fp+tn))
   print('False Positive rate is:', fpr)
   tpr = (tp/(tp+fn))
   print('True Positive rate is:', tpr)

In [None]:
get_metrics(results)

An important point here is that because of the *np.round()* function above we are using a simple threshold (or cutoff) of 0.5. Our predictions from xgboost with binary:logistic as the objective come out as continuous values between 0 and 1 and we force them into the binary classes that we began with. However, because identifying a transaction as recurring when it is not may result a negative customer experience by sending customer unnessary communication we may want to adjust the threshold. 

Let's first look at the continuous values of our predictions..

In [None]:
plt.hist(predictions)
plt.show()

The continuous value predictions coming from our model tend to skew toward 0 (Not Recurring), which is expected given our use case.  However, there are some values in the 70-80 range that may be lower than acceptable to ensure we are not unnecessarily sending customer communication on recurring transactions.  Let's adjust the cutoff from .5 to .9 to lower our false positives.

In [None]:
#pd.crosstab(index=test_data.iloc[:, 0], columns=np.where(predictions > 0.3, 1, 0))

df_cm2 = pd.crosstab(index=test_data['Recurring_Label_Yes'], columns=np.where(predictions > 0.9,1,0), rownames=['actuals'], colnames=['predictions'])
tp = df_cm2.iloc[0,0]
fp = df_cm2.iloc[0,1]
fn = df_cm2.iloc[1,0]
tn = df_cm2.iloc[1,1]
results = [tp,fp,fn,tn]

# display results
sns.heatmap(df_cm2, annot=True, fmt='d', cmap="YlGnBu").set_title('Confusion Matrix')  

Our false positives have been reduced, so let's take a second look at our metrics with the new boundary set to .8...

In [None]:
get_metrics(results)

### View Endpoint Configuration & Endpoint in SageMaker Console 

You will see the endpoint created in the output above; however, you can also view your endpoint configuration as well as your endpoint from the SageMaker console as well.  

* Click [HERE](https://console.aws.amazon.com/sagemaker/home?region=us-east-1#/endpointConfig/) to view your endpoint configuration from the console. 
* Click [HERE](https://console.aws.amazon.com/sagemaker/home?region=us-east-1#/endpointConfig/) to view your endpoint from the console. You can also scroll down to monitor and evaluate system metrics (CPU/Memory/Disk) to help in future right sizing of hosting instances for cost optimization and performance.  

# Step 3: Model Hosting & Evaluation - Batch Transform

In the previous step we created a persistent endpoint that we could use to make real-time predictions.  With real-time endpoints, they stay up and running until you shut them down.  You can view the endpoint created above in the SageMaker service console under [**Inference --> Endpoints**](https://console.aws.amazon.com/sagemaker/home?region=us-east-1#/endpoints).  There are many use cases, such as forecasting, where you do not need a persistent endpoint because you are submitting batch prediction and obtaining batch results on an ad-hoc or scheduled basis. In these cases, [Batch Transform](https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform.html) is a cost effective option. 

### Batch Transform Setup
In this step, we'll explore hosting our model using batch transform. With batch transform, we will send in batch predictions and receive batch results.  For demonstration, we will utilize the same data formatting performed above for test_data. 

In [None]:
# Batch transform expects prediction input to be in S3. 
# So we need to load the same data used in our inference above to S3 as a single batch prediction *.csv.
bucket = sagemaker.Session().default_bucket()

df = pd.DataFrame(test_data)

#Drop labels
df.drop(labels=['Recurring_Label_No', 'Recurring_Label_Yes'], axis=1, inplace=True)
df.to_csv('./test.csv',header=False, index=False, )
#predictions = predict(test_data.drop(['Recurring_Label_No', 'Recurring_Label_Yes'], axis=1).as_matrix())

#Upload to S3
data_prefix = 'workshop/data'
s3_test_data = boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(data_prefix, 'test/test.csv')).upload_file('./test.csv')

# S3 bucket for saving batch prediction results
batch_output_prefix = 'workshop/batch-predictions-out'
from sagemaker.content_types import CONTENT_TYPE_CSV, CONTENT_TYPE_JSON

In [None]:
!cat test.csv

In [None]:
# Converting datetime object to string
from datetime import datetime
dateTimeObj = datetime.now() 
timestampStr = dateTimeObj.strftime("%d%m%Y-%H%M%S%f")
job_name = 'sagemaker-xgboost-workshop-' +  timestampStr
model_name = training_job_name

output_data_path = 's3://{}/{}'.format(bucket, batch_output_prefix)
batch_prediction_data  = 's3://{}/{}/test/test.csv'.format(bucket, data_prefix)
job_name = 'serial-inference-batch-' + timestampStr

transformer = sagemaker.transformer.Transformer(
    model_name = model_name,
    instance_count = 1,
    instance_type = 'ml.m4.xlarge',
    strategy = 'SingleRecord',
    assemble_with = 'Line',
    output_path = output_data_path,
    base_transform_job_name='serial-inference-batch',
    sagemaker_session=sess,
    accept = CONTENT_TYPE_CSV
)
transformer.transform(data = batch_prediction_data,
                      job_name = job_name,
                      content_type = CONTENT_TYPE_CSV,
                      join_source='Input',
                      split_type = 'Line')
transformer.wait()

### Check Output Results

After the transform job above is done, download the output from the S3 location we specified on *output_path*. Because we specified *join_source='Input'*  above, our output will include both the data sent for inference as well as the prediction result.  Without that parameter, only the prediction result is returned.

In [None]:
# Download the output data from S3 to local filesystem
batch_output = transformer.output_path
!mkdir -p batch_data/output
!aws s3 cp --recursive $batch_output/ batch_data/output/

# Head to see what the batch output looks like
!head batch_data/output/*

# Congratulations - You've completed Lab 3

In this lab we explored hosting our model using both a real-time endpoint as well as using batch transform.  We also evaluated the model for success against the ML problem we are trying to solve