# Amazon SageMaker Customer Churn Model Training
_**Using Gradient Boosted Trees to Predict Mobile Customer Departure**_

---


## Contents

1. [Background](#Background) - Predict customer churn with XGBoost
1. [Data](#Data) - Prep the dataset and upload it to Amazon S3
1. [Train](#Train) - Train with the Amazon SageMaker XGBoost algorithm
  - Trial 1 - XGBoost in algorithm mode
  - Trial 2 - XGBoost in framework mode
1. [Host](#Host)


---

## Background

This notebook has been adapted from an [AWS blog post](https://aws.amazon.com/blogs/ai/predicting-customer-churn-with-amazon-machine-learning/). 

Losing customers is costly for any business.  Identifying unhappy customers early on gives you a chance to offer them incentives to stay.  This notebook describes using machine learning (ML) for automated identification of unhappy customers, also known as customer churn prediction. It uses Amazon SageMaker features for managing experiments, training and debugging the model, and monitoring it after it has been deployed. 

Import the Python libraries we'll need for this walkthrough.

In [13]:
import time
start_time = time.time()

In [14]:
import sys
!{sys.executable} -m pip install -qU awscli boto3 "sagemaker>=1.71.0,<2.0.0"
!{sys.executable} -m pip install sagemaker-experiments

You should consider upgrading via the '/home/ec2-user/anaconda3/envs/amazonei_mxnet_p36/bin/python -m pip install --upgrade pip' command.[0m
You should consider upgrading via the '/home/ec2-user/anaconda3/envs/amazonei_mxnet_p36/bin/python -m pip install --upgrade pip' command.[0m


In [15]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import io
import os
import sys
import time
import json
from IPython.display import display
from time import strftime, gmtime
import boto3
import re


import sagemaker
from sagemaker import get_execution_role
from sagemaker.predictor import csv_serializer
from sagemaker.debugger import rule_configs, Rule, DebuggerHookConfig
from sagemaker.model_monitor import DataCaptureConfig, DatasetFormat, DefaultModelMonitor
from sagemaker.s3 import S3Uploader, S3Downloader

from smexperiments.experiment import Experiment
from smexperiments.trial import Trial
from smexperiments.trial_component import TrialComponent
from smexperiments.tracker import Tracker

In [16]:
sess = boto3.Session()
sm = sess.client('sagemaker')
role = sagemaker.get_execution_role()

---
## Data

Mobile operators have historical records that tell them which customers ended up churning and which continued using the service. We can use this historical information to train an ML model that can predict customer churn. After training the model, we can pass the profile information of an arbitrary customer (the same profile information that we used to train the model) to the model to have the model predict whether this customer will churn. 

The dataset we use is publicly available and was mentioned in [Discovering Knowledge in Data](https://www.amazon.com/dp/0470908742/) by Daniel T. Larose. It is attributed by the author to the University of California Irvine Repository of Machine Learning Datasets. The `data` folder that came with this notebook contains the dataset, which we've already preprocessed for this walkthrough. The dataset has been split into training and validation sets. To see how the dataset was preprocessed, see this notebook: [XGBoost customer churn notebook that starts with the original dataset](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_applying_machine_learning/xgboost_customer_churn/xgboost_customer_churn.ipynb). 

We'll train on a .csv file without the header. But for now, the following cell uses `pandas` to load some of the data from a version of the training data that has a header. 

Explore the data to see the dataset's features and the data that will be used to train the model.

In [18]:
# Set the path we can find the data files that go with this notebook
!wget https://raw.githubusercontent.com/tom5610/sagemaker-immersion-day/master/notebooks/customer_churn/data/training-dataset-with-header.csv

%cd /home/ec2-user/SageMaker
local_data_path = './training-dataset-with-header.csv'
data = pd.read_csv(local_data_path)
pd.set_option('display.max_columns', 500)     # Make sure we can see all of the columns
pd.set_option('display.max_rows', 10)         # Keep the output on one page
data

--2020-08-27 09:12:01--  https://raw.githubusercontent.com/tom5610/sagemaker-immersion-day/master/notebooks/customer_churn/data/training-dataset-with-header.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.28.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.28.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 375873 (367K) [text/plain]
Saving to: ‘training-dataset-with-header.csv.4’


2020-08-27 09:12:01 (30.7 MB/s) - ‘training-dataset-with-header.csv.4’ saved [375873/375873]

/home/ec2-user/SageMaker


Unnamed: 0,Churn,Account Length,VMail Message,Day Mins,Day Calls,Eve Mins,Eve Calls,Night Mins,Night Calls,Intl Mins,Intl Calls,CustServ Calls,State_AK,State_AL,State_AR,State_AZ,State_CA,State_CO,State_CT,State_DC,State_DE,State_FL,State_GA,State_HI,State_IA,State_ID,State_IL,State_IN,State_KS,State_KY,State_LA,State_MA,State_MD,State_ME,State_MI,State_MN,State_MO,State_MS,State_MT,State_NC,State_ND,State_NE,State_NH,State_NJ,State_NM,State_NV,State_NY,State_OH,State_OK,State_OR,State_PA,State_RI,State_SC,State_SD,State_TN,State_TX,State_UT,State_VA,State_VT,State_WA,State_WI,State_WV,State_WY,Area Code_408,Area Code_415,Area Code_510,Int'l Plan_no,Int'l Plan_yes,VMail Plan_no,VMail Plan_yes
0,0,106,0,274.4,120,198.6,82,160.8,62,6.0,3,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,1,0
1,0,28,0,187.8,94,248.6,86,208.8,124,10.6,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,1,0,1,0
2,1,148,0,279.3,104,201.6,87,280.8,99,7.9,2,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,1,0
3,0,132,0,191.9,107,206.9,127,272.0,88,12.6,2,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,1,0
4,0,92,29,155.4,110,188.5,104,254.9,118,8.0,4,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2328,0,106,0,194.8,133,213.4,73,190.8,92,11.5,7,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,1,0,1,0
2329,1,125,0,143.2,80,88.1,94,233.2,135,8.8,7,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,1,0
2330,0,129,0,143.7,114,297.8,98,212.6,86,11.4,8,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,1,0,1,0
2331,0,159,0,198.8,107,195.5,91,213.3,120,16.5,7,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,1,0


Now we'll upload the files to S3 for training but first we will create an S3 bucket for the data if one does not already exist.

In [19]:
# download training and validation data
!wget https://raw.githubusercontent.com/JerryChenZeyun/sagemaker-immersion-day/master/notebooks/customer_churn/data/train.csv
!wget https://raw.githubusercontent.com/JerryChenZeyun/sagemaker-immersion-day/master/notebooks/customer_churn/data/validation.csv
    

--2020-08-27 09:12:06--  https://raw.githubusercontent.com/JerryChenZeyun/sagemaker-immersion-day/master/notebooks/customer_churn/data/train.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.28.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.28.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 375182 (366K) [text/plain]
Saving to: ‘train.csv.3’


2020-08-27 09:12:07 (27.5 MB/s) - ‘train.csv.3’ saved [375182/375182]

--2020-08-27 09:12:07--  https://raw.githubusercontent.com/JerryChenZeyun/sagemaker-immersion-day/master/notebooks/customer_churn/data/validation.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.28.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.28.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 107098 (105K) [text/plain]
Saving to: ‘validation.csv.3’


2020-08-27 09:12:08 (26.2 MB/s) - ‘validati

In [20]:
account_id = sess.client('sts', region_name=sess.region_name).get_caller_identity()["Account"]
bucket = 'sagemaker-notebook-{}-{}'.format(sess.region_name, account_id)
prefix = 'xgboost-churn'

try:
    if sess.region_name == "us-east-1":
        sess.client('s3').create_bucket(Bucket=bucket)
    else:
        sess.client('s3').create_bucket(Bucket=bucket, 
                                        CreateBucketConfiguration={'LocationConstraint': sess.region_name})
except Exception as e:
    print("Looks like you already have a bucket of this name. That's good. Uploading the data files...")

# Return the URLs of the uploaded file, so they can be reviewed or used elsewhere
s3url = S3Uploader.upload('./train.csv', 's3://{}/{}/{}'.format(bucket, prefix,'train'))
print(s3url)
s3url = S3Uploader.upload('./validation.csv', 's3://{}/{}/{}'.format(bucket, prefix,'validation'))
print(s3url)

Looks like you already have a bucket of this name. That's good. Uploading the data files...
s3://sagemaker-studio-ap-southeast-2-910003606845/xgboost-churn/train/train.csv
s3://sagemaker-studio-ap-southeast-2-910003606845/xgboost-churn/validation/validation.csv


---
## Train

We'll use the XGBoost library to train a class of models known as gradient boosted decision trees on the data that we just uploaded. 

Because we're using XGBoost, we first need to specify the locations of the XGBoost algorithm containers.

In [21]:
from sagemaker.amazon.amazon_estimator import get_image_uri
docker_image_name = get_image_uri(boto3.Session().region_name, 'xgboost', repo_version='0.90-2')

	get_image_uri(region, 'xgboost', '1.0-1').


Then, because we're training with the CSV file format, we'll create `s3_input`s that our training function can use as a pointer to the files in S3.

In [22]:
s3_input_train = sagemaker.s3_input(s3_data='s3://{}/{}/train'.format(bucket, prefix), content_type='csv')
s3_input_validation = sagemaker.s3_input(s3_data='s3://{}/{}/validation/'.format(bucket, prefix), content_type='csv')



#### Hyperparameters
Now we can specify our XGBoost hyperparameters.  Among them are these key hyperparameters:
- `max_depth` Controls how deep each tree within the algorithm can be built.  Deeper trees can lead to better fit, but are more computationally expensive and can lead to overfitting.  Typically, you need to explore some trade-offs in model performance between a large number of shallow trees and a smaller number of deeper trees.
- `subsample` Controls sampling of the training data.  This hyperparameter can help reduce overfitting, but setting it too low can also starve the model of data.
- `num_round` Controls the number of boosting rounds.  This value specifies the models that are subsequently trained using the residuals of previous iterations.  Again, more rounds should produce a better fit on the training data, but can be computationally expensive or lead to overfitting.
- `eta` Controls how aggressive each round of boosting is.  Larger values lead to more conservative boosting.
- `gamma` Controls how aggressively trees are grown.  Larger values lead to more conservative models.
- `min_child_weight` Also controls how aggresively trees are grown. Large values lead to a more conservative model.

For more information about these hyperparameters, see [XGBoost's hyperparameters GitHub page](https://github.com/dmlc/xgboost/blob/master/doc/parameter.rst).

In [23]:
hyperparams = {"max_depth":5,
               "subsample":0.8,
               "num_round":600,
               "eta":0.2,
               "gamma":4,
               "min_child_weight":6,
               "silent":0,
               "objective":'binary:logistic'}

#### Trial 1 - XGBoost in algorithm mode

For our first trial, we'll use the built-in XGBoost algorithm to train a model without supplying any additional code. This way, we can use XGBoost to train and deploy a model as we would with other Amazon SageMaker built-in algorithms.

We'll create a new `Trial` object and associate the trial with the experiment that we created earlier. To train the model, we'll create an estimator and specify a few parameters, like the type of training instances we'd like to use and how many, and where the artifacts of the trained model should be stored. 

We'll also associate the training job with the experiment trial that we just created when we call the `fit` method of the `estimator`.

In [24]:
sess = sagemaker.session.Session()

xgb = sagemaker.estimator.Estimator(image_name=docker_image_name,
                                    role=role,
                                    hyperparameters=hyperparams,
                                    train_instance_count=1, 
                                    train_instance_type='ml.m5.large',
                                    output_path='s3://{}/{}/output'.format(bucket, prefix),
                                    base_job_name="demo-xgboost-customer-churn",
                                    sagemaker_session=sess)

xgb.fit({'train': s3_input_train, 'validation': s3_input_validation})


INFO:sagemaker:Creating training-job with name: demo-xgboost-customer-churn-2020-08-27-09-12-19-755


2020-08-27 09:12:19 Starting - Starting the training job...
2020-08-27 09:12:22 Starting - Launching requested ML instances......
2020-08-27 09:13:46 Starting - Preparing the instances for training.........
2020-08-27 09:15:09 Downloading - Downloading input data...
2020-08-27 09:15:50 Training - Training image download completed. Training in progress.
2020-08-27 09:15:50 Uploading - Uploading generated training model[34mINFO:sagemaker-containers:Imported framework sagemaker_xgboost_container.training[0m
[34mINFO:sagemaker-containers:Failed to parse hyperparameter objective value binary:logistic to Json.[0m
[34mReturning the value itself[0m
[34mINFO:sagemaker-containers:No GPUs detected (normal if no gpus installed)[0m
[34mINFO:sagemaker_xgboost_container.training:Running XGBoost Sagemaker in algorithm mode[0m
[34mINFO:root:Determined delimiter of CSV input is ','[0m
[34mINFO:root:Determined delimiter of CSV input is ','[0m
[34mINFO:root:Determined delimiter of CSV input


2020-08-27 09:15:57 Completed - Training job completed
Training seconds: 48
Billable seconds: 48


#### Trial 2 - XGBoost in framework mode

For the next trial, we'll train a similar model, but use XGBoost in framework mode. If you've worked with the open source XGBoost, using XGBoost this way will be familiar to you. Using XGBoost as a framework provides more flexibility than using it as a built-in algorithm because it enables more advanced scenarios that allow incorporating preprocessing and post-processing scripts into your training script. Specifically, we'll be able to specify a list of rules that we want Amazon SageMaker Debugger to evaluate our training  against.

#### Fit estimator

To use XGBoost as a framework, you need to specify an entry-point script that can incorporate additional processing into your training jobs.

We've made a couple of simple changes to enable the Amazon SageMaker Debugger `smdebug` library. We created a `SessionHook`, which we pass as a callback function when creating a `Booster`. We passed a `SaveConfig` object that tells the hook to save the evaluation metrics, feature importances, and SHAP values at regular intervals. (Debugger is highly configurable. You can choose exactly what to save.) We describe the changes in more detail after we train this example. For even more detail, see the [Developer Guide for XGBoost](https://github.com/awslabs/sagemaker-debugger/tree/master/docs/xgboost).

In [25]:
!wget https://raw.githubusercontent.com/JerryChenZeyun/sagemaker-immersion-day/master/notebooks/customer_churn/xgboost_customer_churn.py
!pygmentize xgboost_customer_churn.py

--2020-08-27 09:16:32--  https://raw.githubusercontent.com/JerryChenZeyun/sagemaker-immersion-day/master/notebooks/customer_churn/xgboost_customer_churn.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.28.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.28.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4321 (4.2K) [text/plain]
Saving to: ‘xgboost_customer_churn.py.1’


2020-08-27 09:16:32 (41.9 MB/s) - ‘xgboost_customer_churn.py.1’ saved [4321/4321]

[34mimport[39;49;00m [04m[36margparse[39;49;00m
[34mimport[39;49;00m [04m[36mjson[39;49;00m
[34mimport[39;49;00m [04m[36mos[39;49;00m
[34mimport[39;49;00m [04m[36mpickle[39;49;00m
[34mimport[39;49;00m [04m[36mrandom[39;49;00m
[34mimport[39;49;00m [04m[36mtempfile[39;49;00m
[34mimport[39;49;00m [04m[36murllib[39;49;00m[04m[36m.[39;49;00m[04m[36mrequest[39;49;00m

[34mimport[39;49;00m [04m[36mxgboost[39;49;

Let's create our framwork estimator and call `fit` to start the training job. Because we are running in framework mode, we also need to pass additional parameters, like the entry point script and the framework version, to the estimator. 

As training progresses, you'll be able to see Amazon SageMaker Debugger logs that evaluate the rule against the training job.

In [27]:
entry_point_script = "xgboost_customer_churn.py"

framework_xgb = sagemaker.xgboost.XGBoost(image_name=docker_image_name,
                                          entry_point=entry_point_script,
                                          role=role,
                                          framework_version="0.90-2",
                                          py_version="py3",
                                          hyperparameters=hyperparams,
                                          train_instance_count=1, 
                                          train_instance_type='ml.m5.xlarge',
                                          output_path='s3://{}/{}/output'.format(bucket, prefix),
                                          base_job_name="demo-xgboost-customer-churn",
                                          sagemaker_session=sess,
                                          )

framework_xgb.fit({'train': s3_input_train, 'validation': s3_input_validation})

INFO:sagemaker:Creating training-job with name: demo-xgboost-customer-churn-2020-08-27-09-22-46-516


2020-08-27 09:22:46 Starting - Starting the training job...
2020-08-27 09:22:48 Starting - Launching requested ML instances......
2020-08-27 09:23:54 Starting - Preparing the instances for training...
2020-08-27 09:24:46 Downloading - Downloading input data
2020-08-27 09:24:46 Training - Downloading the training image...
2020-08-27 09:25:12 Training - Training image download completed. Training in progress..[34mINFO:sagemaker-containers:Imported framework sagemaker_xgboost_container.training[0m
[34mINFO:sagemaker-containers:No GPUs detected (normal if no gpus installed)[0m
[34mINFO:sagemaker_xgboost_container.training:Invoking user training script.[0m
[34mINFO:sagemaker-containers:Module xgboost_customer_churn does not provide a setup.py. [0m
[34mGenerating setup.py[0m
[34mINFO:sagemaker-containers:Generating setup.cfg[0m
[34mINFO:sagemaker-containers:Generating MANIFEST.in[0m
[34mINFO:sagemaker-containers:Installing module with the following command:[0m
[34m/miniconda

[34m[563]#011train-error:0.021003#011validation-error:0.061562[0m
[34m[564]#011train-error:0.021003#011validation-error:0.061562[0m
[34m[565]#011train-error:0.021003#011validation-error:0.061562[0m
[34m[566]#011train-error:0.021003#011validation-error:0.061562[0m
[34m[567]#011train-error:0.021003#011validation-error:0.061562[0m
[34m[568]#011train-error:0.021003#011validation-error:0.061562[0m
[34m[569]#011train-error:0.020574#011validation-error:0.061562[0m
[34m[570]#011train-error:0.020574#011validation-error:0.061562[0m
[34m[571]#011train-error:0.020574#011validation-error:0.061562[0m
[34m[572]#011train-error:0.020574#011validation-error:0.061562[0m
[34m[573]#011train-error:0.020574#011validation-error:0.061562[0m
[34m[574]#011train-error:0.020574#011validation-error:0.061562[0m
[34m[575]#011train-error:0.020574#011validation-error:0.061562[0m
[34m[576]#011train-error:0.021003#011validation-error:0.061562[0m
[34m[577]#011train-error:0.020574#011validation

---
## Host the model

Now that we've trained the model, let's deploy it to a hosted endpoint. To monitor the model after it's hosted and serving requests, we'll also add configurations to capture data that is being sent to the endpoint.

# Batch transform part

Batch transform is used when users don't need millisecond level of latency, while a batch volume of data is needed for the inference. We are uploading the test data to S3 bucket first, then launch batch transform job to get inference result.

In [28]:
# download test dataset from github and store it in S3 bucket
!wget https://raw.githubusercontent.com/JerryChenZeyun/sagemaker-immersion-day/master/notebooks/customer_churn/data/test_sample_20200827.csv

s3url = S3Uploader.upload('./test_sample_20200827.csv', 's3://{}/{}/{}'.format(bucket, prefix,'test'))
print(s3url)



--2020-08-27 09:25:58--  https://raw.githubusercontent.com/JerryChenZeyun/sagemaker-immersion-day/master/notebooks/customer_churn/data/test_sample_20200827.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.28.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.28.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 19223 (19K) [text/plain]
Saving to: ‘test_sample_20200827.csv.1’


2020-08-27 09:25:59 (20.9 MB/s) - ‘test_sample_20200827.csv.1’ saved [19223/19223]

s3://sagemaker-studio-ap-southeast-2-910003606845/xgboost-churn/test/test_sample_20200827.csv


In [29]:
# The location of the test dataset
batch_input = 's3://{}/{}/{}'.format(bucket, prefix, 'test')

# The location to store the results of the batch transform job
batch_output = 's3://{}/{}/{}'.format(bucket, prefix, 'batch-inference')

batch_output
 

's3://sagemaker-studio-ap-southeast-2-910003606845/xgboost-churn/batch-inference'

In [30]:
transformer = xgb.transformer(instance_count=1, instance_type='ml.m4.4xlarge', output_path=batch_output)

transformer.transform(data=batch_input, data_type='S3Prefix', content_type='text/csv', split_type='Line')

transformer.wait()


INFO:sagemaker:Creating model with name: demo-xgboost-customer-churn-2020-08-27-09-12-19-755
INFO:sagemaker:Creating transform job with name: demo-xgboost-customer-churn-2020-08-27-09-26-00-821


...........................
[34m[2020-08-27:09:30:20:INFO] No GPUs detected (normal if no gpus installed)[0m
[34m[2020-08-27:09:30:20:INFO] No GPUs detected (normal if no gpus installed)[0m
[34m[2020-08-27:09:30:20:INFO] nginx config: [0m
[34mworker_processes auto;[0m
[34mdaemon off;[0m
[34mpid /tmp/nginx.pid;[0m
[34merror_log  /dev/stderr;
[0m
[34mworker_rlimit_nofile 4096;
[0m
[34mevents {
  worker_connections 2048;[0m
[34m}
[0m
[34mhttp {
  include /etc/nginx/mime.types;
  default_type application/octet-stream;
  access_log /dev/stdout combined;

  upstream gunicorn {
    server unix:/tmp/gunicorn.sock;
  }

  server {
    listen 8080 deferred;
    client_max_body_size 0;

    keepalive_timeout 3;

    location ~ ^/(ping|invocations|execution-parameters) {
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_set_header Host $http_host;
      proxy_redirect off;
      proxy_read_timeout 60s;
      proxy_pass http://gunicorn;
    }

    l

Once the batch transform has been completed, you may check your S3 "batch-inference" bucket to check the batch inference result

# Endpoint hosting part

In [31]:
data_capture_prefix = '{}/datacapture'.format(prefix)

endpoint_name = "demo-xgboost-customer-churn-" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print("EndpointName = {}".format(endpoint_name))

EndpointName = demo-xgboost-customer-churn-2020-08-27-09-30-43


In [32]:
xgb_predictor = xgb.deploy(initial_instance_count=1, 
                           instance_type='ml.m4.xlarge',
                           endpoint_name=endpoint_name,
                           data_capture_config=DataCaptureConfig(enable_capture=True,
                                                                 sampling_percentage=100,
                                                                 destination_s3_uri='s3://{}/{}'.format(bucket, data_capture_prefix)
                                                                )
                           )

INFO:sagemaker:Creating model with name: demo-xgboost-customer-churn-2020-08-27-09-12-19-755
INFO:sagemaker:Creating endpoint with name demo-xgboost-customer-churn-2020-08-27-09-30-43


---------------!

### Invoke the deployed model

Now that we have a hosted endpoint running, we can make real-time predictions from our model by making an http POST request.  But first, we need to set up serializers and deserializers for passing our `test_data` NumPy arrays to the model behind the endpoint.

In [33]:
xgb_predictor.content_type = 'text/csv'
xgb_predictor.serializer = csv_serializer
xgb_predictor.deserializer = None

Now, we'll loop over our test dataset and collect predictions by invoking the XGBoost endpoint:

In [34]:
print("Sending test traffic to the endpoint {}. \nPlease wait for a minute...".format(endpoint_name))

with open('./test_sample_20200827.csv', 'r') as f:
    for row in f:
        payload = row.rstrip('\n')
        response = xgb_predictor.predict(data=payload)
        print (response)
        time.sleep(0.5)

Sending test traffic to the endpoint demo-xgboost-customer-churn-2020-08-27-09-30-43. 
Please wait for a minute...
b'0.014719205908477306'
b'0.005068291909992695'
b'0.008791499771177769'
b'0.16663919389247894'
b'0.004287515766918659'
b'0.028125004842877388'
b'0.8758227825164795'
b'0.03185996040701866'
b'0.11987227201461792'
b'0.010174904949963093'
b'0.009708206169307232'
b'0.03329237550497055'
b'0.005242553073912859'
b'0.024490639567375183'
b'0.014462091028690338'
b'0.004954594187438488'
b'0.025912262499332428'
b'0.023918986320495605'
b'0.3170226514339447'
b'0.035495929419994354'
b'0.0046775382943451405'
b'0.6329114437103271'
b'0.011705470271408558'
b'0.024390051141381264'
b'0.018761157989501953'
b'0.08423447608947754'
b'0.009342672303318977'
b'0.9935926795005798'
b'0.01245774608105421'
b'0.130371555685997'
b'0.009074637666344643'
b'0.03049078769981861'
b'0.014549369923770428'
b'0.010767386294901371'
b'0.006765507161617279'
b'0.7319309115409851'
b'0.03967547044157982'
b'0.0049095554277

In [36]:
print("--- %s mins ---" % str((time.time() - start_time) / 60))

--- 101.32662258148193 mins ---


## Clean up

If you no longer need this notebook, clean up your environment by running the following cell. It removes the hosted endpoint that you created for this walkthrough and prevents you from incurring charges for running an instance that you no longer need. It also cleans up all artifacts related to the experiments. 

You might also want to delete artifacts stored in the S3 bucket used in this notebook. To do so, open the Amazon S3 console, find the `sagemaker-notebook-<region-name>-<account-name>` bucket, and delete the files associated with this notebook.

In [37]:
sess.delete_endpoint(xgb_predictor.endpoint)

INFO:sagemaker:Deleting endpoint with name: demo-xgboost-customer-churn-2020-08-27-09-30-43
