<h1>SMS Spam Classifier</h1>
<br />
This notebook shows how to implement a basic spam classifier for SMS messages using Amazon SageMaker built-in linear learner algorithm.
The idea is to use the SMS spam collection dataset available at <a href="https://archive.ics.uci.edu/ml/datasets/sms+spam+collection">https://archive.ics.uci.edu/ml/datasets/sms+spam+collection</a> to train and deploy a binary classification model by leveraging on the built-in Linear Learner algoirithm available in Amazon SageMaker.

Amazon SageMaker's Linear Learner algorithm extends upon typical linear models by training many models in parallel, in a computationally efficient manner. Each model has a different set of hyperparameters, and then the algorithm finds the set that optimizes a specific criteria. This can provide substantially more accurate models than typical linear algorithms at the same, or lower, cost.

Let's get started by setting some configuration variables and getting the Amazon SageMaker session and the current execution role, using the Amazon SageMaker high-level SDK for Python.

In [4]:
from sagemaker import get_execution_role

bucket_name = '<bucket-name>'

role = get_execution_role()
bucket_key_prefix = 'sms-spam-classifier'
vocabulary_length = 9013

print(role)

AmazonSageMaker-ExecutionRole-20180311T170786


We now download the spam collection dataset, unzip it and read the first 10 rows.

In [5]:
!mkdir -p dataset
!curl https://archive.ics.uci.edu/ml/machine-learning-databases/00228/smsspamcollection.zip -o dataset/smsspamcollection.zip
!unzip -o dataset/smsspamcollection.zip -d dataset
!head -10 dataset/SMSSpamCollection

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  198k  100  198k    0     0  45348      0  0:00:04  0:00:04 --:--:-- 45354
Archive:  dataset/smsspamcollection.zip
  inflating: dataset/SMSSpamCollection  
  inflating: dataset/readme          
ham	Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got amore wat...
ham	Ok lar... Joking wif u oni...
spam	Free entry in 2 a wkly comp to win FA Cup final tkts 21st May 2005. Text FA to 87121 to receive entry question(std txt rate)T&C's apply 08452810075over18's
ham	U dun say so early hor... U c already then say...
ham	Nah I don't think he goes to usf, he lives around here though
spam	FreeMsg Hey there darling it's been 3 week's now and no word back! I'd like some fun you up for it still? Tb ok! XxX std chgs to send, £1.50 to rcv
ham	Even my brother is not like to speak with me. They treat

We now load the dataset into a Pandas dataframe and execute some data preparation.
More specifically we have to:
<ul>
    <li>replace the target column values (ham/spam) with numeric values (0/1)</li>
    <li>tokenize the sms messages and encode based on word counts</li>
    <li>split into train and test sets</li>
    <li>upload to a S3 bucket for training</li>
</ul>

In [6]:
import pandas as pd
import numpy as np
import pickle
from sms_spam_classifier_utilities import one_hot_encode
from sms_spam_classifier_utilities import vectorize_sequences

df = pd.read_csv('dataset/SMSSpamCollection', sep='\t', header=None)
df[df.columns[0]] = df[df.columns[0]].map({'ham': 0, 'spam': 1})

targets = df[df.columns[0]].values
messages = df[df.columns[1]].values

# one hot encoding for each SMS message
one_hot_data = one_hot_encode(messages, vocabulary_length)
encoded_messages = vectorize_sequences(one_hot_data, vocabulary_length)

df2 = pd.DataFrame(encoded_messages)
df2.insert(0, 'spam', targets)

# Split into training and validation sets (80%/20% split)
split_index = int(np.ceil(df.shape[0] * 0.8))
train_set = df2[:split_index]
val_set = df2[split_index:]

train_set.to_csv('dataset/sms_train_set.csv', header=False, index=False)
val_set.to_csv('dataset/sms_val_set.csv', header=False, index=False)

We have to upload the two files back to Amazon S3 in order to be accessed by the Amazon SageMaker training cluster.

In [7]:
import boto3

s3 = boto3.resource('s3')
target_bucket = s3.Bucket(bucket_name)

with open('dataset/sms_train_set.csv', 'rb') as data:
    target_bucket.upload_fileobj(data, '{0}/train/sms_train_set.csv'.format(bucket_key_prefix))
    
with open('dataset/sms_val_set.csv', 'rb') as data:
    target_bucket.upload_fileobj(data, '{0}/val/sms_val_set.csv'.format(bucket_key_prefix))

<h2>Training the model with Linear Learner</h2>

We are now ready to run the training using the Amazon SageMaker Linear Learner built-in algorithm. First let's get the linear larner container.

In [8]:
import boto3

from sagemaker.amazon.amazon_estimator import get_image_uri
container = get_image_uri(boto3.Session().region_name, 'linear-learner', repo_version="latest")

Next we'll kick off the base estimator, making sure to pass in the necessary hyperparameters. Notice:

<ul>
    <li>feature_dim is set to the same dimension of the vocabulary.</li>
<li>predictor_type is set to 'binary_classifier' since we are trying to predict whether a SMS message is spam or not.</li>
<li>mini_batch_size is set to 100.</li>
<ul>

In [9]:
import sagemaker

output_path = 's3://{0}/{1}/output'.format(bucket_name, bucket_key_prefix)

linear = sagemaker.estimator.Estimator(container,
                                       role, 
                                       train_instance_count=1, 
                                       train_instance_type='ml.c5.2xlarge',
                                       output_path=output_path,
                                       base_job_name='sms-spam-classifier-ll')
linear.set_hyperparameters(feature_dim=vocabulary_length,
                           predictor_type='binary_classifier',
                           mini_batch_size=100)

train_config = sagemaker.session.s3_input('s3://{0}/{1}/train/{2}'
                                          .format(bucket_name, bucket_key_prefix, 'sms_train_set.csv'), 
                                          content_type='text/csv')
test_config = sagemaker.session.s3_input('s3://{0}/{1}/val/{2}'
                                         .format(bucket_name, bucket_key_prefix, 'sms_val_set.csv'), 
                                         content_type='text/csv')

linear.fit({'train': train_config, 'test': test_config })

INFO:sagemaker:Creating training-job with name: sms-spam-classifier-ll-2018-10-21-19-03-59-244


2018-10-21 19:04:30 Starting - Starting the training job...
Launching requested ML instances...
Preparing the instances for training...
2018-10-21 19:05:59 Downloading - Downloading input data
2018-10-21 19:06:06 Training - Downloading the training image...
Training image download completed. Training in progress.
[31mDocker entrypoint called with argument(s): train[0m
[31m[10/21/2018 19:06:37 INFO 140393867704128] Reading default configuration from /opt/amazon/lib/python2.7/site-packages/algorithm/default-input.json: {u'loss_insensitivity': u'0.01', u'epochs': u'15', u'init_bias': u'0.0', u'lr_scheduler_factor': u'auto', u'num_calibration_samples': u'10000000', u'accuracy_top_k': u'3', u'_num_kv_servers': u'auto', u'use_bias': u'true', u'num_point_for_scaler': u'10000', u'_log_level': u'info', u'quantile': u'0.5', u'bias_lr_mult': u'auto', u'lr_scheduler_step': u'auto', u'init_method': u'uniform', u'init_sigma': u'0.01', u'lr_scheduler_minimum_lr': u'auto', u'target_recall': u'0.8',

[31m#metrics {"Metrics": {"train_binary_classification_cross_entropy_objective": {"count": 1, "max": 0.554088320501826, "sum": 0.554088320501826, "min": 0.554088320501826}}, "EndTime": 1540148836.441351, "Dimensions": {"model": 0, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 1}, "StartTime": 1540148836.441281}
[0m
[31m#metrics {"Metrics": {"train_binary_classification_cross_entropy_objective": {"count": 1, "max": 0.5861643652482467, "sum": 0.5861643652482467, "min": 0.5861643652482467}}, "EndTime": 1540148836.441422, "Dimensions": {"model": 1, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 1}, "StartTime": 1540148836.441413}
[0m
[31m#metrics {"Metrics": {"train_binary_classification_cross_entropy_objective": {"count": 1, "max": 0.5693944204124537, "sum": 0.5693944204124537, "min": 0.5693944204124537}}, "EndTime": 1540148836.441447, "Dimensions": {"model": 2, "Host": "algo-1", "Operation": "training", "Algori

[31m#metrics {"Metrics": {"train_binary_classification_cross_entropy_objective": {"count": 1, "max": 0.18855312327261675, "sum": 0.18855312327261675, "min": 0.18855312327261675}}, "EndTime": 1540148849.399728, "Dimensions": {"model": 0, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 2}, "StartTime": 1540148849.399656}
[0m
[31m#metrics {"Metrics": {"train_binary_classification_cross_entropy_objective": {"count": 1, "max": 0.19962540338323875, "sum": 0.19962540338323875, "min": 0.19962540338323875}}, "EndTime": 1540148849.399805, "Dimensions": {"model": 1, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 2}, "StartTime": 1540148849.399796}
[0m
[31m#metrics {"Metrics": {"train_binary_classification_cross_entropy_objective": {"count": 1, "max": 0.20308738333596424, "sum": 0.20308738333596424, "min": 0.20308738333596424}}, "EndTime": 1540148849.39984, "Dimensions": {"model": 2, "Host": "algo-1", "Operation": "trainin

[31m#metrics {"Metrics": {"train_binary_classification_cross_entropy_objective": {"count": 1, "max": 0.05340552661906589, "sum": 0.05340552661906589, "min": 0.05340552661906589}}, "EndTime": 1540148862.483417, "Dimensions": {"model": 0, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 3}, "StartTime": 1540148862.483348}
[0m
[31m#metrics {"Metrics": {"train_binary_classification_cross_entropy_objective": {"count": 1, "max": 0.05567015592516823, "sum": 0.05567015592516823, "min": 0.05567015592516823}}, "EndTime": 1540148862.48349, "Dimensions": {"model": 1, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 3}, "StartTime": 1540148862.483482}
[0m
[31m#metrics {"Metrics": {"train_binary_classification_cross_entropy_objective": {"count": 1, "max": 0.06905136190354824, "sum": 0.06905136190354824, "min": 0.06905136190354824}}, "EndTime": 1540148862.483521, "Dimensions": {"model": 2, "Host": "algo-1", "Operation": "trainin

[31m#metrics {"Metrics": {"train_binary_classification_cross_entropy_objective": {"count": 1, "max": 0.020019603774628855, "sum": 0.020019603774628855, "min": 0.020019603774628855}}, "EndTime": 1540148875.47513, "Dimensions": {"model": 0, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 4}, "StartTime": 1540148875.47506}
[0m
[31m#metrics {"Metrics": {"train_binary_classification_cross_entropy_objective": {"count": 1, "max": 0.017479563733901488, "sum": 0.017479563733901488, "min": 0.017479563733901488}}, "EndTime": 1540148875.4752, "Dimensions": {"model": 1, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 4}, "StartTime": 1540148875.47519}
[0m
[31m#metrics {"Metrics": {"train_binary_classification_cross_entropy_objective": {"count": 1, "max": 0.02438414247537201, "sum": 0.02438414247537201, "min": 0.02438414247537201}}, "EndTime": 1540148875.475237, "Dimensions": {"model": 2, "Host": "algo-1", "Operation": "train

[31m#metrics {"Metrics": {"train_binary_classification_cross_entropy_objective": {"count": 1, "max": 0.012077039569108324, "sum": 0.012077039569108324, "min": 0.012077039569108324}}, "EndTime": 1540148888.554211, "Dimensions": {"model": 0, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 5}, "StartTime": 1540148888.554139}
[0m
[31m#metrics {"Metrics": {"train_binary_classification_cross_entropy_objective": {"count": 1, "max": 0.010173524652861737, "sum": 0.010173524652861737, "min": 0.010173524652861737}}, "EndTime": 1540148888.554286, "Dimensions": {"model": 1, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 5}, "StartTime": 1540148888.554276}
[0m
[31m#metrics {"Metrics": {"train_binary_classification_cross_entropy_objective": {"count": 1, "max": 0.013700132963501595, "sum": 0.013700132963501595, "min": 0.013700132963501595}}, "EndTime": 1540148888.554316, "Dimensions": {"model": 2, "Host": "algo-1", "Operation"

[31m#metrics {"Metrics": {"train_binary_classification_cross_entropy_objective": {"count": 1, "max": 0.009566558315841989, "sum": 0.009566558315841989, "min": 0.009566558315841989}}, "EndTime": 1540148901.553847, "Dimensions": {"model": 0, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 6}, "StartTime": 1540148901.553774}
[0m
[31m#metrics {"Metrics": {"train_binary_classification_cross_entropy_objective": {"count": 1, "max": 0.006876852870495482, "sum": 0.006876852870495482, "min": 0.006876852870495482}}, "EndTime": 1540148901.55392, "Dimensions": {"model": 1, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 6}, "StartTime": 1540148901.553911}
[0m
[31m#metrics {"Metrics": {"train_binary_classification_cross_entropy_objective": {"count": 1, "max": 0.010296031287024644, "sum": 0.010296031287024644, "min": 0.010296031287024644}}, "EndTime": 1540148901.553946, "Dimensions": {"model": 2, "Host": "algo-1", "Operation":

[31m#metrics {"Metrics": {"train_binary_classification_cross_entropy_objective": {"count": 1, "max": 0.007715694633397189, "sum": 0.007715694633397189, "min": 0.007715694633397189}}, "EndTime": 1540148914.54823, "Dimensions": {"model": 0, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 7}, "StartTime": 1540148914.54816}
[0m
[31m#metrics {"Metrics": {"train_binary_classification_cross_entropy_objective": {"count": 1, "max": 0.004796923304992643, "sum": 0.004796923304992643, "min": 0.004796923304992643}}, "EndTime": 1540148914.548304, "Dimensions": {"model": 1, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 7}, "StartTime": 1540148914.548296}
[0m
[31m#metrics {"Metrics": {"train_binary_classification_cross_entropy_objective": {"count": 1, "max": 0.008164696855555204, "sum": 0.008164696855555204, "min": 0.008164696855555204}}, "EndTime": 1540148914.54833, "Dimensions": {"model": 2, "Host": "algo-1", "Operation": "

[31m#metrics {"Metrics": {"train_binary_classification_cross_entropy_objective": {"count": 1, "max": 0.0063177989397875285, "sum": 0.0063177989397875285, "min": 0.0063177989397875285}}, "EndTime": 1540148927.542138, "Dimensions": {"model": 0, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 8}, "StartTime": 1540148927.54207}
[0m
[31m#metrics {"Metrics": {"train_binary_classification_cross_entropy_objective": {"count": 1, "max": 0.0035489617647941818, "sum": 0.0035489617647941818, "min": 0.0035489617647941818}}, "EndTime": 1540148927.542215, "Dimensions": {"model": 1, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 8}, "StartTime": 1540148927.542206}
[0m
[31m#metrics {"Metrics": {"train_binary_classification_cross_entropy_objective": {"count": 1, "max": 0.006524865352971987, "sum": 0.006524865352971987, "min": 0.006524865352971987}}, "EndTime": 1540148927.542248, "Dimensions": {"model": 2, "Host": "algo-1", "Opera

[31m#metrics {"Metrics": {"train_binary_classification_cross_entropy_objective": {"count": 1, "max": 0.005282840216955678, "sum": 0.005282840216955678, "min": 0.005282840216955678}}, "EndTime": 1540148940.489161, "Dimensions": {"model": 0, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 9}, "StartTime": 1540148940.489091}
[0m
[31m#metrics {"Metrics": {"train_binary_classification_cross_entropy_objective": {"count": 1, "max": 0.0030238344439898024, "sum": 0.0030238344439898024, "min": 0.0030238344439898024}}, "EndTime": 1540148940.489232, "Dimensions": {"model": 1, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 9}, "StartTime": 1540148940.489223}
[0m
[31m#metrics {"Metrics": {"train_binary_classification_cross_entropy_objective": {"count": 1, "max": 0.00520600397055122, "sum": 0.00520600397055122, "min": 0.00520600397055122}}, "EndTime": 1540148940.48926, "Dimensions": {"model": 2, "Host": "algo-1", "Operation":

[31m#metrics {"Metrics": {"train_binary_classification_cross_entropy_objective": {"count": 1, "max": 0.004434023734859445, "sum": 0.004434023734859445, "min": 0.004434023734859445}}, "EndTime": 1540148953.5073, "Dimensions": {"model": 0, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 10}, "StartTime": 1540148953.507229}
[0m
[31m#metrics {"Metrics": {"train_binary_classification_cross_entropy_objective": {"count": 1, "max": 0.0026638986432755535, "sum": 0.0026638986432755535, "min": 0.0026638986432755535}}, "EndTime": 1540148953.507374, "Dimensions": {"model": 1, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 10}, "StartTime": 1540148953.507365}
[0m
[31m#metrics {"Metrics": {"train_binary_classification_cross_entropy_objective": {"count": 1, "max": 0.004201584551906721, "sum": 0.004201584551906721, "min": 0.004201584551906721}}, "EndTime": 1540148953.507404, "Dimensions": {"model": 2, "Host": "algo-1", "Operati

[31m#metrics {"Metrics": {"train_binary_classification_cross_entropy_objective": {"count": 1, "max": 0.00373285640115765, "sum": 0.00373285640115765, "min": 0.00373285640115765}}, "EndTime": 1540148966.539704, "Dimensions": {"model": 0, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 11}, "StartTime": 1540148966.539635}
[0m
[31m#metrics {"Metrics": {"train_binary_classification_cross_entropy_objective": {"count": 1, "max": 0.002365260732952844, "sum": 0.002365260732952844, "min": 0.002365260732952844}}, "EndTime": 1540148966.539776, "Dimensions": {"model": 1, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 11}, "StartTime": 1540148966.539767}
[0m
[31m#metrics {"Metrics": {"train_binary_classification_cross_entropy_objective": {"count": 1, "max": 0.003571359642611986, "sum": 0.003571359642611986, "min": 0.003571359642611986}}, "EndTime": 1540148966.539803, "Dimensions": {"model": 2, "Host": "algo-1", "Operation":

[31m#metrics {"Metrics": {"train_binary_classification_cross_entropy_objective": {"count": 1, "max": 0.003270748879083178, "sum": 0.003270748879083178, "min": 0.003270748879083178}}, "EndTime": 1540148979.657596, "Dimensions": {"model": 0, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 12}, "StartTime": 1540148979.657524}
[0m
[31m#metrics {"Metrics": {"train_binary_classification_cross_entropy_objective": {"count": 1, "max": 0.002110379346307706, "sum": 0.002110379346307706, "min": 0.002110379346307706}}, "EndTime": 1540148979.657674, "Dimensions": {"model": 1, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 12}, "StartTime": 1540148979.657663}
[0m
[31m#metrics {"Metrics": {"train_binary_classification_cross_entropy_objective": {"count": 1, "max": 0.003220084719359875, "sum": 0.003220084719359875, "min": 0.003220084719359875}}, "EndTime": 1540148979.657709, "Dimensions": {"model": 2, "Host": "algo-1", "Operatio

[31m#metrics {"Metrics": {"train_binary_classification_cross_entropy_objective": {"count": 1, "max": 0.003000006884844466, "sum": 0.003000006884844466, "min": 0.003000006884844466}}, "EndTime": 1540148992.841241, "Dimensions": {"model": 0, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 13}, "StartTime": 1540148992.841172}
[0m
[31m#metrics {"Metrics": {"train_binary_classification_cross_entropy_objective": {"count": 1, "max": 0.0018871596099978144, "sum": 0.0018871596099978144, "min": 0.0018871596099978144}}, "EndTime": 1540148992.841311, "Dimensions": {"model": 1, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 13}, "StartTime": 1540148992.841303}
[0m
[31m#metrics {"Metrics": {"train_binary_classification_cross_entropy_objective": {"count": 1, "max": 0.0029664307671853087, "sum": 0.0029664307671853087, "min": 0.0029664307671853087}}, "EndTime": 1540148992.84134, "Dimensions": {"model": 2, "Host": "algo-1", "Ope

[31m#metrics {"Metrics": {"train_binary_classification_cross_entropy_objective": {"count": 1, "max": 0.002794088620099832, "sum": 0.002794088620099832, "min": 0.002794088620099832}}, "EndTime": 1540149006.017467, "Dimensions": {"model": 0, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 14}, "StartTime": 1540149006.017396}
[0m
[31m#metrics {"Metrics": {"train_binary_classification_cross_entropy_objective": {"count": 1, "max": 0.0016904247221960263, "sum": 0.0016904247221960263, "min": 0.0016904247221960263}}, "EndTime": 1540149006.017544, "Dimensions": {"model": 1, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 14}, "StartTime": 1540149006.017533}
[0m
[31m#metrics {"Metrics": {"train_binary_classification_cross_entropy_objective": {"count": 1, "max": 0.0027570253115316683, "sum": 0.0027570253115316683, "min": 0.0027570253115316683}}, "EndTime": 1540149006.017578, "Dimensions": {"model": 2, "Host": "algo-1", "Op


2018-10-21 19:10:33 Uploading - Uploading generated training model
2018-10-21 19:10:39 Completed - Training job completed
[31m[10/21/2018 19:10:30 INFO 140393867704128] #train_score (algo-1) : ('binary_classification_cross_entropy_objective', 0.010428856975558147)[0m
[31m[10/21/2018 19:10:30 INFO 140393867704128] #train_score (algo-1) : ('binary_classification_accuracy', 0.9982054733064154)[0m
[31m[10/21/2018 19:10:30 INFO 140393867704128] #train_score (algo-1) : ('binary_f_1.000', 0.9933993399339934)[0m
[31m[10/21/2018 19:10:30 INFO 140393867704128] #train_score (algo-1) : ('precision', 0.9868852459016394)[0m
[31m[10/21/2018 19:10:30 INFO 140393867704128] #train_score (algo-1) : ('recall', 1.0)[0m
[31m[10/21/2018 19:10:30 INFO 140393867704128] #quality_metric: host=algo-1, train binary_classification_cross_entropy_objective <loss>=0.0104288569756[0m
[31m[10/21/2018 19:10:30 INFO 140393867704128] #quality_metric: host=algo-1, train binary_classification_accuracy <score>=0

<h3><span style="color:red">THE FOLLOWING STEPS ARE NOT MANDATORY IF YOU PLAN TO DEPLOY TO AWS LAMBDA AND ARE INCLUDED IN THIS NOTEBOOK FOR EDUCATIONAL PURPOSES.</span></h3>

<h2>Deploying the model</h2>

Let's deploy the trained model to a real-time inference endpoint fully-managed by Amazon SageMaker.

In [11]:
pred = linear.deploy(initial_instance_count=1,
                     instance_type='ml.m5.large')

INFO:sagemaker:Creating model with name: linear-learner-2018-10-21-19-11-38-520
INFO:sagemaker:Creating endpoint with name sms-spam-classifier-ll-2018-10-21-19-03-59-244


----------------------------------------------------------!

<h2>Executing Inferences</h2>

Now, we can invoke the Amazon SageMaker real-time endpoint to execute some inferences, by providing SMS messages and getting the predicted label (SPAM = 1, HAM = 0) and the related probability.

In [14]:
from sagemaker.predictor import RealTimePredictor
from sms_spam_classifier_utilities import one_hot_encode
from sms_spam_classifier_utilities import vectorize_sequences
from sagemaker.predictor import csv_serializer, json_deserializer

# Uncomment the following line to connect to an existing endpoint.
# pred = RealTimePredictor('<endpoint_name>')

pred.content_type = 'text/csv'
pred.serializer = csv_serializer
pred.deserializer = json_deserializer

test_messages = ["FreeMsg: Txt: CALL to No: 86888 & claim your reward of 3 hours talk time to use from your phone now! ubscribe6GBP/ mnth inc 3hrs 16 stop?txtStop"]
one_hot_test_messages = one_hot_encode(test_messages, vocabulary_length)
encoded_test_messages = vectorize_sequences(one_hot_test_messages, vocabulary_length)

result = pred.predict(encoded_test_messages)
print(result)

{'predictions': [{'score': 0.9999786615371704, 'predicted_label': 1.0}]}


<h2>Cleaning-up</h2>

When done, we can delete the Amazon SageMaker real-time inference endpoint.

In [15]:
pred.delete_endpoint()

INFO:sagemaker:Deleting endpoint with name: sms-spam-classifier-ll-2018-10-21-19-03-59-244
