<h1>SMS Spam Classifier</h1>
<br />
This notebook shows how to implement a basic spam classifier for SMS messages using Apache MXNet as deep learning framework.
The idea is to use the SMS spam collection dataset available at <a href="https://archive.ics.uci.edu/ml/datasets/sms+spam+collection">https://archive.ics.uci.edu/ml/datasets/sms+spam+collection</a> to train and deploy a neural network model by leveraging on the built-in open-source container for Apache MXNet available in Amazon SageMaker.

Let's get started by setting some configuration variables and getting the Amazon SageMaker session and the current execution role, using the Amazon SageMaker high-level SDK for Python.

In [1]:
from sagemaker import get_execution_role

bucket_name = 'hlin-model'

role = get_execution_role()
bucket_key_prefix = 'sms-spam-classifier'
vocabulary_length = 9013

print(role)

arn:aws:iam::458129807189:role/service-role/AmazonSageMaker-ExecutionRole-20220413T101121


<h2>Training the model with MXNet</h2>

We are now ready to run the training using the MXNet estimator object of the SageMaker Python SDK.

In [2]:
from sagemaker.mxnet import MXNet

output_path = 's3://{0}/{1}/output'.format(bucket_name, bucket_key_prefix)
code_location = 's3://{0}/{1}/code'.format(bucket_name, bucket_key_prefix)

m = MXNet('spam_classifier.py',
          role=role,
          instance_count=1,
          instance_type='ml.m4.xlarge',
          output_path=output_path,
          base_job_name='sms-spam-classifier-mxnet',
          framework_version='1.2.1',
          py_version='py3',
          code_location = code_location,
          hyperparameters={'batch_size': 100,
                           'epochs': 10,
                           'learning_rate': 0.01})

inputs = {'train': 's3://{0}/{1}/train/'.format(bucket_name, bucket_key_prefix),
 'val': 's3://{0}/{1}/val/'.format(bucket_name, bucket_key_prefix)}

m.fit(inputs)

2022-04-18 15:25:58 Starting - Starting the training job...
2022-04-18 15:26:27 Starting - Preparing the instances for trainingProfilerReport-1650295558: InProgress
.........
2022-04-18 15:27:50 Downloading - Downloading input data...
2022-04-18 15:28:26 Training - Training image download completed. Training in progress..[34m2022-04-18 15:28:27,789 INFO - root - running container entrypoint[0m
[34m2022-04-18 15:28:27,789 INFO - root - starting train task[0m
[34m2022-04-18 15:28:27,794 INFO - container_support.training - Training starting[0m
[34m2022-04-18 15:28:28,913 INFO - mxnet_container.train - MXNetTrainingEnvironment: {'sagemaker_region': 'us-east-1', 'resource_config': {'current_instance_type': 'ml.m4.xlarge', 'instance_groups': [{'instance_group_name': 'homogeneousCluster', 'hosts': ['algo-1'], 'instance_type': 'ml.m4.xlarge'}], 'current_group_name': 'homogeneousCluster', 'hosts': ['algo-1'], 'current_host': 'algo-1', 'network_interface_name': 'eth0'}, 'output_dir': '/op

<h2>Deploying the model</h2>

Let's deploy the trained model to an existing real-time inference endpoint fully-managed by Amazon SageMaker.

In [3]:
import random
import boto3

model = m.create_model()
session = model.sagemaker_session

container_def = model.prepare_container_def(instance_type='ml.m5.large')
model_name = str(random.random())[2:]
session.create_model(model_name, role, container_def)

endpoint_name = 'sms-spam-classifier-mxnet-2022-04-13-14-46-36-485'
config_name = str(random.random())[2:]
session.create_endpoint_config(name=config_name,
                               model_name=model_name,
                               initial_instance_count=1,
                               instance_type='ml.m4.xlarge')
client = boto3.client('sagemaker')
client.update_endpoint(EndpointName=endpoint_name,
                       EndpointConfigName=config_name)

{'EndpointArn': 'arn:aws:sagemaker:us-east-1:458129807189:endpoint/sms-spam-classifier-mxnet-2022-04-13-14-46-36-485',
 'ResponseMetadata': {'RequestId': '9f965503-acbd-42b1-b83a-8ef1edf47929',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '9f965503-acbd-42b1-b83a-8ef1edf47929',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '117',
   'date': 'Mon, 18 Apr 2022 15:30:32 GMT'},
  'RetryAttempts': 0}}

<h2>Executing Inferences</h2>

Now, we can invoke the Amazon SageMaker real-time endpoint to execute some inferences, by providing SMS messages and getting the predicted label (SPAM = 1, HAM = 0) and the related probability.

In [4]:
from sagemaker.mxnet.model import MXNetPredictor
from spam_classifier_utilities import one_hot_encode
from spam_classifier_utilities import vectorize_sequences

mxnet_pred = MXNetPredictor(endpoint_name)

test_messages = ["FreeMsg: Txt: CALL to No: 86888 & claim your reward of 3 hours talk time to use from your phone now! ubscribe6GBP/ mnth inc 3hrs 16 stop?txtStop"]
one_hot_test_messages = one_hot_encode(test_messages, vocabulary_length)
encoded_test_messages = vectorize_sequences(one_hot_test_messages, vocabulary_length)

result = mxnet_pred.predict(encoded_test_messages)
print(result)

{'predicted_probability': [[0.999946117401123]], 'predicted_label': [[1.0]]}
