# Sentiment Analysis with MXNet on SageMaker

In this notebook, we will build and train a sentiment analysis model with MXNet on SageMaker.
Our model will learn to classify movie reviews as positive (1) or negative (0).

We will use the SST-2 dataset (Stanford Sentiment Treebank 2), which consists of of movie reviews with one sentence per review.

## Session Initialization and imports

We will start by importing the modules needed, and creating a SageMaker session.

In [8]:
import os
import boto3
import sagemaker
from sagemaker.mxnet import MXNet
from sagemaker import get_execution_role

sagemaker_session = sagemaker.Session()

## Dataset download and preparation

Next let's download the datasets into a /data dir, and then upload it to SageMaker's S3 bucket.
Each line in the dataset has space separated tokens, the first token being the label: 1 for positive and 0 for negative.

We can also check out the downloaded files in Jupyter!

In [None]:
# Download the training data. We're downloading the Stanford Sentiment dataset
# https://nlp.stanford.edu/sentiment/index.html

!mkdir data
!curl https://raw.githubusercontent.com/saurabh3949/Text-Classification-Datasets/master/stsa.binary.phrases.train > data/train
!curl https://raw.githubusercontent.com/saurabh3949/Text-Classification-Datasets/master/stsa.binary.test > data/test

In [9]:
inputs = sagemaker_session.upload_data(path='data', key_prefix='data/sentiment-analysis')

INFO:sagemaker:Created S3 bucket: sagemaker-us-east-1-968277166688


## Implementing the training function

Now we will wanto to implement the training logic that will run on the SageMaker platform. 
The training scripts are essentially the same as one you would write for local training,  except that you need to provide a train function with a specific signature. 

When SageMaker calls your function, it will pass in arguments that describe the training environment. Let's checkout the example below.

In [15]:
!cat 'sentiment-analysis.py'

from __future__ import print_function

import logging
import mxnet as mx
from mxnet import gluon, autograd, nd
from mxnet.gluon import nn
import numpy as np
import json
import time
import re
from mxnet.io import DataIter, DataBatch, DataDesc
import bisect, random
from collections import Counter
from itertools import chain, islice


logging.basicConfig(level=logging.DEBUG)

# ------------------------------------------------------------ #
# Training methods                                             #
# ------------------------------------------------------------ #

def train(current_host, hosts, num_cpus, num_gpus, channel_input_dirs, model_dir, hyperparameters, **kwargs):
    # retrieve the hyperparameters we set in notebook (with some defaults)
    batch_size = hyperparameters.get('batch_size', 8)
    epochs = hyperparameters.get('epochs', 2)
    learning_rate = hyperparameters.get('learning_rate', 0.01)
    log_interval = hyperparameters.get('log_interval'

## Running the training script on SageMaker

SageMaker's MXNet class allows us to run our training function on SageMaker infrastructure. 
We need to configure it with our training script, an IAM role, the number of training instances, training instance type and hyper parameters. 

In [35]:
role = get_execution_role()

m = MXNet("sentiment-analysis.py", 
          role=role, 
          train_instance_count=1, 
          train_instance_type="ml.c5.4xlarge",
          hyperparameters={'batch_size': 8, 
                         'epochs': 2, 
                         'learning_rate': 0.01, 
                         'embedding_size': 50, 
                         'log_interval': 1000})

After we've constructed our MXNet object, we can fit it using the data we uploaded to S3. SageMaker makes sure our data is available in the local filesystem, so our training script can simply read the data from disk.

In [36]:
m.fit(inputs)

INFO:sagemaker:Created S3 bucket: sagemaker-us-east-1-968277166688
INFO:sagemaker:Creating training-job with name: sagemaker-mxnet-2018-06-15-07-17-04-550


................
[31m2018-06-15 07:19:19,397 INFO - root - running container entrypoint[0m
[31m2018-06-15 07:19:19,397 INFO - root - starting train task[0m
[31m2018-06-15 07:19:19,403 INFO - container_support.training - Training starting[0m
[31m2018-06-15 07:19:21,378 INFO - mxnet_container.train - MXNetTrainingEnvironment: {'enable_cloudwatch_metrics': False, 'available_gpus': 0, 'channels': {u'training': {u'TrainingInputMode': u'File', u'RecordWrapperType': u'None', u'S3DistributionType': u'FullyReplicated'}}, '_ps_verbose': 0, 'resource_config': {u'current_host': u'algo-1', u'network_interface_name': u'ethwe', u'hosts': [u'algo-1']}, 'user_script_name': u'sentiment-analysis.py', 'input_config_dir': '/opt/ml/input/config', 'channel_dirs': {u'training': u'/opt/ml/input/data/training'}, 'code_dir': '/opt/ml/code', 'output_data_dir': '/opt/ml/output/data/', 'output_dir': '/opt/ml/output', 'model_dir': '/opt/ml/model', 'hyperparameters': {u'sagemaker_program': u'sentiment-analysis

## Hosting our trained model for inference

As can be seen from the logs, we got > 80% accuracy on the test set.
After training, we can host the trained MXNet model, and use it for inference.

Let's deploy the model, starting with a single C5 instance:

In [19]:
predictor = m.deploy(initial_instance_count=1, instance_type='ml.c5.4xlarge')

INFO:sagemaker:Creating model with name: sagemaker-mxnet-2018-06-15-06-48-04-494
INFO:sagemaker:Creating endpoint with name sagemaker-mxnet-2018-06-15-06-48-04-494


--------------------------------------------------!

Let's use the created predictor object and run inference:

In [33]:
data = ["this was an awesome movie!",
        "come on, you call this a movie?",
        "best one I've seen in ages",
        "i just could not watch it till the end.",
        "the movie was so enthralling !"]

response = predictor.predict(data)
print (response)

[1, 0, 1, 0, 1]


## Cleanup

After you have finished with this example, and do not need the endpoint any more, remember to delete the prediction endpoint to release the instance associated with it.

In [34]:
sagemaker.Session().delete_endpoint(predictor.endpoint)

INFO:sagemaker:Deleting endpoint with name: sagemaker-mxnet-2018-06-15-06-48-04-494
