# Creating a Sentiment Analysis Web App
## Using PyTorch and SageMaker

## Downloading the data



In [0]:
%mkdir ../data
!wget -O ../data/aclImdb_v1.tar.gz http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
!tar -zxf ../data/aclImdb_v1.tar.gz -C ../data

mkdir: cannot create directory ‘../data’: File exists
--2019-03-08 19:31:42--  http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
Resolving ai.stanford.edu (ai.stanford.edu)... 171.64.68.10
Connecting to ai.stanford.edu (ai.stanford.edu)|171.64.68.10|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 84125825 (80M) [application/x-gzip]
Saving to: ‘../data/aclImdb_v1.tar.gz’


2019-03-08 19:31:46 (25.8 MB/s) - ‘../data/aclImdb_v1.tar.gz’ saved [84125825/84125825]



##  Preparing and Processing the data


In [0]:
import os
import glob

def read_imdb_data(data_dir='../data/aclImdb'):
    data = {}
    labels = {}
    
    for data_type in ['train', 'test']:
        data[data_type] = {}
        labels[data_type] = {}
        
        for sentiment in ['pos', 'neg']:
            data[data_type][sentiment] = []
            labels[data_type][sentiment] = []
            
            path = os.path.join(data_dir, data_type, sentiment, '*.txt')
            files = glob.glob(path)
            
            for f in files:
                with open(f) as review:
                    data[data_type][sentiment].append(review.read())
                    # Here we represent a positive review by '1' and a negative review by '0'
                    labels[data_type][sentiment].append(1 if sentiment == 'pos' else 0)
                    
            assert len(data[data_type][sentiment]) == len(labels[data_type][sentiment]), \
                    "{}/{} data size does not match labels size".format(data_type, sentiment)
                
    return data, labels

In [0]:
data, labels = read_imdb_data()
print("IMDB reviews: train = {} pos / {} neg, test = {} pos / {} neg".format(
            len(data['train']['pos']), len(data['train']['neg']),
            len(data['test']['pos']), len(data['test']['neg'])))

IMDB reviews: train = 12500 pos / 12500 neg, test = 12500 pos / 12500 neg


In [0]:
from sklearn.utils import shuffle

def prepare_imdb_data(data, labels):
    """Prepare training and test sets from IMDb movie reviews."""
    
    #Combine positive and negative reviews and labels
    data_train = data['train']['pos'] + data['train']['neg']
    data_test = data['test']['pos'] + data['test']['neg']
    labels_train = labels['train']['pos'] + labels['train']['neg']
    labels_test = labels['test']['pos'] + labels['test']['neg']
    
    #Shuffle reviews and corresponding labels within training and test sets
    data_train, labels_train = shuffle(data_train, labels_train)
    data_test, labels_test = shuffle(data_test, labels_test)
    
    # Return a unified training data, test data, training labels, test labets
    return data_train, data_test, labels_train, labels_test

In [0]:
train_X, test_X, train_y, test_y = prepare_imdb_data(data, labels)
print("IMDb reviews (combined): train = {}, test = {}".format(len(train_X), len(test_X)))

IMDb reviews (combined): train = 25000, test = 25000


In [0]:
print(train_X[100])
print(train_y[100])

One of my favorite movies which has been overlooked by too many movie goers, an observation which mystifies me. Not only directed by the acclaimed Ang Lee,it had many young actors who were to become major stars, e.g., Tobey Maguire (before Spiderman), Skeet Ulrich (before Jericho), Jonathan Rhys Meyers (before Tudors), James Caviezel, Simon Baker, Mark Ruffalo, Jeffrey Wright, Tom Wilkinson, and Jewel. All of the acting was superb and each of the actors mentioned gave memorable performances, especially Meyers who portrayed an evil villain who killed for the sake of killing.<br /><br />When the biographies and accomplishments of the director ( even when he won an academy award) and the actors are listed, this film is usually omitted from their past performances. I discovered the film on DVD by accident and it became one of my most often watched films. However, it is seldom every seen on cable. I look forward to reading what others suggest are the reasons this film is not well known.
1


In [0]:
import nltk
from nltk.corpus import stopwords
from nltk.stem.porter import *

import re
from bs4 import BeautifulSoup

def review_to_words(review):
    nltk.download("stopwords", quiet=True)
    stemmer = PorterStemmer()
    
    text = BeautifulSoup(review, "html.parser").get_text() # Remove HTML tags
    text = re.sub(r"[^a-zA-Z0-9]", " ", text.lower()) # Convert to lower case
    words = text.split() # Split string into words
    words = [w for w in words if w not in stopwords.words("english")] # Remove stopwords
    words = [PorterStemmer().stem(w) for w in words] # stem
    
    return words

In [0]:
review_to_words(train_X[100])

['one',
 'favorit',
 'movi',
 'overlook',
 'mani',
 'movi',
 'goer',
 'observ',
 'mystifi',
 'direct',
 'acclaim',
 'ang',
 'lee',
 'mani',
 'young',
 'actor',
 'becom',
 'major',
 'star',
 'e',
 'g',
 'tobey',
 'maguir',
 'spiderman',
 'skeet',
 'ulrich',
 'jericho',
 'jonathan',
 'rhi',
 'meyer',
 'tudor',
 'jame',
 'caviezel',
 'simon',
 'baker',
 'mark',
 'ruffalo',
 'jeffrey',
 'wright',
 'tom',
 'wilkinson',
 'jewel',
 'act',
 'superb',
 'actor',
 'mention',
 'gave',
 'memor',
 'perform',
 'especi',
 'meyer',
 'portray',
 'evil',
 'villain',
 'kill',
 'sake',
 'kill',
 'biographi',
 'accomplish',
 'director',
 'even',
 'academi',
 'award',
 'actor',
 'list',
 'film',
 'usual',
 'omit',
 'past',
 'perform',
 'discov',
 'film',
 'dvd',
 'accid',
 'becam',
 'one',
 'often',
 'watch',
 'film',
 'howev',
 'seldom',
 'everi',
 'seen',
 'cabl',
 'look',
 'forward',
 'read',
 'other',
 'suggest',
 'reason',
 'film',
 'well',
 'known']

In [0]:
import pickle

cache_dir = os.path.join("../cache", "sentiment_analysis")  # where to store cache files
os.makedirs(cache_dir, exist_ok=True)  # ensure cache directory exists

def preprocess_data(data_train, data_test, labels_train, labels_test,
                    cache_dir=cache_dir, cache_file="preprocessed_data.pkl"):
    """Convert each review to words; read from cache if available."""

    # If cache_file is not None, try to read from it first
    cache_data = None
    if cache_file is not None:
        try:
            with open(os.path.join(cache_dir, cache_file), "rb") as f:
                cache_data = pickle.load(f)
            print("Read preprocessed data from cache file:", cache_file)
        except:
            pass  # unable to read from cache, but that's okay
    
    # If cache is missing, then do the heavy lifting
    if cache_data is None:
        # Preprocess training and test data to obtain words for each review
        #words_train = list(map(review_to_words, data_train))
        #words_test = list(map(review_to_words, data_test))
        words_train = [review_to_words(review) for review in data_train]
        words_test = [review_to_words(review) for review in data_test]
        
        # Write to cache file for future runs
        if cache_file is not None:
            cache_data = dict(words_train=words_train, words_test=words_test,
                              labels_train=labels_train, labels_test=labels_test)
            with open(os.path.join(cache_dir, cache_file), "wb") as f:
                pickle.dump(cache_data, f)
            print("Wrote preprocessed data to cache file:", cache_file)
    else:
        # Unpack data loaded from cache file
        words_train, words_test, labels_train, labels_test = (cache_data['words_train'],
                cache_data['words_test'], cache_data['labels_train'], cache_data['labels_test'])
    
    return words_train, words_test, labels_train, labels_test

In [0]:
# Preprocess data
train_X, test_X, train_y, test_y = preprocess_data(train_X, test_X, train_y, test_y)

Read preprocessed data from cache file: preprocessed_data.pkl


## Transform the data

### Create a word dictionary

In [0]:
import numpy as np
from collections import Counter

def build_dict(data, vocab_size = 5000):
    """Construct and return a dictionary mapping each of the most frequently appearing words to a unique integer."""
    
    # Determine how often each word appears in `data`. Note that `data` is a list of sentences and that a
    #       sentence is a list of words.
    
    word_count = Counter([each for sentence in data for each in sentence]) # A dict storing the words that appear in the reviews along with how often they occur
    
    # Sort the words found in `data` so that sorted_words[0] is the most frequently appearing word and
    #       sorted_words[-1] is the least frequently appearing word.
    
    sorted_words = sorted(word_count, key=word_count.get, reverse=True)
    
    word_dict = {} # This is what we are building, a dictionary that translates words into integers
    for idx, word in enumerate(sorted_words[:vocab_size - 2]): # The -2 is so that we save room for the 'no word'
        word_dict[word] = idx + 2                              # 'infrequent' labels
        
    return word_dict

In [0]:
word_dict = build_dict(train_X)

In [0]:
# five most frequently appearing words in the training set.
list(word_dict.keys())[:5]

['movi', 'film', 'one', 'like', 'time']

### Save `word_dict`

In [0]:
data_dir = '../data/pytorch' # The folder we will use for storing data
if not os.path.exists(data_dir): # Make sure that the folder exists
    os.makedirs(data_dir)

In [0]:
with open(os.path.join(data_dir, 'word_dict.pkl'), "wb") as f:
    pickle.dump(word_dict, f)

### Transform the reviews

In [0]:
def convert_and_pad(word_dict, sentence, pad=500):
    NOWORD = 0 # We will use 0 to represent the 'no word' category
    INFREQ = 1 # and we use 1 to represent the infrequent words, i.e., words not appearing in word_dict
    
    working_sentence = [NOWORD] * pad
    
    for word_index, word in enumerate(sentence[:pad]):
        if word in word_dict:
            working_sentence[word_index] = word_dict[word]
        else:
            working_sentence[word_index] = INFREQ
            
    return working_sentence, min(len(sentence), pad)

def convert_and_pad_data(word_dict, data, pad=500):
    result = []
    lengths = []
    
    for sentence in data:
        converted, leng = convert_and_pad(word_dict, sentence, pad)
        result.append(converted)
        lengths.append(leng)
        
    return np.array(result), np.array(lengths)

In [0]:
train_X, train_X_len = convert_and_pad_data(word_dict, train_X)
test_X, test_X_len = convert_and_pad_data(word_dict, test_X)

In [0]:
# Use this cell to examine one of the processed reviews to make sure everything is working as intended.
x = [each for each in train_X[100] if not each==0]
y = [each for each in train_X[100] if each==0]
print('Post processed review: ',x)
print('Non-zeros(words): ', len(x),' Remaining 0\'s(empty): ', len(y))
print(len(train_X[100]))

Post processed review:  [26, 2, 14, 74, 1024, 23, 260, 26, 132, 155, 70, 1339, 14, 77, 42, 576, 322, 305, 1543, 776, 44, 72, 39, 77, 29, 63, 1, 2373, 5, 5, 2007, 465, 907, 3487, 10, 1, 2373, 1144, 26, 2, 14, 99, 23, 1, 3487, 25, 11, 1684, 77, 16, 77, 132, 23, 1065, 2, 14, 47, 1684, 323, 6, 57, 2, 352, 456, 26, 193, 519, 107, 138, 59, 1571, 161, 48, 624, 44, 72, 4480, 12]
Non-zeros(words):  78  Remaining 0's(empty):  422
500


## Upload the data to S3

### Save the processed training dataset locally

In [0]:
import pandas as pd
    
pd.concat([pd.DataFrame(train_y), pd.DataFrame(train_X_len), pd.DataFrame(train_X)], axis=1) \
        .to_csv(os.path.join(data_dir, 'train.csv'), header=False, index=False)

### Uploading the training data


In [0]:
import sagemaker

sagemaker_session = sagemaker.Session()

bucket = sagemaker_session.default_bucket()
prefix = 'sagemaker/sentiment_rnn'

role = sagemaker.get_execution_role()

In [0]:
input_data = sagemaker_session.upload_data(path=data_dir, bucket=bucket, key_prefix=prefix)

## Build and Train the PyTorch Model


In [0]:
!pygmentize train/model.py

[34mimport[39;49;00m [04m[36mtorch.nn[39;49;00m [34mas[39;49;00m [04m[36mnn[39;49;00m

[34mclass[39;49;00m [04m[32mLSTMClassifier[39;49;00m(nn.Module):
    [33m"""[39;49;00m
[33m    This is the simple RNN model we will be using to perform Sentiment Analysis.[39;49;00m
[33m    """[39;49;00m

    [34mdef[39;49;00m [32m__init__[39;49;00m([36mself[39;49;00m, embedding_dim, hidden_dim, vocab_size):
        [33m"""[39;49;00m
[33m        Initialize the model by settingg up the various layers.[39;49;00m
[33m        """[39;49;00m
        [36msuper[39;49;00m(LSTMClassifier, [36mself[39;49;00m).[32m__init__[39;49;00m()

        [36mself[39;49;00m.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx=[34m0[39;49;00m)
        [36mself[39;49;00m.lstm = nn.LSTM(embedding_dim, hidden_dim)
        [36mself[39;49;00m.dense = nn.Linear(in_features=hidden_dim, out_features=[34m1[39;49;00m)
        [36mself[39;49;00m.sig = nn.Sigm

In [0]:
import torch
import torch.utils.data

# Read in only the first 250 rows
train_sample = pd.read_csv(os.path.join(data_dir, 'train.csv'), header=None, names=None, nrows=250)

# Turn the input pandas dataframe into tensors
train_sample_y = torch.from_numpy(train_sample[[0]].values).float().squeeze()
train_sample_X = torch.from_numpy(train_sample.drop([0], axis=1).values).long()

# Build the dataset
train_sample_ds = torch.utils.data.TensorDataset(train_sample_X, train_sample_y)
# Build the dataloader
train_sample_dl = torch.utils.data.DataLoader(train_sample_ds, batch_size=50)

### Training method

In [0]:
def train(model, train_loader, epochs, optimizer, loss_fn, device):
    for epoch in range(1, epochs + 1):
        model.train()
        total_loss = 0
        for batch in train_loader:         
            batch_X, batch_y = batch
            
            batch_X = batch_X.to(device)
            batch_y = batch_y.to(device)
            
            # TODO: Complete this train method to train the model provided.
            # clear the gradients of all optimized variables
            optimizer.zero_grad()
            # forward pass: compute predicted outputs by passing inputs to the model
            output = model.forward(batch_X)
            # calculate the batch loss
            loss = loss_fn(output, batch_y)
            # backward pass: compute gradient of the loss with respect to model parameters
            loss.backward()
            # perform a single optimization step (parameter update)
            optimizer.step()
            total_loss += loss.data.item()
        print("Epoch: {}, BCELoss: {}".format(epoch, total_loss / len(train_loader)))

In [0]:
import torch.optim as optim
from train.model import LSTMClassifier

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = LSTMClassifier(32, 100, 5000).to(device)
optimizer = optim.Adam(model.parameters())
loss_fn = torch.nn.BCELoss()

train(model, train_sample_dl, 5, optimizer, loss_fn, device)

Epoch: 1, BCELoss: 0.6919290065765381
Epoch: 2, BCELoss: 0.6814518690109252
Epoch: 3, BCELoss: 0.6716896176338196
Epoch: 4, BCELoss: 0.6600986003875733
Epoch: 5, BCELoss: 0.6447454452514648


### Training the model

In [0]:
from sagemaker.pytorch import PyTorch

estimator = PyTorch(entry_point="train.py",
                    source_dir="train",
                    role=role,
                    framework_version='0.4.0',
                    train_instance_count=1,
                    train_instance_type='ml.m4.4xlarge', # Using a bit more powerful instance AWS SageMaker instance
                    hyperparameters={
                        'epochs': 10,
                        'hidden_dim': 200,
                    })

In [0]:
estimator.fit({'training': input_data})

INFO:sagemaker:Creating training-job with name: sagemaker-pytorch-2019-03-08-19-32-35-134


2019-03-08 19:32:35 Starting - Starting the training job...
2019-03-08 19:32:37 Starting - Launching requested ML instances......
2019-03-08 19:33:39 Starting - Preparing the instances for training...
2019-03-08 19:34:33 Downloading - Downloading input data
2019-03-08 19:34:33 Training - Downloading the training image..
[31mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[31mbash: no job control in this shell[0m
[31m2019-03-08 19:34:42,248 sagemaker-containers INFO     Imported framework sagemaker_pytorch_container.training[0m
[31m2019-03-08 19:34:42,250 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)[0m
[31m2019-03-08 19:34:42,264 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[31m2019-03-08 19:34:43,671 sagemaker_pytorch_container.training INFO     Invoking user training script.[0m
[31m2019-03-08 19:34:43,885 sagemaker-containers INFO     Module train does not provide

[31mEpoch: 1, BCELoss: 0.674865647238128[0m
[31mEpoch: 2, BCELoss: 0.6149366279037631[0m
[31mEpoch: 3, BCELoss: 0.5503111141068595[0m
[31mEpoch: 4, BCELoss: 0.4773198414822014[0m
[31mEpoch: 5, BCELoss: 0.42277174093285386[0m
[31mEpoch: 6, BCELoss: 0.3790618613058207[0m
[31mEpoch: 7, BCELoss: 0.3327182774641076[0m
[31mEpoch: 8, BCELoss: 0.31351238854077396[0m
[31mEpoch: 9, BCELoss: 0.28995625127334984[0m

2019-03-08 20:18:30 Uploading - Uploading generated training model
2019-03-08 20:18:30 Completed - Training job completed
[31mEpoch: 10, BCELoss: 0.2736748508652862[0m
[31m2019-03-08 20:18:23,002 sagemaker-containers INFO     Reporting training SUCCESS[0m
Billable seconds: 2645


## Testing the model

## Deploy the model for testing

In [0]:
# TODO: Deploy the trained model
predictor = estimator.deploy(initial_instance_count = 1, instance_type = 'ml.m4.xlarge')

INFO:sagemaker:Creating model with name: sagemaker-pytorch-2019-03-08-19-32-35-134
INFO:sagemaker:Creating endpoint with name sagemaker-pytorch-2019-03-08-19-32-35-134


---------------------------------------------------------------!

## Use the model for testing

In [0]:
test_X = pd.concat([pd.DataFrame(test_X_len), pd.DataFrame(test_X)], axis=1)

In [0]:
# We split the data into chunks and send each chunk seperately, accumulating the results.

def predict(data, rows=512):
    split_array = np.array_split(data, int(data.shape[0] / float(rows) + 1))
    predictions = np.array([])
    for array in split_array:
        predictions = np.append(predictions, predictor.predict(array))
    
    return predictions

In [0]:
predictions = predict(test_X.values)
predictions = [round(num) for num in predictions]

In [0]:
from sklearn.metrics import accuracy_score
accuracy_score(test_y, predictions)

0.84084

### Delete the endpoint

In [0]:
estimator.delete_endpoint()

INFO:sagemaker:Deleting endpoint with name: sagemaker-pytorch-2019-03-08-19-32-35-134


## Deploy the model for the web app

In [0]:
!pygmentize serve/predict.py

[34mimport[39;49;00m [04m[36margparse[39;49;00m
[34mimport[39;49;00m [04m[36mjson[39;49;00m
[34mimport[39;49;00m [04m[36mos[39;49;00m
[34mimport[39;49;00m [04m[36mpickle[39;49;00m
[34mimport[39;49;00m [04m[36msys[39;49;00m
[34mimport[39;49;00m [04m[36msagemaker_containers[39;49;00m
[34mimport[39;49;00m [04m[36mpandas[39;49;00m [34mas[39;49;00m [04m[36mpd[39;49;00m
[34mimport[39;49;00m [04m[36mnumpy[39;49;00m [34mas[39;49;00m [04m[36mnp[39;49;00m
[34mimport[39;49;00m [04m[36mtorch[39;49;00m
[34mimport[39;49;00m [04m[36mtorch.nn[39;49;00m [34mas[39;49;00m [04m[36mnn[39;49;00m
[34mimport[39;49;00m [04m[36mtorch.optim[39;49;00m [34mas[39;49;00m [04m[36moptim[39;49;00m
[34mimport[39;49;00m [04m[36mtorch.utils.data[39;49;00m

[34mfrom[39;49;00m [04m[36mmodel[39;49;00m [34mimport[39;49;00m LSTMClassifier

[34mfrom[39;49;00m [04m[36mutils[39;49;00m [34mimport[39;49;00m review_to_words, convert_and_pad

### Deploying the model

In [0]:
from sagemaker.predictor import RealTimePredictor
from sagemaker.pytorch import PyTorchModel

class StringPredictor(RealTimePredictor):
    def __init__(self, endpoint_name, sagemaker_session):
        super(StringPredictor, self).__init__(endpoint_name, sagemaker_session, content_type='text/plain')

model = PyTorchModel(model_data=estimator.model_data,
                     role = role,
                     framework_version='0.4.0',
                     entry_point='predict.py',
                     source_dir='serve',
                     predictor_cls=StringPredictor)
predictor = model.deploy(initial_instance_count=1, instance_type='ml.m4.xlarge')

INFO:sagemaker:Creating model with name: sagemaker-pytorch-2019-03-08-20-32-17-532
INFO:sagemaker:Creating endpoint with name sagemaker-pytorch-2019-03-08-20-32-17-532


---------------------------------------------------------------------------!

### Testing the model

In [0]:
import glob

def test_reviews(data_dir='../data/aclImdb', stop=250):
    
    results = []
    ground = []
    
    # We make sure to test both positive and negative reviews    
    for sentiment in ['pos', 'neg']:
        
        path = os.path.join(data_dir, 'test', sentiment, '*.txt')
        files = glob.glob(path)
        
        files_read = 0
        
        print('Starting ', sentiment, ' files')
        
        # Iterate through the files and send them to the predictor
        for f in files:
            with open(f) as review:
                # First, we store the ground truth (was the review positive or negative)
                if sentiment == 'pos':
                    ground.append(1)
                else:
                    ground.append(0)
                # Read in the review and convert to 'utf-8' for transmission via HTTP
                review_input = review.read().encode('utf-8')
                # Send the review to the predictor and store the results
                results.append(float(predictor.predict(review_input)))
                
            # Sending reviews to our endpoint one at a time takes a while so we
            # only send a small number of reviews
            files_read += 1
            if files_read == stop:
                break
            
    return ground, results

In [0]:
ground, results = test_reviews()

Starting  pos  files
Starting  neg  files


In [0]:
from sklearn.metrics import accuracy_score
accuracy_score(ground, results)

0.828

## Use the model for the web app

### Setup a Lambda function

#### Create an IAM Role for the Lambda function

#### Create a Lambda function

```python
# We need to use the low-level library to interact with SageMaker since the SageMaker API
# is not available natively through Lambda.
import boto3

def lambda_handler(event, context):

    # The SageMaker runtime is what allows us to invoke the endpoint that we've created.
    runtime = boto3.Session().client('sagemaker-runtime')

    # Now we use the SageMaker runtime to invoke our endpoint, sending the review we were given
    response = runtime.invoke_endpoint(EndpointName = '**ENDPOINT NAME HERE**',    # The name of the endpoint we created
                                       ContentType = 'text/plain',                 # The data format that is expected
                                       Body = event['body'])                       # The actual review

    # The response is an HTTP response whose body contains the result of our inference
    result = response['Body'].read().decode('utf-8')

    return {
        'statusCode' : 200,
        'headers' : { 'Content-Type' : 'text/plain', 'Access-Control-Allow-Origin' : '*' },
        'body' : result
    }
```

Once you have copy and pasted the code above into the Lambda code editor, replace the `**ENDPOINT NAME HERE**` portion with the name of the endpoint that we deployed earlier. You can determine the name of the endpoint using the code cell below.

In [0]:
# endpoint name
predictor.endpoint

'sagemaker-pytorch-2019-03-08-20-32-17-532'


### Setup API Gateway

## Deploy the web app using

In [0]:
# Finally deleting the endpoint to stop incurring charges
predictor.delete_endpoint()

INFO:sagemaker:Deleting endpoint configuration with name: sagemaker-pytorch-2019-03-08-20-32-17-532
INFO:sagemaker:Deleting endpoint with name: sagemaker-pytorch-2019-03-08-20-32-17-532
