# Modeling sentiment analysis in MXNet with MLP

# What is MLP

A multilayer perceptron (MLP) is a feedforward artificial neural network model that maps sets of input data onto a set of appropriate outputs. An MLP consists of multiple layers of nodes in a directed graph, with each layer fully connected to the next one. Except for the input nodes, each node is a neuron (or processing element) with a nonlinear activation function. MLP utilizes a supervised learning technique called backpropagation for training the network

<center><img src='images/mlp.svg' width=700></center> 

## What is MXNet

Apache MXnet is an open-source machine learning framework. It allows to build, train, and port deep learning models. MXNet is supported by AWS, Intel, Microsoft as well as educational instituions such as  MIT, Carnegie Mellon, and University of Washington. MXNet combines flexibility, high performance, and scalability which make it a great choice for many machine learning use cases. MXNet also have Gluon library which implements clear and intuitive API for deep learning models without sacrificing performance and scalability. 

Over the course of these labs, you'll learn how to use MXnet, Gluon, and specialized toolkits to solve variety of machine learning problems.


# Problem 

Sentiment analysis is the use of natural language processing (NLP) to determine the attitude expressed by an author in a piece of written text towards a topic, e.g. movie review. The attitude can be positive, neutral, and negative.
From a machine learning perspective, sentiment analysis can be treated as a classification problem. In the tutorial, we will train an MLP based model for sentiment analysis.
While there are other algorithms, such as Recurrent Neural Networks (RNNs), that are better at capturing the syntactic structure of the sentence for sentiment analysis, MLP is a straight and simple network that is quick to train.


# How to Use This Tutorial¶
You can use this tutorial by executing each snippet of python code in order as it appears in the notebook. An easy way to do so is to click on the "run cell, select below" arrow that is to the left of the "stop" icon in the toolbar. In this tutorial, we will train an MLP on an IMDB dataset which will ultimately produce a neural network that can predict the sentiment of movie reviews.


# Prerequisites
- Skills: Familiarity with MXNet, Python, Numpy, basics of MLP networks.
- Resource: Sagemaker Notebook instances.

# Dataset Overview

The training and testing dataset is the IMDB movie review database.  It contains a total of 50,000 movie reviews that are tagged (labeled) with either a negative (0) or a positive (1) sentiment.  We will split the dataset into 35,000 reviews for training, 10,050 for validation, and 4,950 reviews for testing. Refer to official [dataset documentation](https://ai.stanford.edu/~amaas/data/sentiment/) for more details.

Below, we download locally IMDB dataset and un-archive it.

In [None]:
# Download dataset from public sources ~3 minutes
!wget https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz

In [None]:
!tar zxf aclImdb_v1.tar.gz # un-archive dataset ~3 minutes
!rm -r aclImdb_v1.tar.gz # deleting tar.gz 
!ls -l aclImdb # let's peek inside un-archived dataset and confirm that it's there

# Load Modules

In [None]:
# Load all the libraries and modules
import numpy as np
import sys
import os
import re

from text import Tokenizer
from matplotlib import pyplot
from six.moves.urllib.request import urlopen
from sequence import pad_sequences
from sklearn.model_selection import train_test_split 

from IPython.display import display 
from ipywidgets import widgets

# Enable logging so we will see output during the training
import logging
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)

# import MXNet packages
import mxnet as mx
from mxnet import nd
from mxnet.gluon import nn, loss, Trainer
from mxnet import init
from mxnet.gluon.contrib.estimator import estimator
from mxnet.gluon.contrib.estimator.event_handler import TrainBegin, TrainEnd, EpochEnd, CheckpointHandler # this is optional
from mxnet.gluon.data import DataLoader, ArrayDataset

# Process Movie Review Data

The raw reviews are in the aclImdb directory.  We will process the unzipped raw reviews into training and test datasets for training and validation purpose. Additionally, we need to convert text into machine-friendly format. For sentimental analysis probelm, it's simple and computationally cheap to represent all words as a vocabulary consisting of {index: word} pairs (e.g. {0 : "movie"}). Then, we'll use word indices as input to our model. The process of representing words as numerical values is called "encoding".

## LAB INSTRUCTION
- Enter **10000** as the value for the **vocabsize** variable.  This is limit on size of word vocabulary.  Any word outside of vacabulary will be encoded as 0.

In [None]:
# We specify number of words to index and this is also the size of vocabulary
vocabsize =  #Follow instruction above

# This is the directory where the raw review data is located
path = "aclImdb/"

# List all the files for the reviews in the following directories
ff = [path + "train/pos/" + x for x in os.listdir(path + "train/pos")] + \
     [path + "train/neg/" + x for x in os.listdir(path + "train/neg")] + \
     [path + "test/pos/" + x for x in os.listdir(path + "test/pos")] + \
     [path + "test/neg/" + x for x in os.listdir(path + "test/neg")]

# Find all HTML tags using following regex pattern
TAG_RE = re.compile(r'<[^>]+>')

# Remove all found HTML tags
def remove_tags(text):
    return TAG_RE.sub('', text)

input_label = ([1] * 12500 + [0] * 12500) * 2
input_text  = []

for f in ff:
    with open(f) as fin:
        pass
        input_text += [remove_tags(" ".join(fin.readlines()))]
            
# Initialize a tokenizer with the vocabulary size and train on data input text to create a vocabulary for all 
# the unique words found in the text inputs
tok = Tokenizer(vocabsize)
tok.fit_on_texts(input_text)

        
# Create training (60% of review), validation(30% of reviews), and testing(10% reviews) datasets.  
# Words will be replaced with indexes for the words.
tok_input_text = tok.texts_to_sequences(input_text)
X_train, X_val, y_train, y_val = train_test_split(tok_input_text, input_label, test_size = 0.3, random_state=1)
X_val, X_test, y_val, y_test = train_test_split(X_val, y_val, test_size = 0.33, random_state=1)

print("Reviews in training dataset %d" % len(X_train))
print("Reviews in validations dataset %d" %len(X_val))
print("Reviews in training dataset %d" % len(X_test))


Let's take a look at some of the basic metrics of the datasets including the number of unique words, unique label values, and the mean and standard deviation of the data set.

## LAB INSTRUCTION:
- Anwer the following questions
   - What are the unique label values?
   - What is the mean size of all the review texts?

In [None]:
# Let's do some analysis of the data

X = np.concatenate((X_train, X_val), axis=0)

# Summarize review length
print("Number of unique words : %i" % len(np.unique(np.hstack(X))))
print ('')
print ("Label value")
print (np.unique(y_train))
print ('')
print("Review length: ")

result = [len(x) for x in X]
print("Mean %.2f words with %f standard deviation" % (np.mean(result), np.std(result)))

# plot review length distribution
pyplot.boxplot(result)
pyplot.show()

# Additional Data Processing

We will pad the data to a fixed length and create [MXNet DataLoader objects](https://beta.mxnet.io/api/gluon/_autogen/mxnet.gluon.data.DataLoader.html) to be used for training later.

## LAB INSTRUCTIONS
- Enter **500** as the value for the **maxtextlen** variable.
- Enter **250** as the value for the **Batch_Size** variable.
- Answer the questions below
    - Why do you think we want the maximum length to be 500?


In [None]:
# Maximum text length for each review in the training data
maxtextlen =    #Follow instruction above
Batch_Size =    #Follow instruction above

# Specify the maximum length of the reviews we want to process and pad the training and test data 

X_train = pad_sequences(X_train, maxlen=maxtextlen)
X_val = pad_sequences(X_val, maxlen=maxtextlen)
X_test = pad_sequences(X_test, maxlen=maxtextlen)

# convert list to nd array type as mxnet.gluon.data.ArrayDataset takes Numpy array data type
y_train = np.asarray(y_train)
y_val = np.asarray(y_val)
y_test = np.asarray(y_test)


# Create DataLoaders which return batches of features and laberls during training and validation processes. 
# Please note that datasets are shuffled to ensure randomness of data.
train_data = ArrayDataset(X_train,y_train)
train_data = DataLoader(train_data, batch_size=Batch_Size, shuffle=True)

val_data = ArrayDataset(X_val, y_val)
val_data = DataLoader(val_data, batch_size=Batch_Size, shuffle=True)

# Review Sample Input Data

## LAB INSTRUCTION
- Answer the following questions:
    - What does each integer represent in the vector?
    - What is the length of the vector?

In [None]:
# Let's also take a look at 1 row of the training data
# The integers represent a word in the original text 
print ('Review Example - Coded with word index')
print (X_train[0])

# Build an MLP Network

We will build a simple MLP network with 2 hidden layers to determine negative and positive sentiment of movie review. 

## Understanding model output
As we need to predict binary category (positive or negative sentiment), our output layer has only 2 units (to represent each category). The numeric values in these two units will determine whether network predict positive or negative sentiment for a given review. 

To make model outputs more convenient, we apply to them [Softmax function](https://en.wikipedia.org/wiki/Softmax_function) which takes as input a vector of any real numbers and normalizes this vector into probability distribution with following properties:
- sum of probabilities will add to 1;
- all probabilities will be within (0,1).

This allows us to take any model output for K categories and represent it as probability of each K categories to be true.

## Training model using loss function
Machine Learning models are in most cases using loss function to train. Loss function takes model predictions and true labels and scores how far predictions are true labels. If you have a good model and data, then during training you'll see that your loss score is reducing which indicates that models is learning from data and making more accurate predictions over time. For binary categorical problem such as sentiment analysis, it's common to use [cross-entropy loss function](https://en.wikipedia.org/wiki/Cross_entropy#Cross-entropy_loss_function_and_logistic_regression) which compares distribution of true labels with model predictions.

In [None]:
# Create MLP network using MXNet Gluon library

# We define a new function build_net which takes Gluon block as an input and stacks layers on top of it.
# Function returns MLP network ready for training.
def build_net(net):
    with net.name_scope():
        # We embed the integer representation for each word into a vector of size 32.
        # Embedding is a technique that places related words close together in vector space.
        # This helps improve the accuracy of model.
        # input_dim is the size of the vocabulary.  output_dim is the dimension of the output embedded vector.
        net.add(nn.Embedding(input_dim=vocabsize, output_dim=32))

        # The output from the embedding layer will be dimensional matrix, since MLP only accepts 1 dimensional vector, 
        # we need to flatten it back to one dimension vector
        net.add(nn.Flatten())

        # We create a fully connected layer (in other words densily connected) with 250 neurons.  
        # This layer will take the flattened input and perform a linear calculation on the input data f(x) = ⟨w, x⟩ + b
        # please note, we specify "Relu" activation, so MLP model can "learn" non linear data patterns.
        net.add(nn.Dense(units=250, activation="relu"))


        # Choose if you want to want to have Dropout layer to introduce regularization. 
        # If dropout rate is '0' than Dropout layer won't be added. 
        dropout_rate = 0.5 
        if dropout_rate > 0.0:
            net.add(nn.Dropout(dropout_rate))

        # We add another hidden layer with 2 hidden units as we have 2 desired output (1, 0) - positive or negative review.
        net.add(nn.Dense(units=2))
        
        return net

# MLP model is simple feed forward network architecture, 
# so we'll use Gluon Sequential block, which stacks layers sequentially one on top of another.
net = build_net(nn.Sequential())

# mxnet.gluon.loss.SoftmaxCrossEntropyLoss includes softmax function. 
# Therefore, we didn't include "softmax" activation into output Dense layer above.
softmax_cross_entropy = loss.SoftmaxCrossEntropyLoss() 

# Model Training

Now we are ready to train the model.  We also need to define some hyper-parameters for model training.

## LAB INSTRUCTION
- Enter **10** as the value for variable **num_epoch**  - (This is number of epochs to train the model)
- Enter **"adam"** as the value for variable **optimizer** - (This is the optimizer for updating the weights)
- Enter **mx.metric.Accuracy()** as the value for variable **eval_metric**  (This is the performance evaluation metric)
- Enter **0.01** as the value for variable **learning_rate** - (This parameters defines how much we are adjusting weights of our network during training).


In [None]:
# training parameteres 
num_epoch =        # Follow instruction above 
optimizer =        # Follow instruction above
eval_metric =      # Follow instruction above
learning_rate =    # Follow instruction above

# MXNet allows users to choose whether to run computation on GPU or CPU devices. 
# Code line below defaults to computation on GPU if it's available.
device = mx.gpu() if mx.context.num_gpus() > 0 else mx.cpu()

# To successfully train our model, we need to initialize model parameters (weights and biases).
# We use normal distribution for parameters using from mx.init package
net.collect_params().initialize(mx.init.Normal(sigma=.1), force_reinit=True, ctx=device)

# model training
trainer = Trainer(net.collect_params(), optimizer, {'learning_rate': learning_rate})


# Define the estimator, by passing to it the model, loss function, metrics, trainer object and context
est = estimator.Estimator(net=net,
                          loss=softmax_cross_entropy,
                          val_metrics=eval_metric,
                          trainer=trainer,
                          context=device)

est.fit(train_data=train_data, val_data=val_data, epochs=num_epoch)

## LAB INSTRUCTION

- Add a new cell by clicking on the **"+"** sign on the tool bar  
- Type **net.summary(nd.ones((2500,500), ctx=device))** in the new cell and run the cell to visualize the network

Answer following questions:
- What are params of Embedding and Dense layers? What happened with layer params during model training?
- Why does Activation and Dropout layers have no any params?

In [None]:
net.summary(nd.ones((2500,500), ctx=device))

# Model Evaluation
We evaluated the model during training using validation dataset.  Now let's try to evaluate accuracy on test dataset which was "unseen" by model during training.

Answer following questions:
- Is model accuracy on test dataset comparable to model accuracy on training and validation datasets? 
- How can you explain difference?

In [None]:
# Convert evaluation data and labels into MXNet NDArray class as Gluon model expects it.
labels = mx.nd.array(y_test, ctx=device)
test_data = mx.nd.array(X_test, ctx=device)

# Get prediction using trained model and doing one pass forward using net(x) method
predictions = net(test_data)
# Convert float model output to binary values: 0 (negative) or 1 (positive)
predictions = predictions.argmax(axis=1)

# Create MXnet Metric object to evaluate accuracy of predictions.
metric = mx.metric.Accuracy(axis = 0)
metric.update(preds = predictions, labels = labels)
print(metric.get())

# Saving The Model

Now we have the model fully trained, we can save the model for later use.


## LAB INSTRUCTION
- After running the cell below, check that file with parameters was created

In [None]:
# Save the model
filename = "mpl.params"
net.save_parameters(filename)

---
# Making Predictions


# Load Saved Model



In [None]:
# Let create a new network with exact same network architecture and load previously trained parameters.
new_net = build_net(nn.Sequential())
new_net.load_parameters(filename, ctx=device)

In [None]:
# Some helper function for making the prediction

# This function takes a text string and return a nd array with word indexes 
def prepare_imdb_list(text, maxlen=500, vocabsize=10000):
    imdb_word_index = tok.word_index
    
    sentence = []

    sentence.append(str(text))
    

    #tokenize the input sentence
    tokens = Tokenizer()
    tokens.fit_on_texts(sentence)

    # get a list of words from the encoding
    words = []
    for iter in range(len(tokens.word_index)):
        words += [key for key,value in tokens.word_index.items() if value==iter+1]
    
    # create a imdb based sequence from the words and specified vocab size
    imdb_seq = []
    err_count = 0
    for w in words:
        try:
            idx = imdb_word_index[w]
            if idx < vocabsize:
                imdb_seq.append(idx)
        except:
            err_count = err_count + 1

    # next we need to create a list of list so we can use pad_sequence to pad the inputs
    new_list = []
    new_list.append(imdb_seq)

    new_list = pad_sequences(new_list, maxlen=maxlen)
    
    return new_list


def predict_sentiment(model, text_nd):
    # Convert input data into expected MXNet NDArray format
    pred_data = mx.nd.array(text_nd, ctx=device)
    
    # Get prediction using trained model and doing one pass forward using net(x) method
    predictions = model(pred_data)
    predictions = nd.softmax(predictions, axis=1) # convert float model output into softmax probabilities
    
    return predictions
    

# Sample Movie Review Text For Testing

You can use the samples below - or any other review text - to try out the predictive power of the model. 

## Negative sentiment review samples
- Blake Edwards' legendary fiasco, begins to seem pointless after just 10 minutes. A combination of The Eagle Has Landed, Star!, Oh! What a Lovely War!, and Edwards' Pink Panther films, Darling Lili never engages the viewer; the aerial sequences, the musical numbers, the romance, the comedy, and the espionage are all ho hum. At what point is the viewer supposed to give a damn? This disaster wavers in tone, never decides what it wants to be, and apparently thinks it's a spoof, but it's pathetically and grindingly square. Old fashioned in the worst sense, audiences understandably stayed away in droves. It's awful. James Garner would have been a vast improvement over Hudson who is just cardboard, and he doesn't connect with Andrews and vice versa. And both Andrews and Hudson don't seem to have been let in on the joke and perform with a miscalculated earnestness. Blake Edwards' SOB isn't much more than OK, but it's the only good that ever came out of Darling Lili. The expensive and professional look of much of Darling Lili, only make what it's all lavished on even more difficult to bear. To quote Paramount chief Robert Evans, 24 million dollars worth of film and no picture.

- A mean spirited, repulsive horror film about 3 murderous children. Susan Strasberg is totally wasted in a 5-minute cameo, even though she receives star billing. If you are a Julie Brown fan, you'll want to check it out, since she's naked in a couple of shots. All others,avoid.


## Positive sentiment review samples
- I went and saw this movie last night after being coaxed to by a few friends of mine. I'll admit that I was reluctant to see it because from what I knew of Ashton Kutcher he was only able to do comedy. I was wrong. Kutcher played the character of Jake Fischer very well, and Kevin Costner played Ben Randall with such professionalism. The sign of a good movie is that it can toy with our emotions. This one did exactly that. The entire theater (which was sold out) was overcome by laughter during the first half of the movie, and were moved to tears during the second half. While exiting the theater I not only saw many women in tears, but many full grown men as well, trying desperately not to let anyone see them crying. This movie was great, and I suggest that you go see it before you judge.

- This is one of my three all-time favorite movies. My only quibble is that the director, Peter Yates, had too many cuts showing the actors individually instead of together as a scene, but the performances were so great I forgive him. Albert Finney and Tom are absolutely marvelous; brilliant. The script is great, giving a very good picture of life in the theatre during World War II (and, therefore, what it was like in the 30s as well). Lots of great, subtle touches, lots of broad, overplayed strokes, all of it perfectly done. Scene after scene just blows me away, and then there's the heartbreaking climax.

Enter your review in variable **text_to_predict_** below and then run next cell to predict this text snippet sentiment. See example

In [None]:
text_to_predict = """
I went and saw this movie last night after being coaxed to by a few friends of mine. 
I'll admit that I was reluctant to see it because from what I knew of Ashton Kutcher he was 
only able to do comedy. I was wrong. Kutcher played the character of Jake Fischer very well,
and Kevin Costner played Ben Randall with such professionalism. The sign of a good movie is 
that it can toy with our emotions. This one did exactly that. The entire theater (which was sold 
out) was overcome by laughter during the first half of the movie, and were moved to tears 
during the second half. While exiting the theater I not only saw many women in tears, 
but many full grown men as well, trying desperately not to let anyone see them crying. 
This movie was great, and I suggest that you go see it before you judge.
"""

In [None]:
text_nd = prepare_imdb_list(text_to_predict)   
predictions = predict_sentiment(new_net, text_nd)

print('Probability for negative sentiment (0):  %0.4f ' % predictions.asnumpy()[0:1,0])
print('Probability for positive sentiment (1):   %0.4f ' % predictions.asnumpy()[0:1,1])
