# Sentiment Analysis

 It uses TFLearn to train a Sentiment Analyzer on a set of IMDB Movie ratings. Once trained, given some input text, it will be able to classify it as either positive or negative. The neural network that is built for this is a recurrent network using the Long Short Term Memory (LSTM).



##### Short degression on LSTM

In a traditional recurrent neural network, during the gradient back-propagation phase, the gradient signal can end up being multiplied a large number of times (as many as the number of timesteps) by the weight matrix associated with the connections between the neurons of the recurrent hidden layer. This means that, the magnitude of weights in the transition matrix can have a strong impact on the learning process.

If the weights in this matrix are small (or, more formally, if the leading eigenvalue of the weight matrix is smaller than 1.0), it can lead to a situation called vanishing gradients where the gradient signal gets so small that learning either becomes very slow or stops working altogether. It can also make more difficult the task of learning long-term dependencies in the data. Conversely, if the weights in this matrix are large (or, again, more formally, if the leading eigenvalue of the weight matrix is larger than 1.0), it can lead to a situation where the gradient signal is so large that it can cause learning to diverge. This is often referred to as exploding gradients.

These issues are the main motivation behind the LSTM model which introduces a new structure called a memory cell (see Figure 1 below). A memory cell is composed of four main elements: an input gate, a neuron with a self-recurrent connection (a connection to itself), a forget gate and an output gate. The self-recurrent connection has a weight of 1.0 and ensures that, barring any outside interference, the state of a memory cell can remain constant from one timestep to another. The gates serve to modulate the interactions between the memory cell itself and its environment. The input gate can allow incoming signal to alter the state of the memory cell or block it. On the other hand, the output gate can allow the state of the memory cell to have an effect on other neurons or prevent it. Finally, the forget gate can modulate the memory cellâ€™s self-recurrent connection, allowing the cell to remember or forget its previous state, as needed.

###### source: http://deeplearning.net/tutorial/lstm.html

In [1]:
'''TFlearn is a modular and transparent deep learning library built 
on top of Tensorflow. It was designed to provide a higher-level API 
to TensorFlow in order to facilitate and speed-up experimentations, 
while remaining fully transparent and compatible with it.'''
import tflearn
from tflearn.data_utils import to_categorical, pad_sequences

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


curses is not supported on this machine (please install/reinstall curses for an optimal experience)








In [2]:
# download of the database used
from tflearn.datasets import imdb

In [3]:
# IMDB Dataset loading
    #n_words: top most frequent words to consider
    #valid_portion: % of dataset used for validation
train, test, _ = imdb.load_data(path='imdb.pkl', n_words=10000,
                                valid_portion=0.1)

# For the separation is used the values set by valid_portion
trainX, trainY = train
testX, testY = test 

In [4]:
#DATA PROCESSING
#sequence padding
    # pad_sequences is used to ensure that all sequences in a list have 
    # the same length. By default this is done by padding 0 in the 
    # beginning of each sequence until each sequence has the same 
    # length as the longest sequence.
trainX = pad_sequences(trainX, maxlen=100, value=0.)
testX = pad_sequences(testX, maxlen=100, value=0.)

In [6]:
#converting labels to binary vectors
    # to_categorical converts a class vector (integer)
    # to binary class matrix
trainY = to_categorical(trainY, nb_classes=2)
testY = to_categorical(testY, nb_classes=2)

In [7]:
#network building
    #the input layer
net = tflearn.input_data([None, 100])




In [8]:
''' Machine learning models take vectors (arrays of numbers) as input. 
When working with text, the first thing we must do come up with a 
strategy to convert strings to numbers (or to "vectorize" the text) 
before feeding it to the model.
Word embeddings give us a way to use an efficient, dence rappresentation 
in shich similar words have a similar encoding. Importantly, we do not 
have to specify this encoding by hand. An embedding is a dense vector of 
floating point values (the length of the vector is a parameter you specify).
Instead of specifying the values for the embedding manually, they are 
trainable parameters (weights learned by the model during training, in 
the same way a model learns weights for a dense layer).
MORE: https://www.tensorflow.org/tutorials/text/word_embeddings 
'''
# input_dim = 10 000 because for every neuron will be used every 
# input
net = tflearn.embedding(net, input_dim=10000, output_dim=128)

Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor


In [9]:
# Long Short term memory. This layer allow our network to remember data
# from the beginneing of the sequence, which will improve our prediction
# Furthermode it is set DROPOUT to 0.08 whihc is a techinque that helps
# prevent over fitting by randomly turning on and off pathways on our
# network
net = tflearn.lstm(net, 128, dropout=0.8)

Instructions for updating:
Please use `keras.layers.RNN(cell, unroll=True)`, which is equivalent to this API
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor


In [10]:
# Our next layer is fully connected which means that every neuron in
# the previous layer is connected to every neuron in this layer. We have
# a set of learned feature vectors from previous layers, and adding a fully
# connected layer is a computationally cheap way of learning non-linear
# combinations of them
net = tflearn.fully_connected(net, 2, activation='softmax')

In [11]:
# adam: it performs grandient descent
net = tflearn.regression(net, optimizer='adam', learning_rate=0.001,
                         loss='categorical_crossentropy')


Instructions for updating:
keep_dims is deprecated, use keepdims instead


In [None]:
# Training
model = tflearn.DNN(net, tensorboard_verbose=0)
model.fit(trainX, trainY, validation_set=(testX, testY), show_metric=True,
          batch_size=32)

Training Step: 1885  | total loss: [1m[32m0.24049[0m[0m | time: 54.871s
| Adam | epoch: 003 | loss: 0.24049 - acc: 0.9018 -- iter: 15264/22500
