# Sentiment analysis

This notebook will go through simple preprocessing and building a LSTM neural network. The neural network will be tasked with classifying the sentiment of imbd reviews. 

In [1]:
# Set up EGPU for training using plaidml
import os

os.environ["KERAS_BACKEND"] = "plaidml.keras.backend"

### Preprocessing

In [2]:
# Import Librarys
from __future__ import print_function
from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Embedding
from keras.layers import LSTM
from keras import optimizers
# Load dataset
from keras.datasets import imdb


print('Loading data...')

Loading data...


Using plaidml.keras.backend backend.


In [3]:
max_features = 20000
maxlen = 80
batch_size = 32

# Split dataset into training and testing sets
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)

# Print length of new datasets
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')

25000 train sequences
25000 test sequences


In [4]:
# Pad data to equal lengths to feed into the model

print('Pad sequences (samples x time)')
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)

Pad sequences (samples x time)
x_train shape: (25000, 80)
x_test shape: (25000, 80)


### Build and Train Model

In [10]:
# Build model

print('Build model...')
model = Sequential()
# Embedding layer converts words to integers
model.add(Embedding(max_features, 128, input_length=maxlen))
# LSTM layer with dropout to prevent overfitting
model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
# Final layer of model with sigmoid function 
model.add(Dense(1, activation='sigmoid'))

Build model...


In [11]:
# Create otimizer
o=optimizers.adam(lr=0.0001)

# Compile Model 
model.compile(loss='binary_crossentropy',
              optimizer=o,
              metrics=['accuracy'])

In [12]:
# Train model with training dataset
print('Train...')
model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=8,
          validation_data=(x_test, y_test))

# Test trained model with testing dataset
score, acc = model.evaluate(x_test, y_test,
                            batch_size=batch_size)

# Print results
print('Test score:', score)
print('Test accuracy:', acc)

Train...
Train on 25000 samples, validate on 25000 samples
Epoch 1/8


INFO:plaidml:Analyzing Ops: 2413 of 9426 operations complete
INFO:plaidml:Analyzing Ops: 7806 of 9426 operations complete




INFO:plaidml:Analyzing Ops: 2246 of 9427 operations complete
INFO:plaidml:Analyzing Ops: 7963 of 9427 operations complete


Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Test score: 0.4278096935367584
Test accuracy: 0.82164


After the training of the model for eight epochs we can see that the model became better at predicting the seentiment in the train dataset but idd not perform the same in the test dataset. As we can see from looking at the validation accuracy, the model became worse at classifying sentiment after the third epoch decreasing from .8421 to .8216. This is not a huge decrease but it suggests that the model may be overfitting to the training dataset. 