# Recurring Neural Networks with Keras
## Sentiment analysis from movie reviews

The data set we have used consists of user-generated movie reviews and
classification of whether the user liked the movie or not based on its
associated rating.

More info on the dataset is here:

https://keras.io/datasets/#imdb-movie-reviews-sentiment-classification

So we are going to use an RNN to do sentiment analysis on full-text movie reviews!

We have trained an artificial neural network how to "read" movie reviews and guess whether the author liked the movie or not from them.

Since understanding written language requires keeping track of all the words in a sentence, we need a recurrent neural network to keep a "memory" of the words that have come before as it "reads" sentences over time.

In particular, we'll use LSTM (Long Short-Term Memory) cells because we don't really want to "forget" words too quickly - words early on in a sentence can affect the meaning of that sentence significantly.

In [1]:
# Importing libraries

from tensorflow.keras.preprocessing import sequence
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Embedding
from tensorflow.keras.layers import LSTM
from tensorflow.keras.datasets import imdb

In [2]:
print('Data is loading...')
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=20000)

Data is loading...


  x_train, y_train = np.array(xs[:idx]), np.array(labels[:idx])
  x_test, y_test = np.array(xs[idx:]), np.array(labels[idx:])


We have a bunch of movie reviews that have been converted into vectors of words represented by integers, and a binary sentiment classification to learn from.

In [3]:
x_train=sequence.pad_sequences(x_train,maxlen=100)
x_test=sequence.pad_sequences(x_test,maxlen=100)

In [None]:
model = Sequential()
model.add(Embedding(20000, 128))
model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))

In [5]:
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])

In [6]:
model.fit(x_train, y_train, batch_size=32,epochs=5,verbose=2,validation_data=(x_test, y_test))

Epoch 1/5
782/782 - 183s - loss: 0.4244 - accuracy: 0.7994 - val_loss: 0.3718 - val_accuracy: 0.8358
Epoch 2/5
782/782 - 175s - loss: 0.2437 - accuracy: 0.9046 - val_loss: 0.3554 - val_accuracy: 0.8449
Epoch 3/5
782/782 - 172s - loss: 0.1572 - accuracy: 0.9426 - val_loss: 0.4920 - val_accuracy: 0.8415
Epoch 4/5
782/782 - 157s - loss: 0.1090 - accuracy: 0.9607 - val_loss: 0.5494 - val_accuracy: 0.8295
Epoch 5/5
782/782 - 273s - loss: 0.0747 - accuracy: 0.9731 - val_loss: 0.5630 - val_accuracy: 0.8294


<tensorflow.python.keras.callbacks.History at 0x7f2db3e51f50>

In [7]:
score, acc = model.evaluate(x_test, y_test,
                            batch_size=32,
                            verbose=2)
print('Test score:', score)
print('Test accuracy:', acc)

782/782 - 17s - loss: 0.5630 - accuracy: 0.8294
Test score: 0.5629750490188599
Test accuracy: 0.8293600082397461
