Long short-term memory (LSTM) is an artificial recurrent neural network (RNN) architecture used in the field of deep learning.

A common LSTM unit is composed of a cell, an input gate, an output gate and a forget gate. The cell remembers values over arbitrary time intervals and the three gates regulate the flow of information into and out of the cell.

LSTM networks are well-suited to classifying, processing and making predictions based on time series data, since there can be lags of unknown duration between important events in a time series. LSTMs were developed to deal with the vanishing gradient problem that can be encountered when training traditional RNNs. Relative insensitivity to gap length is an advantage of LSTM over RNNs, hidden Markov models and other sequence learning methods in numerous applications.

In [191]:
from keras.models import Sequential
from keras.layers.core import Activation, Dense, Dropout
from keras.layers.recurrent import LSTM
from keras.layers.embeddings import Embedding
from keras.preprocessing import sequence
from sklearn.model_selection import train_test_split
import numpy as np
import collections
import nltk

print('Using UMICH SI650 dataset')

maxlen = 0
word_freqs = collections.Counter()
num_recs = 0
ftrain = open("tweets.txt", 'r', encoding = 'utf-8')
for line in ftrain:
    label, sentence = line.strip().split("\t")
    words = nltk.word_tokenize(sentence.lower())
    if len(words) > maxlen:
        maxlen = len(words)
    for word in words:
        word_freqs[word] += 1
    num_recs += 1
ftrain.close()

MAX_FEATURES = 2000
MAX_SENTENCE_LENGTH = 40

print('Forming vocabulary...')

vocab_size = min(MAX_FEATURES, len(word_freqs)) + 2
word2index = {x[0]: i+2 for i, x in 
                enumerate(word_freqs.most_common(MAX_FEATURES))}
word2index["PAD"] = 0
word2index["UNK"] = 1
index2word = {v:k for k, v in word2index.items()}

X = np.empty((num_recs, ), dtype = list)
y = np.zeros((num_recs, ))
i = 0
ftrain = open("tweets.txt", 'r', encoding = 'utf-8')
for line in ftrain:
    label, sentence = line.strip().split("\t")
    words = nltk.word_tokenize(sentence.lower())
    seqs = []
    for word in words:
        if word in word2index:
            seqs.append(word2index[word])
        else:
            seqs.append(word2index["UNK"])
    X[i] = seqs
    y[i] = int(label)
    i += 1
ftrain.close()
X = sequence.pad_sequences(X, maxlen = MAX_SENTENCE_LENGTH)

Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, test_size = 0.2, random_state = 214)

EMBEDDING_SIZE = 128
HIDDEN_LAYER_SIZE = 32
BATCH_SIZE = 16
NUM_EPOCHS = 9

model = Sequential()
model.add(Embedding(vocab_size, EMBEDDING_SIZE, input_length = MAX_SENTENCE_LENGTH))
model.add(Dropout(0.2))
model.add(LSTM(HIDDEN_LAYER_SIZE, dropout = 0.2, recurrent_dropout = 0.2, bias_regularizer=None, recurrent_activation='sigmoid'))
model.add(Dense(1))
model.add(Activation("sigmoid"))

print('Compiling LSTM-based model...')

model.compile(loss="binary_crossentropy", optimizer="rmsprop", metrics=["accuracy"])

history = model.fit(Xtrain, ytrain, batch_size = BATCH_SIZE, epochs = NUM_EPOCHS, validation_data = (Xtest, ytest))

score, acc = model.evaluate(Xtest, ytest, batch_size = BATCH_SIZE)
print('Accuracy on TEST SET: ',(100*acc),'%')

Using UMICH SI650 dataset
Forming vocabulary...
Compiling LSTM-based model...
Train on 854 samples, validate on 214 samples
Epoch 1/9
Epoch 2/9
Epoch 3/9
Epoch 4/9
Epoch 5/9
Epoch 6/9
Epoch 7/9
Epoch 8/9
Epoch 9/9
Accuracy on TEST SET:  98.13084006309509 %


In [194]:
print('First row - prediction, second row - target, third row - sentence. Have a fun!')

for i in range(10):
    idx = np.random.randint(len(Xtest))
    xtest = Xtest[idx].reshape(1,40)
    ylabel = ytest[idx]
    ypred = model.predict(xtest)[0][0]
    sent = " ".join([index2word[x] for x in xtest[0].tolist() if x != 0])
    print("%.0f\t%d\t%s" % (ypred, ylabel, sent))

First row - prediction, second row - target, third row - sentence. Have a fun!
1	1	i liked the harry potter lines , but ron is my favourite character ...
1	1	if theres one thing i love as much as harry potter , its avatar ! ! ! ..
0	0	i wo n't go too far into my rant about why these people are deluded if they think harry potter is evil ( and is anything less than a loose allegory of the bible .
1	1	then again , my opinion may be a bit biased because i loved the da vinci code soundtrack . ) .
0	0	its freezing cold up there ! - ... ... -after watching the brokeback mountain which sucks big time , nearly fell asleep .
1	1	i miss the harry potter hookup .
0	0	i 'm not even halfway through this movie , but i think brokeback mountain is terrible..
1	1	i love harry potter , but every few months or so i 'll go through an intense harry potter phase .
0	0	i heard da vinci code sucked pretty hard , which is too bad , because i like ron howard .
1	1	i liked the harry potter lines , but ron is my f

In [195]:
from keras.models import load_model

In [196]:
model.save('sentim_model.h5')