# Machine Learning Final Project
## Sentiment analysis of hotel reviews using neural networks

In this project, I build upon the work in my capstone project. There, we used sklearn's TF-IDF vectorizer to encode ngrams of words and use them to classify a review as a score from 1-5, to include in that recommender system model. Using a lot of computing power, I was able to achieve significant accuracy using that method on the entire set of reviews. However, as noted in detail below, sampling the reviews had far less accuracy.

In this case, we are looking at the sentiment analysis of the text with the goal to build a neural network model that is better than the bag of words model. The secondary goal is get pratice building different types of neural networks and understand the tuning of these models.

In [42]:
import pandas as pd
import numpy as np
import glob
import json
from bs4 import BeautifulSoup
from os import path

First, I define a function to pull the text reviews (and other data) into a dataframe from the JSON format, below.

In [43]:
#function to turn the json files into a dataframe
def json_to_df(filename):
    """Return a Pandas DataFrame with Reviews from the JSON File given."""

    with open(filename) as f:
        data = json.load(f)
        reviews = data['Reviews']
        info = data['HotelInfo']

    df = pd.DataFrame()
    for review in reviews:
        rr = pd.Series(review['Ratings'], name=review['ReviewID'])
        rr['Date'] = review['Date']
        rr['Author'] = review['Author']
        try:
            rr['AuthorLocation'] = review['AuthorLocation'].split(', ')[-1]
        except Exception:
            rr['AuthorLocation'] = np.nan
        rr['Review'] = review['Content']
        df = df.append(rr)


    # Hotel Info
    df['HotelID'] = int(info['HotelID'])
    try:
        df['Hotel'] = info['Name']
    except Exception:
        df['Hotel'] = ""
    price_range = [int(''.join([el for el in price if el.isdigit()])) for price in info['Price'].split('-')]
    df['PriceMin'] = price_range[0]
    df['PriceMax'] = price_range[-1]
    try:
        address = BeautifulSoup(info['Address'], 'lxml')
    except Exception:
        address = ""
    try:
        region = address.find('span', property='v:region').text
    except Exception:
        region = np.nan
    try:
        df['HotelLocation'] = region
    except Exception:
        df['HotelLocation'] = ""
         
    return df

Now, with the function defined and some error catching thrown in, as the reviews are often missing some data, we begin the process of sampling and loading the data.  

In [45]:
from timeit import default_timer as timer #for timing

start = timer()
#load data into dataframe
path ='/Users/Ryan/Desktop/Programming/SMU/Capstone/TripAdvisor/json' 
allFiles = glob.glob(path + "/*.json")
#print(allFiles)
#due to computing constraints, we have to sample the data set and can't use the whole thing
#so we randomly sample some number of them
#1000 hotels takes over half an hour to load on my laptop
randFiles = np.random.choice(allFiles, 200, replace=False)
df2 = pd.DataFrame()
df_reviews = pd.DataFrame()
list_ = []
i = 1

#switch between either sampling or all of the files
for file_ in randFiles:
#for file_ in allFiles:
    try:
        #df = tripadvisor_convert.to_df(file_)
        df2 = json_to_df(file_)
    #df2 = json_to_df(file_)
    except Exception:
        pass 
    list_.append(df2)
    df_reviews = pd.concat(list_)
    print(file_ + "... done  -", i)
    i = i+1
end = timer()

df_reviews.drop_duplicates(keep='first', inplace=True)
print(df_reviews.head(2))
print (end - start)
print(len(df_reviews))


/Users/Ryan/Desktop/Programming/SMU/Capstone/TripAdvisor/json/1910036.json... done  - 1
/Users/Ryan/Desktop/Programming/SMU/Capstone/TripAdvisor/json/484017.json... done  - 2
/Users/Ryan/Desktop/Programming/SMU/Capstone/TripAdvisor/json/614389.json... done  - 3
/Users/Ryan/Desktop/Programming/SMU/Capstone/TripAdvisor/json/305486.json... done  - 4
/Users/Ryan/Desktop/Programming/SMU/Capstone/TripAdvisor/json/841981.json... done  - 5
/Users/Ryan/Desktop/Programming/SMU/Capstone/TripAdvisor/json/589554.json... done  - 6
/Users/Ryan/Desktop/Programming/SMU/Capstone/TripAdvisor/json/73855.json... done  - 7
/Users/Ryan/Desktop/Programming/SMU/Capstone/TripAdvisor/json/294930.json... done  - 8
/Users/Ryan/Desktop/Programming/SMU/Capstone/TripAdvisor/json/234947.json... done  - 9
/Users/Ryan/Desktop/Programming/SMU/Capstone/TripAdvisor/json/636152.json... done  - 10
/Users/Ryan/Desktop/Programming/SMU/Capstone/TripAdvisor/json/1223953.json... done  - 11
/Users/Ryan/Desktop/Programming/SMU/Caps

/Users/Ryan/Desktop/Programming/SMU/Capstone/TripAdvisor/json/73884.json... done  - 94
/Users/Ryan/Desktop/Programming/SMU/Capstone/TripAdvisor/json/2514827.json... done  - 95
/Users/Ryan/Desktop/Programming/SMU/Capstone/TripAdvisor/json/91607.json... done  - 96
/Users/Ryan/Desktop/Programming/SMU/Capstone/TripAdvisor/json/281234.json... done  - 97
/Users/Ryan/Desktop/Programming/SMU/Capstone/TripAdvisor/json/268433.json... done  - 98
/Users/Ryan/Desktop/Programming/SMU/Capstone/TripAdvisor/json/1174552.json... done  - 99
/Users/Ryan/Desktop/Programming/SMU/Capstone/TripAdvisor/json/507538.json... done  - 100
/Users/Ryan/Desktop/Programming/SMU/Capstone/TripAdvisor/json/678622.json... done  - 101
/Users/Ryan/Desktop/Programming/SMU/Capstone/TripAdvisor/json/99365.json... done  - 102
/Users/Ryan/Desktop/Programming/SMU/Capstone/TripAdvisor/json/230355.json... done  - 103
/Users/Ryan/Desktop/Programming/SMU/Capstone/TripAdvisor/json/78135.json... done  - 104
/Users/Ryan/Desktop/Programmi

/Users/Ryan/Desktop/Programming/SMU/Capstone/TripAdvisor/json/2515536.json... done  - 186
/Users/Ryan/Desktop/Programming/SMU/Capstone/TripAdvisor/json/85007.json... done  - 187
/Users/Ryan/Desktop/Programming/SMU/Capstone/TripAdvisor/json/582157.json... done  - 188
/Users/Ryan/Desktop/Programming/SMU/Capstone/TripAdvisor/json/89357.json... done  - 189
/Users/Ryan/Desktop/Programming/SMU/Capstone/TripAdvisor/json/1196040.json... done  - 190
/Users/Ryan/Desktop/Programming/SMU/Capstone/TripAdvisor/json/232321.json... done  - 191
/Users/Ryan/Desktop/Programming/SMU/Capstone/TripAdvisor/json/1434751.json... done  - 192
/Users/Ryan/Desktop/Programming/SMU/Capstone/TripAdvisor/json/229441.json... done  - 193
/Users/Ryan/Desktop/Programming/SMU/Capstone/TripAdvisor/json/2514871.json... done  - 194
/Users/Ryan/Desktop/Programming/SMU/Capstone/TripAdvisor/json/642994.json... done  - 195
/Users/Ryan/Desktop/Programming/SMU/Capstone/TripAdvisor/json/227457.json... done  - 196
/Users/Ryan/Desktop

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.




For this analysis, since I don't need anything at this point but the overall score and the review text, we drop the rest. I am also ignoring EDA, since this has been done already on the Capstone/Thesis project.

In [46]:
#drop extra columns
df_reviews = df_reviews.filter(items=['Overall', 'Review'])
df_reviews.head()

Unnamed: 0,Overall,Review
UR126450078,5.0,My wife and I recently stayed here for 5 night...
UR126433804,5.0,"Great location, friendly staff, amazing rooms ..."
UR126126502,3.0,"Ok, this hotel looks great, the interior decor..."
UR125766797,5.0,"This really could be a one word review, with t..."
UR125671278,5.0,Stayed at La Belle Juliette for 4 nights. It i...


Some method must be now used to classify each review text, as written, as either positive (1) or negative (0). Many people use a manual process with humans doing the scoring, but this isn't doable here. I chose to infer the classification for the Overall rating for the hotel that accompanies the review. If the Overall rating for the hotel is 3 or higher, I am assuming the written review is positive. If it's 1 or 2, I'm assuming it is negative.

I noted in the capstone project that this isn't an ideal method. In some instances, the review text and Overall rating don't match up. A paper I referenced backs this up, that some times there isn't strong correlation between the review text and the scores. It will hopefully be good enough for our purposes since it is true in most cases, and it seems like common sense that this would work.

In [47]:
#this is lazy but i cant classify tens or hundreds of thousands of reviews manually
#this is binary and necessary for the neural networks
#add to df

sentiment = []

for row in df_reviews['Overall']:
    if float(row) >= 3:
        sentiment.append(1)
    else:
        sentiment.append(0)
        
df_reviews['sentiment'] = sentiment

df_reviews.head(2)
        

Unnamed: 0,Overall,Review,sentiment
UR126450078,5.0,My wife and I recently stayed here for 5 night...,1
UR126433804,5.0,"Great location, friendly staff, amazing rooms ...",1


Now that we have the reviews classified as positive and negative in the "sentiment" column, we can drop the Overall rating of the hotels.

In [48]:
#drop overall scores
df_reviews.drop(columns=['Overall'])
df_reviews.head(2)

Unnamed: 0,Overall,Review,sentiment
UR126450078,5.0,My wife and I recently stayed here for 5 night...,1
UR126433804,5.0,"Great location, friendly staff, amazing rooms ...",1


Finally, before we make the models, we do a 80/20% train test split on the data.

In [49]:
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(df_reviews['Review'],df_reviews['sentiment'], test_size=0.2,random_state=42,stratify=df_reviews['sentiment'])
print(x_train.shape, x_test.shape, y_train.shape, y_test.shape)

(19700,) (4925,) (19700,) (4925,)


## TF-IDF Vectorizer

From my capstone project, I did this analysis to classify text reviews as a score from 1-5. In that case, using the entire corpus of 1.6+ million reviews, we achieved a 95% accuracy. However, this required using Amazon Web Services, and isn't doable for this project. Instead, using the sample method above, I only achieved 60-65% accuracy using between 100 and 1000 hotels worth of reviews. This is a significant decrease in accuracy compared to using the whole data set.

In [50]:
from sklearn.feature_extraction.text import TfidfVectorizer
import datetime

#uses groups of 1 or 2 words
vectorizer = TfidfVectorizer(ngram_range=(1,2))
t1 = datetime.datetime.now()
vectors = vectorizer.fit_transform(df_reviews['Review'])
print(datetime.datetime.now()-t1)

0:00:18.044872


  if hasattr(X, 'dtype') and np.issubdtype(X.dtype, np.float):


In [51]:
#test train split on this data using vectors df
X_train2, X_test2, y_train2, y_test2 = train_test_split(vectors, df_reviews['sentiment'], test_size=.2, random_state=42)

In [52]:
from sklearn.svm import LinearSVC

classifier = LinearSVC()

t1 = datetime.datetime.now()
classifier.fit(X_train2, y_train2)
print(datetime.datetime.now()-t1)

0:00:00.972765


preds = classifier.predict(X_test2)
print (list(preds[:10]))
print(y_test[:10])

In [54]:
from sklearn.metrics import accuracy_score

print ("Accuracy Score: ", accuracy_score(y_test2, preds))

Accuracy Score:  0.9476142131979696


The accuracy achieved here is much higher than expected. As above, previous use of this method by classifying on a score from 1-5 resulted in only a .60 -.65 accuracy. For this sentiment analysis problem, the accuracy is .95. That is a high bar to beat for the deep learning models.

## RNN

In [55]:
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.text import text_to_word_sequence
from keras.preprocessing.sequence import pad_sequences

from keras.models import Model, Sequential, load_model

from keras.layers import Input, Dense, Embedding, Conv1D, Conv2D, MaxPooling1D, MaxPool2D
from keras.layers import Reshape, Flatten, Dropout, Concatenate
from keras.layers import SpatialDropout1D, concatenate
from keras.layers import GRU, Bidirectional, GlobalAveragePooling1D, GlobalMaxPooling1D

from keras.callbacks import Callback, ModelCheckpoint, EarlyStopping

In [56]:
#setting this to 100,000 words
MAX_WORDS = 100000

#initialize tokenizer and fit to review texts
tokenizer = Tokenizer(num_words=MAX_WORDS)
tokenizer.fit_on_texts(df_reviews['Review'])


In [57]:
#apply tokenizer to train and test sets
train_sequences = tokenizer.texts_to_sequences(x_train)
test_sequences = tokenizer.texts_to_sequences(x_test)


In [64]:
#pad length to make them all the same length, as required for RNN
MAX_LENGTH = 200
#increasing the max length dramatically increases processing time
padded_train = pad_sequences(train_sequences, maxlen=MAX_LENGTH)
padded_test = pad_sequences(test_sequences, maxlen=MAX_LENGTH)

print(padded_train.shape, padded_test.shape)


(19700, 200) (4925, 200)


In [65]:
#rnn model

embedding_dim = 300
embedding_matrix = np.random.random((MAX_WORDS, embedding_dim))

inp = Input(shape=(MAX_LENGTH, ))
x = Embedding(input_dim=MAX_WORDS, output_dim=embedding_dim, input_length=MAX_LENGTH, weights=[embedding_matrix],trainable=True)(inp)
x = SpatialDropout1D(0.1)(x)
x = Bidirectional(GRU(100, return_sequences=True))(x)
avg_pool = GlobalAveragePooling1D()(x)
max_pool = GlobalMaxPooling1D()(x)
conc = concatenate([avg_pool, max_pool])
outp = Dense(1, activation="sigmoid")(conc)

rnn_simple_model = Model(inputs = inp, outputs=outp)
rnn_simple_model.compile(loss="binary_crossentropy", optimizer='adam',metrics=['accuracy'])

In [66]:
print(len(padded_train), len(y_train), len(padded_test), len(y_test))

19700 19700 4925 4925


In [69]:
history = rnn_simple_model.fit(x = padded_train, 
                               y = y_train, 
                               validation_data=(padded_test, y_test),
                               batch_size=256, 
                               callbacks=[checkpoint], 
                               epochs=10, #5, 10
                               verbose=1)

y_pred_rnn_simple = rnn_simple_model.predict(padded_test, verbose=1, batch_size=1024)

y_pred_rnn_simple = pd.DataFrame(y_pred_rnn_simple, columns=['prediction'])
y_pred_rnn_simple['prediction'] = y_pred_rnn_simple['prediction'].map(lambda p: 1 if p >= 0.5 else 0)
y_pred_rnn_simple.to_csv('/Users/Ryan/Desktop/Programming/SMU/Machine Learning/y_pred_rnn_simple.csv', index=False)


Train on 19700 samples, validate on 4925 samples
Epoch 1/10

Epoch 00001: val_acc improved from 0.93868 to 0.94112, saving model to /Users/Ryan/Desktop/Programming/SMU/Machine Learning/weights-improvement-01-0.9411.hdf5
Epoch 2/10

Epoch 00002: val_acc did not improve from 0.94112
Epoch 3/10

Epoch 00003: val_acc did not improve from 0.94112
Epoch 4/10

Epoch 00004: val_acc did not improve from 0.94112
Epoch 5/10

Epoch 00005: val_acc did not improve from 0.94112
Epoch 6/10

Epoch 00006: val_acc improved from 0.94112 to 0.94132, saving model to /Users/Ryan/Desktop/Programming/SMU/Machine Learning/weights-improvement-06-0.9413.hdf5
Epoch 7/10

Epoch 00007: val_acc improved from 0.94132 to 0.94193, saving model to /Users/Ryan/Desktop/Programming/SMU/Machine Learning/weights-improvement-07-0.9419.hdf5
Epoch 8/10

Epoch 00008: val_acc did not improve from 0.94193
Epoch 9/10

Epoch 00009: val_acc did not improve from 0.94193
Epoch 10/10

Epoch 00010: val_acc did not improve from 0.94193


In [70]:
from sklearn.metrics import accuracy_score 
y_pred_rnn_simple = pd.read_csv('/Users/Ryan/Desktop/Programming/SMU/Machine Learning/y_pred_rnn_simple.csv')
print(accuracy_score(y_test, y_pred_rnn_simple))

0.9417258883248731


We see that the RNN results accuracy for the sample is almost the same as the TF-IDF model, and does very well. 

We will now try a CNN model. CNN's typically haven't been used for NLP, but recent work suggests they can be very effective.

## CNN

In [71]:
from keras.optimizers import Adam

embedding_dim = 300
    
filter_sizes = [2, 3, 5]
num_filters = 256
drop = 0.3

inputs = Input(shape=(MAX_LENGTH,), dtype='int32')
embedding = Embedding(input_dim=MAX_WORDS,
                        output_dim=embedding_dim,
                        weights=[embedding_matrix],
                        input_length=MAX_LENGTH,
                        trainable=True)(inputs)

reshape = Reshape((MAX_LENGTH, embedding_dim, 1))(embedding)
conv_0 = Conv2D(num_filters, 
                kernel_size=(filter_sizes[0], embedding_dim), 
                padding='valid', kernel_initializer='normal', 
                activation='relu')(reshape)

conv_1 = Conv2D(num_filters, 
                kernel_size=(filter_sizes[1], embedding_dim), 
                padding='valid', kernel_initializer='normal', 
                activation='relu')(reshape)
conv_2 = Conv2D(num_filters, 
                kernel_size=(filter_sizes[2], embedding_dim), 
                padding='valid', kernel_initializer='normal', 
                activation='relu')(reshape)

maxpool_0 = MaxPool2D(pool_size=(MAX_LENGTH - filter_sizes[0] + 1, 1), 
                strides=(1,1), padding='valid')(conv_0)

maxpool_1 = MaxPool2D(pool_size=(MAX_LENGTH - filter_sizes[1] + 1, 1), 
                strides=(1,1), padding='valid')(conv_1)

maxpool_2 = MaxPool2D(pool_size=(MAX_LENGTH - filter_sizes[2] + 1, 1), 
                strides=(1,1), padding='valid')(conv_2)
concatenated_tensor = Concatenate(axis=1)(
    [maxpool_0, maxpool_1, maxpool_2])
flatten = Flatten()(concatenated_tensor)
dropout = Dropout(drop)(flatten)
output = Dense(units=1, activation='sigmoid')(dropout)

cnn_model_multi_channel = Model(inputs=inputs, outputs=output)
adam = Adam(lr=1e-4, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)

cnn_model_multi_channel.compile(optimizer=adam, loss='binary_crossentropy', metrics=['accuracy'])

In [72]:
batch_size = 256
epochs = 10

history = cnn_model_multi_channel.fit(x=padded_train, 
                    y=y_train, 
                    validation_data=(padded_test, y_test), 
                    batch_size=batch_size, 
                    callbacks=[checkpoint], 
                    epochs=epochs, 
                    verbose=1)

y_pred_cnn_multi_channel = cnn_model_multi_channel.predict(padded_test, verbose=1, batch_size=2048)

y_pred_cnn_multi_channel = pd.DataFrame(y_pred_cnn_multi_channel, columns=['prediction'])
y_pred_cnn_multi_channel['prediction'] = y_pred_cnn_multi_channel['prediction'].map(lambda p: 1 if p >= 0.5 else 0)
y_pred_cnn_multi_channel.to_csv('/Users/Ryan/Desktop/Programming/SMU/Machine Learning/y_pred_cnn_multi_channel.csv', index=False)

Train on 19700 samples, validate on 4925 samples
Epoch 1/10

Epoch 00001: val_acc did not improve from 0.94193
Epoch 2/10

Epoch 00002: val_acc did not improve from 0.94193
Epoch 3/10

Epoch 00003: val_acc did not improve from 0.94193
Epoch 4/10

Epoch 00004: val_acc did not improve from 0.94193
Epoch 5/10

Epoch 00005: val_acc did not improve from 0.94193
Epoch 6/10

Epoch 00006: val_acc did not improve from 0.94193
Epoch 7/10

Epoch 00007: val_acc did not improve from 0.94193
Epoch 8/10

Epoch 00008: val_acc did not improve from 0.94193
Epoch 9/10

Epoch 00009: val_acc did not improve from 0.94193
Epoch 10/10

Epoch 00010: val_acc did not improve from 0.94193


In [73]:
y_pred_cnn_multi_channel = pd.read_csv('/Users/Ryan/Desktop/Programming/SMU/Machine Learning/y_pred_cnn_multi_channel.csv')
print(accuracy_score(y_test, y_pred_cnn_multi_channel))

0.9084263959390863


We see that the accuracy is slightly lower than the RNN model, but still good. This is expected, as research indicates that CNNs aren't typically used for NLP, but this is still a good result.

My research also suggests that using a CNN combined with an RNN a better model than a CNN or RNN alone, so I will do that next.

## CNN + RNN

In [32]:
embedding_dim = 300

inp = Input(shape=(MAX_LENGTH, ))
x = Embedding(MAX_WORDS, embedding_dim, weights=[embedding_matrix], input_length=MAX_LENGTH, trainable=True)(inp)
x = SpatialDropout1D(0.3)(x)
x = Bidirectional(GRU(100, return_sequences=True))(x)
x = Conv1D(64, kernel_size = 2, padding = "valid", kernel_initializer = "he_uniform")(x)
avg_pool = GlobalAveragePooling1D()(x)
max_pool = GlobalMaxPooling1D()(x)
conc = concatenate([avg_pool, max_pool])
outp = Dense(1, activation="sigmoid")(conc)
    
rnn_cnn_model = Model(inputs=inp, outputs=outp)
rnn_cnn_model.compile(loss='binary_crossentropy',
                  optimizer='adam',
                  metrics=['accuracy'])

In [33]:
batch_size = 256
epochs = 4

history = rnn_cnn_model.fit(x=padded_train, 
                    y=y_train, 
                    validation_data=(padded_test, y_test), 
                    batch_size=batch_size, 
                    callbacks=[checkpoint], 
                    epochs=epochs, 
                    verbose=1)

y_pred_rnn_cnn = rnn_cnn_model.predict(padded_test, verbose=1, batch_size=2048)

y_pred_rnn_cnn = pd.DataFrame(y_pred_rnn_cnn, columns=['prediction'])
y_pred_rnn_cnn['prediction'] = y_pred_rnn_cnn['prediction'].map(lambda p: 1 if p >= 0.5 else 0)
y_pred_rnn_cnn.to_csv('/Users/Ryan/Desktop/Programming/SMU/Machine Learning/y_pred_rnn_cnn.csv', index=False)

Train on 20879 samples, validate on 5220 samples
Epoch 1/4

Epoch 00001: val_acc did not improve from 0.92816
Epoch 2/4

Epoch 00002: val_acc did not improve from 0.92816
Epoch 3/4

Epoch 00003: val_acc did not improve from 0.92816
Epoch 4/4

Epoch 00004: val_acc did not improve from 0.92816


In [34]:
y_pred_rnn_cnn = pd.read_csv('/Users/Ryan/Desktop/Programming/SMU/Machine Learning/y_pred_rnn_cnn.csv')
print(accuracy_score(y_test, y_pred_rnn_cnn))

0.9256704980842911


The best result is better than the CNN alone, but not as good as the RNN. In research I read, they achieved slightly better performance than an RNN with the CNN+RNN combo, but I was not able to replicate this.

Some last minute research indicates that an LSTM is a useful layer to use in an RNN, so I'm going to implement that next and see how it does. LSTM stands for "long short term memory" and it is handles the vanishing gradient problem well, as it "remembers" for longer time steps than an RNN.

## LSTM RNN

In [86]:
from keras.layers import LSTM

lstm_out = 196

inp = Input(shape=(MAX_LENGTH, ))
x = Embedding(MAX_WORDS, embedding_dim, weights=[embedding_matrix], input_length=MAX_LENGTH, trainable=True)(inp)
x = SpatialDropout1D(0.2)(x)
x = Bidirectional(GRU(100, return_sequences=True))(x)
x = LSTM(lstm_out, dropout=0.2, recurrent_dropout=0.2)(x)
outp = Dense(1, activation="sigmoid")(x)

lstm_rnn_model = Model(inputs = inp, outputs=outp)
lstm_rnn_model.compile(loss = 'binary_crossentropy', optimizer='adam', metrics = ['accuracy'])


In [87]:
history = lstm_rnn_model.fit(x = padded_train, 
                               y = y_train, 
                               validation_data=(padded_test, y_test),
                               batch_size=256, 
                               callbacks=[checkpoint], 
                               epochs=10, #5, 10, 15
                               verbose=1)

y_pred_lstm = lstm_rnn_model.predict(padded_test, verbose=1, batch_size=1024)

y_pred_lstm = pd.DataFrame(y_pred_lstm, columns=['prediction'])
y_pred_lstm['prediction'] = y_pred_lstm['prediction'].map(lambda p: 1 if p >= 0.5 else 0)
y_pred_lstm.to_csv('/Users/Ryan/Desktop/Programming/SMU/Machine Learning/y_pred_lstm.csv', index=False)

Train on 19700 samples, validate on 4925 samples
Epoch 1/10

Epoch 00001: val_acc did not improve from 0.94193
Epoch 2/10

Epoch 00002: val_acc did not improve from 0.94193
Epoch 3/10

Epoch 00003: val_acc did not improve from 0.94193
Epoch 4/10

Epoch 00004: val_acc did not improve from 0.94193
Epoch 5/10

Epoch 00005: val_acc did not improve from 0.94193
Epoch 6/10

Epoch 00006: val_acc did not improve from 0.94193
Epoch 7/10

Epoch 00007: val_acc did not improve from 0.94193
Epoch 8/10

Epoch 00008: val_acc did not improve from 0.94193
Epoch 9/10

Epoch 00009: val_acc did not improve from 0.94193
Epoch 10/10

Epoch 00010: val_acc did not improve from 0.94193


In [88]:
y_pred_lstm = pd.read_csv('/Users/Ryan/Desktop/Programming/SMU/Machine Learning/y_pred_lstm.csv')
print(accuracy_score(y_test, y_pred_lstm))

0.9260913705583756


The LSTM model performed worse than the RNN, but better than the CNN and CNN+RNN combo. The model is very similar to the RNN model, with the addition of the LSTM layer.

## Conclusion

The goal of this project was two-fold: to attempt to improve on a simple bag of words model using neural networks, and to gain expertise on neural networks. For the first goal, after extensive tuning and time spent running the neural networks, in this case they were not able to beat the bag of words model. I think a design change in the process made this easier than the previous work, in that there we classified to a score of 1-5, and here it was just a binary sentiment classification. This may give the simpler model the edge compared to more challenging problems where a neural network would perform better.

Current research such as https://arxiv.org/abs/1808.03867 indicates that 2D CNNs perform better than other models for some language tasks, such as translation. That's similar to how I built the CNN model above, but it didn't perform. I should say that none of the models were bad, with each performing above 90% accuracy. It's just a question of more tuning. With more time, I think I could build a model that beats the bag of words.

For the second goal, I think this has opened my mind to all the things I don't know about neural networks, particularly in the realm of NLP. I took this project on for more experience tuning and writing models, and while I got that, there's so much more to learn. This has given me great motivation to go forward and keep at it, as I am serious about machine learning using neural networks and NLP.