#Sentiment Challenge

Welcome to my Kernel on Movie Review Sentiment. The goal of this Kernel is to predict the sentiment of movie reviews. The sentiment can be rated 0 to 4 with 0 being negative, 1 slightly negative, 2 neutral, 3 slightly positive and 4 positive. This kernel will walk through data preparation and constrution of the model. Links to the resources that help me learn the methods used in this Kernel are also listed below.

In [None]:
import pandas as pd
import numpy as np

train = pd.read_csv('../input/movie-review-sentiment-analysis-kernels-only/train.tsv', sep='\t')
test = pd.read_csv('../input/movie-review-sentiment-analysis-kernels-only/test.tsv', sep='\t')

In [None]:
xtrain = train["Phrase"]
ytrain = pd.get_dummies(train["Sentiment"])
xtest = test["Phrase"]

Below we will tokenize the text by using the Tokenizer class from Keras. This converts the text into words or what are known as tokens. The parameter "lower" allows us to convert each word to lower case when it is tokenized and "num_words" only keeps the most frequent 20,000 words. The tokens will then be converted to a sequence using texts_to_sequences from Keras.

More info on preparing text for deep learning: 

Inspiration for data preprocessing: https://www.kaggle.com/antmarakis/cnn-baseline-model
https://machinelearningmastery.com/prepare-text-data-deep-learning-keras/
https://keras.io/preprocessing/text/

In [None]:
from keras.preprocessing.text import Tokenizer
from keras.preprocessing import sequence

print(xtrain.head(5))

In [None]:
Token = Tokenizer(lower = True, num_words = 20000)
Token.fit_on_texts(xtrain)
Token.fit_on_texts(xtest)

In [None]:
xtrain = Token.texts_to_sequences(xtrain)
xtest = Token.texts_to_sequences(xtest)

word_index = Token.word_index

xtrain = sequence.pad_sequences(xtrain, maxlen = 300)
xtest = sequence.pad_sequences(xtest, maxlen = 300)


Next we will prepare the embedding layer on the model. Embeddings provide representations of a word and their relative meaning. The code below prepares a pre-trained embedding known as GloVe.

Using Pre-Trained Embeddings: https://blog.keras.io/using-pre-trained-word-embeddings-in-a-keras-model.html

GloVe: https://nlp.stanford.edu/projects/glove/

Background on Embedding: https://machinelearningmastery.com/use-word-embedding-layers-deep-learning-keras/

In [None]:
embedding_index = {}
f = open('../input/glove6b/glove.6B.100d.txt')
for line in f:
    values = line.split()
    word = values[0]
    coef = np.asarray(values[1:], dtype='float32')
    embedding_index[word] = coef
f.close()

In [None]:
vocab = len(word_index) + 1

embedding_matrix = np.zeros((len(word_index) + 1, 100))
for word, i in word_index.items():
    embedding_vector = embedding_index.get(word)
    if embedding_vector is not None:
        embedding_matrix[i] = embedding_vector

The model below combines a recurrent LSTM layer and convolutional layers. Inpiration for the model can be found below:

https://machinelearningmastery.com/use-word-embedding-layers-deep-learning-keras/

In [None]:
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers.convolutional import Conv1D
from keras.layers.convolutional import MaxPooling1D
from keras.layers.embeddings import Embedding

model = Sequential()
emb = Embedding(vocab, 100, weights = [embedding_matrix], input_length = 300, trainable = False)
model.add(emb)
model.add(Conv1D(32, 3, padding = 'same', activation = 'relu'))
model.add(MaxPooling1D(3))
model.add(Conv1D(32, 3, padding = 'same', activation = 'relu'))
model.add(MaxPooling1D(3))
model.add(Conv1D(32, 3, padding = 'same', activation = 'relu'))
model.add(MaxPooling1D(3))
model.add(LSTM(100))
model.add(Dense(5, activation = 'softmax'))
model.compile(optimizer='adam',loss='categorical_crossentropy', metrics=['accuracy'])

In [None]:
model.fit(xtrain, ytrain, epochs = 10)

In [None]:
Submission = pd.read_csv('../input/movie-review-sentiment-analysis-kernels-only/sampleSubmission.csv')
Submission['Sentiment'] = model.predict_classes(xtest)
Submission.to_csv("SentimentSubmission.csv", index = False)

I hope you enjoyed this Kernel on using deep learning to predict Movie Review Sentiment. If you liked what you saw feel free to upvote and comment with any questions you may have.