## LSTM(Long Short Term Memory) on Amazon Fine Food Reviews

### The Amazon Fine Food Reviews dataset consists of reviews of fine foods from Amazon.

Number of reviews: 568,454
Number of users: 256,059
Number of products: 74,258
Timespan: Oct 1999 - Oct 2012
Number of Attributes/Columns in data: 10

### Attribute Information:

Id
ProductId - unique identifier for the product
UserId - unqiue identifier for the user
ProfileName
HelpfulnessNumerator - number of users who found the review helpful
HelpfulnessDenominator - number of users who indicated whether they found the review helpful or not
Score - rating between 1 and 5
Time - timestamp for the review
Summary - brief summary of the review
Text - text of the review

### Objective:
Given a review, determine whether the review is positive (Rating of 4 or 5) or negative (rating of 1 or 2).

## Loading,Cleaning & Preprocessing the data

The dataset is available in two forms
1. .csv file
2. SQLite Database

In order to load the data, We have used the SQLITE dataset as it easier to query the data and visualise the data efficiently.
<br> 

Here as we only want to get the global sentiment of the recommendations (positive or negative), we will purposefully ignore all Scores equal to 3. If the score id above 3, then the recommendation wil be set to "positive". Otherwise, it will be set to "negative".*italicized text*

In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

/kaggle/input/amazon-fine-food-reviews/database.sqlite
/kaggle/input/amazon-fine-food-reviews/hashes.txt
/kaggle/input/amazon-fine-food-reviews/Reviews.csv


### Import All Required Libraries 

In [2]:
%matplotlib inline
import warnings 
 
warnings.filterwarnings("ignore")

In [3]:
import sqlite3
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM,Dropout
from keras.layers.embeddings import Embedding
from keras.preprocessing import sequence
import string
import matplotlib.pyplot as plt
import seaborn as sns
import pickle
from tqdm import tqdm
import os
from tqdm import tqdm
import os
import pdb
import pickle
import re
import nltk
from collections import Counter
from itertools import islice
from sklearn.model_selection import train_test_split
from keras.models import Sequential 
from keras.preprocessing import sequence
from keras.initializers import he_normal
from keras.layers import BatchNormalization, Dense, Dropout, Flatten, LSTM
from keras.layers.embeddings import Embedding
from keras.regularizers import L1L2

## Connect to sqlite and fetch the data using SQL Query

In [4]:
con=sqlite3.connect('../input/amazon-fine-food-reviews/database.sqlite')

filtered_data=pd.read_sql_query("""SELECT * FROM Reviews WHERE Score !=3""",con)
filtered_data.head(3)

Unnamed: 0,Id,ProductId,UserId,ProfileName,HelpfulnessNumerator,HelpfulnessDenominator,Score,Time,Summary,Text
0,1,B001E4KFG0,A3SGXH7AUHU8GW,delmartian,1,1,5,1303862400,Good Quality Dog Food,I have bought several of the Vitality canned d...
1,2,B00813GRG4,A1D87F6ZCVE5NK,dll pa,0,0,1,1346976000,Not as Advertised,Product arrived labeled as Jumbo Salted Peanut...
2,3,B000LQOCH0,ABXLMWJIXXAIN,"Natalia Corres ""Natalia Corres""",1,1,4,1219017600,"""Delight"" says it all",This is a confection that has been around a fe...


In [5]:
filtered_data.shape

(525814, 10)

In [6]:
def partition(x):
  if x < 3 :
    return 'negative'
  return 'positive'

actualScore=filtered_data['Score']
positive_negative=actualScore.map(partition)
filtered_data['Score']=positive_negative
print("Number of datapoints",filtered_data.shape)
filtered_data.head(3)

Number of datapoints (525814, 10)


Unnamed: 0,Id,ProductId,UserId,ProfileName,HelpfulnessNumerator,HelpfulnessDenominator,Score,Time,Summary,Text
0,1,B001E4KFG0,A3SGXH7AUHU8GW,delmartian,1,1,positive,1303862400,Good Quality Dog Food,I have bought several of the Vitality canned d...
1,2,B00813GRG4,A1D87F6ZCVE5NK,dll pa,0,0,negative,1346976000,Not as Advertised,Product arrived labeled as Jumbo Salted Peanut...
2,3,B000LQOCH0,ABXLMWJIXXAIN,"Natalia Corres ""Natalia Corres""",1,1,positive,1219017600,"""Delight"" says it all",This is a confection that has been around a fe...


In [7]:
display = pd.read_sql_query("""
SELECT UserId, ProductId, ProfileName, Time, Score, Text, COUNT(*)
FROM Reviews
GROUP BY UserId
HAVING COUNT(*)>1
""", con)

In [8]:
print(display.shape)
display.head(3)

(80668, 7)


Unnamed: 0,UserId,ProductId,ProfileName,Time,Score,Text,COUNT(*)
0,#oc-R115TNMSPFT9I7,B005ZBZLT4,Breyton,1331510400,2,Overall its just OK when considering the price...,2
1,#oc-R11D9D7SHXIJB9,B005HG9ESG,"Louis E. Emory ""hoppy""",1342396800,5,"My wife has recurring extreme muscle spasms, u...",3
2,#oc-R11DNU2NBKQ23Z,B005ZBZLT4,Kim Cieszykowski,1348531200,1,This coffee is horrible and unfortunately not ...,2


In [9]:
display[display["UserId"]=='AZY10LLTJ71NX']

Unnamed: 0,UserId,ProductId,ProfileName,Time,Score,Text,COUNT(*)
80638,AZY10LLTJ71NX,B001ATMQK2,"undertheshrine ""undertheshrine""",1296691200,5,I bought this 6 pack because for the price tha...,5


In [10]:
display['COUNT(*)'].sum()

393063

In [11]:
display= pd.read_sql_query("""
SELECT *
FROM Reviews
WHERE Score != 3 AND UserId="AR5J8UI46CURR"
ORDER BY ProductID
""", con)
display.head()

Unnamed: 0,Id,ProductId,UserId,ProfileName,HelpfulnessNumerator,HelpfulnessDenominator,Score,Time,Summary,Text
0,78445,B000HDL1RQ,AR5J8UI46CURR,Geetha Krishnan,2,2,5,1199577600,LOACKER QUADRATINI VANILLA WAFERS,DELICIOUS WAFERS. I FIND THAT EUROPEAN WAFERS ...
1,138317,B000HDOPYC,AR5J8UI46CURR,Geetha Krishnan,2,2,5,1199577600,LOACKER QUADRATINI VANILLA WAFERS,DELICIOUS WAFERS. I FIND THAT EUROPEAN WAFERS ...
2,138277,B000HDOPYM,AR5J8UI46CURR,Geetha Krishnan,2,2,5,1199577600,LOACKER QUADRATINI VANILLA WAFERS,DELICIOUS WAFERS. I FIND THAT EUROPEAN WAFERS ...
3,73791,B000HDOPZG,AR5J8UI46CURR,Geetha Krishnan,2,2,5,1199577600,LOACKER QUADRATINI VANILLA WAFERS,DELICIOUS WAFERS. I FIND THAT EUROPEAN WAFERS ...
4,155049,B000PAQ75C,AR5J8UI46CURR,Geetha Krishnan,2,2,5,1199577600,LOACKER QUADRATINI VANILLA WAFERS,DELICIOUS WAFERS. I FIND THAT EUROPEAN WAFERS ...


In [13]:
sorted_data=filtered_data.sort_values('ProductId',axis=0,ascending=True,inplace=False,kind='quicksort',na_position='last')

final_data=sorted_data.drop_duplicates(subset={"UserId","ProfileName","Time","Text"},keep='first',inplace=False)
final_data.shape

(364173, 10)

In [14]:
(final_data['Id'].size*1.0)/(filtered_data['Id'].size*1.0)*100

69.25890143662969

In [15]:
display= pd.read_sql_query("""
SELECT *
FROM Reviews
WHERE Score != 3 AND Id=44737 OR Id=64422
ORDER BY ProductID
""", con)

display.head()

Unnamed: 0,Id,ProductId,UserId,ProfileName,HelpfulnessNumerator,HelpfulnessDenominator,Score,Time,Summary,Text
0,64422,B000MIDROQ,A161DK06JJMCYF,"J. E. Stephens ""Jeanne""",3,1,5,1224892800,Bought This for My Son at College,My son loves spaghetti so I didn't hesitate or...
1,44737,B001EQ55RW,A2V0I904FH7ABY,Ram,3,2,4,1212883200,Pure cocoa taste with crunchy almonds inside,It was almost a 'love at first bite' - the per...


In [16]:
final_data=final_data[final_data.HelpfulnessNumerator<=final_data.HelpfulnessDenominator]

## Count the Positive and Negative Review Counts

In [17]:
print(final_data.shape)

#How many positive and negative reviews are present in our dataset?
final_data['Score'].value_counts()

(364171, 10)


positive    307061
negative     57110
Name: Score, dtype: int64

## Import nltk library

In [18]:

from nltk.corpus import stopwords
stopping_words=set(stopwords.words('english'))
print(stopping_words)

{'yourselves', "won't", 'ma', 'herself', 'them', 'didn', 'should', 'your', 'an', "haven't", 'wasn', 'his', 'who', "you're", 'don', 'now', 'on', 'until', 'there', 'than', 'just', 'they', 'he', 'once', 'have', 'own', "mustn't", 'then', 'being', 'before', "mightn't", 'hadn', 'nor', 'after', 'no', 'whom', 'under', 'over', 'when', 'will', 'll', 'did', 'out', 'am', 'are', "aren't", "shouldn't", 'same', 'their', 'wouldn', 're', 'me', 'ours', 'why', "doesn't", 'mightn', 'does', 'of', 'at', 'each', 'again', 'shouldn', 'too', "needn't", 'y', 'through', 'aren', 'all', 'themselves', 'mustn', 'very', 'what', 'shan', 'had', 'but', 'to', 'and', 'during', 'can', "you'll", 'few', 'as', 'myself', 'be', 'between', 'off', 't', 'our', 'for', 'a', 'o', 'some', "wasn't", 'below', "it's", 'any', 'theirs', 'further', 'we', "don't", 'm', 'if', 'ourselves', "you'd", 'because', 'were', 'isn', 'those', 've', 'other', 'it', 'couldn', 'i', 'the', 'where', 'from', "weren't", 'both', 'ain', 'doing', 'down', "didn't", 

In [19]:
def clean_html(text):
    clean_r = re.compile('<,*?>')
    clean_text = re.sub(clean_r,'',text)
    return clean_text

def Clean_punc(text):
    clean_sentence = re.sub(r'[?|!|\'|"|#]',r' ',text)
    clean_data = re.sub(r'[.|,|)|(|\|/)]',r' ',clean_sentence)
    return clean_data

In [20]:
stem_no = nltk.stem.SnowballStemmer('english')

if not os.path.isfile('final_data.sqlite'):
    final_string=[]
    all_positive_words=[]
    all_negative_words=[]
    for i,sentence in enumerate(tqdm(final_data['Text'].values)):
        filtered_sentence=[]
        sent_without_html_tags=clean_html(sentence)
        #pdb.set_trace()
        for w in sent_without_html_tags.split():
            for cleaned_words in Clean_punc(w).split():
                if ((cleaned_words.isalpha()) & (len(cleaned_words) > 2)):
                    if(cleaned_words.lower() not in stopping_words) :
                        stemming=(stem_no.stem(cleaned_words.lower())).encode('utf8')
                        filtered_sentence.append(stemming)
                        if(final_data['Score'].values)[i]=='positive':
                            all_positive_words.append(stemming)
                        if(final_data['Score'].values)[i]=='negative':
                            all_negative_words.append(stemming)
        str1 = b" ".join(filtered_sentence)
        final_string.append(str1)
        
    final_data['Cleaned_text']=final_string
    final_data['Cleaned_text']=final_data['Cleaned_text'].str.decode("utf-8")    
    
    conn = sqlite3.connect('final_data.sqlite')
    cursor=conn.cursor
    conn.text_factory = str
    final_data.to_sql('Reviews',conn,schema=None,if_exists='replace',index=True,index_label=None,chunksize=None,dtype=None)
    conn.close()
    
    
    with open('positive_words.pkl','wb') as f :
        pickle.dump(all_positive_words,f)
    with open('negative_words.pkl','wb') as f :
        pickle.dump(all_negative_words,f)

100%|██████████| 364171/364171 [08:20<00:00, 727.25it/s] 


In [21]:
final_data['total_words'] = [len(x.split()) for x in final_data['Cleaned_text'].tolist()] 

In [22]:
final_data.head(3)

Unnamed: 0,Id,ProductId,UserId,ProfileName,HelpfulnessNumerator,HelpfulnessDenominator,Score,Time,Summary,Text,Cleaned_text,total_words
138706,150524,6641040,ACITT7DI6IDDL,shari zychinski,0,0,positive,939340800,EVERY book is educational,this witty little book makes my son laugh at l...,witti littl book make son laugh loud recit car...,34
138688,150506,6641040,A2IW4PEEKO2R0U,Tracy,1,1,positive,1194739200,"Love the book, miss the hard cover version","I grew up reading these Sendak books, and watc...",grew read sendak book watch realli rosi movi i...,27
138689,150507,6641040,A1S4A3IQ2MU7V4,"sally sue ""sally sue""",1,1,positive,1191456000,chicken soup with rice months,This is a fun way for children to learn their ...,fun way children learn month year learn poem t...,15


In [23]:
final_data.shape

(364171, 12)

## Pick the top 100K Data Points from the Dataset


In [24]:
final_data_100K=final_data[0:100000]
amazon_polarity_labels=final_data_100K['Score'].values
final_data_100K.head(2)

Unnamed: 0,Id,ProductId,UserId,ProfileName,HelpfulnessNumerator,HelpfulnessDenominator,Score,Time,Summary,Text,Cleaned_text,total_words
138706,150524,6641040,ACITT7DI6IDDL,shari zychinski,0,0,positive,939340800,EVERY book is educational,this witty little book makes my son laugh at l...,witti littl book make son laugh loud recit car...,34
138688,150506,6641040,A2IW4PEEKO2R0U,Tracy,1,1,positive,1194739200,"Love the book, miss the hard cover version","I grew up reading these Sendak books, and watc...",grew read sendak book watch realli rosi movi i...,27


In [25]:
X_Data_Text=final_data_100K['Cleaned_text']

In [26]:
Y_Data_Score=final_data_100K['Score']

In [27]:
X_Data_Text.head(3)

138706    witti littl book make son laugh loud recit car...
138688    grew read sendak book watch realli rosi movi i...
138689    fun way children learn month year learn poem t...
Name: Cleaned_text, dtype: object

In [28]:
X_Data_Text.index=[x for x in range(0,10**5)]

## Get the list of all the words from the "Cleaned Text"

In [29]:
list_of_words = []
for i in tqdm(X_Data_Text):
    list_of_words += i
list_of_words = ''.join(list_of_words)
list_of_all_words=list_of_words.split()

100%|██████████| 100000/100000 [00:00<00:00, 318290.67it/s]


In [30]:
list_of_all_words[0]

'witti'

## Convert the List of words into SET and List

In [30]:
vocab=set(list_of_all_words)
vocab_list=list(vocab)

In [31]:
type(vocab)

set

In [33]:
vocab_list[0]

'lisa'

In [34]:
dict_freq={}
for x in tqdm(vocab_list):
    dict_freq[x]=list_of_all_words.count(x)

100%|██████████| 103077/103077 [1:34:36<00:00, 18.16it/s] 


In [38]:
dict_freq

{'applyear': 1,
 'bouillon': 149,
 'calphalon': 3,
 'drinksugar': 1,
 'relishlove': 1,
 'walmartorder': 1,
 'ricepurchas': 4,
 'programexpect': 1,
 'awri': 1,
 'dangl': 7,
 'catyear': 1,
 'middlstar': 1,
 'sodiumfound': 1,
 'canfield': 9,
 'cherrigot': 2,
 'honey': 2493,
 'gatorad': 28,
 'soon': 1114,
 'ajemiandrink': 1,
 'nurient': 1,
 'purchashusband': 1,
 'flavormani': 2,
 'perfectmani': 1,
 'spotuse': 2,
 'orderdelici': 7,
 'betterenjoy': 4,
 'likepuf': 2,
 'ethiopian': 26,
 'foodservic': 1,
 'rey': 2,
 'yummibumbl': 1,
 'couchboston': 1,
 'sucros': 36,
 'enjoygive': 2,
 'seelike': 1,
 'customhavent': 1,
 'tasetpoland': 1,
 'orderlive': 3,
 'optionson': 1,
 'recoverigreat': 1,
 'stockalmost': 1,
 'loveperfect': 2,
 'changalthough': 1,
 'happineither': 1,
 'dillig': 1,
 'swore': 21,
 'peep': 17,
 'stocktook': 1,
 'teabold': 1,
 'juuuust': 1,
 'vagu': 96,
 'yet': 2209,
 'sleepsleepytim': 1,
 'goodinstant': 1,
 'mucinex': 1,
 'ouchknow': 1,
 'muffinproduct': 1,
 'areaorder': 1,
 'amaz

In [37]:
(pd.DataFrame.from_dict(data=dict_freq, orient='index')
   .to_csv('dict_file.csv', header=False))

## Use Counter function to count the number of Unique words and pick the top 5000 words and add in the dictionary 

In [32]:
print("Number of sentences in complete dataset : ",len(list_of_all_words))

counts_words = Counter(list_of_all_words)
print("Number of unique words present : ",len(counts_words.most_common()))
vocab_size = len(counts_words.most_common()) + 1
top_words_count = 5000
common_words = counts_words.most_common(top_words_count)

word_index = dict()
i = 1
for word,frequency in common_words:
    word_index[word] = i
    i += 1

print()
print("Top 25 words with their frequencies:")
print(counts_words.most_common(25))
print()
print("Top 25 words with their index:")
print(list(islice(word_index.items(), 25)))

Number of sentences in complete dataset :  3488783
Number of unique words present :  103077

Top 25 words with their frequencies:
[('like', 38877), ('tast', 37698), ('tea', 33156), ('good', 29934), ('product', 29213), ('use', 29058), ('flavor', 28866), ('one', 28605), ('great', 25821), ('love', 24096), ('make', 23406), ('tri', 23337), ('get', 21851), ('food', 18331), ('amazon', 17278), ('would', 17245), ('time', 17142), ('eat', 16362), ('buy', 15401), ('find', 15243), ('also', 14003), ('much', 13957), ('realli', 13946), ('bag', 13511), ('order', 13476)]

Top 25 words with their index:
[('like', 1), ('tast', 2), ('tea', 3), ('good', 4), ('product', 5), ('use', 6), ('flavor', 7), ('one', 8), ('great', 9), ('love', 10), ('make', 11), ('tri', 12), ('get', 13), ('food', 14), ('amazon', 15), ('would', 16), ('time', 17), ('eat', 18), ('buy', 19), ('find', 20), ('also', 21), ('much', 22), ('realli', 23), ('bag', 24), ('order', 25)]


In [53]:
type(X_Data_Text)

pandas.core.series.Series

## Create a new Column called "CleanedText_Index" and add the index of each word occured in the "Cleaned_text"

In [33]:
def use_index(row):   
    holder = []
    for word in row['Cleaned_text'].split():
        if word in word_index:
            holder.append(word_index[word]) 
        else:
            holder.append(0)            
    return holder


final_data_100K['CleanedText_Index'] = final_data_100K.apply(lambda row: use_index(row),axis=1)
final_data_100K.head(5)

Unnamed: 0,Id,ProductId,UserId,ProfileName,HelpfulnessNumerator,HelpfulnessDenominator,Score,Time,Summary,Text,Cleaned_text,total_words,CleanedText_Index
138706,150524,6641040,ACITT7DI6IDDL,shari zychinski,0,0,positive,939340800,EVERY book is educational,this witty little book makes my son laugh at l...,witti littl book make son laugh loud recit car...,34,"[0, 27, 932, 11, 384, 1976, 2578, 0, 1196, 123..."
138688,150506,6641040,A2IW4PEEKO2R0U,Tracy,1,1,positive,1194739200,"Love the book, miss the hard cover version","I grew up reading these Sendak books, and watc...",grew read sendak book watch realli rosi movi i...,27,"[995, 247, 0, 932, 551, 23, 0, 988, 2594, 10, ..."
138689,150507,6641040,A1S4A3IQ2MU7V4,"sally sue ""sally sue""",1,1,positive,1191456000,chicken soup with rice months,This is a fun way for children to learn their ...,fun way children learn month year learn poem t...,15,"[644, 56, 776, 731, 129, 35, 731, 0, 1410, 928..."
138690,150508,6641040,AZGXZ2UUK6X,"Catherine Hallberg ""(Kate)""",1,1,positive,1076025600,a good swingy rhythm for reading aloud,This is a great little book to read aloud- it ...,great littl book read nice rhythm well good re...,49,"[9, 27, 932, 247, 85, 0, 33, 4, 0, 27, 8, 1, 4..."
138691,150509,6641040,A3CMRKGE0P909G,Teresa,3,4,positive,1018396800,A great way to learn the months,This is a book of poetry about the months of t...,book poetri month year goe month cute littl po...,37,"[932, 0, 129, 35, 378, 129, 1475, 27, 0, 492, ..."


## Convert the Score to 1 & 0.
## positive - 1 
## negative - 0

In [34]:
final_data_100K['Score'] = final_data_100K['Score'].map(lambda x : 1 if x == 'positive' else 0)
final_data_100K.head(5)

Unnamed: 0,Id,ProductId,UserId,ProfileName,HelpfulnessNumerator,HelpfulnessDenominator,Score,Time,Summary,Text,Cleaned_text,total_words,CleanedText_Index
138706,150524,6641040,ACITT7DI6IDDL,shari zychinski,0,0,1,939340800,EVERY book is educational,this witty little book makes my son laugh at l...,witti littl book make son laugh loud recit car...,34,"[0, 27, 932, 11, 384, 1976, 2578, 0, 1196, 123..."
138688,150506,6641040,A2IW4PEEKO2R0U,Tracy,1,1,1,1194739200,"Love the book, miss the hard cover version","I grew up reading these Sendak books, and watc...",grew read sendak book watch realli rosi movi i...,27,"[995, 247, 0, 932, 551, 23, 0, 988, 2594, 10, ..."
138689,150507,6641040,A1S4A3IQ2MU7V4,"sally sue ""sally sue""",1,1,1,1191456000,chicken soup with rice months,This is a fun way for children to learn their ...,fun way children learn month year learn poem t...,15,"[644, 56, 776, 731, 129, 35, 731, 0, 1410, 928..."
138690,150508,6641040,AZGXZ2UUK6X,"Catherine Hallberg ""(Kate)""",1,1,1,1076025600,a good swingy rhythm for reading aloud,This is a great little book to read aloud- it ...,great littl book read nice rhythm well good re...,49,"[9, 27, 932, 247, 85, 0, 33, 4, 0, 27, 8, 1, 4..."
138691,150509,6641040,A3CMRKGE0P909G,Teresa,3,4,1,1018396800,A great way to learn the months,This is a book of poetry about the months of t...,book poetri month year goe month cute littl po...,37,"[932, 0, 129, 35, 378, 129, 1475, 27, 0, 492, ..."


In [35]:
final_data_100K['CleanedText_Index'].head(2)

138706    [0, 27, 932, 11, 384, 1976, 2578, 0, 1196, 123...
138688    [995, 247, 0, 932, 551, 23, 0, 988, 2594, 10, ...
Name: CleanedText_Index, dtype: object

## Split the Data into TEST and Train 

In [36]:
X_Train, X_Test, Y_Train, Y_Test = train_test_split(final_data_100K['CleanedText_Index'].values,final_data_100K['Score'],test_size=0.3,shuffle=False,random_state=0)

In [39]:
X_Train[6]

[365,
 1773,
 266,
 295,
 37,
 247,
 384,
 539,
 465,
 2206,
 776,
 932,
 8,
 31,
 3423,
 0,
 90,
 688,
 65,
 247,
 0,
 10,
 932,
 3810,
 129,
 35,
 1473,
 247,
 33,
 1964,
 130,
 776,
 932,
 200,
 2084,
 0,
 0,
 932,
 149,
 2024,
 3032,
 1033,
 774,
 0,
 31]

### Apply Padding 

In [37]:
from keras.preprocessing import sequence

max_review_length = 600
X_Train = sequence.pad_sequences(X_Train, maxlen=max_review_length)
X_Test = sequence.pad_sequences(X_Test, maxlen=max_review_length)

print(X_Train.shape)
print(X_Train[1])

(70000, 600)
[   0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    

In [38]:
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 6202525998069733733
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 15694142288273005724
physical_device_desc: "device: XLA_CPU device"
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 15695549568
locality {
  bus_id: 1
  links {
  }
}
incarnation: 16992963004178590259
physical_device_desc: "device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0"
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 15462798390071149482
physical_device_desc: "device: XLA_GPU device"
]


## Apply LSTM 1

In [40]:
import numpy
numpy.random.seed(7)

embedding_vecor_length = 32
model = Sequential()
model.add(Embedding(top_words_count+1, embedding_vecor_length, input_length=max_review_length))
model.add(LSTM(100))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, 600, 32)           160032    
_________________________________________________________________
lstm (LSTM)                  (None, 100)               53200     
_________________________________________________________________
dense (Dense)                (None, 1)                 101       
Total params: 213,333
Trainable params: 213,333
Non-trainable params: 0
_________________________________________________________________
None


In [41]:
model.fit(X_Train, Y_Train, epochs=10, batch_size=64)
# Final evaluation of the model
scores = model.evaluate(X_Test, Y_Test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Accuracy: 90.80%


In [43]:
print('Test score: ',scores[0])
print('Test accuracy: ',scores[1])

Test score:  0.3073323667049408
Test accuracy:  0.9080333113670349


In [46]:
embedding_vecor_length = 32
model2 = Sequential()
model2.add(Embedding(top_words_count+1, embedding_vecor_length, input_length=max_review_length))
model2.add(LSTM(100,return_sequences=True))
model2.add(Dropout(0.25))
model2.add(LSTM(80))
model2.add(Dropout(0.5))
model2.add(Dense(1, activation='sigmoid'))
model2.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model2.summary())

Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_2 (Embedding)      (None, 600, 32)           160032    
_________________________________________________________________
lstm_3 (LSTM)                (None, 600, 100)          53200     
_________________________________________________________________
dropout_1 (Dropout)          (None, 600, 100)          0         
_________________________________________________________________
lstm_4 (LSTM)                (None, 80)                57920     
_________________________________________________________________
dropout_2 (Dropout)          (None, 80)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 81        
Total params: 271,233
Trainable params: 271,233
Non-trainable params: 0
________________________________________________

In [47]:
model2.fit(X_Train, Y_Train, epochs=10, batch_size=64)
# Final evaluation of the model
scores = model2.evaluate(X_Test, Y_Test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Accuracy: 90.68%


In [48]:
print('Test score: ',scores[0])
print('Test accuracy: ',scores[1])

Test score:  0.3317498564720154
Test accuracy:  0.9067999720573425


# CONCLUSION 

## After using LSTM Models , below are the conlusion made 

1. LSTM without Dropouts --> Accuracy is 90.80%
2. LSTM with Dropout --> Accuracy is 90.67%