<a href="https://colab.research.google.com/github/nabeel-gulzar/sequence_classification/blob/main/sequence_classification_imdb.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Sequence Classification with LSTM

https://machinelearningmastery.com/how-to-make-classification-and-regression-predictions-for-deep-learning-models-in-keras/

In [1]:
# LSTM and CNN for sequence classification in the IMDB dataset
import numpy
from keras.datasets import imdb
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers.convolutional import Conv1D
from keras.layers.convolutional import MaxPooling1D
from keras.layers.embeddings import Embedding
from keras.preprocessing import sequence
from keras.layers import Dropout


In [2]:
# fix random seed for reproducibility
numpy.random.seed(7)
# load the dataset but only keep the top n words, zero the rest
top_words = 5000
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=top_words)
# truncate and pad input sequences
max_review_length = 500
X_train = sequence.pad_sequences(X_train, maxlen=max_review_length)
X_test = sequence.pad_sequences(X_test, maxlen=max_review_length)


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz


+ve: One of the other reviewers has mentioned that after watching just 1 Oz episode you'll be hooked. The...

+ve: A wonderful little production. The filming technique is very unassuming- very old-time-B...

-ve: Phil the Alien is one of those quirky films where the humour is based around the oddness of everythi...

-ve: I saw this movie when I was about 12 when it came out. I recall the scariest scene was the big bird ...

In [3]:
# create the model
embedding_vecor_length = 32
model = Sequential()
model.add(Embedding(top_words, embedding_vecor_length, input_length=max_review_length))
model.add(Dropout(0.2))
model.add(LSTM(100))
model.add(Dropout(0.2))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())


Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, 500, 32)           160000    
                                                                 
 dropout (Dropout)           (None, 500, 32)           0         
                                                                 
 lstm (LSTM)                 (None, 100)               53200     
                                                                 
 dropout_1 (Dropout)         (None, 100)               0         
                                                                 
 dense (Dense)               (None, 1)                 101       
                                                                 
Total params: 213,301
Trainable params: 213,301
Non-trainable params: 0
_________________________________________________________________
None


In [4]:
model.fit(X_train, y_train, epochs=3, batch_size=64)


Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x7fe8fb0c5750>

In [5]:
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))

Accuracy: 87.79%


In [8]:
from keras.datasets.imdb import get_word_index

In [9]:
word_to_id = get_word_index()
word_to_id = {k:(v+3) for k,v in word_to_id.items()}
word_to_id[""] = 0
word_to_id["<START>"] = 1
word_to_id["<UNK>"] = 2
word_to_id["<UNUSED>"] = 3

id_to_word = {value:key for key,value in word_to_id.items()}

In [28]:
ynew = model.predict(X_test[:15])
ynew[ynew>0.5] = 1
ynew[ynew<=0.5] = 0
ynew = ynew.flatten()
ynew = ynew.astype('int')

In [30]:
for i in range(15):
  str = ""
  cnt = 0
  for id in X_test[i]:
    if cnt > 20:
      break
    if id > 0:
      str += (id_to_word[id])+ " "
      cnt = cnt + 1      

  #print(' '.join(id_to_word[id] for id in X_test[i] ))
  print("%s" % str)
  print("Predicted=%s, True label=%s\n" % (ynew[i], y_test[i]))  

<START> please give this one a miss br br <UNK> <UNK> and the rest of the cast <UNK> terrible performances the 
Predicted=0, True label=0

<START> this film requires a lot of <UNK> because it focuses on mood and character development the plot is very simple 
Predicted=1, True label=1

at a time when motion picture animation of all sorts was in its <UNK> br br the political <UNK> of the 
Predicted=1, True label=1

<START> i generally love this type of movie however this time i found myself wanting to kick the screen since i 
Predicted=0, True label=0

<START> like some other people wrote i'm a die hard mario fan and i loved this game br br this game 
Predicted=1, True label=1

<START> i'm absolutely <UNK> this movie isn't being sold all who love this movie should <UNK> disney and <UNK> the demand 
Predicted=1, True label=1

later used by frank <UNK> in mr <UNK> goes to town and meet john <UNK> but in <UNK> no one individual 
Predicted=1, True label=1

<START> the <UNK> richard <UNK> dog