<a href="https://colab.research.google.com/github/richa-patel-27/tensorflow/blob/master/tf_movie_reviews.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [23]:
"""
Text classification

Classification of movie reviews as positive or negative
"""

'\nText classification\n\nClassification of movie reviews as positive or negative\n'

In [24]:
import tensorflow as tf
from tensorflow import keras
import numpy as np


In [25]:
data = keras.datasets.imdb

In [26]:
(train_data, train_labels), (test_data, test_labels) = data.load_data(num_words=88000)  # consider 10000 most frequent words

In [27]:
# each word in a movie review is a number. So each review is an array of words.

# Before that lets get the dictionary of word and number mapping

word_index = data.get_word_index()
word_index = {k : (c+3) for k,c in word_index.items()}  # we start at 3 coz we're leaving 0,1,2 for special characters
word_index["<PAD>"] = 0
word_index["<START>"] = 1
word_index["<UNK>"] = 2
word_index["<UNUSED>"] = 3

In [28]:
# get a reverse dictionary of word_index (number: word)

reversed_word_index = {value: key for (key, value) in word_index.items()}


In [29]:
# function to get the words from a number represented review

def decode_review(text):
  return " ".join([reversed_word_index.get(i, "?") for i in text])  # join "?" if i not found

decode_review(train_data[0])

"<START> this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert redford's is an amazing actor and now the same being director norman's father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as soon as it was released for retail and would recommend it to everyone to watch and the fly fishing was amazing really cried at the end it was so sad and you know what they say if you cry at a film it must have been good and this definitely was also congratulations to the two little boy's that played the part's of norman and paul they were just brilliant children are often left out of the praising list i think because the stars that play them all grown up are such a big profile for the whole film but these children are amazing and sh

In [30]:
# Each review contains different no of words. Hence normalizing it is tricky. Also we need to have a fixed input length for our neural networks.
# So we fix a max length of string say 250. If review contains < 250 words, we do padding else we omit extra words

# we use keras for that.
# value - padding value, post - pad value to the end, maxlen - pad until len is 250
# NOTE: this does not normalize the data

train_data = keras.preprocessing.sequence.pad_sequences(train_data, value=word_index["<PAD>"], padding='post', maxlen=250)
test_data = keras.preprocessing.sequence.pad_sequences(test_data, value=word_index["<PAD>"], padding='post', maxlen=250)


In [31]:
# We will not normalize the data because we want same output numbers

# Create the model

model = keras.Sequential()
model.add(keras.layers.Embedding(88000, 16))  # each word is represented as a vector of fixed length 16
model.add(keras.layers.GlobalAveragePooling1D())  # scaling down the data
model.add(keras.layers.Dense(16, activation="relu"))
model.add(keras.layers.Dense(1, activation="sigmoid"))  # sigmoid - gives 0 or 1 depending on output percent. > 50 then 1 < 50% then 0

model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_1 (Embedding)     (None, None, 16)          1408000   
                                                                 
 global_average_pooling1d_1   (None, 16)               0         
 (GlobalAveragePooling1D)                                        
                                                                 
 dense_2 (Dense)             (None, 16)                272       
                                                                 
 dense_3 (Dense)             (None, 1)                 17        
                                                                 
Total params: 1,408,289
Trainable params: 1,408,289
Non-trainable params: 0
_________________________________________________________________


In [32]:
# Compile the model

model.compile(
    optimizer="adam",
    loss="binary_crossentropy",   # output is binary 0 or 1
    metrics=['accuracy'],
)

In [33]:
# split the train data into train and validation data

# validation data is used to check how tuning model parameters affects the accuracy of the model

# 25000 entries in the data set

x_val = train_data[:10000]  # use upto 10000 for validation
x_train = train_data[10000:]  # use 10000 onwards for train

y_val = train_labels[:10000]  # use upto 10000 for validation
y_train = train_labels[10000:]  # use 10000 onwards for train

In [34]:
# Train the model

model.fit(
    x_train,
    y_train,
    epochs = 40,
    batch_size = 512,   # different batches of size 512 are used to train model at each epoch
    validation_data = (x_val, y_val),
    verbose = 1,
)

Epoch 1/40
Epoch 2/40
Epoch 3/40
Epoch 4/40
Epoch 5/40
Epoch 6/40
Epoch 7/40
Epoch 8/40
Epoch 9/40
Epoch 10/40
Epoch 11/40
Epoch 12/40
Epoch 13/40
Epoch 14/40
Epoch 15/40
Epoch 16/40
Epoch 17/40
Epoch 18/40
Epoch 19/40
Epoch 20/40
Epoch 21/40
Epoch 22/40
Epoch 23/40
Epoch 24/40
Epoch 25/40
Epoch 26/40
Epoch 27/40
Epoch 28/40
Epoch 29/40
Epoch 30/40
Epoch 31/40
Epoch 32/40
Epoch 33/40
Epoch 34/40
Epoch 35/40
Epoch 36/40
Epoch 37/40
Epoch 38/40
Epoch 39/40
Epoch 40/40


<keras.callbacks.History at 0x7a5b7a49f2e0>

In [35]:
# Evaluate the model

test_loss, test_acc = model.evaluate(test_data, test_labels)

print(f"Test accuracy = {test_acc}")
print(f"Test loss = {test_loss}")

Test accuracy = 0.8721200227737427
Test loss = 0.3297561705112457


In [36]:
# Test the model on a random test review

test_review = test_data[0]
predict = model.predict(np.array([test_review]))
print(f"Review: \n {decode_review(test_review)}")
print(f"Actual label: {str(test_labels[0])}")
print(f"Prediction: {str(predict[0])}")

Review: 
 <START> please give this one a miss br br kristy swanson and the rest of the cast rendered terrible performances the show is flat flat flat br br i don't know how michael madison could have allowed this one on his plate he almost seemed to know this wasn't going to work out and his performance was quite lacklustre so all you madison fans give this a miss <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PA

In [37]:
# save the model

model.save("movie_review_model.h5")

In [38]:
# load the saved model

saved_model = keras.models.load_model("movie_review_model.h5")

In [39]:
# convert string into list of numbers

def review_encode(s):
  encoded = [1]

  for word in s:
    if word in word_index:
      encoded.append(word_index[word])
    else:
      encoded.append(2)

  return encoded

In [40]:
# test the loaded model on a review in text.txt file

text = "Delightful animated feature from Walt Disney Pictures about a naive young lion cub destined for greatness. Born the son of a beloved and authoritative king he's groomed to be the next ruler of the kingdom, but along the way he encounters tragic detours at the hands of his villainous uncle and scheming hyena henchmen. Years later—as an adult—he decides to embrace his destiny and take his proper place in the Circle of Life. Warm, intelligent, laugh-out loud funny film is a triumph in every aspect; unforgettable songs, snappy dialogue, remarkable animation, and a stellar cast of voices make this a treat for all ages. A rousing adventure that you can enjoy again and again, and arguably one of the finest animated films ever made."

In [41]:
encode = review_encode(text)
encode = keras.preprocessing.sequence.pad_sequences([encode], value=word_index["<PAD>"], padding='post', maxlen=250)
predict = saved_model.predict(encode)
print(f"Review: {text}")
print(f"Encoded review: {encode}")
print(f"Prediction: {predict[0]}")


Review: Delightful animated feature from Walt Disney Pictures about a naive young lion cub destined for greatness. Born the son of a beloved and authoritative king he's groomed to be the next ruler of the kingdom, but along the way he encounters tragic detours at the hands of his villainous uncle and scheming hyena henchmen. Years later—as an adult—he decides to embrace his destiny and take his proper place in the Circle of Life. Warm, intelligent, laugh-out loud funny film is a triumph in every aspect; unforgettable songs, snappy dialogue, remarkable animation, and a stellar cast of voices make this a treat for all ages. A rousing adventure that you can enjoy again and again, and arguably one of the finest animated films ever made.
Encoded review: [[  13 3363    2  963 1964  963 1479 5135    2    6  590 1657  963 1148
   830    2    2 1206 3363 1209 1604 1479 1331  963  830  830    6  503
  2014  963    2  590 1604 3363 1331  590    2    2  590 3363    6 1657
  1657 5135    2 1095   1