MOVIE REVIEW CLASSIFICATION USING DEEP LEARNING AND TENSORFLOW


In [None]:
#importing the necessary modules.
import pandas as pd
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score

In [None]:
# Loading the data into train and test
train_data = pd.read_csv(r'/content/data-4.csv')
test_data = pd.read_csv(r'/content/data_test.csv')

In [None]:
train_data.head()

Unnamed: 0,"Homelessness (or Houselessness as George Carlin stated) has been an issue for years but never a plan to help those on the street that were once considered human who did everything from going to school, work, or vote for the matter. Most people think of the homeless as just a lost cause while worrying about things such as racism, the war on Iraq, pressuring kids to succeed, technology, the elections, inflation, or worrying if they'll be next to end up on the streets.<br /><br />But what if you were given a bet to live on the streets for a month without the luxuries you once had from a home, the entertainment sets, a bathroom, pictures on the wall, a computer, and everything you once treasure to see what it's like to be homeless? That is Goddard Bolt's lesson.<br /><br />Mel Brooks (who directs) who stars as Bolt plays a rich man who has everything in the world until deciding to make a bet with a sissy rival (Jeffery Tambor) to see if he can live in the streets for thirty days without the luxuries; if Bolt succeeds, he can do what he wants with a future project of making more buildings. The bet's on where Bolt is thrown on the street with a bracelet on his leg to monitor his every move where he can't step off the sidewalk. He's given the nickname Pepto by a vagrant after it's written on his forehead where Bolt meets other characters including a woman by the name of Molly (Lesley Ann Warren) an ex-dancer who got divorce before losing her home, and her pals Sailor (Howard Morris) and Fumes (Teddy Wilson) who are already used to the streets. They're survivors. Bolt isn't. He's not used to reaching mutual agreements like he once did when being rich where it's fight or flight, kill or be killed.<br /><br />While the love connection between Molly and Bolt wasn't necessary to plot, I found ""Life Stinks"" to be one of Mel Brooks' observant films where prior to being a comedy, it shows a tender side compared to his slapstick work such as Blazing Saddles, Young Frankenstein, or Spaceballs for the matter, to show what it's like having something valuable before losing it the next day or on the other hand making a stupid bet like all rich people do when they don't know what to do with their money. Maybe they should give it to the homeless instead of using it like Monopoly money.<br /><br />Or maybe this film will inspire you to help others.",positive
0,Brilliant over-acting by Lesley Ann Warren. Be...,positive
1,This is easily the most underrated film inn th...,positive
2,This is not the typical Mel Brooks film. It wa...,positive
3,"This isn't the comedic Robin Williams, nor is ...",positive
4,Yes its an art... to successfully make a slow ...,positive


In [None]:
train_data.info()   #function used to know the type of training data.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 24599 entries, 0 to 24598
Data columns (total 2 columns):
 #   Column                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 

In [None]:
# Extracting the reviews and labels from training and testing dataset.
train_reviews = train_data.iloc[:, 0].values
train_labels = train_data.iloc[:, 1].values
test_reviews = test_data.iloc[:, 0].values
test_labels = test_data.iloc[:, 1].values

In [None]:
# Tokenizing the reviews.
tokenizer = Tokenizer(num_words=10000, oov_token="<OOV>") #using tokenizer function to limit the vocabulary size.
tokenizer.fit_on_texts(train_reviews)

In [None]:
# Converting the text data to sequences and padding them
train_sequences = tokenizer.texts_to_sequences(train_reviews) #converting training sequences into reviews.
train_padded = pad_sequences(train_sequences, padding='post', maxlen=200)

In [None]:
test_sequences = tokenizer.texts_to_sequences(test_reviews)
test_padded = pad_sequences(test_sequences, padding='post', maxlen=200)

In [None]:
# Encoding the labels into numbers
label_encoder = LabelEncoder()    #using the label encoding to convert the text type of data into numerical type of data.
train_labels_encoded = label_encoder.fit_transform(train_labels)    #encoding the training labels.
test_labels_encoded = label_encoder.transform(test_labels)

In [None]:
# Building the TensorFlow model
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(10000, 16, input_length=200), #embedding the layer by giving input length
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32)),
    tf.keras.layers.Dense(32, activation='relu'), #giving activation function
    tf.keras.layers.Dense(1, activation='sigmoid')  #output layer for by adding activation function for binary classification.
])



In [None]:
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])  #compiling the model by adding some parameters

In [None]:
# Training the model
history = model.fit(
    train_padded, train_labels_encoded,
    epochs=10,  #number of epochs to be trained for the model
    validation_split=0.2,
    batch_size=32,
    verbose=1
)

Epoch 1/10
[1m615/615[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m93s[0m 138ms/step - accuracy: 0.6175 - loss: 0.6584 - val_accuracy: 0.0067 - val_loss: 0.9131
Epoch 2/10
[1m615/615[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m143s[0m 139ms/step - accuracy: 0.6712 - loss: 0.5952 - val_accuracy: 0.6571 - val_loss: 0.7790
Epoch 3/10
[1m615/615[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m139s[0m 135ms/step - accuracy: 0.7861 - loss: 0.4619 - val_accuracy: 0.7486 - val_loss: 0.5869
Epoch 4/10
[1m615/615[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m142s[0m 136ms/step - accuracy: 0.8835 - loss: 0.2851 - val_accuracy: 0.7445 - val_loss: 0.6139
Epoch 5/10
[1m615/615[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m142s[0m 137ms/step - accuracy: 0.9251 - loss: 0.2041 - val_accuracy: 0.7140 - val_loss: 0.6787
Epoch 6/10
[1m615/615[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m84s[0m 136ms/step - accuracy: 0.9474 - loss: 0.1523 - val_accuracy: 0.8114 - val_loss: 0.4646
Epoch 

In [None]:
# Evaluating the model on the test data.
predictions = (model.predict(test_padded) > 0.5).astype("int32")
accuracy = accuracy_score(test_labels_encoded, predictions)  #calculating the accuracy of the model.

[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 155ms/step


In [None]:
print(f"Accuracy on test data: {accuracy * 100:.2f}%")  #printing the accuracy of the model.

Accuracy on test data: 79.70%
