# Sentiment Classification on IMDB Movie Reviews

In this notebook we will build **three simple models** to classify IMDB movie reviews as positive or negative:

- Simple RNN
- 1D CNN
- LSTM

We will:
1. Load the IMDB dataset from Keras.
2. Preprocess the text (token IDs + padding).
3. Build and train the three models.
4. Compare accuracy and training time.
5. Write a short conclusion.

## 1. Imports

In [1]:
import time

import numpy as np
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, SimpleRNN, LSTM, Dense, Conv1D, GlobalMaxPooling1D
from tensorflow.keras.optimizers import Adam

## 2. Load the IMDB dataset

The IMDB dataset is already **tokenized as integers**. Each review is a list of word indices.
We will limit the vocabulary size so the models are small and easy to train.

In [2]:
# Number of words to keep (most frequent)
vocab_size = 10000

# Load data (already tokenized as integer sequences)
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=vocab_size)

print("Number of training samples:", len(x_train))
print("Number of test samples:", len(x_test))
print("Example review (first 20 word indices):", x_train[0][:20])
print("Label (0 = negative, 1 = positive):", y_train[0])

Number of training samples: 25000
Number of test samples: 25000
Example review (first 20 word indices): [1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25]
Label (0 = negative, 1 = positive): 1


## 3. Preprocess text (pad sequences)

Reviews have **different lengths**. Neural networks work better if all sequences have the same length.
So we will pad (or cut) each review to a fixed length.

In [3]:
# Maximum length of each review (in number of word indices)
maxlen = 200  # small to make models faster

# Pad sequences with zeros at the beginning if they are shorter than maxlen
x_train_padded = pad_sequences(x_train, maxlen=maxlen)
x_test_padded = pad_sequences(x_test, maxlen=maxlen)

print("Shape of x_train_padded:", x_train_padded.shape)
print("Shape of x_test_padded:", x_test_padded.shape)

Shape of x_train_padded: (25000, 200)
Shape of x_test_padded: (25000, 200)


We will use the **same embedding layer settings** for all three models so that the comparison is fair.

In [4]:
embedding_dim = 32  # size of word vectors
batch_size = 128
epochs = 3  # keep small so training is quick

## 4. Simple RNN model

A **Simple RNN** reads one word at a time and keeps a small memory of previous words.

In [5]:
# Build Simple RNN model
rnn_model = Sequential([
    Embedding(input_dim=vocab_size, output_dim=embedding_dim, input_length=maxlen),
    SimpleRNN(32),
    Dense(1, activation='sigmoid')  # binary classification
])

rnn_model.compile(optimizer=Adam(learning_rate=0.001),
                  loss='binary_crossentropy',
                  metrics=['accuracy'])

rnn_model.summary()



In [6]:
# Train Simple RNN
start_time = time.time()
history_rnn = rnn_model.fit(
    x_train_padded,
    y_train,
    epochs=epochs,
    batch_size=batch_size,
    validation_split=0.2,
    verbose=1
)
rnn_time = time.time() - start_time

# Evaluate on test data
rnn_loss, rnn_acc = rnn_model.evaluate(x_test_padded, y_test, verbose=0)
print(f"RNN test accuracy: {rnn_acc:.4f}")
print(f"RNN training time: {rnn_time:.2f} seconds")

Epoch 1/3
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 59ms/step - accuracy: 0.6245 - loss: 0.6294 - val_accuracy: 0.7832 - val_loss: 0.4765
Epoch 2/3
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 56ms/step - accuracy: 0.8361 - loss: 0.3886 - val_accuracy: 0.8064 - val_loss: 0.4445
Epoch 3/3
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 56ms/step - accuracy: 0.8926 - loss: 0.2733 - val_accuracy: 0.8498 - val_loss: 0.3697
RNN test accuracy: 0.8440
RNN training time: 33.87 seconds


## 5. 1D CNN model

A **1D CNN** applies convolution filters over neighboring words. It is good at
finding local patterns (like short phrases). It can also train faster because
it can process many words in parallel.

In [7]:
# Build 1D CNN model
cnn_model = Sequential([
    Embedding(input_dim=vocab_size, output_dim=embedding_dim, input_length=maxlen),
    Conv1D(filters=32, kernel_size=3, activation='relu'),
    GlobalMaxPooling1D(),
    Dense(1, activation='sigmoid')
])

cnn_model.compile(optimizer=Adam(learning_rate=0.001),
                  loss='binary_crossentropy',
                  metrics=['accuracy'])

cnn_model.summary()

In [8]:
# Train 1D CNN
start_time = time.time()
history_cnn = cnn_model.fit(
    x_train_padded,
    y_train,
    epochs=epochs,
    batch_size=batch_size,
    validation_split=0.2,
    verbose=1
)
cnn_time = time.time() - start_time

# Evaluate on test data
cnn_loss, cnn_acc = cnn_model.evaluate(x_test_padded, y_test, verbose=0)
print(f"CNN test accuracy: {cnn_acc:.4f}")
print(f"CNN training time: {cnn_time:.2f} seconds")

Epoch 1/3
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 25ms/step - accuracy: 0.7046 - loss: 0.6126 - val_accuracy: 0.7838 - val_loss: 0.4813
Epoch 2/3
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 22ms/step - accuracy: 0.8190 - loss: 0.4034 - val_accuracy: 0.8438 - val_loss: 0.3673
Epoch 3/3
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 23ms/step - accuracy: 0.8757 - loss: 0.3045 - val_accuracy: 0.8626 - val_loss: 0.3282
CNN test accuracy: 0.8586
CNN training time: 14.15 seconds


## 6. LSTM model

An **LSTM (Long Short-Term Memory)** network is a special type of RNN that is
better at remembering long-term information and avoiding the vanishing gradient problem.

In [9]:
# Build LSTM model
lstm_model = Sequential([
    Embedding(input_dim=vocab_size, output_dim=embedding_dim, input_length=maxlen),
    LSTM(32),
    Dense(1, activation='sigmoid')
])

lstm_model.compile(optimizer=Adam(learning_rate=0.001),
                   loss='binary_crossentropy',
                   metrics=['accuracy'])

lstm_model.summary()

In [10]:
# Train LSTM
start_time = time.time()
history_lstm = lstm_model.fit(
    x_train_padded,
    y_train,
    epochs=epochs,
    batch_size=batch_size,
    validation_split=0.2,
    verbose=1
)
lstm_time = time.time() - start_time

# Evaluate on test data
lstm_loss, lstm_acc = lstm_model.evaluate(x_test_padded, y_test, verbose=0)
print(f"LSTM test accuracy: {lstm_acc:.4f}")
print(f"LSTM training time: {lstm_time:.2f} seconds")

Epoch 1/3
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m32s[0m 174ms/step - accuracy: 0.7502 - loss: 0.5086 - val_accuracy: 0.8592 - val_loss: 0.3382
Epoch 2/3
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m27s[0m 174ms/step - accuracy: 0.8957 - loss: 0.2647 - val_accuracy: 0.8808 - val_loss: 0.2904
Epoch 3/3
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m40s[0m 166ms/step - accuracy: 0.9300 - loss: 0.1905 - val_accuracy: 0.8704 - val_loss: 0.3216
LSTM test accuracy: 0.8659
LSTM training time: 114.74 seconds


## 7. Compare accuracy and training time

In [11]:
print("\n=== Test Accuracy ===")
print(f"Simple RNN: {rnn_acc:.4f}")
print(f"1D CNN    : {cnn_acc:.4f}")
print(f"LSTM      : {lstm_acc:.4f}")

print("\n=== Training Time (seconds) ===")
print(f"Simple RNN: {rnn_time:.2f}")
print(f"1D CNN    : {cnn_time:.2f}")
print(f"LSTM      : {lstm_time:.2f}")


=== Test Accuracy ===
Simple RNN: 0.8440
1D CNN    : 0.8586
LSTM      : 0.8659

=== Training Time (seconds) ===
Simple RNN: 33.87
1D CNN    : 14.15
LSTM      : 114.74


## 8. Short conclusion 


- **Which model performs best and why?**  
  Usually, the **LSTM** gets the best accuracy because it remembers long-term dependencies in the text better than a simple RNN.

- **Which model was fastest?**  
  Often, the **1D CNN** trains fastest because convolutions can be parallelized and the model is shallow.

- **Which performed best in your run?**  
  The Simple RNN worked better in terms of time accuracy tradeoff, but LSTM gave the highest accuracy .
  
- **Why might LSTM outperform a Simple RNN?**  
  LSTMs have a special internal structure (gates and cell state) that helps them **remember important information for longer** and **forget unimportant information**. This makes them better at understanding the meaning of long sentences.

- **Why might CNN train faster?**  
  CNNs look at several words at once using filters and do not have to process the sequence step-by-step like RNNs. This parallel processing makes training faster while still capturing useful local patterns (like important phrases).