### The IMDb dataset is a binary sentiment classification dataset with 50,000 movie reviews (25,000 for training and 25,000 for testing), evenly split between positive and negative sentiments. We’ll use an RNN with LSTM layers to train a model that classifies reviews as positive (1) or negative (0). LSTMs are well-suited for this task because they can capture long-term dependencies in sequential text data, overcoming limitations of traditional RNNs like vanishing gradients

### Steps
- Load and Preprocess the Dataset: Use TensorFlow’s built-in IMDb dataset, preprocess the text by tokenizing and padding sequences.
- Build the Model: Create an RNN-LSTM model using Keras.
- Train the Model: Train on the training set and validate on a portion of it.
- Evaluate the Model: Test the model’s performance on the test set.
- User Input Section: Allow users to input custom text and predict its sentiment.

In [1]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, Dropout
from tensorflow.keras.datasets import imdb

# 1. Load and Preprocess the IMDb Dataset
max_words = 10000  # Use the top 10,000 most frequent words
max_len = 200      # Maximum length of each review (truncate/pad to this length)

In [2]:
# Load the dataset
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_words)

# Pad sequences to ensure uniform input length
x_train = pad_sequences(x_train, maxlen=max_len)
x_test = pad_sequences(x_test, maxlen=max_len)

# Print dataset info
print(f"Training samples: {len(x_train)}, Test samples: {len(x_test)}")

# 2. Build the RNN-LSTM Model
model = Sequential()
model.add(Embedding(input_dim=max_words, output_dim=128, input_length=max_len))
model.add(LSTM(64, return_sequences=True))
model.add(Dropout(0.5))  # Prevent overfitting
model.add(LSTM(32))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))  # Binary classification (positive/negative)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz
Training samples: 25000, Test samples: 25000


In [3]:
# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Model summary
model.summary()

# 3. Train the Model
history = model.fit(x_train, y_train, epochs=5, batch_size=32, validation_split=0.2, verbose=1)

# 4. Evaluate the Model
test_loss, test_accuracy = model.evaluate(x_test, y_test, verbose=0)
print(f"\nTest Loss: {test_loss:.4f}, Test Accuracy: {test_accuracy:.4f}")

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, 200, 128)          1280000   
                                                                 
 lstm (LSTM)                 (None, 200, 64)           49408     
                                                                 
 dropout (Dropout)           (None, 200, 64)           0         
                                                                 
 lstm_1 (LSTM)               (None, 32)                12416     
                                                                 
 dropout_1 (Dropout)         (None, 32)                0         
                                                                 
 dense (Dense)               (None, 1)                 33        
                                                                 
Total params: 1,341,857
Trainable params: 1,341,857
Non-

In [None]:
# 5. User Input Section
def preprocess_text(text, word_index, max_len):
    # Convert text to sequence of word indices
    tokens = text.lower().split()
    sequence = [word_index.get(word, 0) if word in word_index and word_index[word] < max_words else 0 for word in tokens]
    # Pad the sequence
    padded_sequence = pad_sequences([sequence], maxlen=max_len)
    return padded_sequence

# Get the word index from the IMDb dataset
word_index = imdb.get_word_index()

print("\n--- Sentiment Prediction for User Input ---")
while True:
    user_text = input("Enter a movie review (or 'quit' to exit): ")
    if user_text.lower() == 'quit':
        break
    # Preprocess user input
    processed_input = preprocess_text(user_text, word_index, max_len)
    # Predict sentiment
    prediction = model.predict(processed_input)[0][0]
    sentiment = "Positive" if prediction >= 0.5 else "Negative"
    print(f"Predicted Sentiment: {sentiment} (Confidence: {prediction:.4f})")

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb_word_index.json

--- Sentiment Prediction for User Input ---
Enter a movie review (or 'quit' to exit): This movie is good
Predicted Sentiment: Positive (Confidence: 0.5483)
Enter a movie review (or 'quit' to exit): This movie is bad
Predicted Sentiment: Positive (Confidence: 0.5359)
