<a href="https://colab.research.google.com/github/sreent/machine-learning/blob/main/Final%20DNN%20Code%20Examples/Movie%20Review/Movie%20Review%20-%20NLP%20Binary%20Classification%20Example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Movie Review - NLP Binary Classification Example

This notebook demonstrates the **Universal ML Workflow** for binary sentiment classification on movie reviews.

## Learning Objectives

By the end of this notebook, you will be able to:
- Apply TF-IDF vectorization to text data for binary classification
- Handle nearly-balanced binary classification
- Build and evaluate binary sentiment classifiers
- Compare binary vs. multi-class NLP approaches
- Use **Hyperband** for efficient hyperparameter tuning

---

## Dataset Overview

| Attribute | Description |
|-----------|-------------|
| **Source** | [NLTK Movie Review Dataset](https://www.kaggle.com/datasets/nltkdata/movie-review) |
| **Problem Type** | Binary Classification (Positive/Negative) |
| **Data Balance** | Nearly Balanced (~51% Positive, ~49% Negative) |
| **Data Type** | Unstructured Text (Movie Reviews) |
| **Input Features** | TF-IDF Vectors (5000 features, bigrams) |

---

## 1. Defining the Problem and Assembling a Dataset

**Problem:** Classify movie reviews as positive or negative sentiment.

**Key difference from Twitter example:** This is binary (2 classes) vs. multi-class (3+ classes), affecting:
- **Output layer:** 1 neuron with sigmoid (binary) vs. N neurons with softmax (multi-class)
- **Loss function:** Binary cross-entropy vs. categorical cross-entropy
- **Label encoding:** Single 0/1 label vs. one-hot vectors

**Why this matters:** Binary classification is simpler and often more robust, making it a good starting point for sentiment analysis tasks.

## 2. Choosing a Measure of Success

For this nearly-balanced binary classification:
- **Balanced Accuracy** still useful for consistency
- Standard **Accuracy** is also meaningful here
- **Precision, Recall, AUC** for comprehensive evaluation

## 3. Deciding on an Evaluation Protocol

Standard hold-out + validation + K-fold cross-validation approach.

## 4. Preparing Your Data

### 4.1 Import Libraries and Load Dataset

In [None]:
import pandas as pd
import numpy as np

from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.model_selection import StratifiedKFold
from sklearn.utils.class_weight import compute_class_weight
from sklearn.metrics import accuracy_score, precision_score, recall_score, roc_auc_score
from sklearn.metrics import balanced_accuracy_score, confusion_matrix, ConfusionMatrixDisplay
from sklearn.feature_extraction.text import TfidfVectorizer

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from keras.utils import np_utils
from keras.models import Sequential
from keras.layers import Dense, Dropout
from tensorflow.keras.optimizers import RMSprop

# Keras Tuner for hyperparameter search
!pip install -q -U keras-tuner
import keras_tuner as kt

import itertools
import matplotlib.pyplot as plt

SEED = 204

tf.random.set_seed(SEED)
np.random.seed(SEED)

import warnings
warnings.filterwarnings('ignore')

In [2]:
reviews = pd.read_csv('movie_review.csv', sep=',')
reviews = reviews[['text', 'tag']]

reviews.head()

Unnamed: 0,text,tag
0,films adapted from comic books have had plenty...,pos
1,"for starters , it was created by alan moore ( ...",pos
2,to say moore and campbell thoroughly researche...,pos
3,"the book ( or "" graphic novel , "" if you will ...",pos
4,"in other words , don't dismiss this film becau...",pos


In [3]:
TEST_SIZE = 0.1

(text_train, text_test, 
 tag_train, tag_test) = train_test_split(reviews['text'], reviews['tag'], 
                                         test_size=TEST_SIZE, stratify=reviews['tag'],
                                                     shuffle=True, random_state=SEED)

In [4]:
MAX_FEATURES = 5000
NGRAMS = 2

tfidf = TfidfVectorizer(ngram_range=(1, NGRAMS), max_features=MAX_FEATURES)
tfidf.fit(text_train)

X_train, X_test = tfidf.transform(text_train).toarray(), tfidf.transform(text_test).toarray()

In [5]:
label_encoder = LabelEncoder()
label_encoder.fit(reviews['tag'])

y_train = label_encoder.transform(tag_train)
y_test = label_encoder.transform(tag_test)

In [6]:
VALIDATION_SIZE = X_test.shape[0]

X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, 
                                                 test_size=VALIDATION_SIZE, stratify=y_train,
                                                 shuffle=True, random_state=SEED)

## 5. Developing a Model That Does Better Than a Baseline

Baseline for nearly-balanced binary: ~51% (majority class)

In [7]:
counts = reviews.groupby(['tag']).count()
counts.reset_index(inplace=True)

counts

Unnamed: 0,tag,text
0,neg,31783
1,pos,32937


In [8]:
# the class is off-balanced, but very minimal 

baseline = counts[counts['tag']=='pos']['text'].values[0] / counts['text'].sum()

baseline

0.5089153275648949

In [9]:
balanced_accuracy_baseline = balanced_accuracy_score(y_train, np.zeros(len(y_train)))

In [12]:
INPUT_DIMENSION = X_train.shape[1]
OUTPUT_DIMENSION = 1

OPTIMIZER = 'rmsprop'
LOSS_FUNC = 'binary_crossentropy'
METRICS = ['accuracy', 
           tf.keras.metrics.Precision(name='precision'), 
           tf.keras.metrics.Recall(name='recall'),
           tf.keras.metrics.AUC(name='auc', multi_label=True)]

2023-02-26 13:00:32.082616: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2023-02-26 13:00:32.082637: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
2023-02-26 13:00:32.082653: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (snottingham): /proc/driver/nvidia/version does not exist
2023-02-26 13:00:32.082850: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [None]:
learning_rate = 0.001

slp_model = Sequential(name='Single_Layer_Perceptron')
slp_model.add(Dense(1, activation='sigmoid', input_shape=(INPUT_DIMENSION,)))
slp_model.compile(optimizer=RMSprop(learning_rate=learning_rate), loss=LOSS_FUNC, metrics=METRICS)

slp_model.summary()

In [None]:
batch_size = 512
EPOCHS = 100

In [15]:
weights = compute_class_weight('balanced', classes=np.unique(y_train), y=y_train)
CLASS_WEIGHTS = dict(enumerate(weights))

CLASS_WEIGHTS

{0: 1.0181303338970387, 1: 0.9825040798512278}

In [None]:
history = slp_model.fit(X_train, y_train, class_weight=CLASS_WEIGHTS, batch_size=batch_size, epochs=500, validation_data=(X_val, y_val), verbose=0)
val_score = slp_model.evaluate(X_val, y_val, verbose=0)[1:]

In [None]:
print('Accuracy (Validation): {:.2f} (baseline={:.2f})'.format(val_score[0], baseline))
print('Precision (Validation): {:.2f}'.format(val_score[1]))
print('Recall (Validation): {:.2f}'.format(val_score[2]))
print('AUC (Validation): {:.2f}'.format(val_score[3]))

In [None]:
preds = slp_model.predict(X_val, verbose=0)

print('Balanced Accuracy (Validation): {:.2f} (baseline={:.2f})'.format(balanced_accuracy_score(y_val, (preds > 0.5).astype('int32')), balanced_accuracy_baseline))

In [19]:
def plot_training_history(history, monitors=['loss', 'AUC']) :

  # using the variable axs for multiple Axes
  fig, axs = plt.subplots(1, 2, sharex='all', figsize=(15,5))
 
  for ax, monitor in zip(axs.flat, monitors) :
    loss, val_loss = history.history[monitor], history.history['val_' + monitor]

    if monitor == 'loss' :
      monitor = monitor.capitalize()

    epochs = range(1, len(loss)+1)

    ax.plot(epochs, loss, 'b.', label=monitor)
    ax.plot(epochs, val_loss, 'r.', label='Validation ' + monitor)
    ax.set_xlim([0, len(loss)])
    ax.title.set_text('Training and Validation ' + monitor + 's')
    ax.set_xlabel('Epochs')
    ax.set_ylabel(monitor)
    ax.legend()
    ax.grid()

  _ = plt.show()

In [None]:
plot_training_history(history, monitors=['loss', 'auc'])

## 6. Scaling Up: Developing a Model That Overfits

Adding hidden layers for increased capacity.

In [None]:
learning_rate = 0.0002

mlp_model = Sequential(name='Multi_Layer_Perceptron')
mlp_model.add(Dense(64, activation='relu', input_shape=(INPUT_DIMENSION,)))
mlp_model.add(Dense(1, activation='sigmoid'))
mlp_model.compile(optimizer=RMSprop(learning_rate=learning_rate), loss=LOSS_FUNC, metrics=METRICS)

mlp_model.summary()

In [None]:
history = mlp_model.fit(X_train, y_train, class_weight=CLASS_WEIGHTS, batch_size=batch_size, epochs=EPOCHS, validation_data=(X_val, y_val), verbose=0)
val_score = mlp_model.evaluate(X_val, y_val, verbose=0)[1:]

In [None]:
plot_training_history(history, monitors=['loss', 'auc'])

In [None]:
print('Accuracy (Validation): {:.2f} (baseline={:.2f})'.format(val_score[0], baseline))
print('Precision (Validation): {:.2f}'.format(val_score[1]))
print('Recall (Validation): {:.2f}'.format(val_score[2]))
print('AUC (Validation): {:.2f}'.format(val_score[3]))

In [None]:
preds = mlp_model.predict(X_val, verbose=0)

print('Balanced Accuracy (Validation): {:.2f} (baseline={:.2f})'.format(balanced_accuracy_score(y_val, (preds > 0.5).astype('int32')), balanced_accuracy_baseline))

## 7. Regularizing Your Model and Tuning Hyperparameters

Using **Hyperband** for efficient hyperparameter tuning with a frozen architecture.

### Why Hyperband?

**Hyperband** is more efficient than grid search because it:
1. Starts training many configurations for a few epochs
2. Eliminates poor performers early
3. Allocates more resources to promising configurations

In [None]:
# Hyperband Model Builder for Binary NLP Classification
def build_model_hyperband(hp):
    """
    Build Movie Review model with FROZEN architecture (2 layers: 64 -> 32 neurons).
    Only tunes regularization (Dropout) and learning rate.
    """
    model = keras.Sequential()
    model.add(layers.Input(shape=(INPUT_DIMENSION,)))

    # Fixed architecture: 2 hidden layers with 64 and 32 neurons
    # Layer 1: 64 neurons
    model.add(layers.Dense(64, activation='relu'))
    drop_0 = hp.Float('drop_0', 0.0, 0.5, step=0.1)
    model.add(layers.Dropout(drop_0))

    # Layer 2: 32 neurons
    model.add(layers.Dense(32, activation='relu'))
    drop_1 = hp.Float('drop_1', 0.0, 0.5, step=0.1)
    model.add(layers.Dropout(drop_1))

    # Output layer for binary classification
    model.add(layers.Dense(OUTPUT_DIMENSION, activation='sigmoid'))

    lr = hp.Float('lr', 1e-4, 1e-2, sampling='log')
    model.compile(
        optimizer=keras.optimizers.Adam(learning_rate=lr),
        loss=LOSS_FUNC,
        metrics=METRICS
    )
    return model

In [None]:
# Configure Hyperband tuner
tuner = kt.Hyperband(
    build_model_hyperband,
    objective='val_auc',
    max_epochs=20,
    factor=3,
    directory='movie_review_hyperband',
    project_name='movie_review_tuning'
)

# Run Hyperband search
tuner.search(
    X_train, y_train,
    validation_data=(X_val, y_val),
    epochs=20,
    batch_size=batch_size,
    class_weight=CLASS_WEIGHTS
)

In [29]:
KFOLDS = 5

In [None]:
# Get best hyperparameters and build best model
best_hp = tuner.get_best_hyperparameters(num_trials=1)[0]
print("Best hyperparameters:")
print(f"  Dropout Layer 1: {best_hp.get('drop_0')}")
print(f"  Dropout Layer 2: {best_hp.get('drop_1')}")
print(f"  Learning Rate: {best_hp.get('lr')}")

opt_model = tuner.hypermodel.build(best_hp)
opt_model.summary()

In [None]:
# Train the optimized model
history = opt_model.fit(X_train, y_train, class_weight=CLASS_WEIGHTS, batch_size=batch_size, epochs=EPOCHS, validation_data=(X_val, y_val), verbose=0)
val_score = opt_model.evaluate(X_val, y_val, verbose=0)[1:]

In [None]:
# Plot training history
plot_training_history(history, monitors=['loss', 'auc'])

---

## Results Summary

The optimized model trained with Hyperband hyperparameters has been evaluated on both validation and test sets. See the Key Takeaways section below for lessons learned.

---

## Appendix: Helper Functions

Reusable functions for binary classification model building and training.

In [None]:
def build_binary_classification_model(input_dimension, hidden_layers=None, 
                                       dropout=None, learning_rate=0.001,
                                       optimizer='rmsprop', loss='binary_crossentropy',
                                       metrics=['accuracy'], name=None):
    """
    Build a binary classification neural network model.
    
    Parameters:
    -----------
    input_dimension : int
        Number of input features
    hidden_layers : list of int, optional
        List of neurons per hidden layer (e.g., [64, 32] for 2 hidden layers)
    dropout : float, optional
        Dropout rate to apply after each hidden layer (0.0 to 1.0)
    learning_rate : float
        Learning rate for the optimizer
    optimizer : str
        Optimizer name ('rmsprop', 'adam', etc.)
    loss : str
        Loss function (default: 'binary_crossentropy')
    metrics : list
        Metrics to track during training
    name : str, optional
        Model name
    
    Returns:
    --------
    keras.Sequential : Compiled model
    """
    from keras.models import Sequential
    from keras.layers import Dense, Dropout
    from tensorflow.keras.optimizers import RMSprop, Adam
    
    model = Sequential(name=name)
    
    # Add hidden layers if specified
    if hidden_layers:
        for i, neurons in enumerate(hidden_layers):
            if i == 0:
                model.add(Dense(neurons, activation='relu', input_shape=(input_dimension,)))
            else:
                model.add(Dense(neurons, activation='relu'))
            if dropout and dropout > 0:
                model.add(Dropout(dropout))
    
    # Output layer for binary classification
    if hidden_layers:
        model.add(Dense(1, activation='sigmoid'))
    else:
        model.add(Dense(1, activation='sigmoid', input_shape=(input_dimension,)))
    
    # Select optimizer
    if optimizer == 'rmsprop':
        opt = RMSprop(learning_rate=learning_rate)
    elif optimizer == 'adam':
        opt = Adam(learning_rate=learning_rate)
    else:
        opt = optimizer
    
    model.compile(optimizer=opt, loss=loss, metrics=metrics)
    
    return model

In [None]:
# Print validation results
print('Accuracy (Validation): {:.2f} (baseline={:.2f})'.format(val_score[0], baseline))
print('Precision (Validation): {:.2f}'.format(val_score[1]))
print('Recall (Validation): {:.2f}'.format(val_score[2]))
print('AUC (Validation): {:.2f}'.format(val_score[3]))

preds = opt_model.predict(X_val, verbose=0)
print('Balanced Accuracy (Validation): {:.2f} (baseline={:.2f})'.format(balanced_accuracy_score(y_val, (preds > 0.5).astype('int32')), balanced_accuracy_baseline))

In [None]:
# Evaluate the optimized model on test data
test_preds = opt_model.predict(X_test, verbose=0)

print('Accuracy (Test): {:.2f} (baseline={:.2f})'.format(accuracy_score(y_test, (test_preds > 0.5).astype('int32')), baseline))
print('Precision (Test): {:.2f}'.format(precision_score(y_test, (test_preds > 0.5).astype('int32'))))
print('Recall (Test): {:.2f}'.format(recall_score(y_test, (test_preds > 0.5).astype('int32'))))
print('AUC (Test): {:.2f}'.format(roc_auc_score(y_test, test_preds)))
print('Balanced Accuracy (Test): {:.2f} (baseline={:.2f})'.format(balanced_accuracy_score(y_test, (test_preds > 0.5).astype('int32')), balanced_accuracy_baseline))

---

## 8. Key Takeaways

### Binary vs. Multi-Class Classification

| Aspect | Binary (This Notebook) | Multi-Class (Twitter Airline) |
|--------|------------------------|------------------------------|
| **Output Layer** | 1 neuron, sigmoid | N neurons, softmax |
| **Loss Function** | Binary cross-entropy | Categorical cross-entropy |
| **Labels** | Single 0/1 value | One-hot vectors |
| **Prediction** | Threshold at 0.5 | argmax of probabilities |

### Key Lessons Learned

1. **Binary Classification is Simpler:** With only two classes, the model architecture and loss function are more straightforward.

2. **Class Weights Optional for Balanced Data:** With ~51:49 class distribution, class weights have minimal impact (weights ~1.0 for both classes).

3. **Same TF-IDF Pipeline Works:** The text preprocessing approach (TF-IDF with bigrams) applies equally well to binary and multi-class NLP.

4. **Hyperband Efficiently Finds Optimal Settings:** The early-stopping approach of Hyperband quickly identifies effective dropout rates and learning rates.

5. **Balanced Accuracy Baseline:** For binary classification, random guessing achieves 50% balanced accuracy (vs. 33% for 3-class problems).