# Customer Review Sentiment Analysis

This notebook provides an interactive exploration of customer review sentiment analysis using various deep learning models.

## Overview
- **Objective**: Classify customer reviews as positive or negative
- **Models**: Neural Bag of Words (NBoW), LSTM, CNN, Transformer (BERT)
- **Framework**: PyTorch with modern NLP techniques

## Table of Contents
1. [Setup and Data Loading](#setup)
2. [Data Exploration](#exploration)
3. [Text Preprocessing](#preprocessing)
4. [Model Training](#training)
5. [Model Evaluation](#evaluation)
6. [Inference and Prediction](#inference)
7. [Model Comparison](#comparison)

## 1. Setup and Data Loading {#setup}

First, let's import the necessary libraries and load our data.

In [None]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from collections import Counter
import warnings
warnings.filterwarnings('ignore')

# Set style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

# Import our custom modules
import sys
import os
sys.path.append('..')

try:
    import torch
    import torch.nn as nn
    from torch.utils.data import DataLoader
    print(f"PyTorch version: {torch.__version__}")
    print(f"Device: {'CUDA' if torch.cuda.is_available() else 'CPU'}")
except ImportError:
    print("PyTorch not installed. Please run: pip install -r requirements.txt")

# Import our modules
from utils.data_loader import ReviewDataset, CustomDataLoader
from utils.preprocessing import TextPreprocessor, VocabularyBuilder
from utils.training import Trainer
from utils.visualization import *
from models.nbow import NBoW
from models.lstm import LSTMModel
from models.cnn import CNNModel
from models.transformer import TransformerModel

In [None]:
# Load sample data
data_path = '../data/sample_reviews.csv'

try:
    df = pd.read_csv(data_path)
    print(f"Data loaded successfully: {len(df)} reviews")
    print(f"Columns: {df.columns.tolist()}")
    display(df.head())
except FileNotFoundError:
    print(f"Data file not found at {data_path}")
    print("Creating sample data...")
    
    # Create sample data
    sample_reviews = [
        ("This product is amazing! I love it so much.", "positive"),
        ("Terrible quality, waste of money.", "negative"),
        ("Great value for money, highly recommend.", "positive"),
        ("Poor customer service, disappointed.", "negative"),
        ("Excellent product, exceeded my expectations.", "positive"),
        ("Not worth the price, very disappointed.", "negative"),
        ("Perfect for my needs, very satisfied.", "positive"),
        ("Broke after one day, terrible quality.", "negative"),
        ("Outstanding quality and fast delivery.", "positive"),
        ("Completely useless, don't buy this.", "negative")
    ]
    
    df = pd.DataFrame(sample_reviews, columns=['review_text', 'sentiment'])
    print("Sample data created:")
    display(df)

## 2. Data Exploration {#exploration}

Let's explore our dataset to understand its characteristics.

In [None]:
# Basic statistics
print("Dataset Overview:")
print(f"Total reviews: {len(df)}")
print(f"Columns: {df.columns.tolist()}")
print(f"Data types: {df.dtypes.to_dict()}")
print(f"Missing values: {df.isnull().sum().to_dict()}")

# Sentiment distribution
print("\nSentiment Distribution:")
sentiment_counts = df['sentiment'].value_counts()
print(sentiment_counts)
print(f"Positive ratio: {sentiment_counts.get('positive', 0) / len(df):.2%}")
print(f"Negative ratio: {sentiment_counts.get('negative', 0) / len(df):.2%}")

In [None]:
# Plot sentiment distribution
plt.figure(figsize=(10, 6))

# Sentiment distribution
plt.subplot(1, 2, 1)
sentiment_counts = df['sentiment'].value_counts()
colors = ['#FF6B6B', '#4ECDC4']
plt.pie(sentiment_counts.values, labels=sentiment_counts.index, autopct='%1.1f%%', colors=colors)
plt.title('Sentiment Distribution')

# Text length distribution
plt.subplot(1, 2, 2)
text_lengths = df['review_text'].str.len()
plt.hist(text_lengths, bins=20, alpha=0.7, color='skyblue', edgecolor='black')
plt.axvline(text_lengths.mean(), color='red', linestyle='--', label=f'Mean: {text_lengths.mean():.1f}')
plt.xlabel('Review Length (characters)')
plt.ylabel('Frequency')
plt.title('Review Length Distribution')
plt.legend()

plt.tight_layout()
plt.show()

In [None]:
# Word count analysis
df['word_count'] = df['review_text'].str.split().str.len()

print("Word Count Statistics:")
print(df['word_count'].describe())

# Word count by sentiment
plt.figure(figsize=(10, 6))
for sentiment in df['sentiment'].unique():
    subset = df[df['sentiment'] == sentiment]
    plt.hist(subset['word_count'], alpha=0.7, label=sentiment, bins=10)

plt.xlabel('Word Count')
plt.ylabel('Frequency')
plt.title('Word Count Distribution by Sentiment')
plt.legend()
plt.show()

## 3. Text Preprocessing {#preprocessing}

Now let's preprocess the text data for our models.

In [None]:
# Initialize preprocessor
preprocessor = TextPreprocessor()

# Example preprocessing
sample_text = "This product is AMAZING!!! I love it so much. Best purchase ever! 😍"

print("Text Preprocessing Example:")
print(f"Original: {sample_text}")
print(f"Cleaned: {preprocessor.clean_text(sample_text)}")
print(f"Tokenized: {preprocessor.tokenize(sample_text)}")
print(f"Preprocessed: {preprocessor.preprocess_text(sample_text)}")

In [None]:
# Preprocess all reviews
print("Preprocessing all reviews...")
texts = df['review_text'].astype(str).tolist()
processed_texts = [preprocessor.preprocess_text(text) for text in texts]

# Show examples
print("\nProcessing Examples:")
for i in range(min(3, len(texts))):
    print(f"Original: {texts[i]}")
    print(f"Processed: {processed_texts[i]}")
    print("-" * 50)

In [None]:
# Build vocabulary
vocab_builder = VocabularyBuilder(min_freq=1, max_vocab_size=1000)
vocab = vocab_builder.build_from_texts(processed_texts)

print(f"Vocabulary size: {len(vocab)}")
print(f"Sample vocabulary: {list(vocab.items())[:10]}")

# Convert text to indices
max_length = 50
text_indices = [vocab_builder.text_to_indices(tokens, max_length) for tokens in processed_texts]

print(f"\nText to indices example:")
print(f"Tokens: {processed_texts[0]}")
print(f"Indices: {text_indices[0]}")

## 4. Model Training {#training}

Let's train different models and compare their performance.

In [None]:
# Prepare data for training
from sklearn.model_selection import train_test_split

# Convert labels to integers
labels = df['sentiment'].tolist()
label_to_idx = {'negative': 0, 'positive': 1}
y = [label_to_idx[label] for label in labels]

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    text_indices, y, test_size=0.2, random_state=42, stratify=y
)

# Create datasets
train_dataset = ReviewDataset(X_train, y_train)
test_dataset = ReviewDataset(X_test, y_test)

# Create data loaders
train_loader = DataLoader(train_dataset, batch_size=8, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=8, shuffle=False)

print(f"Training samples: {len(X_train)}")
print(f"Test samples: {len(X_test)}")
print(f"Vocabulary size: {len(vocab)}")

In [None]:
# Train Neural Bag of Words model
print("Training Neural Bag of Words (NBoW) model...")

# Create model
nbow_model = NBoW(
    vocab_size=len(vocab),
    embed_dim=50,
    hidden_dim=64,
    num_classes=2,
    dropout=0.5
)

# Initialize trainer
device = 'cuda' if torch.cuda.is_available() else 'cpu'
trainer = Trainer(nbow_model, device=device)

# Train model
history = trainer.fit(
    train_loader=train_loader,
    val_loader=test_loader,  # Using test as validation for demo
    epochs=10,
    lr=1e-3,
    early_stopping_patience=5
)

print("NBoW training completed!")

In [None]:
# Plot training history
plot_training_history(history)

In [None]:
# Evaluate model
print("Evaluating NBoW model...")
results = trainer.evaluate(test_loader)

print(f"Test Accuracy: {results['accuracy']:.4f}")
print(f"Test Precision: {results['precision']:.4f}")
print(f"Test Recall: {results['recall']:.4f}")
print(f"Test F1-Score: {results['f1']:.4f}")

# Plot confusion matrix
class_names = ['Negative', 'Positive']
plot_confusion_matrix(results['confusion_matrix'], class_names)

## 5. Model Evaluation {#evaluation}

Let's evaluate our model's performance in detail.

In [None]:
# Classification report
from sklearn.metrics import classification_report

print("Classification Report:")
print(classification_report(results['labels'], results['predictions'], 
                          target_names=class_names))

# Plot classification report
plot_classification_report(results['labels'], results['predictions'], class_names)

In [None]:
# Analyze predictions
test_df = pd.DataFrame({
    'review': [texts[i] for i in range(len(X_test))],
    'true_label': [class_names[label] for label in results['labels']],
    'predicted_label': [class_names[pred] for pred in results['predictions']],
    'confidence': [max(prob) for prob in results['probabilities']]
})

# Show correct predictions
correct_predictions = test_df[test_df['true_label'] == test_df['predicted_label']]
print(f"Correct predictions: {len(correct_predictions)}")
print("\nSample correct predictions:")
display(correct_predictions.head())

# Show incorrect predictions
incorrect_predictions = test_df[test_df['true_label'] != test_df['predicted_label']]
print(f"\nIncorrect predictions: {len(incorrect_predictions)}")
if len(incorrect_predictions) > 0:
    print("Sample incorrect predictions:")
    display(incorrect_predictions.head())

## 6. Inference and Prediction {#inference}

Let's test our model on new reviews.

In [None]:
def predict_sentiment(text, model, vocab_builder, preprocessor, device):
    """Predict sentiment for a single text."""
    model.eval()
    
    # Preprocess text
    tokens = preprocessor.preprocess_text(text)
    indices = vocab_builder.text_to_indices(tokens, max_length)
    
    # Convert to tensor
    input_tensor = torch.tensor(indices, dtype=torch.long).unsqueeze(0).to(device)
    
    # Make prediction
    with torch.no_grad():
        outputs = model(input_tensor)
        probabilities = torch.softmax(outputs, dim=1)
        predicted_class = torch.argmax(outputs, dim=1).item()
        confidence = probabilities[0][predicted_class].item()
    
    return {
        'sentiment': class_names[predicted_class],
        'confidence': confidence,
        'probabilities': {
            'negative': probabilities[0][0].item(),
            'positive': probabilities[0][1].item()
        }
    }

# Test on new reviews
test_reviews = [
    "This product exceeded my expectations!",
    "Terrible quality, complete waste of money.",
    "Pretty good, but could be better.",
    "Absolutely love this! Highly recommend.",
    "Not bad, but not great either."
]

print("Testing on new reviews:")
for review in test_reviews:
    result = predict_sentiment(review, nbow_model, vocab_builder, preprocessor, device)
    print(f"Review: {review}")
    print(f"Sentiment: {result['sentiment']} (confidence: {result['confidence']:.3f})")
    print(f"Probabilities: {result['probabilities']}")
    print("-" * 50)

In [None]:
# Interactive prediction
def interactive_prediction():
    """Interactive sentiment prediction."""
    print("Interactive Sentiment Analysis")
    print("Enter a review to analyze (or 'quit' to exit):")
    
    while True:
        user_input = input("\nYour review: ")
        
        if user_input.lower() == 'quit':
            break
        
        if user_input.strip():
            result = predict_sentiment(user_input, nbow_model, vocab_builder, preprocessor, device)
            print(f"Sentiment: {result['sentiment']}")
            print(f"Confidence: {result['confidence']:.3f}")
            print(f"Probabilities: {result['probabilities']}")
        else:
            print("Please enter a valid review.")

# Uncomment to run interactive prediction
# interactive_prediction()

## 7. Model Comparison {#comparison}

Let's compare different model architectures.

In [None]:
# Function to train and evaluate a model
def train_and_evaluate_model(model, model_name, train_loader, test_loader, epochs=10):
    """Train and evaluate a model."""
    print(f"Training {model_name}...")
    
    trainer = Trainer(model, device=device)
    
    history = trainer.fit(
        train_loader=train_loader,
        val_loader=test_loader,
        epochs=epochs,
        lr=1e-3,
        early_stopping_patience=5
    )
    
    results = trainer.evaluate(test_loader)
    
    return {
        'model': model,
        'trainer': trainer,
        'history': history,
        'results': results
    }

# Compare models
model_results = {}

# NBoW (already trained)
model_results['NBoW'] = {
    'accuracy': results['accuracy'],
    'precision': results['precision'],
    'recall': results['recall'],
    'f1': results['f1']
}

# LSTM Model
try:
    lstm_model = LSTMModel(
        vocab_size=len(vocab),
        embed_dim=50,
        hidden_dim=64,
        num_classes=2,
        num_layers=2,
        dropout=0.5
    )
    
    lstm_result = train_and_evaluate_model(lstm_model, 'LSTM', train_loader, test_loader, epochs=5)
    model_results['LSTM'] = {
        'accuracy': lstm_result['results']['accuracy'],
        'precision': lstm_result['results']['precision'],
        'recall': lstm_result['results']['recall'],
        'f1': lstm_result['results']['f1']
    }
except Exception as e:
    print(f"Error training LSTM: {e}")

# CNN Model
try:
    cnn_model = CNNModel(
        vocab_size=len(vocab),
        embed_dim=50,
        num_classes=2,
        filter_sizes=[3, 4, 5],
        num_filters=50,
        dropout=0.5
    )
    
    cnn_result = train_and_evaluate_model(cnn_model, 'CNN', train_loader, test_loader, epochs=5)
    model_results['CNN'] = {
        'accuracy': cnn_result['results']['accuracy'],
        'precision': cnn_result['results']['precision'],
        'recall': cnn_result['results']['recall'],
        'f1': cnn_result['results']['f1']
    }
except Exception as e:
    print(f"Error training CNN: {e}")

print("\nModel Comparison Results:")
comparison_df = pd.DataFrame(model_results).T
display(comparison_df)

In [None]:
# Plot model comparison
if len(model_results) > 1:
    plot_model_comparison(model_results)
else:
    print("Need at least 2 models for comparison")

## Summary

This notebook demonstrated:

1. **Data Loading and Exploration**: Understanding the dataset structure and characteristics
2. **Text Preprocessing**: Cleaning and preparing text data for neural networks
3. **Model Training**: Training different deep learning models for sentiment analysis
4. **Evaluation**: Comprehensive evaluation of model performance
5. **Inference**: Making predictions on new data
6. **Model Comparison**: Comparing different architectures

### Key Takeaways:
- Text preprocessing is crucial for good model performance
- Different model architectures have different strengths
- Proper evaluation metrics help understand model behavior
- Interactive prediction enables practical use cases

### Next Steps:
1. Try the transformer model with pre-trained embeddings
2. Experiment with different hyperparameters
3. Use larger datasets for better generalization
4. Implement more sophisticated preprocessing techniques
5. Deploy the model for real-world applications