# Model Training - IMDb Movie Reviews Sentiment Analysis 
## Introduction
This notebook focuses on **model training and comparison** for the IMDb Movie Reviews sentiment analysis, building upon the preprocessed data. We'll implement and compare both traditional machine learning and deep learning approaches to find the best performing model.

**Dataset:** IMDB Dataset of 50K Movie Reviews (Kaggle)

**Objective:** Train, evaluate, and compare various ML models for sentiment classification

**Author:** NGUYEN Ngoc Dang Nguyen - Final-year Student in Computer Science, Aix-Marseille University

**Training Pipeline:**
1. Setup and load preprocessed data
2. Traditional ML models (Logistic Regression, Naive Bayes, Random Forest, SVM)
3. Deep Learning models (LSTM, CNN, Bidirectional LSTM)
4. Advanced approaches (Transfer Learning with pre-trained embeddings)
5. Model comparison and performance analysis
6. Hyperparameter tuning for best models
7. Final model selection and saving
8. Deployment preparation

## 1. Load Libraries and Setup

In [None]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pickle
import os
import sys
import time
from datetime import datetime

# Add src to path
sys.path.append('../src')

# ML libraries
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import MultinomialNB
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import LinearSVC, SVC
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix, f1_score, precision_score, recall_score, roc_auc_score

# TensorFlow/Keras - compatible import
import tensorflow as tf
try:
    from keras.models import Sequential
    from keras.layers import Dense, Embedding, LSTM, Dropout, Bidirectional, Conv1D, MaxPooling1D, GlobalMaxPooling1D, SpatialDropout1D
    from keras.optimizers import Adam
    from keras.callbacks import EarlyStopping, ReduceLROnPlateau, ModelCheckpoint
    from keras.utils import to_categorical
    print("Using standalone Keras")
except ImportError:
    try:
        from tensorflow.keras.models import Sequential
        from tensorflow.keras.layers import Dense, Embedding, LSTM, Dropout, Bidirectional, Conv1D, MaxPooling1D, GlobalMaxPooling1D, SpatialDropout1D
        from tensorflow.keras.optimizers import Adam
        from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau, ModelCheckpoint
        from tensorflow.keras.utils import to_categorical
        print("Using tensorflow.keras")
    except ImportError:
        print("Warning: Could not import Keras modules")

# Import from our modules
from config import *

# Create directories for saving results
os.makedirs('results/plots', exist_ok=True)
os.makedirs('results/reports', exist_ok=True)
os.makedirs('models/saved_models', exist_ok=True)

print("Libraries imported successfully!")

## 2. Load Preprocessed Data

In [None]:
print("LOADING PREPROCESSED DATA")
print("="*50)

# Load traditional ML data
ml_data = np.load('data/processed/traditional_ml_data.npz')
X_train_tfidf = ml_data['X_train']
X_val_tfidf = ml_data['X_val']
X_test_tfidf = ml_data['X_test']
y_train = ml_data['y_train']
y_val = ml_data['y_val']
y_test = ml_data['y_test']

print(f"Traditional ML data loaded:")
print(f"  Train: {X_train_tfidf.shape}, Val: {X_val_tfidf.shape}, Test: {X_test_tfidf.shape}")

# Load deep learning data
dl_data = np.load('data/processed/deep_learning_data.npz')
X_train_seq = dl_data['X_train']
X_val_seq = dl_data['X_val']
X_test_seq = dl_data['X_test']

print(f"Deep Learning data loaded:")
print(f"  Train: {X_train_seq.shape}, Val: {X_val_seq.shape}, Test: {X_test_seq.shape}")

# Load preprocessing config
with open('models/preprocessors/preprocessing_config.pkl', 'rb') as f:
    config = pickle.load(f)

MAX_SEQUENCE_LENGTH = config['max_sequence_length']
VOCAB_SIZE = config['vocab_size']

print(f"Config loaded: Max sequence length: {MAX_SEQUENCE_LENGTH}, Vocab size: {VOCAB_SIZE}")

# Check class distribution
print(f"\nClass distribution:")
print(f"Train: {np.bincount(y_train)} ({np.bincount(y_train)/len(y_train)*100}%)")
print(f"Val: {np.bincount(y_val)} ({np.bincount(y_val)/len(y_val)*100}%)")
print(f"Test: {np.bincount(y_test)} ({np.bincount(y_test)/len(y_test)*100}%)")


## 3. Traditional Machine Learning Models

In [None]:
print("\nTRADITIONAL MACHINE LEARNING MODELS")
print("="*50)

# Clear any existing model_results to avoid duplicates from re-running
model_results = {}

def evaluate_model(model, X_train, X_val, X_test, y_train, y_val, y_test, model_name):
    """Comprehensive model evaluation function"""
    
    print(f"\nTraining {model_name}...")
    start_time = time.time()
    
    # Train the model
    model.fit(X_train, y_train)
    training_time = time.time() - start_time
    
    # Make predictions
    y_train_pred = model.predict(X_train)
    y_val_pred = model.predict(X_val)
    y_test_pred = model.predict(X_test)
    
    # Get prediction probabilities for ROC-AUC (if available)
    try:
        y_train_proba = model.predict_proba(X_train)[:, 1]
        y_val_proba = model.predict_proba(X_val)[:, 1]
        y_test_proba = model.predict_proba(X_test)[:, 1]
    except:
        # For models without predict_proba (like SVM with linear kernel)
        y_train_proba = model.decision_function(X_train) if hasattr(model, 'decision_function') else None
        y_val_proba = model.decision_function(X_val) if hasattr(model, 'decision_function') else None
        y_test_proba = model.decision_function(X_test) if hasattr(model, 'decision_function') else None
    
    # Calculate metrics
    metrics = {
        'model_name': model_name,
        'training_time': training_time,
        
        # Accuracy
        'train_accuracy': accuracy_score(y_train, y_train_pred),
        'val_accuracy': accuracy_score(y_val, y_val_pred),
        'test_accuracy': accuracy_score(y_test, y_test_pred),
        
        # Precision, Recall, F1
        'train_precision': precision_score(y_train, y_train_pred),
        'val_precision': precision_score(y_val, y_val_pred),
        'test_precision': precision_score(y_test, y_test_pred),
        
        'train_recall': recall_score(y_train, y_train_pred),
        'val_recall': recall_score(y_val, y_val_pred),
        'test_recall': recall_score(y_test, y_test_pred),
        
        'train_f1': f1_score(y_train, y_train_pred),
        'val_f1': f1_score(y_val, y_val_pred),
        'test_f1': f1_score(y_test, y_test_pred),
    }
    
    # ROC-AUC if probabilities available
    if y_val_proba is not None:
        metrics.update({
            'train_roc_auc': roc_auc_score(y_train, y_train_proba),
            'val_roc_auc': roc_auc_score(y_val, y_val_proba),
            'test_roc_auc': roc_auc_score(y_test, y_test_proba),
        })
    
    # Print results
    print(f"  Training time: {training_time:.2f}s")
    print(f"  Val Accuracy: {metrics['val_accuracy']:.4f}")
    print(f"  Val F1-Score: {metrics['val_f1']:.4f}")
    if 'val_roc_auc' in metrics:
        print(f"  Val ROC-AUC: {metrics['val_roc_auc']:.4f}")
    
    return model, metrics

# 1. Logistic Regression
print("1. Logistic Regression")
lr_model = LogisticRegression(
    random_state=RANDOM_STATE,
    max_iter=1000,
    C=1.0
)
lr_trained, lr_metrics = evaluate_model(
    lr_model, X_train_tfidf, X_val_tfidf, X_test_tfidf, 
    y_train, y_val, y_test, "Logistic Regression"
)
model_results['Logistic Regression'] = lr_metrics

# 2. Multinomial Naive Bayes
print("\n2. Multinomial Naive Bayes")
nb_model = MultinomialNB(alpha=1.0)
nb_trained, nb_metrics = evaluate_model(
    nb_model, X_train_tfidf, X_val_tfidf, X_test_tfidf,
    y_train, y_val, y_test, "Naive Bayes"
)
model_results['Naive Bayes'] = nb_metrics

# 3. Random Forest
print("\n3. Random Forest")
rf_model = RandomForestClassifier(
    n_estimators=100,
    random_state=RANDOM_STATE,
    max_depth=20,
    min_samples_split=5,
    n_jobs=-1
)
rf_trained, rf_metrics = evaluate_model(
    rf_model, X_train_tfidf, X_val_tfidf, X_test_tfidf,
    y_train, y_val, y_test, "Random Forest"
)
model_results['Random Forest'] = rf_metrics

# Import LinearSVC
from sklearn.svm import LinearSVC
from sklearn.calibration import CalibratedClassifierCV

# 4. Support Vector Machine (Linear) 
print("\n4. Support Vector Machine (LinearSVC)")
linear_svm_model = LinearSVC(
    C=1.0,
    random_state=RANDOM_STATE,
    max_iter=2000,  # Increased maximum iterations
    dual=False      # Faster when n_samples > n_features
)

try:
    # If probabilities are needed, CalibratedClassifierCV can be used to wrap LinearSVC
    if False:  # Set to True if you really need probabilities
        calibrated_svm = CalibratedClassifierCV(linear_svm_model, cv=3)
        linear_svm_trained, linear_svm_metrics = evaluate_model(
            calibrated_svm, X_train_tfidf, X_val_tfidf, X_test_tfidf,
            y_train, y_val, y_test, "LinearSVC (Calibrated)"
        )
    else:
        linear_svm_trained, linear_svm_metrics = evaluate_model(
            linear_svm_model, X_train_tfidf, X_val_tfidf, X_test_tfidf,
            y_train, y_val, y_test, "LinearSVC"
        )
    
    model_results['LinearSVC'] = linear_svm_metrics

except Exception as e:
    print(f"\nERROR: {e}")
    print("Please run cells from the beginning to ensure all data has been loaded correctly.")

## 4. Deep Learning Models

In [None]:
print("\nDEEP LEARNING MODELS")
print("="*50)

# Common training parameters
BATCH_SIZE = 32
EPOCHS = 10
EMBEDDING_DIM = 100

# Callbacks for training
early_stopping = EarlyStopping(
    monitor='val_accuracy',
    patience=3,
    restore_best_weights=True,
    verbose=1
)

reduce_lr = ReduceLROnPlateau(
    monitor='val_loss',
    factor=0.2,
    patience=2,
    min_lr=0.0001,
    verbose=1
)

def train_deep_model(model, model_name, X_train, X_val, y_train, y_val):
    """Train deep learning model with callbacks"""
    
    print(f"\nTraining {model_name}...")
    start_time = time.time()
    
    # Model checkpoint
    checkpoint = ModelCheckpoint(
        f'models/saved_models/{model_name.lower().replace(" ", "_")}_best.h5',
        monitor='val_accuracy',
        save_best_only=True,
        verbose=0
    )
    
    # Train model
    history = model.fit(
        X_train, y_train,
        batch_size=BATCH_SIZE,
        epochs=EPOCHS,
        validation_data=(X_val, y_val),
        callbacks=[early_stopping, reduce_lr, checkpoint],
        verbose=1
    )
    
    training_time = time.time() - start_time
    
    # Evaluate model
    train_loss, train_acc = model.evaluate(X_train, y_train, verbose=0)
    val_loss, val_acc = model.evaluate(X_val, y_val, verbose=0)
    
    # Get predictions for detailed metrics
    y_train_pred_proba = model.predict(X_train, verbose=0)
    y_val_pred_proba = model.predict(X_val, verbose=0)
    
    y_train_pred = (y_train_pred_proba > 0.5).astype(int).flatten()
    y_val_pred = (y_val_pred_proba > 0.5).astype(int).flatten()
    
    # Calculate metrics
    metrics = {
        'model_name': model_name,
        'training_time': training_time,
        'epochs_trained': len(history.history['loss']),
        
        'train_accuracy': accuracy_score(y_train, y_train_pred),
        'val_accuracy': accuracy_score(y_val, y_val_pred),
        
        'train_precision': precision_score(y_train, y_train_pred),
        'val_precision': precision_score(y_val, y_val_pred),
        
        'train_recall': recall_score(y_train, y_train_pred),
        'val_recall': recall_score(y_val, y_val_pred),
        
        'train_f1': f1_score(y_train, y_train_pred),
        'val_f1': f1_score(y_val, y_val_pred),
        
        'train_roc_auc': roc_auc_score(y_train, y_train_pred_proba.flatten()),
        'val_roc_auc': roc_auc_score(y_val, y_val_pred_proba.flatten()),
        
        'train_loss': train_loss,
        'val_loss': val_loss,
    }
    
    print(f"  Training time: {training_time:.2f}s")
    print(f"  Epochs trained: {len(history.history['loss'])}")
    print(f"  Val Accuracy: {val_acc:.4f}")
    print(f"  Val F1-Score: {metrics['val_f1']:.4f}")
    print(f"  Val ROC-AUC: {metrics['val_roc_auc']:.4f}")
    
    return model, history, metrics

# 1. Simple LSTM Model
print("1. LSTM Model")
lstm_model = Sequential([
    Embedding(VOCAB_SIZE, EMBEDDING_DIM, input_length=MAX_SEQUENCE_LENGTH),
    SpatialDropout1D(0.2),
    LSTM(64, dropout=0.2, recurrent_dropout=0.2),
    Dense(32, activation='relu'),
    Dropout(0.5),
    Dense(1, activation='sigmoid')
])

lstm_model.compile(
    optimizer=Adam(learning_rate=0.001),
    loss='binary_crossentropy',
    metrics=['accuracy']
)

lstm_model.summary()

lstm_trained, lstm_history, lstm_metrics = train_deep_model(
    lstm_model, "LSTM", X_train_seq, X_val_seq, y_train, y_val
)
model_results['LSTM'] = lstm_metrics

# 2. CNN Model
print("\n2. CNN Model")
cnn_model = Sequential([
    Embedding(VOCAB_SIZE, EMBEDDING_DIM, input_length=MAX_SEQUENCE_LENGTH),
    SpatialDropout1D(0.2),
    Conv1D(64, 5, activation='relu'),
    GlobalMaxPooling1D(),
    Dense(32, activation='relu'),
    Dropout(0.5),
    Dense(1, activation='sigmoid')
])

cnn_model.compile(
    optimizer=Adam(learning_rate=0.001),
    loss='binary_crossentropy',
    metrics=['accuracy']
)

cnn_model.summary()

cnn_trained, cnn_history, cnn_metrics = train_deep_model(
    cnn_model, "CNN", X_train_seq, X_val_seq, y_train, y_val
)
model_results['CNN'] = cnn_metrics

# 3. Bidirectional LSTM Model
print("\n3. Bidirectional LSTM Model")
bilstm_model = Sequential([
    Embedding(VOCAB_SIZE, EMBEDDING_DIM, input_length=MAX_SEQUENCE_LENGTH),
    SpatialDropout1D(0.2),
    Bidirectional(LSTM(64, dropout=0.2, recurrent_dropout=0.2)),
    Dense(32, activation='relu'),
    Dropout(0.5),
    Dense(1, activation='sigmoid')
])

bilstm_model.compile(
    optimizer=Adam(learning_rate=0.001),
    loss='binary_crossentropy',
    metrics=['accuracy']
)

bilstm_model.summary()

bilstm_trained, bilstm_history, bilstm_metrics = train_deep_model(
    bilstm_model, "Bidirectional LSTM", X_train_seq, X_val_seq, y_train, y_val
)
model_results['Bidirectional LSTM'] = bilstm_metrics

## 5. Model Comparison and Analysis

In [None]:
print("\nMODEL COMPARISON AND ANALYSIS")
print("="*60)

# Create comprehensive comparison DataFrame
comparison_df = pd.DataFrame(model_results).T

# Select key metrics for comparison
key_metrics = ['val_accuracy', 'val_f1', 'val_roc_auc', 'training_time']
comparison_subset = comparison_df[key_metrics].copy()

# Convert data type to float for metric columns
for col in key_metrics:
    comparison_subset[col] = pd.to_numeric(comparison_subset[col], errors='coerce')

print("Model Performance Summary:")
print("="*50)
print(comparison_subset.round(4))

# Find best model by validation accuracy
best_model_name = comparison_subset['val_accuracy'].idxmax()
print(f"\nBest model by validation accuracy: {best_model_name}")
print(f"Best validation accuracy: {comparison_subset.loc[best_model_name, 'val_accuracy']:.4f}")

# Visualization of model comparison
fig, axes = plt.subplots(2, 2, figsize=(15, 12))

# Validation Accuracy
axes[0,0].bar(comparison_subset.index, comparison_subset['val_accuracy'], color='skyblue')
axes[0,0].set_title('Validation Accuracy Comparison', fontweight='bold')
axes[0,0].set_ylabel('Accuracy')
axes[0,0].tick_params(axis='x', rotation=45)
axes[0,0].grid(axis='y', alpha=0.3)

# Validation F1-Score
axes[0,1].bar(comparison_subset.index, comparison_subset['val_f1'], color='lightgreen')
axes[0,1].set_title('Validation F1-Score Comparison', fontweight='bold')
axes[0,1].set_ylabel('F1-Score')
axes[0,1].tick_params(axis='x', rotation=45)
axes[0,1].grid(axis='y', alpha=0.3)

# Validation ROC-AUC
valid_roc_auc = comparison_subset['val_roc_auc'].dropna()
axes[1,0].bar(valid_roc_auc.index, valid_roc_auc.values, color='lightcoral')
axes[1,0].set_title('Validation ROC-AUC Comparison', fontweight='bold')
axes[1,0].set_ylabel('ROC-AUC')
axes[1,0].tick_params(axis='x', rotation=45)
axes[1,0].grid(axis='y', alpha=0.3)

# Training Time
axes[1,1].bar(comparison_subset.index, comparison_subset['training_time'], color='orange')
axes[1,1].set_title('Training Time Comparison', fontweight='bold')
axes[1,1].set_ylabel('Time (seconds)')
axes[1,1].tick_params(axis='x', rotation=45)
axes[1,1].grid(axis='y', alpha=0.3)

plt.tight_layout()
plt.savefig('results/plots/model_comparison.png', dpi=300, bbox_inches='tight')
plt.show()

# Performance vs Training Time scatter plot
plt.figure(figsize=(10, 6))
for model_name in comparison_subset.index:
    plt.scatter(
        comparison_subset.loc[model_name, 'training_time'],
        comparison_subset.loc[model_name, 'val_accuracy'],
        s=100, label=model_name
    )
    
plt.xlabel('Training Time (seconds)')
plt.ylabel('Validation Accuracy')
plt.title('Model Performance vs Training Time', fontweight='bold')
plt.legend()
plt.grid(alpha=0.3)
plt.savefig('results/plots/performance_vs_time.png', dpi=300, bbox_inches='tight')
plt.show()

## 6. Detailed Analysis of Best Models

In [None]:
print("\nDETAILED ANALYSIS OF BEST MODELS")
print("="*50)

# Ensure numeric data type for comparison_subset
for col in comparison_subset.columns:
    comparison_subset[col] = pd.to_numeric(comparison_subset[col], errors='coerce')

# Select top 3 models for detailed analysis
top_3_models = comparison_subset.nlargest(3, 'val_accuracy')
print("Top 3 models by validation accuracy:")
print(top_3_models[['val_accuracy', 'val_f1', 'training_time']])

# Learning curves for deep learning models (if available)
if 'lstm_history' in locals():
    fig, axes = plt.subplots(1, 3, figsize=(18, 5))
    
    # LSTM learning curves
    axes[0].plot(lstm_history.history['accuracy'], label='Train Accuracy', marker='o')
    axes[0].plot(lstm_history.history['val_accuracy'], label='Val Accuracy', marker='s')
    axes[0].set_title('LSTM Learning Curves', fontweight='bold')
    axes[0].set_xlabel('Epoch')
    axes[0].set_ylabel('Accuracy')
    axes[0].legend()
    axes[0].grid(alpha=0.3)
    
    # CNN learning curves
    axes[1].plot(cnn_history.history['accuracy'], label='Train Accuracy', marker='o')
    axes[1].plot(cnn_history.history['val_accuracy'], label='Val Accuracy', marker='s')
    axes[1].set_title('CNN Learning Curves', fontweight='bold')
    axes[1].set_xlabel('Epoch')
    axes[1].set_ylabel('Accuracy')
    axes[1].legend()
    axes[1].grid(alpha=0.3)
    
    # BiLSTM learning curves
    axes[2].plot(bilstm_history.history['accuracy'], label='Train Accuracy', marker='o')
    axes[2].plot(bilstm_history.history['val_accuracy'], label='Val Accuracy', marker='s')
    axes[2].set_title('Bidirectional LSTM Learning Curves', fontweight='bold')
    axes[2].set_xlabel('Epoch')
    axes[2].set_ylabel('Accuracy')
    axes[2].legend()
    axes[2].grid(alpha=0.3)
    
    plt.tight_layout()
    plt.savefig('results/plots/learning_curves.png', dpi=300, bbox_inches='tight')
    plt.show()

## 7. Final Model Evaluation on Test Set

In [None]:
print("FINAL MODEL EVALUATION ON TEST SET")
print("="*50)

# Check available models
print("Available models:", list(model_results.keys()))

# Evaluate best traditional ML model on test set
available_traditional = [model for model in ['Logistic Regression', 'Naive Bayes', 'Random Forest', 'Linear SVM'] 
                        if model in model_results.keys()]

if available_traditional:
    best_traditional = max(available_traditional, key=lambda x: model_results[x]['val_accuracy'])
    print(f"Best Traditional ML Model: {best_traditional}")
    
    # Get the trained model and evaluate on test set
    if best_traditional == 'Logistic Regression':
        best_trad_model = lr_trained
    elif best_traditional == 'Naive Bayes':
        best_trad_model = nb_trained
    elif best_traditional == 'Random Forest':
        best_trad_model = rf_trained
    elif best_traditional == 'Linear SVM':
        best_trad_model = linear_svm_trained
    
    # Test predictions
    y_test_pred_trad = best_trad_model.predict(X_test_tfidf)
    test_acc_trad = accuracy_score(y_test, y_test_pred_trad)
    test_f1_trad = f1_score(y_test, y_test_pred_trad)
    
    print(f"Test Accuracy: {test_acc_trad:.4f}")
    print(f"Test F1 Score: {test_f1_trad:.4f}")
    
    # Traditional ML confusion matrix
    cm_trad = confusion_matrix(y_test, y_test_pred_trad)
    print("Traditional ML Confusion Matrix:")
    print(cm_trad)
else:
    print("No traditional ML models found!")
    
print("\n" + "="*50)

# Deep Learning models
available_dl = [model for model in ['LSTM', 'BiLSTM', 'CNN'] 
               if model in model_results.keys()]

if available_dl:
    best_dl = max(available_dl, key=lambda x: model_results[x]['val_accuracy'])
    print(f"Best Deep Learning Model: {best_dl}")
    
    # Get the trained model
    if best_dl == 'LSTM':
        best_dl_model = lstm_trained
    elif best_dl == 'BiLSTM':
        best_dl_model = bilstm_trained
    elif best_dl == 'CNN':
        best_dl_model = cnn_trained
    
    # Test evaluation
    test_loss_dl, test_acc_dl = best_dl_model.evaluate(X_test_seq, y_test, verbose=0)
    y_test_pred_dl_proba = best_dl_model.predict(X_test_seq)
    y_test_pred_dl = (y_test_pred_dl_proba > 0.5).astype(int)
    test_f1_dl = f1_score(y_test, y_test_pred_dl)
    
    print(f"Test Loss: {test_loss_dl:.4f}")
    print(f"Test Accuracy: {test_acc_dl:.4f}")  
    print(f"Test F1 Score: {test_f1_dl:.4f}")
    
    # Deep Learning confusion matrix
    cm_dl = confusion_matrix(y_test, y_test_pred_dl)
    print("Deep Learning Confusion Matrix:")
    print(cm_dl)
else:
    print("No deep learning models found!")

# Overall comparison
print("\n" + "="*50)
print("OVERALL COMPARISON")
print("="*50)

# Find overall best model
overall_best = None
overall_best_acc = 0

for model_name, metrics in model_results.items():
    if metrics['val_accuracy'] > overall_best_acc:
        overall_best_acc = metrics['val_accuracy']
        overall_best = model_name

print(f"Overall Best Model: {overall_best}")
print(f"Best Validation Accuracy: {overall_best_acc:.4f}")

## 8. Save Models and Results

In [None]:
print("\nSAVING MODELS AND RESULTS")
print("="*50)

# Save traditional ML models
with open(f'models/saved_models/{best_traditional.lower().replace(" ", "_")}.pkl', 'wb') as f:
    pickle.dump(best_trad_model, f)

# Deep learning models are already saved via ModelCheckpoint callback

# Save comprehensive results
final_results = {
    'model_comparison': comparison_df.to_dict(),
    'best_traditional_model': best_traditional,
    'best_deep_learning_model': best_dl,
    'overall_best_model': overall_best,
    'test_results': {
        'traditional_ml': {
            'model': best_traditional,
            'test_accuracy': test_acc_trad,
            'test_f1': test_f1_trad
        },
        'deep_learning': {
            'model': best_dl,
            'test_accuracy': test_acc_dl,
            'test_f1': test_f1_dl
        }
    },
    'training_config': {
        'random_state': RANDOM_STATE,
        'batch_size': BATCH_SIZE,
        'epochs': EPOCHS,
        'embedding_dim': EMBEDDING_DIM
    }
}

with open('results/reports/training_results.pkl', 'wb') as f:
    pickle.dump(final_results, f)

# Save results as CSV for easy viewing
comparison_df.round(4).to_csv('results/reports/model_comparison.csv')

# Generate detailed report
report_text = f"""
IMDb Sentiment Analysis - Model Training Report
Generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}

=== DATASET SUMMARY ===
Training samples: {len(y_train):,}
Validation samples: {len(y_val):,}
Test samples: {len(y_test):,}
Features (TF-IDF): {X_train_tfidf.shape[1]:,}
Max sequence length: {MAX_SEQUENCE_LENGTH}
Vocabulary size: {VOCAB_SIZE:,}

=== MODEL PERFORMANCE SUMMARY ===
"""

for model_name in model_results:
    result = model_results[model_name]
    report_text += f"""
{model_name}:
  Validation Accuracy: {result['val_accuracy']:.4f}
  Validation F1-Score: {result['val_f1']:.4f}
  Training Time: {result['training_time']:.2f}s
"""

report_text += f"""
=== BEST MODEL SELECTION ===
Best Traditional ML: {best_traditional} (Val Acc: {model_results[best_traditional]['val_accuracy']:.4f})
Best Deep Learning: {best_dl} (Val Acc: {model_results[best_dl]['val_accuracy']:.4f})
Overall Best: {overall_best} (Test Acc: {overall_best_acc:.4f})

=== TEST SET EVALUATION ===
{best_traditional} Test Accuracy: {test_acc_trad:.4f}
{best_dl} Test Accuracy: {test_acc_dl:.4f}

=== RECOMMENDATIONS ===
1. The {overall_best} model achieved the highest test accuracy of {overall_best_acc:.4f}
2. Consider ensemble methods combining traditional ML and deep learning
3. Experiment with pre-trained embeddings (GloVe, Word2Vec) for potential improvements
4. Hyperparameter tuning could further improve performance
5. For production deployment, consider model size vs accuracy trade-offs
"""

## Conclusion

In [None]:
print("SAVING MODELS FOR WEB APP")
print("="*50)

# Create directory for saved models
import os
os.makedirs('../models/saved_models', exist_ok=True)

# Find best traditional ML model
print("Finding best traditional ML model...")
traditional_models = {name: metrics for name, metrics in model_results.items() 
                     if name in ['Logistic Regression', 'Naive Bayes', 'Random Forest', 'LinearSVC']}

if traditional_models:
    best_trad_name = max(traditional_models.keys(), key=lambda x: traditional_models[x]['val_accuracy'])
    print(f"Best traditional model: {best_trad_name} ({traditional_models[best_trad_name]['val_accuracy']:.4f})")
    
    # Get the trained model
    if best_trad_name == 'Logistic Regression':
        best_trad_model = lr_trained
    elif best_trad_name == 'Naive Bayes':
        best_trad_model = nb_trained
    elif best_trad_name == 'Random Forest':
        best_trad_model = rf_trained
    elif best_trad_name == 'LinearSVC':
        best_trad_model = linear_svm_trained
    
    # Save best traditional model
    with open('../models/saved_models/best_traditional_model.pkl', 'wb') as f:
        pickle.dump(best_trad_model, f)
    print(f"✓ Saved {best_trad_name} model")

# Find best deep learning model
print("\nFinding best deep learning model...")
dl_models = {name: metrics for name, metrics in model_results.items() 
            if name in ['LSTM', 'CNN', 'Bidirectional LSTM']}

if dl_models:
    best_dl_name = max(dl_models.keys(), key=lambda x: dl_models[x]['val_accuracy'])
    print(f"Best deep learning model: {best_dl_name} ({dl_models[best_dl_name]['val_accuracy']:.4f})")
    
    # Get the trained model
    if best_dl_name == 'LSTM':
        best_dl_model = lstm_trained
    elif best_dl_name == 'CNN':
        best_dl_model = cnn_trained
    elif best_dl_name == 'Bidirectional LSTM':
        best_dl_model = bilstm_trained
    
    # Save best deep learning model
    best_dl_model.save('../models/saved_models/best_dl_model.h5')
    print(f"✓ Saved {best_dl_name} model")

print("\n" + "="*50)
print("MODELS SAVED SUCCESSFULLY!")
print("You can now run the web app with:")
print("  cd app")
print("  python app.py")
print("="*50)

## Model Training Conclusion
Successfully trained and compared multiple ML approaches for sentiment analysis, achieving strong performance across both traditional and deep learning models. Traditional ML delivered excellent results with Logistic Regression (89.66% validation accuracy) and Linear SVM (88.66%), while deep learning models showed competitive performance with CNN architecture leading at 90.34% validation accuracy.

The comprehensive comparison revealed that both approaches are highly effective for this sentiment classification task, with deep learning providing marginal improvement over traditional methods. All best-performing models have been automatically saved and are ready for deployment in the web application.