# Emotion Detection with Machine Learning

This notebook demonstrates how to build emotion classification models using text data. We'll train multiple models and evaluate their performance in classifying emotions like Happy, Sad, Angry, Neutral, Fear, and Surprise.

## Table of Contents
1. [Setup and Imports](#setup)
2. [Data Loading and Exploration](#data)
3. [Data Preprocessing](#preprocessing)
4. [Model Training](#training)
5. [Model Evaluation](#evaluation)
6. [Confusion Matrix Analysis](#confusion)
7. [Making Predictions](#predictions)
8. [Model Comparison](#comparison)
9. [Error Analysis](#errors)

## 1. Setup and Imports <a id="setup"></a>

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import warnings
warnings.filterwarnings('ignore')

# Set plotting style
plt.style.use('default')
sns.set_palette("husl")

# Import our custom classes
from emotion_classifier import EmotionClassifier
from data_processor import DataProcessor
from model_evaluator import ModelEvaluator
from utils import *

print("All libraries imported successfully!")
print("\nChecking system requirements...")
requirements = check_system_requirements()
for package, available in requirements.items():
    status = "✓" if available else "✗"
    print(f"{status} {package}")

## 2. Data Loading and Exploration <a id="data"></a>

In [None]:
def create_sample_dataset():
    """Create a sample emotion dataset for demonstration"""
    emotions = ['Happy', 'Sad', 'Angry', 'Neutral', 'Fear', 'Surprise']
    
    # Sample texts for each emotion
    sample_texts = {
        'Happy': [
            "I'm so excited about this amazing opportunity!",
            "What a beautiful sunny day, feeling fantastic!",
            "Just got the best news ever, I'm thrilled!",
            "Love spending time with my family and friends",
            "This is the best day of my life!",
            "I feel so grateful and blessed today",
            "Amazing performance, I'm so proud!",
            "Everything is going perfectly, I'm delighted!",
            "Celebrating this wonderful achievement!",
            "Feeling joy and happiness all around me"
        ],
        'Sad': [
            "I'm feeling really down today",
            "This is such a disappointing situation",
            "I miss my old friends so much",
            "Feeling lonely and isolated lately",
            "Nothing seems to be going right",
            "I'm having a really tough time",
            "This news made me feel so heartbroken",
            "I feel empty and lost right now",
            "Tears are flowing down my face",
            "Everything feels hopeless and dark"
        ],
        'Angry': [
            "This is absolutely infuriating!",
            "I can't believe how unfair this is",
            "I'm so frustrated with this situation",
            "This makes my blood boil!",
            "I'm outraged by this behavior",
            "This is completely unacceptable!",
            "I'm fed up with all these problems",
            "This injustice makes me furious!",
            "I'm livid about what happened",
            "This disrespect is making me mad"
        ],
        'Neutral': [
            "The weather today is partly cloudy",
            "I need to go to the grocery store",
            "The meeting is scheduled for 3 PM",
            "Please review the attached document",
            "The report is due next Friday",
            "I'll be working from home tomorrow",
            "The conference will be held virtually",
            "Please confirm your attendance",
            "The office hours are 9 AM to 5 PM",
            "The project deadline is next month"
        ],
        'Fear': [
            "I'm terrified of what might happen",
            "This situation makes me very anxious",
            "I'm worried about the future",
            "I feel scared and uncertain",
            "This gives me chills down my spine",
            "I'm afraid things will get worse",
            "The thought of this terrifies me",
            "I'm nervous about the outcome",
            "This makes me feel vulnerable and helpless",
            "I'm trembling with fear right now"
        ],
        'Surprise': [
            "I can't believe this actually happened!",
            "What a shocking turn of events!",
            "I never expected this to occur",
            "This is completely unexpected!",
            "I'm amazed by this revelation",
            "What a surprising discovery!",
            "This caught me completely off guard",
            "I'm stunned by this news!",
            "Wow, this is absolutely incredible!",
            "I'm blown away by this surprise"
        ]
    }
    
    # Create DataFrame
    data = []
    for emotion, texts in sample_texts.items():
        for text in texts:
            data.append({'text': text, 'emotion': emotion})
    
    return pd.DataFrame(data)

# Load the sample dataset
df = create_sample_dataset()

print(f"Dataset loaded successfully!")
print(f"Total samples: {len(df)}")
print(f"Unique emotions: {df['emotion'].nunique()}")
print(f"\nFirst few rows:")
df.head()

In [None]:
# Data exploration
print("=== Dataset Statistics ===")
print(f"Total samples: {len(df)}")
print(f"Unique emotions: {df['emotion'].nunique()}")
print(f"Average text length: {df['text'].str.len().mean():.1f} characters")
print(f"Min text length: {df['text'].str.len().min()} characters")
print(f"Max text length: {df['text'].str.len().max()} characters")

print("\n=== Emotion Distribution ===")
emotion_counts = df['emotion'].value_counts()
print(emotion_counts)

# Visualize emotion distribution
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Bar plot
emotion_counts.plot(kind='bar', ax=ax1, color='skyblue')
ax1.set_title('Emotion Distribution')
ax1.set_xlabel('Emotion')
ax1.set_ylabel('Count')
ax1.tick_params(axis='x', rotation=45)

# Pie chart
ax2.pie(emotion_counts.values, labels=emotion_counts.index, autopct='%1.1f%%')
ax2.set_title('Emotion Distribution (Percentage)')

plt.tight_layout()
plt.show()

In [None]:
# Text length analysis
df['text_length'] = df['text'].str.len()
df['word_count'] = df['text'].str.split().str.len()

# Box plots for text length by emotion
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Text length distribution by emotion
df.boxplot(column='text_length', by='emotion', ax=ax1)
ax1.set_title('Text Length Distribution by Emotion')
ax1.set_xlabel('Emotion')
ax1.set_ylabel('Text Length (characters)')

# Word count distribution by emotion
df.boxplot(column='word_count', by='emotion', ax=ax2)
ax2.set_title('Word Count Distribution by Emotion')
ax2.set_xlabel('Emotion')
ax2.set_ylabel('Word Count')

plt.tight_layout()
plt.show()

# Summary statistics
print("\n=== Text Length Statistics by Emotion ===")
text_stats = df.groupby('emotion')['text_length'].agg(['mean', 'std', 'min', 'max']).round(2)
print(text_stats)

## 3. Data Preprocessing <a id="preprocessing"></a>

In [None]:
# Initialize data processor
processor = DataProcessor()

# Show original vs processed text examples
print("=== Text Preprocessing Examples ===")
sample_texts = df['text'].head(5).tolist()

for i, text in enumerate(sample_texts):
    processed = processor.preprocess_text(text)
    print(f"\nExample {i+1}:")
    print(f"Original:  {text}")
    print(f"Processed: {processed}")

In [None]:
# Preprocess the entire dataset
print("Preprocessing the dataset...")
df_processed = processor.preprocess_data(df.copy())

print(f"\nDataset size before preprocessing: {len(df)}")
print(f"Dataset size after preprocessing: {len(df_processed)}")

# Show the processed dataset
print("\n=== Processed Dataset Sample ===")
print(df_processed.head(10))

# Get text statistics
stats = processor.get_text_statistics(df_processed)
print("\n=== Text Statistics (After Preprocessing) ===")
for key, value in stats.items():
    print(f"{key}: {value:.2f}" if isinstance(value, float) else f"{key}: {value}")

## 4. Model Training <a id="training"></a>

In [None]:
# Initialize the emotion classifier
classifier = EmotionClassifier()

# Training parameters
training_params = {
    'test_size': 0.2,
    'random_state': 42,
    'max_features': 5000,
    'ngram_range': (1, 2),
    'models_to_train': ["Naive Bayes", "SVM", "Random Forest", "Logistic Regression"]
}

print("=== Training Parameters ===")
for param, value in training_params.items():
    print(f"{param}: {value}")

print("\n=== Starting Model Training ===")
print("This may take a few minutes...")

# Train the models
training_results = classifier.train_models(df_processed, **training_params)

print("\n=== Training Results ===")
results_df = pd.DataFrame(training_results).T
results_df = results_df.sort_values('accuracy', ascending=False)
print(results_df)

# Find best model
best_model = results_df.index[0]
best_accuracy = results_df.loc[best_model, 'accuracy']
print(f"\n🏆 Best Model: {best_model} (Accuracy: {best_accuracy:.4f})")

In [None]:
# Visualize training results
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Accuracy comparison
models = list(training_results.keys())
accuracies = [training_results[model]['accuracy'] for model in models]
training_times = [training_results[model]['training_time'] for model in models]

bars1 = ax1.bar(models, accuracies, color='lightblue', alpha=0.7)
ax1.set_title('Model Accuracy Comparison')
ax1.set_ylabel('Accuracy')
ax1.set_ylim(0, 1)
ax1.tick_params(axis='x', rotation=45)

# Add accuracy values on bars
for bar, acc in zip(bars1, accuracies):
    ax1.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.01,
             f'{acc:.3f}', ha='center', va='bottom')

# Training time comparison
bars2 = ax2.bar(models, training_times, color='lightcoral', alpha=0.7)
ax2.set_title('Training Time Comparison')
ax2.set_ylabel('Training Time (seconds)')
ax2.tick_params(axis='x', rotation=45)

# Add time values on bars
for bar, time in zip(bars2, training_times):
    ax2.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.01,
             f'{time:.2f}s', ha='center', va='bottom')

plt.tight_layout()
plt.show()

## 5. Model Evaluation <a id="evaluation"></a>

In [None]:
# Initialize model evaluator
evaluator = ModelEvaluator(classifier)

# Evaluate all models
print("=== Detailed Model Evaluation ===")
all_evaluations = {}

for model_name in training_results.keys():
    print(f"\n--- Evaluating {model_name} ---")
    evaluation = evaluator.evaluate_model(model_name)
    all_evaluations[model_name] = evaluation
    
    print(f"Accuracy:  {evaluation['accuracy']:.4f}")
    print(f"Precision: {evaluation['precision']:.4f}")
    print(f"Recall:    {evaluation['recall']:.4f}")
    print(f"F1-Score:  {evaluation['f1_score']:.4f}")

In [None]:
# Compare all models
comparison_df = evaluator.compare_models()
print("=== Model Comparison Summary ===")
print(comparison_df)

# Visualize model comparison
fig, axes = plt.subplots(2, 2, figsize=(15, 12))
axes = axes.ravel()

metrics = ['Accuracy', 'Precision', 'Recall', 'F1-Score']
colors = ['lightblue', 'lightgreen', 'lightcoral', 'lightyellow']

for i, metric in enumerate(metrics):
    bars = axes[i].bar(comparison_df['Model'], comparison_df[metric], 
                      color=colors[i], alpha=0.7)
    axes[i].set_title(f'{metric} Comparison')
    axes[i].set_ylabel(metric)
    axes[i].set_ylim(0, 1)
    axes[i].tick_params(axis='x', rotation=45)
    
    # Add values on bars
    for bar, value in zip(bars, comparison_df[metric]):
        axes[i].text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.01,
                    f'{value:.3f}', ha='center', va='bottom')

plt.tight_layout()
plt.show()

## 6. Confusion Matrix Analysis <a id="confusion"></a>

In [None]:
# Plot confusion matrices for all models
n_models = len(training_results)
fig, axes = plt.subplots(2, 2, figsize=(16, 12))
axes = axes.ravel()

for i, (model_name, evaluation) in enumerate(all_evaluations.items()):
    cm = evaluation['confusion_matrix']
    labels = evaluation['labels']
    
    # Normalize confusion matrix
    cm_normalized = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
    
    # Plot confusion matrix
    im = axes[i].imshow(cm_normalized, interpolation='nearest', cmap='Blues')
    axes[i].set_title(f'Confusion Matrix - {model_name}\n(Normalized)')
    
    # Add labels
    axes[i].set_xticks(range(len(labels)))
    axes[i].set_yticks(range(len(labels)))
    axes[i].set_xticklabels(labels, rotation=45)
    axes[i].set_yticklabels(labels)
    axes[i].set_xlabel('Predicted')
    axes[i].set_ylabel('Actual')
    
    # Add text annotations
    for j in range(len(labels)):
        for k in range(len(labels)):
            text = axes[i].text(k, j, f'{cm_normalized[j, k]:.2f}',
                               ha="center", va="center", color="black")
    
    # Add colorbar
    plt.colorbar(im, ax=axes[i], fraction=0.046, pad=0.04)

plt.tight_layout()
plt.show()

In [None]:
# Detailed classification report for the best model
best_model_name = comparison_df.iloc[0]['Model']
print(f"=== Detailed Classification Report for {best_model_name} ===")

detailed_report = evaluator.get_detailed_classification_report(best_model_name)
print(detailed_report)

# Per-emotion performance
emotion_performance = evaluator.get_per_emotion_performance(best_model_name)
print(f"\n=== Per-Emotion Performance ({best_model_name}) ===")
print(emotion_performance)

## 7. Making Predictions <a id="predictions"></a>

In [None]:
# Test predictions with sample texts
test_texts = [
    "I'm absolutely thrilled about this wonderful news!",
    "This situation is making me really upset and angry",
    "I feel so lonely and depressed today",
    "The meeting is scheduled for tomorrow at 2 PM",
    "I'm terrified about what might happen next",
    "Wow, I never saw that coming!"
]

expected_emotions = ['Happy', 'Angry', 'Sad', 'Neutral', 'Fear', 'Surprise']

print("=== Prediction Examples ===")
print(f"Using best model: {best_model_name}\n")

predictions_results = []

for i, text in enumerate(test_texts):
    prediction, probabilities = classifier.predict_single(text, best_model_name)
    
    print(f"Text {i+1}: {text}")
    print(f"Predicted: {prediction}")
    print(f"Expected:  {expected_emotions[i]}")
    print(f"Correct:   {'✓' if prediction == expected_emotions[i] else '✗'}")
    
    # Show top 3 probabilities
    sorted_probs = sorted(probabilities.items(), key=lambda x: x[1], reverse=True)
    print("Top 3 predictions:")
    for emotion, prob in sorted_probs[:3]:
        print(f"  {emotion}: {prob:.3f}")
    print("-" * 50)
    
    predictions_results.append({
        'text': text,
        'predicted': prediction,
        'expected': expected_emotions[i],
        'correct': prediction == expected_emotions[i],
        'confidence': max(probabilities.values())
    })

# Summary
correct_predictions = sum(1 for r in predictions_results if r['correct'])
accuracy = correct_predictions / len(predictions_results)
print(f"\nPrediction Accuracy: {accuracy:.2%} ({correct_predictions}/{len(predictions_results)})")

In [None]:
# Interactive prediction function
def predict_emotion_interactive(text, model_name=None):
    """Make prediction and show detailed results"""
    if model_name is None:
        model_name = best_model_name
    
    prediction, probabilities = classifier.predict_single(text, model_name)
    
    print(f"Input Text: {text}")
    print(f"Predicted Emotion: {prediction}")
    print(f"\nConfidence Scores:")
    
    # Sort probabilities
    sorted_probs = sorted(probabilities.items(), key=lambda x: x[1], reverse=True)
    
    for emotion, prob in sorted_probs:
        bar = '█' * int(prob * 20)
        print(f"  {emotion:10}: {prob:.3f} {bar}")
    
    return prediction, probabilities

# Example usage
print("=== Interactive Prediction Example ===")
sample_text = "I can't believe I won the lottery! This is incredible!"
predict_emotion_interactive(sample_text)

## 8. Model Comparison <a id="comparison"></a>

In [None]:
# Compare predictions across all models for the same text
comparison_text = "I'm really worried and scared about the future"

print(f"=== Model Comparison for Text ===")
print(f"Text: {comparison_text}\n")

model_predictions = {}

for model_name in training_results.keys():
    prediction, probabilities = classifier.predict_single(comparison_text, model_name)
    model_predictions[model_name] = {
        'prediction': prediction,
        'confidence': max(probabilities.values()),
        'probabilities': probabilities
    }
    
    print(f"{model_name}:")
    print(f"  Prediction: {prediction}")
    print(f"  Confidence: {max(probabilities.values()):.3f}")
    print()

# Visualize prediction comparison
emotions = list(classifier.emotion_labels)
n_models = len(model_predictions)

fig, ax = plt.subplots(figsize=(12, 8))

x = np.arange(len(emotions))
width = 0.2

for i, (model_name, results) in enumerate(model_predictions.items()):
    probs = [results['probabilities'][emotion] for emotion in emotions]
    ax.bar(x + i * width, probs, width, label=model_name, alpha=0.8)

ax.set_xlabel('Emotions')
ax.set_ylabel('Probability')
ax.set_title(f'Model Predictions Comparison\nText: "{comparison_text}"')
ax.set_xticks(x + width * (n_models - 1) / 2)
ax.set_xticklabels(emotions)
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 9. Error Analysis <a id="errors"></a>

In [None]:
# Analyze misclassified examples
print(f"=== Error Analysis for {best_model_name} ===")

misclassified = evaluator.get_misclassified_examples(best_model_name, num_examples=10)

if len(misclassified) > 0:
    print(f"Found {len(misclassified)} misclassified examples:\n")
    
    for i, row in misclassified.iterrows():
        print(f"Example {i+1}:")
        print(f"  Text: {row['Text']}")
        print(f"  True Emotion: {row['True_Emotion']}")
        print(f"  Predicted: {row['Predicted_Emotion']}")
        print()
else:
    print("No misclassified examples found! Perfect classification.")

# Error pattern analysis
evaluation = all_evaluations[best_model_name]
y_true = evaluation['y_true']
y_pred = evaluation['y_pred']

# Create error matrix
error_matrix = pd.crosstab(pd.Series(y_true, name='Actual'), 
                          pd.Series(y_pred, name='Predicted'), 
                          margins=True)

print(f"\n=== Error Matrix ({best_model_name}) ===")
print(error_matrix)

In [None]:
# Model performance summary
performance_summary = evaluator.get_model_performance_summary()

print("=== Final Model Performance Summary ===")
print(f"Best Model: {performance_summary['best_model']['name']}")
print(f"Best Accuracy: {performance_summary['best_model']['accuracy']:.4f}")

print("\nAll Models Performance:")
summary_df = pd.DataFrame({k: v for k, v in performance_summary.items() 
                          if k != 'best_model'}).T
summary_df = summary_df.sort_values('accuracy', ascending=False)
print(summary_df)

## Conclusion

This notebook demonstrated a complete emotion classification pipeline:

1. **Data Loading**: Created a balanced dataset with 6 emotion categories
2. **Preprocessing**: Cleaned and processed text data using NLTK
3. **Model Training**: Trained 4 different ML models with TF-IDF features
4. **Evaluation**: Comprehensive evaluation with multiple metrics
5. **Analysis**: Confusion matrices, error analysis, and model comparison

### Key Findings:
- Best performing model achieved **X.XX%** accuracy
- Some emotions are easier to classify than others
- Text preprocessing significantly impacts performance
- Different models have different strengths and weaknesses

### Next Steps:
1. Try more advanced models (BERT, RoBERTa)
2. Collect more training data
3. Implement ensemble methods
4. Add more sophisticated text features
5. Deploy the model as a web service

In [None]:
# Save the trained models and results
print("=== Saving Results ===")

# Save models
classifier.save_models('emotion_classifier_models.pkl')
print("✓ Models saved to 'emotion_classifier_models.pkl'")

# Save evaluation results
results_filename = save_results_to_csv(performance_summary)
print(f"✓ Results saved to '{results_filename}'")

# Export model summary
summary_filename = export_model_summary(classifier)
print(f"✓ Model summary saved to '{summary_filename}'")

print("\nAll results have been saved successfully!")