# 04 - Market Direction Classification

This notebook demonstrates **Classification** - predicting whether the market will go UP or DOWN based on geopolitical event features.

---

## What is Classification?

Classification is a **supervised learning** task where we predict a categorical outcome:

- **Input**: Event features (Goldstein scale, mentions, tone, etc.)
- **Output**: Binary prediction (UP or DOWN)

Unlike regression (which predicts a continuous value like return magnitude), classification focuses on **direction**.

---

## Why Logistic Regression?

We use **Logistic Regression** because it's:

| Property | Why It Matters |
|----------|----------------|
| **Interpretable** | Each coefficient tells you feature importance |
| **Probabilistic** | Gives confidence (60% UP vs 99% UP) |
| **Works with small data** | Doesn't need millions of samples |
| **Industry standard** | Used at major trading firms |
| **Fast to train** | Can retrain daily |

---

## Key Concepts You'll Learn

1. **Sigmoid function**: Maps any value to probability [0, 1]
2. **Decision boundary**: Where probability crosses 50%
3. **Cross-validation**: Robust performance estimation
4. **Confusion matrix**: Understanding prediction errors
5. **Precision vs Recall**: Trade-offs in classification

---

We'll compare our **learning version** (manual gradient descent) with **production version** (sklearn).

In [None]:
# ═══════════════════════════════════════════════════════════════════════════════
# IMPORTS
# ═══════════════════════════════════════════════════════════════════════════════
#
# Standard imports for ML notebooks:
#   - sys/pathlib: For path manipulation
#   - datetime: For date handling
#   - pandas: Data manipulation
#   - numpy: Numerical operations
#   - matplotlib/seaborn: Visualization
#
# We'll also import sklearn for the production version.
#
# ═══════════════════════════════════════════════════════════════════════════════

import sys
from pathlib import Path
from datetime import date, timedelta

# Add project root to path
project_root = Path.cwd().parent
sys.path.insert(0, str(project_root))

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

plt.style.use('seaborn-v0_8-whitegrid')
%matplotlib inline

print("Imports successful!")

## 1. Understanding Logistic Regression

### The Problem with Linear Regression

If we used linear regression to predict UP (1) or DOWN (0), we'd get predictions like:
- y = 1.5 (impossible - can't be >100% UP)
- y = -0.3 (impossible - can't be negative probability)

### The Sigmoid Solution

The **sigmoid function** squashes any value into the range [0, 1]:

$$\sigma(z) = \frac{1}{1 + e^{-z}}$$

Where:
- z = w₀ + w₁x₁ + w₂x₂ + ... (linear combination of features)
- σ(z) = probability of class 1 (UP)

### The Decision Rule

- If σ(z) > 0.5 → Predict UP
- If σ(z) ≤ 0.5 → Predict DOWN

In [None]:
# ═══════════════════════════════════════════════════════════════════════════════
# SIGMOID FUNCTION VISUALIZATION
# ═══════════════════════════════════════════════════════════════════════════════
#
# The sigmoid function is the heart of logistic regression.
#
# LEFT PLOT - The Sigmoid Curve:
#   - X-axis: z (the linear combination w₀ + w₁x₁ + ...)
#   - Y-axis: Probability of UP
#   - At z=0: probability = 0.5 (the decision boundary)
#   - As z → ∞: probability → 1
#   - As z → -∞: probability → 0
#
# RIGHT PLOT - Decision Boundary in 2D:
#   - Shows how logistic regression separates two classes
#   - The line is where probability = 0.5
#   - Points on one side → UP, other side → DOWN
#
# ═══════════════════════════════════════════════════════════════════════════════

def sigmoid(z):
    """The sigmoid (logistic) function.
    
    Maps any real number to the range (0, 1).
    This is how we convert a linear combination into a probability.
    
    Properties:
    - sigmoid(0) = 0.5
    - sigmoid(-∞) → 0
    - sigmoid(+∞) → 1
    - Derivative: sigmoid(z) * (1 - sigmoid(z)) - used in gradient descent
    """
    return 1 / (1 + np.exp(-z))

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# ─── LEFT: Sigmoid Curve ───
z = np.linspace(-6, 6, 100)
ax = axes[0]

ax.plot(z, sigmoid(z), linewidth=3, color='steelblue', label='σ(z) = 1/(1+e⁻ᶻ)')
ax.axhline(y=0.5, color='red', linestyle='--', linewidth=2, label='Decision boundary (p=0.5)')
ax.axvline(x=0, color='gray', linestyle='--', alpha=0.5)

# Shade the regions
ax.fill_between(z[z < 0], 0, sigmoid(z[z < 0]), alpha=0.2, color='red', label='Predict DOWN')
ax.fill_between(z[z >= 0], sigmoid(z[z >= 0]), 1, alpha=0.2, color='green', label='Predict UP')

ax.set_xlabel('z = w₀ + w₁x₁ + w₂x₂ + ...', fontsize=12)
ax.set_ylabel('Probability of UP', fontsize=12)
ax.set_title('The Sigmoid Function\n(Converts any z to probability)', fontsize=12)
ax.legend(loc='right')
ax.set_ylim(-0.1, 1.1)

# Add annotations
ax.annotate('High confidence\nUP', xy=(4, 0.98), fontsize=10, ha='center', color='green')
ax.annotate('High confidence\nDOWN', xy=(-4, 0.02), fontsize=10, ha='center', color='red')
ax.annotate('Uncertain\n(near 50%)', xy=(0, 0.5), xytext=(2, 0.3),
           arrowprops=dict(arrowstyle='->', color='gray'), fontsize=10)

# ─── RIGHT: 2D Decision Boundary ───
ax = axes[1]
np.random.seed(42)

# Generate sample data: UP days and DOWN days
up_points = np.random.randn(50, 2) + [1.5, 1.5]   # Cluster in top-right
down_points = np.random.randn(50, 2) + [-1.5, -1.5]  # Cluster in bottom-left

ax.scatter(up_points[:, 0], up_points[:, 1], c='green', label='UP days', alpha=0.6, s=60)
ax.scatter(down_points[:, 0], down_points[:, 1], c='red', label='DOWN days', alpha=0.6, s=60)

# Decision boundary (approximate line where P(UP) = 0.5)
x_line = np.linspace(-4, 4, 100)
ax.plot(x_line, -x_line, 'k--', linewidth=2, label='Decision boundary')

# Shade regions
ax.fill_between(x_line, -x_line, 5, alpha=0.1, color='green')
ax.fill_between(x_line, -x_line, -5, alpha=0.1, color='red')

ax.set_xlabel('Feature 1 (e.g., Average Goldstein Scale)', fontsize=11)
ax.set_ylabel('Feature 2 (e.g., Total Mentions)', fontsize=11)
ax.set_title('Logistic Regression Classification\n(Separating UP from DOWN days)', fontsize=12)
ax.legend()
ax.set_xlim(-4, 4)
ax.set_ylim(-4, 4)

plt.tight_layout()
plt.show()

print("\nHow Logistic Regression Makes Predictions:")
print("═" * 60)
print("")
print("1. COMPUTE LINEAR COMBINATION:")
print("   z = w₀ + w₁(goldstein) + w₂(mentions) + w₃(tone) + ...")
print("")
print("2. APPLY SIGMOID:")
print("   P(UP) = 1 / (1 + e⁻ᶻ)")
print("")
print("3. MAKE DECISION:")
print("   If P(UP) > 0.5 → Predict UP")
print("   If P(UP) ≤ 0.5 → Predict DOWN")

## 2. Learning Version: Manual Implementation

Our **learning version** implements logistic regression from scratch using **gradient descent**.

### Gradient Descent Algorithm

1. Initialize weights randomly
2. For each iteration:
   - Make predictions using current weights
   - Calculate the loss (how wrong are we?)
   - Compute gradients (which direction improves loss?)
   - Update weights in that direction
3. Repeat until convergence

### The Loss Function

We use **log loss** (binary cross-entropy):

$$L = -\frac{1}{n}\sum[y\log(p) + (1-y)\log(1-p)]$$

This penalizes confident wrong predictions more than uncertain ones.

In [None]:
# ═══════════════════════════════════════════════════════════════════════════════
# LEARNING VERSION: MarketClassifier
# ═══════════════════════════════════════════════════════════════════════════════
#
# Our educational implementation shows the complete training process:
#
# Key parameters:
#   - learning_rate (default 0.01):
#     * How big steps we take during gradient descent
#     * Too high → overshooting, too low → slow convergence
#     * 0.01 is a safe starting point
#
#   - max_iterations (default 1000):
#     * Maximum training epochs
#     * Usually converges before this
#
#   - regularization (default 0.01):
#     * L2 regularization strength (λ)
#     * Prevents overfitting by penalizing large weights
#     * Higher → simpler model, lower → more complex model
#
# ═══════════════════════════════════════════════════════════════════════════════

from src.analysis.classification import MarketClassifier, explain_classification

# Create classifier with explicit parameters
classifier = MarketClassifier(
    learning_rate=0.01,      # Step size for gradient descent
    max_iterations=1000,     # Maximum training epochs
    regularization=0.01,     # L2 penalty (prevents overfitting)
)

print("Classifier Configuration")
print("=" * 50)
print(f"  Learning rate (α): {classifier.learning_rate}")
print(f"  Max iterations: {classifier.max_iterations}")
print(f"  Regularization (λ): {classifier.regularization}")
print()
print("What these mean:")
print(f"  • α = {classifier.learning_rate}: Each step adjusts weights by α × gradient")
print(f"  • λ = {classifier.regularization}: Adds λ×||w||² penalty to prevent overfitting")
print(f"  • Stop after {classifier.max_iterations} iterations or when converged")

In [None]:
# ═══════════════════════════════════════════════════════════════════════════════
# TRAINING THE CLASSIFIER
# ═══════════════════════════════════════════════════════════════════════════════
#
# The train() method:
#   1. Fetches event and market data for the symbol
#   2. Prepares features (Goldstein, mentions, tone, etc.)
#   3. Creates labels (1 = UP, 0 = DOWN based on return)
#   4. Runs gradient descent to learn weights
#   5. Returns metrics (accuracy, precision, recall, F1)
#
# We use 6 months of data to have enough training samples.
#
# verbose=True shows the training progress.
#
# ═══════════════════════════════════════════════════════════════════════════════

# Define date range (6 months of data)
end_date = date.today()
start_date = end_date - timedelta(days=180)

print(f"Training classifier for CL=F (Crude Oil)")
print(f"Date range: {start_date} to {end_date}")
print("=" * 60)

# Train with verbose output
metrics = classifier.train('CL=F', start_date, end_date, verbose=True)

if metrics:
    print("\nTraining Complete!")
    print(f"  Training samples: {metrics.n_samples}")
    print(f"  Accuracy: {metrics.accuracy:.2%}")
else:
    print("\nTraining failed - insufficient data.")
    print("Ensure the database has both event and market data.")

In [None]:
# ═══════════════════════════════════════════════════════════════════════════════
# TRAINING HISTORY VISUALIZATION
# ═══════════════════════════════════════════════════════════════════════════════
#
# The learning version tracks training progress:
#   - Loss: How wrong are our predictions? (should decrease)
#   - Coefficient change: Are weights still updating? (should decrease)
#
# LEFT PLOT - Loss Over Iterations:
#   - Should show a decreasing curve
#   - Flat at the end = converged
#   - Increasing = something wrong (learning rate too high?)
#
# RIGHT PLOT - Convergence:
#   - Shows the max change in any coefficient per iteration
#   - When this gets small, we've converged
#
# ═══════════════════════════════════════════════════════════════════════════════

if classifier.training_history:
    history_df = pd.DataFrame(classifier.training_history)
    
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    # ─── LEFT: Loss Over Iterations ───
    axes[0].plot(history_df['iteration'], history_df['loss'], linewidth=2, color='steelblue')
    axes[0].set_xlabel('Iteration', fontsize=12)
    axes[0].set_ylabel('Log Loss', fontsize=12)
    axes[0].set_title('Training Loss\n(Lower = Better Fit)', fontsize=12)
    
    # Mark initial and final loss
    axes[0].scatter([0], [history_df['loss'].iloc[0]], color='red', s=100, zorder=5, label=f'Initial: {history_df["loss"].iloc[0]:.4f}')
    axes[0].scatter([len(history_df)-1], [history_df['loss'].iloc[-1]], color='green', s=100, zorder=5, label=f'Final: {history_df["loss"].iloc[-1]:.4f}')
    axes[0].legend()
    
    # ─── RIGHT: Coefficient Changes ───
    axes[1].plot(history_df['iteration'], history_df['coefficient_change'], linewidth=2, color='orange')
    axes[1].set_xlabel('Iteration', fontsize=12)
    axes[1].set_ylabel('Max Coefficient Change', fontsize=12)
    axes[1].set_title('Convergence\n(Smaller = More Stable)', fontsize=12)
    axes[1].set_yscale('log')  # Log scale to see small changes
    
    plt.tight_layout()
    plt.show()
    
    print("\nTraining Analysis:")
    print("─" * 50)
    print(f"  Total iterations: {len(history_df)}")
    print(f"  Loss reduction: {history_df['loss'].iloc[0]:.4f} → {history_df['loss'].iloc[-1]:.4f}")
    print(f"  Improvement: {(1 - history_df['loss'].iloc[-1]/history_df['loss'].iloc[0])*100:.1f}%")
else:
    print("No training history available.")

In [None]:
# ═══════════════════════════════════════════════════════════════════════════════
# FEATURE IMPORTANCE
# ═══════════════════════════════════════════════════════════════════════════════
#
# One of logistic regression's biggest advantages: INTERPRETABILITY.
#
# Each coefficient tells you:
#   - Magnitude: How important is this feature?
#   - Sign: Does it increase (+) or decrease (-) P(UP)?
#
# Features used:
#   - goldstein_mean: Average Goldstein score (conflict vs cooperation)
#   - goldstein_min: Worst event of the day
#   - goldstein_max: Best event of the day
#   - mentions_total: Total media mentions
#   - avg_tone: Average media tone
#   - conflict_count: Number of conflict events
#   - cooperation_count: Number of cooperation events
#
# ═══════════════════════════════════════════════════════════════════════════════

if classifier.is_trained:
    importance = classifier.get_feature_importance('CL=F')
    
    print("Feature Importance (Absolute Coefficient Values)")
    print("=" * 60)
    print()
    print(f"{'Feature':<25} {'Importance':>12} {'Effect on P(UP)':>18}")
    print("─" * 60)
    
    for name, imp in sorted(importance.items(), key=lambda x: -x[1]):
        if name == 'intercept':
            continue
        # Get actual coefficient for direction
        coef = classifier.symbol_models['CL=F'][classifier.feature_names.index(name)]
        direction = '↑ increases' if coef > 0 else '↓ decreases'
        print(f"  {name:<23} {imp:>10.4f}   {direction}")
    
    print("─" * 60)
    print()
    print("Interpretation:")
    print("  • Higher importance = stronger predictor")
    print("  • ↑ = positive coefficient (increases probability of UP)")
    print("  • ↓ = negative coefficient (decreases probability of UP)")
else:
    print("Classifier not trained.")

In [None]:
# ═══════════════════════════════════════════════════════════════════════════════
# MAKING A PREDICTION
# ═══════════════════════════════════════════════════════════════════════════════
#
# Let's make a prediction for a hypothetical day.
#
# The predict() method:
#   1. Takes a dictionary of feature values
#   2. Computes z = w₀ + w₁x₁ + w₂x₂ + ...
#   3. Applies sigmoid to get P(UP)
#   4. Returns prediction, probability, and confidence level
#
# Confidence levels:
#   - LOW: 50-60% (nearly random)
#   - MEDIUM: 60-75% (some signal)
#   - HIGH: 75%+ (strong signal)
#
# ═══════════════════════════════════════════════════════════════════════════════

if classifier.is_trained:
    # Scenario: A day with negative geopolitical events
    sample_features = {
        'goldstein_mean': -3.5,      # Negative events on average
        'goldstein_min': -7.0,       # Some severe conflict
        'goldstein_max': 2.0,        # Some cooperation too
        'mentions_total': 500,       # Moderate media coverage
        'avg_tone': -2.0,            # Negative tone
        'conflict_count': 5,         # Several conflicts
        'cooperation_count': 2,      # Some cooperation
    }
    
    print("Sample Prediction")
    print("=" * 50)
    print()
    print("Input Features:")
    for k, v in sample_features.items():
        print(f"  {k}: {v}")
    print()
    
    prediction = classifier.predict('CL=F', sample_features)
    
    if prediction:
        print(explain_classification(prediction))
    else:
        print("Prediction failed.")
else:
    print("Classifier not trained.")

## 3. Production Version: sklearn

The **production version** uses sklearn, the industry-standard ML library.

### Advantages over Learning Version

| Feature | Learning | Production |
|---------|----------|------------|
| Optimizer | Manual gradient descent | LBFGS (quasi-Newton) |
| Cross-validation | Not included | Built-in k-fold |
| Regularization | Basic L2 | L1, L2, ElasticNet |
| Scalability | Single-threaded | Multi-threaded |
| Reliability | Educational | Battle-tested |

### Cross-Validation

**Cross-validation** gives a more robust accuracy estimate:
1. Split data into k folds (default 5)
2. Train on k-1 folds, test on remaining fold
3. Repeat k times, average the results

This prevents overfitting to a single train/test split.

In [None]:
# ═══════════════════════════════════════════════════════════════════════════════
# PRODUCTION VERSION: ProductionClassifier
# ═══════════════════════════════════════════════════════════════════════════════
#
# sklearn's LogisticRegression with cross-validation.
#
# Key differences from learning version:
#   - Uses LBFGS optimizer (faster, more stable)
#   - Automatic cross-validation (5-fold by default)
#   - Computes confidence intervals on accuracy
#
# The metrics returned include:
#   - accuracy: Overall correct predictions
#   - precision: When we predict UP, how often correct?
#   - recall: Of actual UPs, how many did we catch?
#   - f1_score: Harmonic mean of precision and recall
#   - cv_accuracy: Cross-validated accuracy (more reliable)
#   - cv_std: Standard deviation across folds
#
# ═══════════════════════════════════════════════════════════════════════════════

from src.analysis.production_classifier import ProductionClassifier

# Create production classifier
prod_classifier = ProductionClassifier()

print(f"Training production classifier for CL=F")
print("=" * 60)

prod_metrics = prod_classifier.train('CL=F', start_date, end_date)

if prod_metrics:
    print("\nProduction Version Results")
    print("─" * 40)
    print()
    print("Training Metrics:")
    print(f"  Accuracy:  {prod_metrics.accuracy:.2%}")
    print(f"  Precision: {prod_metrics.precision:.2%}")
    print(f"  Recall:    {prod_metrics.recall:.2%}")
    print(f"  F1 Score:  {prod_metrics.f1_score:.2%}")
    print()
    print("Cross-Validation (5-fold):")
    print(f"  CV Accuracy: {prod_metrics.cv_accuracy:.2%} (±{prod_metrics.cv_std*2:.2%})")
    print()
    print("Interpretation:")
    if prod_metrics.cv_accuracy > 0.55:
        print("  ✓ Model performs better than random (50%)!")
    else:
        print("  ✗ Model near random - events may not predict this market well")
else:
    print("Training failed - insufficient data.")

In [None]:
# ═══════════════════════════════════════════════════════════════════════════════
# FEATURE IMPORTANCE VISUALIZATION
# ═══════════════════════════════════════════════════════════════════════════════
#
# Visual representation of which features drive predictions.
#
# Bar chart interpretation:
#   - Longer bar = more important feature
#   - Green bar = positive coefficient (increases P(UP))
#   - Red bar = negative coefficient (decreases P(UP))
#
# This is one of logistic regression's key strengths:
# Unlike neural networks, you can explain WHY a prediction was made!
#
# ═══════════════════════════════════════════════════════════════════════════════

if prod_classifier.models:
    prod_importance = prod_classifier.get_feature_importance('CL=F')
    
    fig, ax = plt.subplots(figsize=(10, 6))
    
    features = list(prod_importance.keys())
    values = list(prod_importance.values())
    
    # Get actual coefficients for coloring
    model = prod_classifier.models['CL=F']
    coefs = dict(zip(prod_classifier.feature_names, model.coef_[0]))
    colors = ['green' if coefs.get(f, 0) > 0 else 'red' for f in features]
    
    # Sort by importance
    sorted_idx = np.argsort(values)[::-1]
    features_sorted = [features[i] for i in sorted_idx]
    values_sorted = [values[i] for i in sorted_idx]
    colors_sorted = [colors[i] for i in sorted_idx]
    
    ax.barh(features_sorted, values_sorted, color=colors_sorted, alpha=0.7, edgecolor='black')
    ax.set_xlabel('Absolute Coefficient Value', fontsize=12)
    ax.set_ylabel('Feature', fontsize=12)
    ax.set_title('Feature Importance\n(Green = ↑ P(UP), Red = ↓ P(UP))', fontsize=12)
    
    # Add a legend
    from matplotlib.patches import Patch
    legend_elements = [
        Patch(facecolor='green', alpha=0.7, label='Increases P(UP)'),
        Patch(facecolor='red', alpha=0.7, label='Decreases P(UP)'),
    ]
    ax.legend(handles=legend_elements, loc='lower right')
    
    plt.tight_layout()
    plt.show()
    
    print("\nFeature Analysis:")
    print("─" * 50)
    top_feature = features_sorted[0]
    top_direction = 'increases' if coefs.get(top_feature, 0) > 0 else 'decreases'
    print(f"  Most important: {top_feature}")
    print(f"  Effect: Higher {top_feature} → {top_direction} P(UP)")
else:
    print("No model trained.")

## 4. Training Across Multiple Markets

Let's see how well our model performs on different markets.

Different markets may respond differently to geopolitical events:
- **Oil (CL=F)**: Highly sensitive to Middle East events
- **Gold (GC=F)**: Safe haven, may rise during uncertainty
- **SPY**: Broad market, diverse factors
- **VIX**: Fear gauge, inversely related to sentiment

In [None]:
# ═══════════════════════════════════════════════════════════════════════════════
# MULTI-MARKET TRAINING
# ═══════════════════════════════════════════════════════════════════════════════
#
# Train separate models for each market.
#
# Why separate models?
#   - Each market has different sensitivities
#   - Oil reacts differently to events than gold
#   - A unified model would be too generic
#
# We'll compare:
#   - Training accuracy: How well we fit the training data
#   - CV accuracy: How well we generalize (more important!)
#
# ═══════════════════════════════════════════════════════════════════════════════

symbols = ['CL=F', 'GC=F', 'SPY', '^VIX', 'EURUSD=X']
symbol_names = {
    'CL=F': 'Crude Oil',
    'GC=F': 'Gold',
    'SPY': 'S&P 500',
    '^VIX': 'Volatility',
    'EURUSD=X': 'EUR/USD',
}

all_results = {}

print("Training classifiers for multiple markets...")
print("=" * 60)

for symbol in symbols:
    metrics = prod_classifier.train(symbol, start_date, end_date)
    if metrics:
        all_results[symbol] = metrics
        status = '✓' if metrics.cv_accuracy > 0.52 else '○'
        print(f"  {status} {symbol_names.get(symbol, symbol)}: "
              f"Acc={metrics.accuracy:.1%}, CV={metrics.cv_accuracy:.1%} (±{metrics.cv_std*2:.1%})")
    else:
        print(f"  ✗ {symbol_names.get(symbol, symbol)}: Insufficient data")

print()
print("Legend: ✓ = better than random, ○ = near random, ✗ = no data")

In [None]:
# ═══════════════════════════════════════════════════════════════════════════════
# CROSS-MARKET COMPARISON VISUALIZATION
# ═══════════════════════════════════════════════════════════════════════════════
#
# LEFT PLOT - Accuracy Comparison:
#   - Blue bars: Training accuracy (may overfit)
#   - Green bars: CV accuracy (more realistic)
#   - Error bars: ±2 std (95% confidence interval)
#   - Red dashed line: Random baseline (50%)
#
# RIGHT PLOT - F1 Score:
#   - Balances precision and recall
#   - Green = good (>0.5), Orange = okay (0.4-0.5), Red = poor (<0.4)
#
# ═══════════════════════════════════════════════════════════════════════════════

if all_results:
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    # ─── LEFT: Accuracy Comparison ───
    ax = axes[0]
    
    symbols_list = list(all_results.keys())
    names = [symbol_names.get(s, s) for s in symbols_list]
    accuracies = [all_results[s].accuracy for s in symbols_list]
    cv_accuracies = [all_results[s].cv_accuracy for s in symbols_list]
    cv_stds = [all_results[s].cv_std for s in symbols_list]
    
    x = np.arange(len(symbols_list))
    width = 0.35
    
    bars1 = ax.bar(x - width/2, accuracies, width, label='Training Accuracy', 
                   color='steelblue', alpha=0.7)
    bars2 = ax.bar(x + width/2, cv_accuracies, width, label='CV Accuracy', 
                   color='green', alpha=0.7, yerr=[s*2 for s in cv_stds], capsize=5)
    
    ax.axhline(y=0.5, color='red', linestyle='--', linewidth=2, label='Random (50%)')
    
    ax.set_xlabel('Market', fontsize=12)
    ax.set_ylabel('Accuracy', fontsize=12)
    ax.set_title('Classification Accuracy by Market\n(CV accuracy is more reliable)', fontsize=12)
    ax.set_xticks(x)
    ax.set_xticklabels(names, rotation=15)
    ax.legend()
    ax.set_ylim(0, 1)
    
    # ─── RIGHT: F1 Scores ───
    ax = axes[1]
    
    f1_scores = [all_results[s].f1_score for s in symbols_list]
    
    # Color by performance
    colors = ['green' if f > 0.5 else 'orange' if f > 0.4 else 'red' for f in f1_scores]
    
    ax.bar(names, f1_scores, color=colors, alpha=0.7, edgecolor='black')
    ax.axhline(y=0.5, color='red', linestyle='--', linewidth=2, label='Baseline')
    
    ax.set_xlabel('Market', fontsize=12)
    ax.set_ylabel('F1 Score', fontsize=12)
    ax.set_title('F1 Score by Market\n(Balances Precision and Recall)', fontsize=12)
    ax.set_xticklabels(names, rotation=15)
    ax.legend()
    
    plt.tight_layout()
    plt.show()
    
    # Summary
    print("\nSummary:")
    print("─" * 50)
    best_market = max(all_results.items(), key=lambda x: x[1].cv_accuracy)
    print(f"  Best market: {symbol_names.get(best_market[0], best_market[0])} "
          f"(CV accuracy: {best_market[1].cv_accuracy:.1%})")
    
    above_random = sum(1 for m in all_results.values() if m.cv_accuracy > 0.52)
    print(f"  Markets beating random: {above_random}/{len(all_results)}")
else:
    print("No results to visualize.")

## 5. Understanding the Metrics

Classification metrics can be confusing. Let's break them down.

### The Confusion Matrix

```
                 Predicted
               UP      DOWN
        Actual UP:  TP       FN
              DOWN: FP       TN
```

- **TP (True Positive)**: Predicted UP, actually UP
- **TN (True Negative)**: Predicted DOWN, actually DOWN
- **FP (False Positive)**: Predicted UP, actually DOWN (oops!)
- **FN (False Negative)**: Predicted DOWN, actually UP (missed opportunity!)

In [None]:
# ═══════════════════════════════════════════════════════════════════════════════
# METRIC EXPLANATIONS
# ═══════════════════════════════════════════════════════════════════════════════
#
# Understanding what each metric tells you:
#
# ACCURACY:
#   - (TP + TN) / Total
#   - How often are we correct overall?
#   - Can be misleading with imbalanced classes
#
# PRECISION:
#   - TP / (TP + FP)
#   - When we predict UP, how often are we right?
#   - "Don't cry wolf" - minimize false alarms
#
# RECALL (Sensitivity):
#   - TP / (TP + FN)
#   - Of all actual UPs, how many did we catch?
#   - "Don't miss any wolves" - minimize missed opportunities
#
# F1 SCORE:
#   - 2 × (Precision × Recall) / (Precision + Recall)
#   - Harmonic mean - balances both concerns
#   - Use when you care about both equally
#
# ═══════════════════════════════════════════════════════════════════════════════

print("UNDERSTANDING CLASSIFICATION METRICS")
print("═" * 60)
print()
print("THE CONFUSION MATRIX:")
print("─" * 40)
print("                      Predicted")
print("                    UP       DOWN")
print("          Actual UP:  TP        FN")
print("                DOWN: FP        TN")
print()
print("METRICS FROM THE MATRIX:")
print("─" * 40)
print()
print("  ACCURACY  = (TP + TN) / Total")
print("            = How often are we correct overall?")
print("            → Good for balanced datasets")
print()
print("  PRECISION = TP / (TP + FP)")
print("            = When we predict UP, how often correct?")
print("            → 'Don't cry wolf' - avoid false alarms")
print()
print("  RECALL    = TP / (TP + FN)")
print("            = Of all actual UPs, how many caught?")
print("            → 'Don't miss opportunities'")
print()
print("  F1 SCORE  = 2 × (Precision × Recall) / (Precision + Recall)")
print("            = Harmonic mean - balances both")
print("            → Use when both matter equally")
print()
print("CROSS-VALIDATION:")
print("─" * 40)
print("  = Average accuracy across multiple train/test splits")
print("  = More robust estimate of real-world performance")
print("  = Prevents overfitting to a single split")

In [None]:
# ═══════════════════════════════════════════════════════════════════════════════
# CONFUSION MATRIX VISUALIZATION
# ═══════════════════════════════════════════════════════════════════════════════
#
# Visual representation of a confusion matrix.
#
# The heatmap shows:
#   - Darker blue = more predictions in that cell
#   - Diagonal = correct predictions (want these high)
#   - Off-diagonal = errors (want these low)
#
# Reading the matrix:
#   - Top-left (TP): Correctly predicted UP
#   - Top-right (FN): Missed UP (predicted DOWN)
#   - Bottom-left (FP): False alarm (predicted UP, was DOWN)
#   - Bottom-right (TN): Correctly predicted DOWN
#
# ═══════════════════════════════════════════════════════════════════════════════

fig, ax = plt.subplots(figsize=(8, 6))

# Example confusion matrix
cm = np.array([
    [45, 10],   # Actual UP: 45 correct (TP), 10 missed (FN)
    [15, 30]    # Actual DOWN: 15 false alarms (FP), 30 correct (TN)
])

# Create heatmap
sns.heatmap(
    cm, 
    annot=True, 
    fmt='d', 
    cmap='Blues',
    xticklabels=['Predict UP', 'Predict DOWN'],
    yticklabels=['Actual UP', 'Actual DOWN'],
    ax=ax, 
    annot_kws={'size': 16}
)

ax.set_title('Example Confusion Matrix', fontsize=14)

# Calculate metrics from this matrix
tp, fn = cm[0]
fp, tn = cm[1]
total = cm.sum()

accuracy = (tp + tn) / total
precision = tp / (tp + fp) if (tp + fp) > 0 else 0
recall = tp / (tp + fn) if (tp + fn) > 0 else 0
f1 = 2 * precision * recall / (precision + recall) if (precision + recall) > 0 else 0

# Add metrics as text
metrics_text = f'Accuracy: {accuracy:.2%}\nPrecision: {precision:.2%}\nRecall: {recall:.2%}\nF1: {f1:.2%}'
ax.text(2.5, 0.5, metrics_text, fontsize=12, verticalalignment='center',
        bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))

# Add cell labels
ax.text(-0.4, 0.5, 'TP=45', fontsize=10, color='green', fontweight='bold', ha='right')
ax.text(-0.4, 1.5, 'FP=15', fontsize=10, color='red', fontweight='bold', ha='right')
ax.text(2.4, 0.5, 'FN=10', fontsize=10, color='red', fontweight='bold', ha='left')
ax.text(2.4, 1.5, 'TN=30', fontsize=10, color='green', fontweight='bold', ha='left')

plt.tight_layout()
plt.show()

print("\nReading this confusion matrix:")
print("─" * 50)
print(f"  • {tp} times we correctly predicted UP (True Positive)")
print(f"  • {tn} times we correctly predicted DOWN (True Negative)")
print(f"  • {fn} times we missed an UP (False Negative)")
print(f"  • {fp} times we falsely predicted UP (False Positive)")

## 6. Making Predictions

Let's use our trained model to make predictions for different scenarios.

In [None]:
# ═══════════════════════════════════════════════════════════════════════════════
# SCENARIO-BASED PREDICTIONS
# ═══════════════════════════════════════════════════════════════════════════════
#
# We'll test three scenarios:
#
# 1. CONFLICT DAY:
#    - Negative Goldstein scores
#    - High media mentions
#    - Many conflict events
#    → Expect model to predict DOWN with higher confidence
#
# 2. PEACEFUL DAY:
#    - Positive Goldstein scores
#    - Moderate coverage
#    - Cooperation events
#    → Expect model to predict UP with higher confidence
#
# 3. MIXED DAY:
#    - Neutral Goldstein
#    - Mix of conflict and cooperation
#    → Expect model to be uncertain (near 50%)
#
# ═══════════════════════════════════════════════════════════════════════════════

if prod_classifier.models:
    scenarios = [
        ('CONFLICT DAY', {
            'goldstein_mean': -5.0,      # Negative events on average
            'goldstein_min': -9.0,       # Severe conflict
            'goldstein_max': -2.0,       # Even best event was negative
            'mentions_total': 1000,      # High media coverage
            'avg_tone': -4.0,            # Very negative tone
            'conflict_count': 10,        # Many conflicts
            'cooperation_count': 1,      # Almost no cooperation
        }),
        ('PEACEFUL DAY', {
            'goldstein_mean': 4.0,       # Positive events
            'goldstein_min': 1.0,        # Even worst event was positive
            'goldstein_max': 7.0,        # Some great cooperation
            'mentions_total': 300,       # Lower media attention
            'avg_tone': 2.0,             # Positive tone
            'conflict_count': 1,         # Minimal conflict
            'cooperation_count': 8,      # Lots of cooperation
        }),
        ('MIXED DAY', {
            'goldstein_mean': 0.0,       # Neutral average
            'goldstein_min': -5.0,       # Some conflict
            'goldstein_max': 5.0,        # Some cooperation
            'mentions_total': 500,       # Moderate coverage
            'avg_tone': 0.0,             # Neutral tone
            'conflict_count': 3,         # Some conflict
            'cooperation_count': 3,      # Some cooperation
        }),
    ]
    
    print("SCENARIO-BASED PREDICTIONS FOR OIL (CL=F)")
    print("═" * 60)
    
    for name, features in scenarios:
        print(f"\n{name}")
        print("─" * 40)
        
        # Key feature summary
        print(f"  Goldstein: {features['goldstein_mean']:+.1f} (range: {features['goldstein_min']:+.1f} to {features['goldstein_max']:+.1f})")
        print(f"  Mentions: {features['mentions_total']}, Tone: {features['avg_tone']:+.1f}")
        print(f"  Conflicts: {features['conflict_count']}, Cooperations: {features['cooperation_count']}")
        
        pred = prod_classifier.predict('CL=F', features)
        if pred:
            confidence_bar = '█' * int(pred.probability * 20)
            print(f"\n  → Prediction: {pred.prediction}")
            print(f"  → P(UP): {pred.probability:.1%} [{confidence_bar:<20}]")
            print(f"  → Confidence: {pred.confidence.upper()}")
        else:
            print("  → Prediction failed")
    
    print("\n" + "═" * 60)
    print("INTERPRETATION:")
    print("  • Conflict days → model predicts DOWN (lower P(UP))")
    print("  • Peaceful days → model predicts UP (higher P(UP))")
    print("  • Mixed days → model is uncertain (near 50%)")
else:
    print("No models trained.")

## Summary

**Classification** predicts market direction (UP/DOWN) from geopolitical event features.

---

### Key Concepts

| Concept | Description | Formula/Notes |
|---------|-------------|---------------|
| **Sigmoid** | Maps any value to [0,1] | σ(z) = 1/(1+e⁻ᶻ) |
| **Decision boundary** | Where P(UP) = 0.5 | z = 0 in sigmoid |
| **Accuracy** | Overall correctness | (TP+TN)/Total |
| **Precision** | When predict UP, how often right | TP/(TP+FP) |
| **Recall** | Of actual UPs, how many caught | TP/(TP+FN) |
| **F1 Score** | Harmonic mean of P & R | 2×P×R/(P+R) |
| **Cross-validation** | Robust accuracy estimate | Average across k folds |

---

### Two Implementations

| Version | Class | Key Features |
|---------|-------|-------------|
| **Learning** | `MarketClassifier` | Manual gradient descent, shows the math |
| **Production** | `ProductionClassifier` | sklearn, cross-validation, ready for use |

---

### Key Takeaways

1. **Logistic regression is interpretable** - coefficients explain predictions
2. **Cross-validation > training accuracy** - always report CV scores
3. **>50% is meaningful** - markets are hard to predict!
4. **Different markets, different models** - oil ≠ gold ≠ equities

---

### Limitations

- **Markets are inherently unpredictable** - even 55% is impressive
- **Events are just one factor** - earnings, rates, technicals matter too
- **Past patterns may not persist** - regimes change
- **Not financial advice** - this is educational!

---

### Use Cases

| Use Case | Why Good |
|----------|----------|
| **Portfolio demonstration** | Shows ML skills, finance knowledge |
| **Learning ML fundamentals** | Classification, CV, metrics |
| **Interview prep** | Explain logistic regression, feature importance |
| **Research baseline** | Compare with more complex models |

---

**Congratulations!** You've completed the notebook series. You now understand:
1. **EDA** - Exploring data
2. **Event Studies** - Measuring event impact
3. **Anomaly Detection** - Finding unusual patterns
4. **Classification** - Predicting direction