# HMM Futures Analysis - Basic Tutorial

This notebook demonstrates the basic usage of the HMM Futures Analysis system for market regime detection and analysis.

## Overview

The Hidden Markov Model (HMM) approach helps identify different market regimes (bull, bear, sideways) by analyzing price patterns and transitions between states. This tutorial will walk you through:

1. Loading and preprocessing futures data
2. Feature engineering for HMM analysis
3. Training an HMM model
4. Interpreting results and visualizing regimes
5. Basic backtesting concepts

Let's get started!

In [None]:
# Import necessary libraries
import sys
import warnings
from pathlib import Path

# Add src to path for imports
sys.path.insert(0, str(Path().absolute().parent / 'src'))

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

# Import HMM analysis modules
from data_processing.csv_parser import process_csv
from data_processing.data_validation import validate_data
from data_processing.feature_engineering import add_features
from model_training.hmm_trainer import train_model
from model_training.inference_engine import predict_states_comprehensive
from utils import ProcessingConfig, get_logger

# Set up logging
logger = get_logger(__name__)

# Configure plotting
%matplotlib inline
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

# Suppress warnings for cleaner output
warnings.filterwarnings('ignore')

print("✅ All imports successful!")

## 1. Load and Explore Data

First, let's load our futures data and examine its structure. We'll use the built-in CSV parser which handles various CSV formats automatically.

In [None]:
# Define data path
data_path = "../BTC.csv"  # Update this path to your data file

if not Path(data_path).exists():
    print(f"❌ Data file not found: {data_path}")
    print("Please update the data_path variable to point to your CSV file.")
else:
    print(f"📁 Loading data from: {data_path}")

    # Load data using the CSV parser
    config = ProcessingConfig(engine_type='streaming', chunk_size=1000)
    data = process_csv(data_path, config)

    print("✅ Data loaded successfully!")
    print(f"📊 Dataset shape: {data.shape}")
    print(f"📅 Date range: {data.index[0]} to {data.index[-1]}")
    print(f"📈 Columns: {list(data.columns)}")

In [None]:
# Display basic statistics
if 'data' in locals():
    print("📈 Basic Statistics:")
    display(data.describe())

    print("\n📊 First few rows:")
    display(data.head())

    print("\n💰 Price information:")
    print(f"Initial price: ${data['close'].iloc[0]:.2f}")
    print(f"Final price: ${data['close'].iloc[-1]:.2f}")
    print(f"Price change: {((data['close'].iloc[-1] / data['close'].iloc[0] - 1) * 100):.2f}%")

## 2. Data Validation

Let's validate our data to ensure it meets quality standards for HMM analysis.

In [None]:
if 'data' in locals():
    print("🔍 Validating data quality...")

    # Validate data
    validation_result = validate_data(data)

    print("✅ Data validation completed!")
    print(f"📊 Data quality score: {validation_result.get('quality_score', 'N/A')}")
    print(f"⚠️  Issues found: {validation_result.get('issues_count', 0)}")
    print(f"🔧 Actions taken: {validation_result.get('actions_taken', 0)}")

## 3. Feature Engineering

HMM models work best with meaningful features. Let's engineer technical indicators that will help the model identify different market regimes.

In [None]:
if 'data' in locals():
    print("⚙️  Engineering features...")

    # Add features
    features = add_features(data.copy())

    print("✅ Feature engineering completed!")
    print(f"📊 Features shape: {features.shape}")
    print(f"📈 Total columns: {len(features.columns)}")

    # Display feature columns
    print("\n🔧 Engineered features:")
    feature_cols = [col for col in features.columns if col not in data.columns]
    for i, col in enumerate(feature_cols[:10], 1):  # Show first 10 features
        print(f"{i:2d}. {col}")
    if len(feature_cols) > 10:
        print(f"    ... and {len(feature_cols) - 10} more features")

In [None]:
# Visualize some key features
if 'features' in locals():
    fig, axes = plt.subplots(2, 2, figsize=(15, 10))
    fig.suptitle('Key Features Visualization', fontsize=16)

    # Price and moving averages
    axes[0, 0].plot(features.index, features['close'], label='Close Price', alpha=0.7)
    if 'sma_20' in features.columns:
        axes[0, 0].plot(features.index, features['sma_20'], label='20-day SMA', alpha=0.8)
    axes[0, 0].set_title('Price & Moving Average')
    axes[0, 0].legend()
    axes[0, 0].grid(True, alpha=0.3)

    # Returns
    if 'log_ret' in features.columns:
        axes[0, 1].plot(features.index, features['log_ret'], alpha=0.7)
        axes[0, 1].axhline(y=0, color='red', linestyle='--', alpha=0.5)
        axes[0, 1].set_title('Log Returns')
        axes[0, 1].grid(True, alpha=0.3)

    # Volatility
    if 'volatility_14' in features.columns:
        axes[1, 0].plot(features.index, features['volatility_14'], alpha=0.7, color='orange')
        axes[1, 0].set_title('14-day Volatility')
        axes[1, 0].grid(True, alpha=0.3)

    # Volume
    if 'volume' in features.columns:
        axes[1, 1].bar(features.index, features['volume'], alpha=0.6, color='green')
        axes[1, 1].set_title('Trading Volume')
        axes[1, 1].grid(True, alpha=0.3)

    plt.tight_layout()
    plt.show()

## 4. HMM Model Training

Now let's train our Hidden Markov Model to identify market regimes. We'll start with a simple 3-state model (bull, bear, sideways).

In [None]:
if 'features' in locals():
    print("🤖 Training HMM model...")

    # Configure HMM parameters
    hmm_config = {
        'n_components': 3,  # 3 market regimes
        'covariance_type': 'full',
        'n_iter': 100,
        'random_state': 42,
        'tol': 1e-3,
        'num_restarts': 3  # Multiple restarts for better convergence
    }

    print("📊 HMM Configuration:")
    print(f"   Number of states: {hmm_config['n_components']}")
    print(f"   Covariance type: {hmm_config['covariance_type']}")
    print(f"   Max iterations: {hmm_config['n_iter']}")

    # Select features for HMM (use meaningful technical indicators)
    feature_columns = ['log_ret', 'volatility_14', 'price_position_20', 'hl_ratio_10']
    available_features = [col for col in feature_columns if col in features.columns]

    print(f"\n📈 Using features: {available_features}")

    # Prepare feature matrix
    feature_matrix = features[available_features].dropna()
    print(f"📊 Feature matrix shape: {feature_matrix.shape}")

    # Train HMM model
    model, metadata = train_model(feature_matrix, hmm_config)

    print("\n✅ HMM model trained successfully!")
    print("📊 Training metadata:")
    for key, value in metadata.items():
        if key != 'convergence_history':  # Skip long convergence history
            print(f"   {key}: {value}")

## 5. State Inference and Analysis

Let's use our trained model to infer market states and analyze the results.

In [None]:
if 'model' in locals() and 'feature_matrix' in locals():
    print("🔮 Inferring market states...")

    # Perform comprehensive inference
    inference_result = predict_states_comprehensive(
        model,
        metadata['scaler'],  # Use the scaler from training
        feature_matrix.values,
        available_features
    )

    print("✅ State inference completed!")
    print("📊 Inference results:")
    print(f"   Total samples: {inference_result.n_samples}")
    print(f"   Number of states: {inference_result.n_states}")
    print(f"   Log likelihood: {inference_result.log_likelihood:.2f}")

    # Create results dataframe
    results_df = feature_matrix.copy()
    results_df['hmm_state'] = inference_result.states

    # Add confidence metrics
    if inference_result.probabilities is not None:
        results_df['state_confidence'] = np.max(inference_result.probabilities, axis=1)

    print("\n📈 State distribution:")
    state_counts = pd.Series(inference_result.states).value_counts().sort_index()
    for state, count in state_counts.items():
        percentage = (count / len(inference_result.states)) * 100
        print(f"   State {state}: {count} periods ({percentage:.1f}%)")

In [None]:
# Visualize the results
if 'results_df' in locals():
    fig, axes = plt.subplots(3, 1, figsize=(15, 12))
    fig.suptitle('HMM Market Regime Analysis', fontsize=16)

    # Plot 1: Price with states
    ax1 = axes[0]
    scatter = ax1.scatter(results_df.index, features.loc[results_df.index, 'close'],
                         c=results_df['hmm_state'], cmap='viridis', alpha=0.7, s=1)
    ax1.set_title('Price with HMM States')
    ax1.set_ylabel('Price')
    plt.colorbar(scatter, ax=ax1, label='HMM State')
    ax1.grid(True, alpha=0.3)

    # Plot 2: States over time
    ax2 = axes[1]
    ax2.plot(results_df.index, results_df['hmm_state'], alpha=0.7, linewidth=0.5)
    ax2.set_title('Market Regimes Over Time')
    ax2.set_ylabel('State')
    ax2.grid(True, alpha=0.3)

    # Plot 3: State confidence
    if 'state_confidence' in results_df.columns:
        ax3 = axes[2]
        ax3.plot(results_df.index, results_df['state_confidence'], alpha=0.7, color='red')
        ax3.set_title('Model Confidence in State Assignment')
        ax3.set_ylabel('Confidence')
        ax3.set_xlabel('Date')
        ax3.grid(True, alpha=0.3)
    else:
        axes[2].set_visible(False)

    plt.tight_layout()
    plt.show()

## 6. State Analysis

Let's analyze the characteristics of each identified market regime.

In [None]:
if 'results_df' in locals():
    print("📊 Analyzing market regimes...")

    # Calculate statistics for each state
    state_stats = {}

    for state in sorted(results_df['hmm_state'].unique()):
        state_data = results_df[results_df['hmm_state'] == state]

        # Get corresponding price data
        state_prices = features.loc[state_data.index, 'close']
        state_returns = state_data['log_ret'] if 'log_ret' in state_data.columns else None

        stats = {
            'periods': len(state_data),
            'percentage': (len(state_data) / len(results_df)) * 100,
            'avg_price': state_prices.mean(),
            'price_volatility': state_prices.std(),
            'min_price': state_prices.min(),
            'max_price': state_prices.max(),
        }

        if state_returns is not None:
            stats.update({
                'avg_return': state_returns.mean(),
                'return_volatility': state_returns.std(),
                'positive_returns_pct': (state_returns > 0).mean() * 100,
            })

        if 'state_confidence' in state_data.columns:
            stats['avg_confidence'] = state_data['state_confidence'].mean()

        state_stats[state] = stats

    # Display state analysis
    print("\n🎯 Market Regime Analysis:")
    print("=" * 80)

    for state, stats in state_stats.items():
        print(f"\n📊 State {state} ({stats['percentage']:.1f}% of time):")
        print(f"   Periods: {stats['periods']}")
        print(f"   Average price: ${stats['avg_price']:.2f}")
        print(f"   Price range: ${stats['min_price']:.2f} - ${stats['max_price']:.2f}")
        print(f"   Price volatility: {stats['price_volatility']:.4f}")

        if 'avg_return' in stats:
            print(f"   Average daily return: {stats['avg_return']:.4f}")
            print(f"   Return volatility: {stats['return_volatility']:.4f}")
            print(f"   Positive returns: {stats['positive_returns_pct']:.1f}%")

        if 'avg_confidence' in stats:
            print(f"   Model confidence: {stats['avg_confidence']:.3f}")

        # Classify the regime
        if 'avg_return' in stats:
            if stats['avg_return'] > 0.001:  # > 0.1% daily return
                regime_type = "🟢 Bull Market"
            elif stats['avg_return'] < -0.001:  # < -0.1% daily return
                regime_type = "🔴 Bear Market"
            else:
                regime_type = "🟡 Sideways/Neutral"
            print(f"   Regime type: {regime_type}")

## 7. Transition Analysis

Let's analyze how the market transitions between different regimes.

In [None]:
if 'results_df' in locals():
    print("🔄 Analyzing state transitions...")

    # Calculate transitions
    states = results_df['hmm_state'].values
    transitions = []

    for i in range(1, len(states)):
        if states[i] != states[i-1]:
            transitions.append((states[i-1], states[i]))

    print("✅ Transition analysis completed!")
    print(f"📊 Total transitions: {len(transitions)}")
    print(f"📈 Transition frequency: {len(transitions) / len(states) * 100:.2f}% of periods")

    # Transition matrix
    transition_counts = {}
    for from_state, to_state in transitions:
        key = (from_state, to_state)
        transition_counts[key] = transition_counts.get(key, 0) + 1

    print("\n🔄 Transition Matrix (Count of transitions):")
    print("=" * 50)

    # Create transition matrix display
    all_states = sorted(set(results_df['hmm_state']))

    # Header
    print("From\\To:", end="\t")
    for state in all_states:
        print(f"State {state}", end="\t")
    print()

    # Rows
    for from_state in all_states:
        print(f"State {from_state}", end="\t")
        for to_state in all_states:
            count = transition_counts.get((from_state, to_state), 0)
            print(f"{count}", end="\t")
        print()

    # Analyze persistence
    print("\n⏱️  State Persistence Analysis:")
    print("=" * 40)

    for state in all_states:
        state_mask = results_df['hmm_state'] == state
        state_periods = results_df[state_mask]

        # Calculate consecutive periods
        consecutive_periods = []
        current_count = 0

        for i in range(len(state_periods)):
            if i == 0 or state_periods.index[i] != state_periods.index[i-1] + 1:
                if current_count > 0:
                    consecutive_periods.append(current_count)
                current_count = 1
            else:
                current_count += 1

        if current_count > 0:
            consecutive_periods.append(current_count)

        if consecutive_periods:
            avg_duration = np.mean(consecutive_periods)
            max_duration = max(consecutive_periods)
            min_duration = min(consecutive_periods)

            print(f"State {state}:")
            print(f"   Average duration: {avg_duration:.1f} periods")
            print(f"   Max duration: {max_duration} periods")
            print(f"   Min duration: {min_duration} periods")
            print(f"   Number of occurrences: {len(consecutive_periods)}")
            print()

## 8. Save Results

Let's save our analysis results for future reference.

In [None]:
if 'results_df' in locals():
    # Create output directory
    output_dir = Path("../notebook_output")
    output_dir.mkdir(exist_ok=True)

    # Save results
    results_file = output_dir / "hmm_analysis_results.csv"
    results_df.to_csv(results_file)
    print(f"💾 Results saved to: {results_file}")

    # Save model metadata
    import json

    # Prepare metadata for saving (convert numpy arrays to lists)
    save_metadata = {}
    for key, value in metadata.items():
        if hasattr(value, 'tolist'):
            save_metadata[key] = value.tolist()
        elif isinstance(value, np.ndarray):
            save_metadata[key] = value.tolist()
        else:
            save_metadata[key] = value

    metadata_file = output_dir / "hmm_model_metadata.json"
    with open(metadata_file, 'w') as f:
        json.dump(save_metadata, f, indent=2, default=str)

    print(f"🤖 Model metadata saved to: {metadata_file}")

    # Save state statistics
    if 'state_stats' in locals():
        import json

        stats_file = output_dir / "state_statistics.json"
        with open(stats_file, 'w') as f:
            json.dump(state_stats, f, indent=2, default=str)

        print(f"📊 State statistics saved to: {stats_file}")

    print(f"\n✅ All results saved to {output_dir} directory!")

## 9. Summary and Next Steps

Congratulations! You've completed a basic HMM analysis of futures market data. Here's what we accomplished:

In [None]:
print("🎉 HMM Analysis Summary")
print("=" * 40)
print()
print("✅ What we accomplished:")
print("   1. Loaded and validated futures data")
print("   2. Engineered meaningful technical features")
print("   3. Trained a 3-state Hidden Markov Model")
print("   4. Identified market regimes (bull/bear/sideways)")
print("   5. Analyzed state transitions and persistence")
print("   6. Visualized results and saved outputs")
print()
print("📈 Key Insights:")
if 'state_stats' in locals():
    for state, stats in state_stats.items():
        if 'avg_return' in stats:
            if stats['avg_return'] > 0.001:
                regime = "Bull Market 🟢"
            elif stats['avg_return'] < -0.001:
                regime = "Bear Market 🔴"
            else:
                regime = "Sideways Market 🟡"
            print(f"   State {state}: {regime} ({stats['percentage']:.1f}% of time)")
print()
print("🔮 Potential Applications:")
print("   • Market regime detection and monitoring")
print("   • Risk management based on market states")
print("   • Strategy development for different regimes")
print("   • Portfolio allocation adjustments")
print()
print("🚀 Next Steps:")
print("   1. Try different numbers of states (2-5)")
print("   2. Experiment with different feature sets")
print("   3. Implement regime-based trading strategies")
print("   4. Test on different assets or timeframes")
print("   5. Add more sophisticated validation")
print()
print("💡 Advanced Features to Explore:")
print("   • Multi-timeframe analysis")
print("   • Regime prediction and forecasting")
print("   • Integration with backtesting framework")
print("   • Real-time regime monitoring")
print("   • Ensemble methods with multiple HMMs")

---

## 🎯 Challenge Exercises

Try these exercises to deepen your understanding:

1. **Change the number of states**: Modify the HMM configuration to use 2, 4, or 5 states. How does this affect the results?

2. **Feature selection**: Experiment with different combinations of features. Which features work best for regime detection?

3. **Time periods**: Test the model on different time periods (bull markets, bear markets, crisis periods).

4. **Visual analysis**: Create additional visualizations to better understand the regimes.

5. **Performance metrics**: Calculate basic performance metrics for each regime (win rate, average return, etc.).

Happy analyzing! 🚀