# HMM Regime Validation Results

**Analysis Period:** 2015-01-01 to 2025-12-31 (2,887 daily observations)

**Financial Stress Index:** ANFCI (Adjusted National Financial Conditions Index)
- Replaced STLFSI which discontinued in March 2020
- Correlation with STLFSI: 0.63 during overlap period

**HMM Configuration:**
- 2 regimes (Low Stress vs High Stress)
- 7 features: VIX, ANFCI, realized volatility, absolute changes, stress interaction, term spread, dollar strength
- No standardization (raw values preserve extreme events)

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path

# Create figures directory if it doesn't exist
Path('../results/figures').mkdir(parents=True, exist_ok=True)

sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (14, 6)

## 1. Data Overview

In [None]:
# Load processed data
df = pd.read_csv('../data/processed/full_processed_data_hmm.csv', index_col=0, parse_dates=True)

print(f"Data Shape: {df.shape}")
print(f"Date Range: {df.index.min().date()} to {df.index.max().date()}")
print(f"\nColumns: {df.columns.tolist()}")

## 2. Regime Statistics

In [None]:
# Load regime statistics
regime_stats = pd.read_csv('../results/tables/hmm_regime_statistics.csv')

print("Regime Characteristics:")
print("=" * 80)
print(f"\nRegime 0 (Low Stress):")
print(f"  Observations: {regime_stats.loc[0, 'count']:,} ({regime_stats.loc[0, 'frequency']*100:.1f}%)")
print(f"  VIX mean:     {regime_stats.loc[0, 'vix_mean']:.2f} ± {regime_stats.loc[0, 'vix_std']:.2f}")
print(f"  ANFCI mean:   {regime_stats.loc[0, 'anfci_mean']:.4f} ± {regime_stats.loc[0, 'anfci_std']:.4f}")
print(f"  Realized vol: {regime_stats.loc[0, 'realized_vol_mean']:.4f}")

print(f"\nRegime 1 (High Stress):")
print(f"  Observations: {regime_stats.loc[1, 'count']:,} ({regime_stats.loc[1, 'frequency']*100:.1f}%)")
print(f"  VIX mean:     {regime_stats.loc[1, 'vix_mean']:.2f} ± {regime_stats.loc[1, 'vix_std']:.2f}")
print(f"  ANFCI mean:   {regime_stats.loc[1, 'anfci_mean']:.4f} ± {regime_stats.loc[1, 'anfci_std']:.4f}")
print(f"  Realized vol: {regime_stats.loc[1, 'realized_vol_mean']:.4f}")

## 3. Regime Transition Matrix

In [None]:
# Load transition matrix
transitions = pd.read_csv('../results/tables/regime_transitions.csv', index_col=0)

print("Regime Transition Probabilities:")
print("=" * 80)
print(transitions)
print(f"\nInterpretation:")
print(f"  - Low Stress → Low Stress:  {transitions.loc['Regime 0', 'Regime 0']*100:.1f}% (persistent)")
print(f"  - Low Stress → High Stress: {transitions.loc['Regime 0', 'Regime 1']*100:.1f}%")
print(f"  - High Stress → Low Stress: {transitions.loc['Regime 1', 'Regime 0']*100:.1f}%")
print(f"  - High Stress → High Stress: {transitions.loc['Regime 1', 'Regime 1']*100:.1f}% (persistent)")

## 4. Year-by-Year Regime Distribution

In [None]:
# Add year column
df['year'] = df.index.year

# Compute year-by-year statistics
yearly_stats = []
for year in sorted(df['year'].unique()):
    year_data = df[df['year'] == year]
    total = len(year_data)
    regime_0 = (year_data['regime'] == 0).sum()
    regime_1 = (year_data['regime'] == 1).sum()
    
    yearly_stats.append({
        'year': year,
        'total_obs': total,
        'regime_0_obs': regime_0,
        'regime_1_obs': regime_1,
        'regime_0_pct': regime_0/total*100,
        'regime_1_pct': regime_1/total*100,
        'vix_mean': year_data['vix'].mean(),
        'anfci_mean': year_data['anfci'].mean(),
        'spread_mean': year_data['spread'].mean()
    })

yearly_df = pd.DataFrame(yearly_stats)

print("Year-by-Year Regime Distribution:")
print("=" * 100)
for _, row in yearly_df.iterrows():
    print(f"{int(row['year'])}: Total={int(row['total_obs']):3d} | "
          f"R0: {int(row['regime_0_obs']):3d} ({row['regime_0_pct']:5.1f}%) | "
          f"R1: {int(row['regime_1_obs']):3d} ({row['regime_1_pct']:5.1f}%) | "
          f"VIX={row['vix_mean']:5.2f} | ANFCI={row['anfci_mean']:7.4f} | "
          f"Spread={row['spread_mean']:5.2f}")

## 5. Crisis Detection Validation

In [None]:
# COVID-19 Crisis (Feb-May 2020)
covid_period = df.loc['2020-02-01':'2020-05-31']

print("COVID-19 Crisis Detection (Feb-May 2020):")
print("=" * 80)
print(f"Total observations: {len(covid_period)}")
print(f"\nRegime distribution:")
covid_regime_dist = covid_period['regime'].value_counts().sort_index()
for regime in covid_regime_dist.index:
    pct = covid_regime_dist[regime] / len(covid_period) * 100
    print(f"  Regime {regime}: {covid_regime_dist[regime]:3d} obs ({pct:5.1f}%)")

print(f"\nMarket indicators during COVID:")
print(f"  VIX:    mean={covid_period['vix'].mean():.2f}, max={covid_period['vix'].max():.2f}")
print(f"  Spread: mean={covid_period['spread'].mean():.2f}, max={covid_period['spread'].max():.2f}")
print(f"  ANFCI:  mean={covid_period['anfci'].mean():.4f}, max={covid_period['anfci'].max():.4f}")

# 2022 Inflation/Rate Hike Stress
stress_2022 = df.loc['2022-01-01':'2022-12-31']

print("\n2022 Inflation/Rate Hike Stress:")
print("=" * 80)
print(f"Total observations: {len(stress_2022)}")
print(f"\nRegime distribution:")
stress_regime_dist = stress_2022['regime'].value_counts().sort_index()
for regime in stress_regime_dist.index:
    pct = stress_regime_dist[regime] / len(stress_2022) * 100
    print(f"  Regime {regime}: {stress_regime_dist[regime]:3d} obs ({pct:5.1f}%)")

print(f"\nMarket indicators in 2022:")
print(f"  VIX:    mean={stress_2022['vix'].mean():.2f}, max={stress_2022['vix'].max():.2f}")
print(f"  Spread: mean={stress_2022['spread'].mean():.2f}")
print(f"  ANFCI:  mean={stress_2022['anfci'].mean():.4f}")

# 2017 Calm Period
calm_2017 = df.loc['2017-01-01':'2017-12-31']

print("\n2017 Calm Period (Baseline):")
print("=" * 80)
print(f"Total observations: {len(calm_2017)}")
print(f"\nRegime distribution:")
calm_regime_dist = calm_2017['regime'].value_counts().sort_index()
for regime in calm_regime_dist.index:
    pct = calm_regime_dist[regime] / len(calm_2017) * 100
    print(f"  Regime {regime}: {calm_regime_dist[regime]:3d} obs ({pct:5.1f}%)")

print(f"\nMarket indicators in 2017:")
print(f"  VIX:    mean={calm_2017['vix'].mean():.2f}")
print(f"  Spread: mean={calm_2017['spread'].mean():.2f}")
print(f"  ANFCI:  mean={calm_2017['anfci'].mean():.4f}")

## 6. Mean Reversion Tests

### 6.1 Unconditional Tests

In [None]:
# Load unconditional test results
unconditional = pd.read_csv('../results/tables/unconditional_tests.csv')

print("Unconditional Mean Reversion Tests:")
print("=" * 100)
print(unconditional[['horizon', 'n_obs', 'beta', 'p_beta', 'half_life', 'mean_reverting', 'significance']].to_string(index=False))

print("\nInterpretation:")
print("  - No mean reversion at 1-day and 5-day horizons")
print(f"  - Significant mean reversion at 10-day (β={unconditional.loc[2, 'beta']:.4f}, p={unconditional.loc[2, 'p_beta']:.4f})")
print(f"  - Strong mean reversion at 21-day (β={unconditional.loc[3, 'beta']:.4f}, p={unconditional.loc[3, 'p_beta']:.6f})")
print(f"  - Overall half-life: {unconditional.loc[3, 'half_life']:.1f} days")

### 6.2 Regime-Conditional Tests

In [None]:
# Load conditional test results
conditional = pd.read_csv('../results/tables/conditional_tests.csv')

print("Regime-Conditional Mean Reversion Tests:")
print("=" * 100)

for regime in sorted(conditional['regime'].unique()):
    regime_name = "Low Stress (High Liquidity)" if regime == 0 else "High Stress (Crisis)"
    print(f"\nRegime {regime}: {regime_name}")
    print("-" * 100)
    regime_results = conditional[conditional['regime'] == regime]
    print(regime_results[['horizon', 'n_obs', 'beta', 'p_beta', 'half_life', 'mean_reverting', 'significance']].to_string(index=False))
    
    # Summary
    sig_horizons = regime_results[regime_results['mean_reverting']]['horizon'].tolist()
    print(f"\nSummary:")
    print(f"  - Mean reversion detected at {len(sig_horizons)}/4 horizons: {sig_horizons}")
    
    # Get 5-day and 21-day half-lives
    hl_5d = regime_results[regime_results['horizon'] == 5]['half_life'].values[0]
    hl_21d = regime_results[regime_results['horizon'] == 21]['half_life'].values[0]
    beta_5d = regime_results[regime_results['horizon'] == 5]['beta'].values[0]
    beta_21d = regime_results[regime_results['horizon'] == 21]['beta'].values[0]
    
    print(f"  - 5-day:  β={beta_5d:.4f}, half-life={hl_5d:.1f} days")
    print(f"  - 21-day: β={beta_21d:.4f}, half-life={hl_21d:.1f} days")

## 7. Key Findings Summary

### Regime Characteristics

| Metric | Regime 0 (Low Stress) | Regime 1 (High Stress) |
|--------|----------------------|------------------------|
| Frequency | 74% (2,134 obs) | 26% (753 obs) |
| VIX | 15.6 ± 3.7 | 25.9 ± 8.8 |
| ANFCI | -0.49 ± 0.11 | -0.26 ± 0.20 |
| Realized Vol | 0.010 | 0.027 |

### Mean Reversion by Regime

| Regime | 5-day β | Half-life | Horizons with MR |
|--------|---------|-----------|------------------|
| **Regime 0 (Low Stress)** | -0.017 | 41.5 days | 5d, 10d, 21d |
| **Regime 1 (High Stress)** | -0.099 | 6.6 days | 10d, 21d |

### Crisis Detection Quality

- **2020 COVID Crisis:** 81.6% High Stress during peak (Feb-May), 60% overall
- **2022 Inflation Stress:** 89.7% High Stress
- **2017 Calm Period:** 100% Low Stress (VIX=11.1)

### Critical Insight

**High Stress regime shows FASTER mean reversion** (6.6 days vs 41.5 days):
- Economic interpretation: During crises, spreads snap back quicker due to dealer inventory management
- Trading implication: Higher turnover opportunities in High Stress regime
- Risk consideration: Higher volatility offsets faster convergence

### Comparison to Original Research

| Metric | Original (2016-2020, weekly) | Current (2015-2025, daily) |
|--------|------------------------------|----------------------------|
| Sample | 4.5 years, weekly | 10 years, daily |
| Stress Index | STLFSI | ANFCI |
| Regimes | 3 regimes | 2 regimes |
| High Stress β | +0.219 (explosive) | -0.099 (mean-reverting) |
| Normal β | -0.165 | -0.017 (Regime 0) |
| Half-life | 4-5 days | 6.6 days (R1), 41.5 days (R0) |

**Note:** Different data frequency (weekly vs daily) and sample period drive most differences.

## 8. Visualization

In [None]:
# Plot spread with regime colors
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 10), sharex=True)

# Spread with regime shading
ax1.plot(df.index, df['spread'], color='black', linewidth=0.8, label='Credit Spread')

# Add regime shading
regime_changes = df['regime'].ne(df['regime'].shift())
regime_blocks = []
start_idx = 0
for i in range(1, len(df)):
    if regime_changes.iloc[i]:
        regime_blocks.append((start_idx, i, df['regime'].iloc[start_idx]))
        start_idx = i
regime_blocks.append((start_idx, len(df), df['regime'].iloc[start_idx]))

for start, end, regime in regime_blocks:
    color = 'lightblue' if regime == 0 else 'lightcoral'
    ax1.axvspan(df.index[start], df.index[end-1], alpha=0.3, color=color)

ax1.set_ylabel('Credit Spread (bps)', fontsize=12)
ax1.set_title('Credit Spread with HMM Regime Classification (2015-2025)', fontsize=14, fontweight='bold')
ax1.legend(loc='upper left')
ax1.grid(True, alpha=0.3)

# Add custom legend for regimes
from matplotlib.patches import Patch
legend_elements = [
    Patch(facecolor='lightblue', alpha=0.3, label='Regime 0: Low Stress'),
    Patch(facecolor='lightcoral', alpha=0.3, label='Regime 1: High Stress')
]
ax1.legend(handles=legend_elements, loc='upper right')

# VIX
ax2.plot(df.index, df['vix'], color='orange', linewidth=0.8)
ax2.axhline(y=20, color='red', linestyle='--', alpha=0.5, label='VIX=20 (stress threshold)')
ax2.set_ylabel('VIX', fontsize=12)
ax2.set_xlabel('Date', fontsize=12)
ax2.set_title('VIX (Implied Volatility)', fontsize=14, fontweight='bold')
ax2.grid(True, alpha=0.3)
ax2.legend()

plt.tight_layout()
plt.savefig('../results/figures/hmm_regime_validation.png', dpi=300, bbox_inches='tight')
plt.show()

print("✓ Saved figure: results/figures/hmm_regime_validation.png")

In [None]:
# Plot half-life by regime and horizon
fig, ax = plt.subplots(1, 1, figsize=(10, 6))

regime_0_data = conditional[conditional['regime'] == 0]
regime_1_data = conditional[conditional['regime'] == 1]

x = regime_0_data['horizon']
width = 0.35

bars1 = ax.bar(x - width/2, regime_0_data['half_life'], width, label='Regime 0 (Low Stress)', color='steelblue', alpha=0.8)
bars2 = ax.bar(x + width/2, regime_1_data['half_life'], width, label='Regime 1 (High Stress)', color='coral', alpha=0.8)

ax.set_xlabel('Forecast Horizon (days)', fontsize=12)
ax.set_ylabel('Half-life (days)', fontsize=12)
ax.set_title('Mean Reversion Half-Life by Regime and Horizon', fontsize=14, fontweight='bold')
ax.set_xticks(x)
ax.legend()
ax.grid(True, alpha=0.3, axis='y')

# Add value labels on bars
for bars in [bars1, bars2]:
    for bar in bars:
        height = bar.get_height()
        if not np.isinf(height) and height < 100:
            ax.text(bar.get_x() + bar.get_width()/2., height,
                    f'{height:.1f}',
                    ha='center', va='bottom', fontsize=9)

plt.tight_layout()
plt.savefig('../results/figures/half_life_by_regime.png', dpi=300, bbox_inches='tight')
plt.show()

print("✓ Saved figure: results/figures/half_life_by_regime.png")

## 9. Trading Strategy Implications

### Signal Construction
- **Entry trigger:** Spread deviation from rolling mean > threshold
- **Regime filter:** Different thresholds and position sizing by regime
- **Exit rules:** Immediate exit on regime change to High Stress

### Position Sizing
- **Regime 0 (Low Stress):** Lower leverage, longer holding period (41-day half-life)
- **Regime 1 (High Stress):** Higher turnover, tighter stops (6.6-day half-life, higher vol)

### Risk Management
- **Max drawdown target:** <15%
- **Transaction costs:** 2-5 bps per trade
- **Stop-loss:** 2x realized volatility by regime

### Expected Performance (Hypothesis)
- **Sharpe ratio:** 0.7-1.2
- **Annual return:** 5-10%
- **Win rate:** 55-65% (regime-dependent)

**Next step:** Implement and backtest trading logic on out-of-sample data.