# Full Pipeline: Advanced Regime Analysis

**Production-Ready HMM Regime Detection with Complete Toolkit**

In previous notebooks:
- **Notebook 1**: Proved log returns are mathematically correct
- **Notebook 2**: Learned HMM basics with minimal code

Now we'll use the **complete library features** for comprehensive analysis.

## What You'll Learn

1. Using the pipeline factory for quick setup
2. Proper regime labeling (threshold-based, not forced)
3. Advanced analysis with FinancialAnalysis class
4. Comprehensive visualization tools
5. Best practices for production use

---

In [None]:
import sys
from pathlib import Path
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# Add parent to path
sys.path.insert(0, str(Path().absolute().parent))

# Import hidden_regime using the full API
import hidden_regime as hr

# Plotting
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette(sns.color_palette())

print("Imports complete")
print(f"hidden-regime version: {hr.__version__}")

## 1. Quick Start: Pipeline Factory

The library provides factory functions for quick pipeline creation.

In [None]:
# Create a complete financial pipeline with one function call
ticker = 'NVDA'
n_states = 3

print(f"Creating financial pipeline for {ticker} with {n_states} states...")

pipeline = hr.create_financial_pipeline(
    ticker=ticker,
    n_states=n_states,
    include_report=False  # We'll do custom analysis
)

print("Pipeline created!")
print(f"\nPipeline components:")
print(f"  Data:        {type(pipeline.data).__name__}")
print(f"  Observation: {type(pipeline.observation).__name__}")
print(f"  Model:       {type(pipeline.model).__name__}")
print(f"  Analysis:    {type(pipeline.analysis).__name__}")

## 2. Run the Complete Pipeline

Execute all stages with one command.

In [3]:
# Run the pipeline
print("Running pipeline...\n")

# Pipeline.update() returns a report string, not the DataFrame
report = pipeline.update()

# Get the actual analysis DataFrame from component outputs
result = pipeline.component_outputs['analysis']

print("\n" + "="*70)
print("PIPELINE EXECUTION COMPLETE")
print("="*70)
print(f"\nAnalysis result shape: {result.shape}")
print(f"Date range: {result.index[0].date()} to {result.index[-1].date()}")
print(f"\nAvailable columns ({len(result.columns)}):")
for col in result.columns:
    print(f"  - {col}")

# Optionally show the report
print("\n" + "="*70)
print("PIPELINE REPORT")
print("="*70)
print(report)

Running pipeline...

Training on 500 observations (removed 0 NaN values)

PIPELINE EXECUTION COMPLETE

Analysis result shape: (500, 37)
Date range: 2023-10-13 to 2025-10-10

Available columns (37):
  - predicted_state
  - confidence
  - state_0_prob
  - state_1_prob
  - state_2_prob
  - regime_name
  - regime_type
  - expected_return
  - expected_return_pct
  - expected_volatility
  - expected_duration
  - regime_strength
  - days_in_regime
  - regime_episode
  - expected_remaining_duration
  - rolling_return
  - expected_rolling_return
  - return_vs_expected
  - rolling_volatility
  - volatility_vs_expected
  - volatility_regime
  - rsi_value
  - rsi_signal
  - rsi_agreement
  - macd_value
  - macd_signal
  - macd_agreement
  - bollinger_bands_value
  - bollinger_bands_signal
  - bollinger_bands_agreement
  - moving_average_value
  - moving_average_signal
  - moving_average_agreement
  - indicator_consensus
  - regime_consensus_agreement
  - position_signal
  - signal_strength

PIPELI

## 3. Threshold-Based Regime Mapping

The library automatically classifies regimes based on **actual characteristics**, not state index.

In [None]:
# Show regime classification
print("REGIME CLASSIFICATION (Data-Driven)")
print("="*80)

# Get unique regimes
regimes = result['regime_name'].unique()

for regime in sorted(regimes):
    regime_data = result[result['regime_name'] == regime]
    n_days = len(regime_data)
    pct_time = n_days / len(result) * 100
    
    # Get state number
    state_num = regime_data['predicted_state'].iloc[0]
    
    print(f"\n{regime} (State {state_num})")
    print("-"*80)
    print(f"  Occurrence: {n_days} days ({pct_time:.1f}% of time)")
    
    # Show actual observed returns in this regime
    if 'log_return' in regime_data.columns:
        obs_returns = regime_data['log_return'].dropna()
        if len(obs_returns) > 0:
            avg_log_return = obs_returns.mean()
            avg_pct_return = hr.log_return_to_percent_change(avg_log_return) * 100
            vol = obs_returns.std() * 100
            
            print(f"  Avg Daily Return: {avg_pct_return:+.3f}%")
            print(f"  Daily Volatility: {vol:.3f}%")
            print(f"  Ann. Return:      {avg_pct_return*252:+.1f}%")
            print(f"  Ann. Volatility:  {vol*np.sqrt(252):.1f}%")

print("\n" + "="*80)
print("Note: Regimes classified by ACTUAL return characteristics")
print("   NOT by arbitrarily sorting states and forcing Bear/Bull labels!")
print("="*80)

## 4. Current Regime Status

What regime are we in right now?

In [5]:
# Current regime analysis
current = result.iloc[-1]

print("CURRENT REGIME STATUS")
print("="*70)
print(f"\nDate: {result.index[-1].date()}")
print(f"Current Regime: {current['regime_name']}")
print(f"Confidence: {current['confidence']*100:.1f}%")

# Days in current regime
current_regime = current['regime_name']
days_in_regime = 1
for i in range(len(result)-2, -1, -1):
    if result.iloc[i]['regime_name'] == current_regime:
        days_in_regime += 1
    else:
        break

print(f"Days in Regime: {days_in_regime}")

# Show probabilities for all states
print(f"\nState Probabilities:")
for state in range(n_states):
    col_name = f'state_{state}_prob'
    if col_name in result.columns:
        prob = current[col_name] * 100
        # Find regime name for this state
        state_regime = result[result['predicted_state'] == state]['regime_name'].iloc[0] if len(result[result['predicted_state'] == state]) > 0 else f"State {state}"
        print(f"  {state_regime:<15} {prob:>6.1f}%")

print("\n" + "="*70)

CURRENT REGIME STATUS

Date: 2025-10-10
Current Regime: Sideways
Confidence: 85.7%
Days in Regime: 22

State Probabilities:
  Bear              13.3%
  Sideways          85.7%
  Bull               1.1%



## 5. Regime Duration Analysis

How long do regimes typically last?

In [6]:
# Calculate regime durations
def calculate_regime_durations(df):
    """Calculate duration of each regime period"""
    durations = []
    current_regime = df.iloc[0]['regime_name']
    current_duration = 1
    
    for i in range(1, len(df)):
        if df.iloc[i]['regime_name'] == current_regime:
            current_duration += 1
        else:
            durations.append({
                'regime': current_regime,
                'duration': current_duration,
                'start': df.index[i-current_duration],
                'end': df.index[i-1]
            })
            current_regime = df.iloc[i]['regime_name']
            current_duration = 1
    
    # Add final period
    durations.append({
        'regime': current_regime,
        'duration': current_duration,
        'start': df.index[len(df)-current_duration],
        'end': df.index[-1]
    })
    
    return pd.DataFrame(durations)

durations_df = calculate_regime_durations(result)

print("REGIME DURATION ANALYSIS")
print("="*70)
print(f"\nTotal regime periods: {len(durations_df)}")
print(f"\nDuration statistics by regime:")
print("-"*70)

for regime in sorted(durations_df['regime'].unique()):
    regime_durs = durations_df[durations_df['regime'] == regime]['duration']
    
    print(f"\n{regime}:")
    print(f"  Count:    {len(regime_durs)} periods")
    print(f"  Average:  {regime_durs.mean():.1f} days")
    print(f"  Median:   {regime_durs.median():.0f} days")
    print(f"  Min:      {regime_durs.min()} days")
    print(f"  Max:      {regime_durs.max()} days")

print("\n" + "="*70)

# Show longest regimes
print("\nLongest regime periods:")
print("-"*70)
longest = durations_df.nlargest(5, 'duration')
for idx, row in longest.iterrows():
    print(f"  {row['regime']:<15} {row['duration']:>3} days  ({row['start'].date()} to {row['end'].date()})")

REGIME DURATION ANALYSIS

Total regime periods: 61

Duration statistics by regime:
----------------------------------------------------------------------

Bear:
  Count:    9 periods
  Average:  14.3 days
  Median:   13 days
  Min:      2 days
  Max:      40 days

Bull:
  Count:    26 periods
  Average:  3.0 days
  Median:   3 days
  Min:      1 days
  Max:      7 days

Sideways:
  Count:    26 periods
  Average:  11.3 days
  Median:   7 days
  Min:      2 days
  Max:      40 days


Longest regime periods:
----------------------------------------------------------------------
  Sideways         40 days  (2023-11-07 to 2024-01-04)
  Bear             40 days  (2025-02-24 to 2025-04-21)
  Sideways         39 days  (2025-07-16 to 2025-09-09)
  Sideways         30 days  (2024-11-08 to 2024-12-20)
  Sideways         22 days  (2025-09-11 to 2025-10-10)


## 6. Visualization

Comprehensive regime visualization using library tools.

In [None]:
# Use the analysis component's built-in plotting
print("Generating regime analysis visualization...")

fig = pipeline.analysis.plot()
plt.show()

print("\nVisualization complete")

## 7. Custom Visualization

Create custom plots using visualization tools.

In [None]:
# Create custom multi-panel visualization
from hidden_regime.visualization import RegimePlotter, get_regime_colors

# Get raw data
raw_data = pipeline.data.get_all_data()

# Handle case-insensitive column names
close_col = None
for col in raw_data.columns:
    if col.lower() == 'close':
        close_col = col
        break

if close_col is None:
    raise ValueError("Could not find 'close' column in data")

fig, axes = plt.subplots(3, 1, figsize=(16, 12), sharex=True)

# Define colors by regime type
# Get unique regime names and create color mapping
unique_regimes = sorted(result['regime_name'].unique())
color_list = get_regime_colors(len(unique_regimes), color_scheme="colorblind_safe")
regime_colors = dict(zip(unique_regimes, color_list))

# 1. Price with regimes
ax = axes[0]
ax.plot(raw_data.index, raw_data[close_col], linewidth=1.5, color='black', zorder=2)
ax.set_ylabel('Price ($)', fontsize=12)
ax.set_title(f'{ticker} Price with Detected Regimes', fontsize=14, fontweight='bold')
ax.grid(True, alpha=0.3)

# Shade by regime
for i in range(len(result)):
    regime = result.iloc[i]['regime_name']
    color = regime_colors.get(regime, 'gray')
    ax.axvspan(result.index[i], 
               result.index[min(i+1, len(result)-1)], 
               alpha=0.2, color=color, zorder=1)

# Add legend
from matplotlib.patches import Patch
legend_elements = [Patch(facecolor=regime_colors.get(r, 'gray'), alpha=0.3, label=r) 
                   for r in unique_regimes]
ax.legend(handles=legend_elements, loc='upper left')

# 2. Confidence levels
ax = axes[1]
ax.plot(result.index, result['confidence']*100, linewidth=1.5, color='blue')
ax.axhline(y=80, color='green', linestyle='--', alpha=0.5, label='High confidence')
ax.axhline(y=60, color='orange', linestyle='--', alpha=0.5, label='Medium confidence')
ax.set_ylabel('Confidence (%)', fontsize=12)
ax.set_title('Regime Detection Confidence', fontsize=14, fontweight='bold')
ax.grid(True, alpha=0.3)
ax.legend()
ax.set_ylim(0, 105)

# 3. Regime timeline
ax = axes[2]
# Create numeric mapping for regimes
regime_map = {r: i for i, r in enumerate(unique_regimes)}
regime_numeric = result['regime_name'].map(regime_map)

for i in range(len(result)):
    regime = result.iloc[i]['regime_name']
    color = regime_colors.get(regime, 'gray')
    ax.bar(result.index[i], 1, width=1, color=color, edgecolor='none', alpha=0.7)

ax.set_ylabel('Regime', fontsize=12)
ax.set_xlabel('Date', fontsize=12)
ax.set_title('Regime Sequence Timeline', fontsize=14, fontweight='bold')
ax.set_ylim(0, 1.2)
ax.set_yticks([])
ax.grid(True, alpha=0.3, axis='x')

plt.tight_layout()
plt.show()

print("\nCustom visualization complete")

## 8. Model Comparison

Compare different numbers of states.

In [None]:
# Compare 2, 3, 4, 5 states
print("Comparing models with different numbers of states...")
print("="*70)

comparison_results = []

for n in [2, 3, 4, 5]:
    print(f"\nTraining {n}-state model...")
    
    test_pipeline = hr.create_financial_pipeline(
        ticker=ticker,
        n_states=n,
        include_report=False
    )
    
    # Pipeline.update() returns a report string
    _ = test_pipeline.update()
    
    # Get the actual DataFrame from component outputs
    test_result = test_pipeline.component_outputs['analysis']
    
    # Calculate metrics
    # Note: These are approximations since we don't have direct access to log-likelihood
    n_params = n**2 + 2*n
    n_obs = len(test_result)
    
    # Count regime switches
    switches = (test_result['predicted_state'].diff() != 0).sum()
    avg_duration = n_obs / switches if switches > 0 else n_obs
    
    # Average confidence
    avg_conf = test_result['confidence'].mean() * 100
    
    comparison_results.append({
        'n_states': n,
        'n_params': n_params,
        'switches': switches,
        'avg_duration': avg_duration,
        'avg_confidence': avg_conf
    })
    
    print(f"  Parameters:     {n_params}")
    print(f"  Regime switches: {switches}")
    print(f"  Avg duration:   {avg_duration:.1f} days")
    print(f"  Avg confidence: {avg_conf:.1f}%")

print("\n" + "="*70)

# Create comparison dataframe
comp_df = pd.DataFrame(comparison_results)

print("\nModel Comparison Summary:")
print(comp_df.to_string(index=False))

print("\nNote: Observations:")
print("   - More states = more parameters = higher risk of overfitting")
print("   - Very short durations suggest too many states")
print("   - Lower confidence suggests regime ambiguity")
print("   - 3-4 states often provide best balance")

## 9. Best Practices Summary

### ✓ DO

1. **Use log returns** for stationary observations
2. **Let data determine regimes** through threshold-based classification
3. **Check confidence levels** - low confidence indicates uncertainty
4. **Validate duration** - very short regimes suggest overfitting
5. **Compare models** - try 2-5 states and pick best balance
6. **Monitor regime changes** - large shifts may indicate market structure changes

### ⚠️ DON'T

1. **Don't use raw prices** - they're non-stationary
2. **Don't force Bull/Bear labels** by sorting state indices
3. **Don't ignore low confidence** - it's telling you something important
4. **Don't assume stationarity** - market regimes can evolve
5. **Don't overtrain** - more states isn't always better
6. **Don't use single random seed** - HMM has local optima

### Trading Applications

**Position Sizing**:
- Increase exposure in positive regimes with high confidence
- Reduce positions in negative regimes or low confidence periods

**Risk Management**:
- Scale risk based on regime volatility
- Add buffers during regime transitions (lower confidence)

**Strategy Selection**:
- Trend-following in directional regimes
- Mean-reversion in sideways regimes
- Risk-off in crisis regimes

---

## Conclusion

This notebook demonstrated the **complete hidden-regime toolkit**:

1. ✓ Quick pipeline setup with factory functions
2. ✓ Threshold-based regime classification (data-driven)
3. ✓ Current regime status and confidence
4. ✓ Duration analysis and statistics
5. ✓ Comprehensive visualization
6. ✓ Model comparison and selection
7. ✓ Best practices for production

**You now have production-ready regime detection!**

---

*For more examples and advanced features, see the documentation at hiddenregime.com*