# Performance Analysis Workflow

**Systematic Macro Credit Research — Step 5 of 5**

This notebook performs comprehensive post-backtest performance analysis using the evaluation/performance layer. It represents the final step in the systematic research workflow.

## Workflow Position

```
1. Data Download (01_data_download.ipynb)
   ↓
2. Signal Computation (02_signal_computation.ipynb)
   ↓
3. Signal Suitability Evaluation (03_suitability_evaluation.ipynb)
   ↓
4. Backtest Execution (04_backtest_execution.ipynb)
   ↓
5. Performance Analysis ← YOU ARE HERE
```

## Prerequisites

**This notebook only loads cached data. It does not generate any new data.**

Required files from previous steps:
- **P&L Results:** `data/processed/backtest_results_pnl.parquet` (from Step 4)
- **Position Results:** `data/processed/backtest_results_positions.parquet` (from Step 4)
- **Execution Metadata:** `logs/backtest_metadata.json` (from Step 4)

## What This Notebook Does

1. **Load Backtest Results** — Read P&L and positions from Step 4
2. **Configure Evaluation** — Set parameters for performance analysis
3. **Reconstruct BacktestResult Objects** — Prepare data for evaluation layer
4. **Run Performance Evaluations** — Analyze all signal-strategy pairs using `compute_all_metrics`
5. **Display Extended Metrics** — Show comprehensive 21-metric performance table
6. **Visualize Rolling Sharpe** — Plot temporal stability analysis
7. **Display Attribution** — Show return decomposition by direction/strength
8. **Visualize Attribution** — Plot directional P&L breakdown
9. **Generate Reports** — Create markdown reports for all pairs
10. **Register Evaluations** — Track metadata for reproducibility
11. **Persist Results** — Save evaluation metadata

## Outputs

- **Performance Reports:** `reports/performance/{signal}_{strategy}_{timestamp}.md`
- **Evaluation Registry:** `src/aponyx/evaluation/performance/performance_registry.json`
- **Execution Metadata:** `logs/performance_evaluation_metadata.json`

## Key Design Patterns

- **Cache-Only:** Loads all data from Step 4 (no backtest execution)
- **Consolidated Metrics:** Uses `compute_all_metrics` for all 21 performance statistics
- **Individual Analysis:** Each signal-strategy pair analyzed separately
- **Extended Metrics:** Rolling Sharpe, profit factor, tail ratio, consistency
- **Attribution:** Directional, signal strength, and win/loss decomposition
- **Comprehensive Reporting:** Full markdown reports for all pairs

---


## 1. Imports and Configuration

Import dependencies and verify configuration.

In [1]:
import logging
from datetime import datetime
from pathlib import Path

import pandas as pd
import plotly.graph_objects as go
from plotly.subplots import make_subplots

from aponyx.config import (
    DATA_DIR,
    LOGS_DIR,
    PERFORMANCE_REGISTRY_PATH,
    PERFORMANCE_REPORTS_DIR,
)
from aponyx.backtest import BacktestResult
from aponyx.evaluation.performance import (
    analyze_backtest_performance,
    PerformanceConfig,
    PerformanceRegistry,
    generate_performance_report,
    save_report,
)
from aponyx.persistence import load_parquet, load_json, save_json

# Configure logging for notebook
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
)
logger = logging.getLogger(__name__)

print("=" * 80)
print("PERFORMANCE ANALYSIS WORKFLOW — Step 5 of 5")
print("=" * 80)
print(f"\nConfiguration:")
print(f"  Data directory: {DATA_DIR}")
print(f"  Logs directory: {LOGS_DIR}")
print(f"  Reports directory: {PERFORMANCE_REPORTS_DIR}")
print(f"  Registry path: {PERFORMANCE_REGISTRY_PATH}")
print(f"\n✓ Imports complete")

PERFORMANCE ANALYSIS WORKFLOW — Step 5 of 5

Configuration:
  Data directory: C:\Users\ROG3003\PythonProjects\aponyx\data
  Logs directory: C:\Users\ROG3003\PythonProjects\aponyx\logs
  Reports directory: C:\Users\ROG3003\PythonProjects\aponyx\reports\performance
  Registry path: C:\Users\ROG3003\PythonProjects\aponyx\src\aponyx\evaluation\performance\performance_registry.json

✓ Imports complete


## 2. Load Backtest Results

Load P&L and positions from Step 4.

In [2]:
# Load P&L and positions
pnl_path = DATA_DIR / "processed" / "backtest_results_pnl.parquet"
positions_path = DATA_DIR / "processed" / "backtest_results_positions.parquet"
metadata_path = LOGS_DIR / "backtest_metadata.json"

if not pnl_path.exists():
    raise FileNotFoundError(
        f"P&L file not found: {pnl_path}\n"
        "Please run 04_backtest_execution.ipynb first."
    )

if not positions_path.exists():
    raise FileNotFoundError(
        f"Positions file not found: {positions_path}\n"
        "Please run 04_backtest_execution.ipynb first."
    )

# Load data
pnl_df = load_parquet(pnl_path)
positions_df = load_parquet(positions_path)
backtest_metadata = load_json(metadata_path)

print(f"\n{'='*80}")
print(f"BACKTEST RESULTS LOADED")
print(f"{'='*80}\n")
print(f"P&L path: {pnl_path}")
print(f"  Shape: {pnl_df.shape}")
print(f"  Index levels: {pnl_df.index.names}")
print(f"\nPositions path: {positions_path}")
print(f"  Shape: {positions_df.shape}")
print(f"  Index levels: {positions_df.index.names}")

# Extract signal-strategy pairs
signal_strategy_pairs = pnl_df.index.droplevel('date').unique()
signals = signal_strategy_pairs.get_level_values('_signal_id').unique().tolist()
strategies = signal_strategy_pairs.get_level_values('_strategy_id').unique().tolist()

print(f"\n{'─'*80}")
print(f"Signal-Strategy Pairs")
print(f"{'─'*80}\n")
print(f"Signals ({len(signals)}): {', '.join(signals)}")
print(f"Strategies ({len(strategies)}): {', '.join(strategies)}")
print(f"Total pairs: {len(signal_strategy_pairs)}")

print(f"\nDate range: {backtest_metadata['summary']['date_range']['start']} "
      f"to {backtest_metadata['summary']['date_range']['end']}")
print(f"Backtest execution: {backtest_metadata['timestamp']}")
print(f"\n✓ Data loaded successfully")

2025-11-13 22:25:43,777 - aponyx.persistence.parquet_io - INFO - Loading Parquet file: path=C:\Users\ROG3003\PythonProjects\aponyx\data\processed\backtest_results_pnl.parquet, columns=all
2025-11-13 22:25:43,831 - aponyx.persistence.parquet_io - INFO - Loaded 15460 rows, 4 columns from C:\Users\ROG3003\PythonProjects\aponyx\data\processed\backtest_results_pnl.parquet
2025-11-13 22:25:43,831 - aponyx.persistence.parquet_io - INFO - Loading Parquet file: path=C:\Users\ROG3003\PythonProjects\aponyx\data\processed\backtest_results_positions.parquet, columns=all
2025-11-13 22:25:43,849 - aponyx.persistence.parquet_io - INFO - Loaded 15460 rows, 4 columns from C:\Users\ROG3003\PythonProjects\aponyx\data\processed\backtest_results_positions.parquet
2025-11-13 22:25:43,849 - aponyx.persistence.json_io - INFO - Loading JSON from C:\Users\ROG3003\PythonProjects\aponyx\logs\backtest_metadata.json



BACKTEST RESULTS LOADED

P&L path: C:\Users\ROG3003\PythonProjects\aponyx\data\processed\backtest_results_pnl.parquet
  Shape: (15460, 4)
  Index levels: ['_signal_id', '_strategy_id', 'date']

Positions path: C:\Users\ROG3003\PythonProjects\aponyx\data\processed\backtest_results_positions.parquet
  Shape: (15460, 4)
  Index levels: ['_signal_id', '_strategy_id', 'date']

────────────────────────────────────────────────────────────────────────────────
Signal-Strategy Pairs
────────────────────────────────────────────────────────────────────────────────

Signals (3): cdx_etf_basis, cdx_vix_gap, spread_momentum
Strategies (4): conservative, balanced, aggressive, experimental
Total pairs: 12

Date range: 2020-11-11 00:00:00 to 2024-06-06 00:00:00
Backtest execution: 2025-11-13T22:25:19.655787

✓ Data loaded successfully


## 3. Configure Evaluation Parameters

Set parameters for performance analysis.

In [3]:
# Create performance evaluation configuration
config = PerformanceConfig(
    min_obs=252,  # Require at least 1 year of data
    n_subperiods=4,  # Quarterly stability analysis
    risk_free_rate=0.0,  # Zero risk-free rate
    rolling_window=63,  # 3-month rolling metrics
    report_format='markdown',  # Markdown reports
    attribution_quantiles=3,  # Terciles for signal strength
)

print(f"\n{'='*80}")
print(f"EVALUATION CONFIGURATION")
print(f"{'='*80}\n")
print(f"Minimum observations: {config.min_obs}")
print(f"Subperiods for stability: {config.n_subperiods} (quarterly)")
print(f"Rolling window: {config.rolling_window} days (3 months)")
print(f"Attribution quantiles: {config.attribution_quantiles} (terciles)")
print(f"Report format: {config.report_format}")
print(f"\n✓ Configuration set")


EVALUATION CONFIGURATION

Minimum observations: 252
Subperiods for stability: 4 (quarterly)
Rolling window: 63 days (3 months)
Attribution quantiles: 3 (terciles)
Report format: markdown

✓ Configuration set


## 4. Reconstruct BacktestResult Objects

Prepare data for evaluation layer from cached DataFrames.

In [4]:
def reconstruct_backtest_result(
    signal_name: str,
    strategy_name: str,
    pnl_df: pd.DataFrame,
    positions_df: pd.DataFrame,
    metadata: dict,
) -> BacktestResult:
    """
    Reconstruct BacktestResult from cached DataFrames.
    
    Parameters
    ----------
    signal_name : str
        Signal identifier.
    strategy_name : str
        Strategy identifier.
    pnl_df : pd.DataFrame
        MultiIndex DataFrame with P&L data.
    positions_df : pd.DataFrame
        MultiIndex DataFrame with position data.
    metadata : dict
        Backtest execution metadata.
    
    Returns
    -------
    BacktestResult
        Reconstructed backtest result object.
    """
    # Filter to specific signal-strategy pair
    pnl = pnl_df.loc[(signal_name, strategy_name)].copy()
    positions = positions_df.loc[(signal_name, strategy_name)].copy()
    
    # Drop the MultiIndex levels that were used for filtering
    pnl.index.name = 'date'
    positions.index.name = 'date'
    
    # The positions DataFrame now has the original columns: signal, position, days_held, spread
    # No need to drop anything - the MultiIndex columns were already removed by set_index
    
    # Create BacktestResult
    return BacktestResult(
        pnl=pnl,
        positions=positions,
        metadata={
            "signal_id": signal_name,
            "strategy_id": strategy_name,
            "config": metadata.get("configuration", {}),
            "backtest_timestamp": metadata.get("timestamp"),
        },
    )

print(f"\n{'='*80}")
print(f"RECONSTRUCTING BACKTEST RESULTS")
print(f"{'='*80}\n")

# Reconstruct BacktestResult for each signal-strategy pair
backtest_results = {}

for signal, strategy in signal_strategy_pairs:
    result = reconstruct_backtest_result(
        signal,
        strategy,
        pnl_df,
        positions_df,
        backtest_metadata,
    )
    backtest_results[(signal, strategy)] = result
    print(f"✓ Reconstructed: {signal} × {strategy}")

print(f"\n✓ Reconstructed {len(backtest_results)} BacktestResult objects")


RECONSTRUCTING BACKTEST RESULTS

✓ Reconstructed: cdx_etf_basis × conservative
✓ Reconstructed: cdx_etf_basis × balanced
✓ Reconstructed: cdx_etf_basis × aggressive
✓ Reconstructed: cdx_etf_basis × experimental
✓ Reconstructed: cdx_vix_gap × conservative
✓ Reconstructed: cdx_vix_gap × balanced
✓ Reconstructed: cdx_vix_gap × aggressive
✓ Reconstructed: cdx_vix_gap × experimental
✓ Reconstructed: spread_momentum × conservative
✓ Reconstructed: spread_momentum × balanced
✓ Reconstructed: spread_momentum × aggressive
✓ Reconstructed: spread_momentum × experimental

✓ Reconstructed 12 BacktestResult objects


  pnl = pnl_df.loc[(signal_name, strategy_name)].copy()
  positions = positions_df.loc[(signal_name, strategy_name)].copy()


## 5. Run Performance Evaluations

Analyze all signal-strategy pairs using `compute_all_metrics`.

In [5]:
print(f"\n{'='*80}")
print(f"RUNNING PERFORMANCE EVALUATIONS")
print(f"{'='*80}\n")

# Store performance results
performance_results = {}
evaluation_start = datetime.now()

# Run evaluations
total_evaluations = len(backtest_results)
current_eval = 0

for (signal_name, strategy_name), backtest_result in backtest_results.items():
    current_eval += 1
    print(f"[{current_eval}/{total_evaluations}] Evaluating: {signal_name} × {strategy_name}")
    
    # Run performance analysis
    perf_result = analyze_backtest_performance(backtest_result, config)
    performance_results[(signal_name, strategy_name)] = perf_result
    
    # Display summary (access dataclass fields directly)
    print(f"  Stability score: {perf_result.stability_score:.3f}")
    print(f"  Profit factor: {perf_result.metrics.profit_factor:.2f}")

evaluation_end = datetime.now()
evaluation_time = (evaluation_end - evaluation_start).total_seconds()

print(f"\n✓ Performance evaluations complete: {total_evaluations} pairs in {evaluation_time:.1f}s")
print(f"  Average: {evaluation_time/total_evaluations:.2f}s per evaluation")


2025-11-13 22:25:43,908 - aponyx.evaluation.performance.analyzer - INFO - Analyzing backtest performance: config=PerformanceConfig(min_obs=252, n_subperiods=4, risk_free_rate=0.0, rolling_window=63, report_format='markdown', attribution_quantiles=3)
2025-11-13 22:25:43,950 - aponyx.evaluation.performance.decomposition - INFO - Computing return attribution: n_quantiles=3



RUNNING PERFORMANCE EVALUATIONS

[1/12] Evaluating: cdx_etf_basis × conservative


  quantile_pnl = aligned_pnl.groupby(positioned["quantile"]).sum()
2025-11-13 22:25:43,959 - aponyx.evaluation.performance.decomposition - INFO - Attribution computed: long=-0.1%, wins=444.4%
2025-11-13 22:25:43,960 - aponyx.evaluation.performance.analyzer - INFO - Performance evaluation complete: stability=0.35, profit_factor=0.86
2025-11-13 22:25:43,960 - aponyx.evaluation.performance.analyzer - INFO - Analyzing backtest performance: config=PerformanceConfig(min_obs=252, n_subperiods=4, risk_free_rate=0.0, rolling_window=63, report_format='markdown', attribution_quantiles=3)
2025-11-13 22:25:44,000 - aponyx.evaluation.performance.decomposition - INFO - Computing return attribution: n_quantiles=3
  quantile_pnl = aligned_pnl.groupby(positioned["quantile"]).sum()
2025-11-13 22:25:44,005 - aponyx.evaluation.performance.decomposition - INFO - Attribution computed: long=112.5%, wins=795.1%
2025-11-13 22:25:44,006 - aponyx.evaluation.performance.analyzer - INFO - Performance evaluation com

  Stability score: 0.350
  Profit factor: 0.86
[2/12] Evaluating: cdx_etf_basis × balanced
  Stability score: 0.350
  Profit factor: 0.93
[3/12] Evaluating: cdx_etf_basis × aggressive


  quantile_pnl = aligned_pnl.groupby(positioned["quantile"]).sum()
2025-11-13 22:25:44,051 - aponyx.evaluation.performance.decomposition - INFO - Attribution computed: long=29.0%, wins=254.2%
2025-11-13 22:25:44,052 - aponyx.evaluation.performance.analyzer - INFO - Performance evaluation complete: stability=0.35, profit_factor=0.76
2025-11-13 22:25:44,052 - aponyx.evaluation.performance.analyzer - INFO - Analyzing backtest performance: config=PerformanceConfig(min_obs=252, n_subperiods=4, risk_free_rate=0.0, rolling_window=63, report_format='markdown', attribution_quantiles=3)
2025-11-13 22:25:44,090 - aponyx.evaluation.performance.decomposition - INFO - Computing return attribution: n_quantiles=3


  Stability score: 0.350
  Profit factor: 0.76
[4/12] Evaluating: cdx_etf_basis × experimental


  quantile_pnl = aligned_pnl.groupby(positioned["quantile"]).sum()
2025-11-13 22:25:44,095 - aponyx.evaluation.performance.decomposition - INFO - Attribution computed: long=40.1%, wins=455.0%
2025-11-13 22:25:44,095 - aponyx.evaluation.performance.analyzer - INFO - Performance evaluation complete: stability=0.80, profit_factor=1.30
2025-11-13 22:25:44,096 - aponyx.evaluation.performance.analyzer - INFO - Analyzing backtest performance: config=PerformanceConfig(min_obs=252, n_subperiods=4, risk_free_rate=0.0, rolling_window=63, report_format='markdown', attribution_quantiles=3)
2025-11-13 22:25:44,133 - aponyx.evaluation.performance.decomposition - INFO - Computing return attribution: n_quantiles=3
  quantile_pnl = aligned_pnl.groupby(positioned["quantile"]).sum()
2025-11-13 22:25:44,136 - aponyx.evaluation.performance.decomposition - INFO - Attribution computed: long=34.5%, wins=247.1%
2025-11-13 22:25:44,137 - aponyx.evaluation.performance.analyzer - INFO - Performance evaluation comp

  Stability score: 0.800
  Profit factor: 1.30
[5/12] Evaluating: cdx_vix_gap × conservative
  Stability score: 0.900
  Profit factor: 1.53
[6/12] Evaluating: cdx_vix_gap × balanced


  quantile_pnl = aligned_pnl.groupby(positioned["quantile"]).sum()
2025-11-13 22:25:44,177 - aponyx.evaluation.performance.decomposition - INFO - Attribution computed: long=54.2%, wins=35.4%
2025-11-13 22:25:44,177 - aponyx.evaluation.performance.analyzer - INFO - Performance evaluation complete: stability=0.00, profit_factor=0.49
2025-11-13 22:25:44,177 - aponyx.evaluation.performance.analyzer - INFO - Analyzing backtest performance: config=PerformanceConfig(min_obs=252, n_subperiods=4, risk_free_rate=0.0, rolling_window=63, report_format='markdown', attribution_quantiles=3)
2025-11-13 22:25:44,213 - aponyx.evaluation.performance.decomposition - INFO - Computing return attribution: n_quantiles=3
  quantile_pnl = aligned_pnl.groupby(positioned["quantile"]).sum()
2025-11-13 22:25:44,213 - aponyx.evaluation.performance.decomposition - INFO - Attribution computed: long=35.0%, wins=83.1%
2025-11-13 22:25:44,213 - aponyx.evaluation.performance.analyzer - INFO - Performance evaluation comple

  Stability score: 0.000
  Profit factor: 0.49
[7/12] Evaluating: cdx_vix_gap × aggressive
  Stability score: 0.500
  Profit factor: 0.61
[8/12] Evaluating: cdx_vix_gap × experimental


  quantile_pnl = aligned_pnl.groupby(positioned["quantile"]).sum()
2025-11-13 22:25:44,253 - aponyx.evaluation.performance.decomposition - INFO - Attribution computed: long=41.9%, wins=278.8%
2025-11-13 22:25:44,253 - aponyx.evaluation.performance.analyzer - INFO - Performance evaluation complete: stability=0.50, profit_factor=0.81
2025-11-13 22:25:44,253 - aponyx.evaluation.performance.analyzer - INFO - Analyzing backtest performance: config=PerformanceConfig(min_obs=252, n_subperiods=4, risk_free_rate=0.0, rolling_window=63, report_format='markdown', attribution_quantiles=3)


  Stability score: 0.500
  Profit factor: 0.81
[9/12] Evaluating: spread_momentum × conservative


2025-11-13 22:25:44,297 - aponyx.evaluation.performance.decomposition - INFO - Computing return attribution: n_quantiles=3
  quantile_pnl = aligned_pnl.groupby(positioned["quantile"]).sum()
2025-11-13 22:25:44,301 - aponyx.evaluation.performance.decomposition - INFO - Attribution computed: long=37.8%, wins=264.2%
2025-11-13 22:25:44,301 - aponyx.evaluation.performance.analyzer - INFO - Performance evaluation complete: stability=0.60, profit_factor=1.16
2025-11-13 22:25:44,301 - aponyx.evaluation.performance.analyzer - INFO - Analyzing backtest performance: config=PerformanceConfig(min_obs=252, n_subperiods=4, risk_free_rate=0.0, rolling_window=63, report_format='markdown', attribution_quantiles=3)
2025-11-13 22:25:44,323 - aponyx.evaluation.performance.decomposition - INFO - Computing return attribution: n_quantiles=3
  quantile_pnl = aligned_pnl.groupby(positioned["quantile"]).sum()
2025-11-13 22:25:44,338 - aponyx.evaluation.performance.decomposition - INFO - Attribution computed: lo

  Stability score: 0.600
  Profit factor: 1.16
[10/12] Evaluating: spread_momentum × balanced
  Stability score: 0.100
  Profit factor: 0.75
[11/12] Evaluating: spread_momentum × aggressive


  quantile_pnl = aligned_pnl.groupby(positioned["quantile"]).sum()
2025-11-13 22:25:44,370 - aponyx.evaluation.performance.decomposition - INFO - Attribution computed: long=68.0%, wins=1391.6%
2025-11-13 22:25:44,370 - aponyx.evaluation.performance.analyzer - INFO - Performance evaluation complete: stability=0.35, profit_factor=0.96
2025-11-13 22:25:44,370 - aponyx.evaluation.performance.analyzer - INFO - Analyzing backtest performance: config=PerformanceConfig(min_obs=252, n_subperiods=4, risk_free_rate=0.0, rolling_window=63, report_format='markdown', attribution_quantiles=3)
2025-11-13 22:25:44,418 - aponyx.evaluation.performance.decomposition - INFO - Computing return attribution: n_quantiles=3
  quantile_pnl = aligned_pnl.groupby(positioned["quantile"]).sum()
2025-11-13 22:25:44,423 - aponyx.evaluation.performance.decomposition - INFO - Attribution computed: long=85.1%, wins=204.2%
2025-11-13 22:25:44,423 - aponyx.evaluation.performance.analyzer - INFO - Performance evaluation com

  Stability score: 0.350
  Profit factor: 0.96
[12/12] Evaluating: spread_momentum × experimental
  Stability score: 0.100
  Profit factor: 0.64

✓ Performance evaluations complete: 12 pairs in 0.5s
  Average: 0.04s per evaluation


## 6. Display Extended Metrics Summary

Show comprehensive 21-metric performance table.

In [6]:
print(f"\n{'='*80}")
print(f"EXTENDED METRICS SUMMARY")
print(f"{'='*80}\n")

# Create summary table
metrics_summary = []

for (signal_name, strategy_name), perf_result in performance_results.items():
    metrics = perf_result.metrics
    metrics_summary.append({
        "Signal": signal_name,
        "Strategy": strategy_name,
        "Stability": f"{perf_result.stability_score:.3f}",
        "Profit Factor": f"{metrics.profit_factor:.2f}",
        "Tail Ratio": f"{metrics.tail_ratio:.2f}",
        "Rolling Sharpe μ": f"{metrics.rolling_sharpe_mean:.2f}",
        "Rolling Sharpe σ": f"{metrics.rolling_sharpe_std:.2f}",
        "Consistency": f"{metrics.consistency_score:.1%}",
        "Recovery Days": f"{metrics.avg_recovery_days:.0f}",
        "DD Count": f"{int(metrics.n_drawdowns)}",
    })

metrics_df = pd.DataFrame(metrics_summary)

# Sort by stability score (descending)
metrics_df_sorted = metrics_df.copy()
metrics_df_sorted['_stability'] = metrics_df_sorted['Stability'].astype(float)
metrics_df_sorted = metrics_df_sorted.sort_values('_stability', ascending=False)
metrics_df_sorted = metrics_df_sorted.drop(columns=['_stability'])

print(metrics_df_sorted.to_markdown(index=False))

print(f"\n{'─'*80}")
print(f"Metric Definitions:")
print(f"{'─'*80}")
print(f"  Stability: Overall consistency score (0-1)")
print(f"  Profit Factor: Gross wins / gross losses")
print(f"  Tail Ratio: 95th percentile / 5th percentile return")
print(f"  Rolling Sharpe: Mean/std of 3-month rolling Sharpe ratios")
print(f"  Consistency: % of 3-week windows with positive returns")
print(f"  Recovery Days: Average time to recover from drawdowns")
print(f"  DD Count: Number of distinct drawdown periods")



EXTENDED METRICS SUMMARY

| Signal          | Strategy     |   Stability |   Profit Factor |   Tail Ratio |   Rolling Sharpe μ |   Rolling Sharpe σ | Consistency   |   Recovery Days |   DD Count |
|:----------------|:-------------|------------:|----------------:|-------------:|-------------------:|-------------------:|:--------------|----------------:|-----------:|
| cdx_vix_gap     | conservative |        0.9  |            1.53 |         1.47 |               1.3  |               3.33 | 48.4%         |               6 |         32 |
| cdx_etf_basis   | experimental |        0.8  |            1.3  |         1.15 |               1.25 |               7.89 | 55.4%         |              18 |         36 |
| spread_momentum | conservative |        0.6  |            1.16 |         1.07 |               0.22 |               5.12 | 47.5%         |              93 |         13 |
| cdx_vix_gap     | experimental |        0.5  |            0.81 |         0.94 |              -1.13 |               6

## 7. Visualize Rolling Sharpe Analysis

Plot temporal stability for top performers.

In [7]:
import numpy as np

print(f"\n{'='*80}")
print(f"ROLLING SHARPE ANALYSIS — Top 5 by Stability")
print(f"{'='*80}\n")

# Get top 5 pairs by stability score
top_5_pairs = sorted(
    performance_results.items(),
    key=lambda x: x[1].stability_score,
    reverse=True
)[:5]

for (signal_name, strategy_name), perf_result in top_5_pairs:
    # Get backtest result for this pair
    backtest_result = backtest_results[(signal_name, strategy_name)]
    pnl_series = backtest_result.pnl['net_pnl']
    
    # Compute rolling Sharpe
    rolling_sharpe = (
        pnl_series.rolling(window=config.rolling_window, min_periods=config.rolling_window // 2)
        .mean() / 
        pnl_series.rolling(window=config.rolling_window, min_periods=config.rolling_window // 2)
        .std()
    ) * np.sqrt(252)
    
    # Create plot
    fig = go.Figure()
    
    fig.add_trace(go.Scatter(
        x=rolling_sharpe.index,
        y=rolling_sharpe,
        mode='lines',
        name='Rolling Sharpe',
        line=dict(color='blue', width=2),
    ))
    
    # Add mean line (access dataclass field)
    mean_sharpe = perf_result.metrics.rolling_sharpe_mean
    fig.add_hline(
        y=mean_sharpe,
        line_dash="dash",
        line_color="green",
        annotation_text=f"Mean: {mean_sharpe:.2f}",
        annotation_position="right"
    )
    
    # Add zero line
    fig.add_hline(y=0, line_dash="dot", line_color="gray")
    
    # Update layout
    fig.update_layout(
        title=f"Rolling Sharpe Ratio: {signal_name} × {strategy_name} (Stability: {perf_result.stability_score:.3f})",
        xaxis_title="Date",
        yaxis_title=f"Rolling {config.rolling_window}-Day Sharpe Ratio",
        hovermode='x unified',
        height=400,
        showlegend=False,
    )
    
    fig.show()
    print(f"✓ Plotted: {signal_name} × {strategy_name}\n")



ROLLING SHARPE ANALYSIS — Top 5 by Stability



✓ Plotted: cdx_vix_gap × conservative



✓ Plotted: cdx_etf_basis × experimental



✓ Plotted: spread_momentum × conservative



✓ Plotted: cdx_vix_gap × aggressive



✓ Plotted: cdx_vix_gap × experimental



## 8. Display Attribution Analysis

Show return decomposition by direction, signal strength, and win/loss.

In [8]:
print(f"\n{'='*80}")
print(f"ATTRIBUTION ANALYSIS — Top 5 by Stability")
print(f"{'='*80}\n")

for (signal_name, strategy_name), perf_result in top_5_pairs:
    print(f"\n{'─'*80}")
    print(f"{signal_name} × {strategy_name}")
    print(f"{'─'*80}\n")
    
    attribution = perf_result.attribution
    
    # Directional attribution
    dir_attr = attribution['direction']
    print("Directional Attribution:")
    print(f"  Long P&L:  ${dir_attr['long_pnl']:>12,.0f} ({dir_attr['long_pct']:>6.1%})")
    print(f"  Short P&L: ${dir_attr['short_pnl']:>12,.0f} ({dir_attr['short_pct']:>6.1%})")
    
    # Signal strength attribution
    sig_attr = attribution['signal_strength']
    print(f"\nSignal Strength Attribution (Terciles):")
    print(f"  Q1 (Weak):   ${sig_attr['q1_pnl']:>12,.0f} ({sig_attr['q1_pct']:>6.1%})")
    print(f"  Q2 (Medium): ${sig_attr['q2_pnl']:>12,.0f} ({sig_attr['q2_pct']:>6.1%})")
    print(f"  Q3 (Strong): ${sig_attr['q3_pnl']:>12,.0f} ({sig_attr['q3_pct']:>6.1%})")
    
    # Win/loss decomposition
    wl_attr = attribution['win_loss']
    print(f"\nWin/Loss Decomposition:")
    print(f"  Wins:   ${wl_attr['gross_wins']:>12,.0f} ({wl_attr['win_contribution']:>6.1%})")
    print(f"  Losses: ${wl_attr['gross_losses']:>12,.0f} ({wl_attr['loss_contribution']:>6.1%})")


ATTRIBUTION ANALYSIS — Top 5 by Stability


────────────────────────────────────────────────────────────────────────────────
cdx_vix_gap × conservative
────────────────────────────────────────────────────────────────────────────────

Directional Attribution:
  Long P&L:  $  -1,982,803 ( 34.5%)
  Short P&L: $  -3,762,440 ( 65.5%)

Signal Strength Attribution (Terciles):
  Q1 (Weak):   $   8,423,511 (-146.6%)
  Q2 (Medium): $  -3,583,873 ( 62.4%)
  Q3 (Strong): $ -10,584,881 (184.2%)

Win/Loss Decomposition:
  Wins:   $  14,195,065 (247.1%)
  Losses: $ -19,940,308 (347.1%)

────────────────────────────────────────────────────────────────────────────────
cdx_etf_basis × experimental
────────────────────────────────────────────────────────────────────────────────

Directional Attribution:
  Long P&L:  $  23,987,077 ( 40.1%)
  Short P&L: $  35,809,389 ( 59.9%)

Signal Strength Attribution (Terciles):
  Q1 (Weak):   $  41,406,397 ( 69.2%)
  Q2 (Medium): $   5,052,669 (  8.4%)
  Q3 (Strong):

## 9. Visualize Attribution Breakdown

Plot directional P&L breakdown charts.

In [9]:
print(f"\n{'='*80}")
print(f"DIRECTIONAL ATTRIBUTION VISUALIZATION")
print(f"{'='*80}\n")

# Prepare data for visualization
pairs_list = []
long_pnl_list = []
short_pnl_list = []

for (signal_name, strategy_name), perf_result in top_5_pairs:
    pair_label = f"{signal_name}<br>{strategy_name}"
    pairs_list.append(pair_label)
    
    dir_attr = perf_result.attribution['direction']
    long_pnl_list.append(dir_attr['long_pnl'])
    short_pnl_list.append(dir_attr['short_pnl'])

# Create stacked bar chart
fig = go.Figure()

fig.add_trace(go.Bar(
    name='Long P&L',
    x=pairs_list,
    y=long_pnl_list,
    marker_color='green',
))

fig.add_trace(go.Bar(
    name='Short P&L',
    x=pairs_list,
    y=short_pnl_list,
    marker_color='red',
))

fig.update_layout(
    title="Directional Attribution: Long vs Short P&L (Top 5 by Stability)",
    xaxis_title="Signal × Strategy",
    yaxis_title="P&L ($)",
    barmode='relative',
    height=500,
    hovermode='x unified',
)

fig.show()
print(f"\n✓ Directional attribution chart displayed")


DIRECTIONAL ATTRIBUTION VISUALIZATION




✓ Directional attribution chart displayed


## 10. Generate Performance Reports

Create comprehensive markdown reports for all pairs.

In [10]:
print(f"\n{'='*80}")
print(f"GENERATING PERFORMANCE REPORTS")
print(f"{'='*80}\n")

# Ensure reports directory exists
PERFORMANCE_REPORTS_DIR.mkdir(parents=True, exist_ok=True)

# Store report paths
report_paths = {}

for (signal_name, strategy_name), perf_result in performance_results.items():
    print(f"Generating report: {signal_name} × {strategy_name}")
    
    # Generate markdown report
    report = generate_performance_report(
        perf_result,
        signal_name,
        strategy_name,
    )
    
    # Save report
    report_path = save_report(
        report,
        signal_name,
        strategy_name,
        PERFORMANCE_REPORTS_DIR,
    )
    
    report_paths[(signal_name, strategy_name)] = report_path
    print(f"  ✓ Saved: {report_path.name}")

print(f"\n✓ Generated {len(report_paths)} performance reports")
print(f"  Location: {PERFORMANCE_REPORTS_DIR}")

2025-11-13 22:25:44,754 - aponyx.evaluation.performance.report - INFO - Saved performance report to C:\Users\ROG3003\PythonProjects\aponyx\reports\performance\cdx_etf_basis_conservative_20251113_222544.md
2025-11-13 22:25:44,755 - aponyx.evaluation.performance.report - INFO - Saved performance report to C:\Users\ROG3003\PythonProjects\aponyx\reports\performance\cdx_etf_basis_balanced_20251113_222544.md
2025-11-13 22:25:44,757 - aponyx.evaluation.performance.report - INFO - Saved performance report to C:\Users\ROG3003\PythonProjects\aponyx\reports\performance\cdx_etf_basis_aggressive_20251113_222544.md
2025-11-13 22:25:44,758 - aponyx.evaluation.performance.report - INFO - Saved performance report to C:\Users\ROG3003\PythonProjects\aponyx\reports\performance\cdx_etf_basis_experimental_20251113_222544.md
2025-11-13 22:25:44,759 - aponyx.evaluation.performance.report - INFO - Saved performance report to C:\Users\ROG3003\PythonProjects\aponyx\reports\performance\cdx_vix_gap_conservative_20


GENERATING PERFORMANCE REPORTS

Generating report: cdx_etf_basis × conservative
  ✓ Saved: cdx_etf_basis_conservative_20251113_222544.md
Generating report: cdx_etf_basis × balanced
  ✓ Saved: cdx_etf_basis_balanced_20251113_222544.md
Generating report: cdx_etf_basis × aggressive
  ✓ Saved: cdx_etf_basis_aggressive_20251113_222544.md
Generating report: cdx_etf_basis × experimental
  ✓ Saved: cdx_etf_basis_experimental_20251113_222544.md
Generating report: cdx_vix_gap × conservative
  ✓ Saved: cdx_vix_gap_conservative_20251113_222544.md
Generating report: cdx_vix_gap × balanced
  ✓ Saved: cdx_vix_gap_balanced_20251113_222544.md
Generating report: cdx_vix_gap × aggressive
  ✓ Saved: cdx_vix_gap_aggressive_20251113_222544.md
Generating report: cdx_vix_gap × experimental
  ✓ Saved: cdx_vix_gap_experimental_20251113_222544.md
Generating report: spread_momentum × conservative
  ✓ Saved: spread_momentum_conservative_20251113_222544.md
Generating report: spread_momentum × balanced
  ✓ Saved: s

## 11. Register Evaluations

Track evaluation metadata in performance registry.

In [11]:
print(f"\n{'='*80}")
print(f"REGISTERING EVALUATIONS")
print(f"{'='*80}\n")

# Initialize performance registry
registry = PerformanceRegistry(PERFORMANCE_REGISTRY_PATH)

# Register all evaluations
evaluation_ids = []

for (signal_name, strategy_name), perf_result in performance_results.items():
    report_path = report_paths[(signal_name, strategy_name)]
    
    # Register evaluation
    eval_id = registry.register_evaluation(
        perf_result,
        signal_name,
        strategy_name,
        report_path,
    )
    
    evaluation_ids.append(eval_id)
    print(f"✓ Registered: {eval_id}")

print(f"\n✓ Registered {len(evaluation_ids)} evaluations")
print(f"  Registry: {PERFORMANCE_REGISTRY_PATH}")

# Display registry summary
all_evaluations = registry.list_evaluations()
print(f"\nRegistry Summary:")
print(f"  Total evaluations: {len(all_evaluations)}")
print(f"  Unique signals: {len(set(e.split('_')[0] for e in all_evaluations))}")
print(f"  Unique strategies: {len(set('_'.join(e.split('_')[1:-1]) for e in all_evaluations))}")

2025-11-13 22:25:44,775 - aponyx.persistence.json_io - INFO - Loading JSON from C:\Users\ROG3003\PythonProjects\aponyx\src\aponyx\evaluation\performance\performance_registry.json
2025-11-13 22:25:44,778 - aponyx.evaluation.performance.registry - INFO - Loaded existing performance registry: 36 evaluations
2025-11-13 22:25:44,781 - aponyx.persistence.json_io - INFO - Saving JSON to C:\Users\ROG3003\PythonProjects\aponyx\src\aponyx\evaluation\performance\performance_registry.json (37 top-level keys)
2025-11-13 22:25:44,793 - aponyx.evaluation.performance.registry - INFO - Registered performance evaluation: cdx_etf_basis_conservative_20251113_222544 (stability=0.350, sharpe=-0.46)
2025-11-13 22:25:44,796 - aponyx.persistence.json_io - INFO - Saving JSON to C:\Users\ROG3003\PythonProjects\aponyx\src\aponyx\evaluation\performance\performance_registry.json (38 top-level keys)
2025-11-13 22:25:44,811 - aponyx.evaluation.performance.registry - INFO - Registered performance evaluation: cdx_etf_b


REGISTERING EVALUATIONS

✓ Registered: cdx_etf_basis_conservative_20251113_222544
✓ Registered: cdx_etf_basis_balanced_20251113_222544


2025-11-13 22:25:44,813 - aponyx.persistence.json_io - INFO - Saving JSON to C:\Users\ROG3003\PythonProjects\aponyx\src\aponyx\evaluation\performance\performance_registry.json (39 top-level keys)
2025-11-13 22:25:44,827 - aponyx.evaluation.performance.registry - INFO - Registered performance evaluation: cdx_etf_basis_aggressive_20251113_222544 (stability=0.350, sharpe=-1.31)


✓ Registered: cdx_etf_basis_aggressive_20251113_222544


2025-11-13 22:25:44,829 - aponyx.persistence.json_io - INFO - Saving JSON to C:\Users\ROG3003\PythonProjects\aponyx\src\aponyx\evaluation\performance\performance_registry.json (40 top-level keys)
2025-11-13 22:25:44,839 - aponyx.evaluation.performance.registry - INFO - Registered performance evaluation: cdx_etf_basis_experimental_20251113_222544 (stability=0.800, sharpe=1.46)
2025-11-13 22:25:44,839 - aponyx.persistence.json_io - INFO - Saving JSON to C:\Users\ROG3003\PythonProjects\aponyx\src\aponyx\evaluation\performance\performance_registry.json (41 top-level keys)


✓ Registered: cdx_etf_basis_experimental_20251113_222544


2025-11-13 22:25:44,855 - aponyx.evaluation.performance.registry - INFO - Registered performance evaluation: cdx_vix_gap_conservative_20251113_222544 (stability=0.900, sharpe=1.05)
2025-11-13 22:25:44,857 - aponyx.persistence.json_io - INFO - Saving JSON to C:\Users\ROG3003\PythonProjects\aponyx\src\aponyx\evaluation\performance\performance_registry.json (42 top-level keys)
2025-11-13 22:25:44,871 - aponyx.evaluation.performance.registry - INFO - Registered performance evaluation: cdx_vix_gap_balanced_20251113_222544 (stability=0.000, sharpe=-2.61)
2025-11-13 22:25:44,872 - aponyx.persistence.json_io - INFO - Saving JSON to C:\Users\ROG3003\PythonProjects\aponyx\src\aponyx\evaluation\performance\performance_registry.json (43 top-level keys)


✓ Registered: cdx_vix_gap_conservative_20251113_222544
✓ Registered: cdx_vix_gap_balanced_20251113_222544


2025-11-13 22:25:44,881 - aponyx.evaluation.performance.registry - INFO - Registered performance evaluation: cdx_vix_gap_aggressive_20251113_222544 (stability=0.500, sharpe=-2.35)
2025-11-13 22:25:44,887 - aponyx.persistence.json_io - INFO - Saving JSON to C:\Users\ROG3003\PythonProjects\aponyx\src\aponyx\evaluation\performance\performance_registry.json (44 top-level keys)


✓ Registered: cdx_vix_gap_aggressive_20251113_222544


2025-11-13 22:25:44,900 - aponyx.evaluation.performance.registry - INFO - Registered performance evaluation: cdx_vix_gap_experimental_20251113_222544 (stability=0.500, sharpe=-1.15)
2025-11-13 22:25:44,903 - aponyx.persistence.json_io - INFO - Saving JSON to C:\Users\ROG3003\PythonProjects\aponyx\src\aponyx\evaluation\performance\performance_registry.json (45 top-level keys)


✓ Registered: cdx_vix_gap_experimental_20251113_222544


2025-11-13 22:25:44,922 - aponyx.evaluation.performance.registry - INFO - Registered performance evaluation: spread_momentum_conservative_20251113_222544 (stability=0.600, sharpe=0.63)


✓ Registered: spread_momentum_conservative_20251113_222544


2025-11-13 22:25:44,924 - aponyx.persistence.json_io - INFO - Saving JSON to C:\Users\ROG3003\PythonProjects\aponyx\src\aponyx\evaluation\performance\performance_registry.json (46 top-level keys)
2025-11-13 22:25:44,943 - aponyx.evaluation.performance.registry - INFO - Registered performance evaluation: spread_momentum_balanced_20251113_222544 (stability=0.100, sharpe=-1.40)
2025-11-13 22:25:44,944 - aponyx.persistence.json_io - INFO - Saving JSON to C:\Users\ROG3003\PythonProjects\aponyx\src\aponyx\evaluation\performance\performance_registry.json (47 top-level keys)
2025-11-13 22:25:44,957 - aponyx.evaluation.performance.registry - INFO - Registered performance evaluation: spread_momentum_aggressive_20251113_222544 (stability=0.350, sharpe=-0.22)


✓ Registered: spread_momentum_balanced_20251113_222544


2025-11-13 22:25:44,957 - aponyx.persistence.json_io - INFO - Saving JSON to C:\Users\ROG3003\PythonProjects\aponyx\src\aponyx\evaluation\performance\performance_registry.json (48 top-level keys)
2025-11-13 22:25:44,974 - aponyx.evaluation.performance.registry - INFO - Registered performance evaluation: spread_momentum_experimental_20251113_222544 (stability=0.100, sharpe=-2.58)


✓ Registered: spread_momentum_aggressive_20251113_222544
✓ Registered: spread_momentum_experimental_20251113_222544

✓ Registered 12 evaluations
  Registry: C:\Users\ROG3003\PythonProjects\aponyx\src\aponyx\evaluation\performance\performance_registry.json

Registry Summary:
  Total evaluations: 48
  Unique signals: 2
  Unique strategies: 24


## 12. Persist Evaluation Metadata

Save evaluation metadata for reproducibility.

In [12]:
print(f"\n{'='*80}")
print(f"PERSISTING EVALUATION METADATA")
print(f"{'='*80}\n")

# Create evaluation metadata
evaluation_metadata = {
    "timestamp": datetime.now().isoformat(),
    "execution_time_seconds": evaluation_time,
    "configuration": {
        "min_obs": config.min_obs,
        "n_subperiods": config.n_subperiods,
        "risk_free_rate": config.risk_free_rate,
        "rolling_window": config.rolling_window,
        "attribution_quantiles": config.attribution_quantiles,
    },
    "summary": {
        "total_evaluations": len(performance_results),
        "signals": signals,
        "strategies": strategies,
        "top_performer": {
            "signal": top_5_pairs[0][0][0],
            "strategy": top_5_pairs[0][0][1],
            "stability_score": float(top_5_pairs[0][1].stability_score),
            "profit_factor": float(top_5_pairs[0][1].metrics.profit_factor),
        },
        "reports_directory": str(PERFORMANCE_REPORTS_DIR),
        "registry_path": str(PERFORMANCE_REGISTRY_PATH),
    },
    "backtest_reference": {
        "timestamp": backtest_metadata["timestamp"],
        "date_range": backtest_metadata["summary"]["date_range"],
    },
}

# Save metadata
metadata_path = LOGS_DIR / "performance_evaluation_metadata.json"
save_json(evaluation_metadata, metadata_path)

print(f"✓ Saved evaluation metadata: {metadata_path}")
print(f"\nMetadata summary:")
print(f"  Evaluations: {evaluation_metadata['summary']['total_evaluations']}")
print(f"  Top performer: {evaluation_metadata['summary']['top_performer']['signal']} × "
      f"{evaluation_metadata['summary']['top_performer']['strategy']}")
print(f"  Stability: {evaluation_metadata['summary']['top_performer']['stability_score']:.3f}")
print(f"  Profit factor: {evaluation_metadata['summary']['top_performer']['profit_factor']:.2f}")


2025-11-13 22:25:44,987 - aponyx.persistence.json_io - INFO - Saving JSON to C:\Users\ROG3003\PythonProjects\aponyx\logs\performance_evaluation_metadata.json (5 top-level keys)



PERSISTING EVALUATION METADATA

✓ Saved evaluation metadata: C:\Users\ROG3003\PythonProjects\aponyx\logs\performance_evaluation_metadata.json

Metadata summary:
  Evaluations: 12
  Top performer: cdx_vix_gap × conservative
  Stability: 0.900
  Profit factor: 1.53


---

## Workflow Complete

Performance analysis successful! Comprehensive post-backtest evaluation has been completed for all signal-strategy pairs using consolidated metrics computation.

### What Was Accomplished

✓ **Backtest Results Loaded** — P&L and positions from Step 4  
✓ **Evaluation Configured** — Performance analysis parameters set  
✓ **BacktestResult Objects Reconstructed** — Data prepared for evaluation layer  
✓ **Performance Evaluations Executed** — All pairs analyzed using `compute_all_metrics`  
✓ **Extended Metrics Displayed** — All 21 comprehensive metrics (basic + extended)  
✓ **Rolling Sharpe Visualized** — Temporal stability analysis for top performers  
✓ **Attribution Analyzed** — Directional, signal strength, win/loss decomposition  
✓ **Attribution Visualized** — P&L breakdown charts generated  
✓ **Reports Generated** — Comprehensive markdown reports for all pairs  
✓ **Evaluations Registered** — Metadata tracked in PerformanceRegistry  
✓ **Metadata Persisted** — Execution summary saved

### Consolidated Metrics Architecture

The evaluation layer now uses **`compute_all_metrics`** for unified performance computation:

**All 21 Metrics (Basic + Extended):**
- **Returns:** total_return, annualized_return
- **Risk-Adjusted:** sharpe_ratio, sortino_ratio, calmar_ratio, max_drawdown, annualized_volatility
- **Trade Stats:** n_trades, hit_rate, avg_win, avg_loss, win_loss_ratio, avg_holding_days
- **Stability:** rolling_sharpe_mean, rolling_sharpe_std, max_dd_recovery_days, avg_recovery_days, n_drawdowns
- **Extended:** tail_ratio, profit_factor, consistency_score

**Optimizations:**
- Shared intermediates (running_max, drawdown, daily stats) computed once
- Single function call returns `PerformanceMetrics` dataclass
- Eliminates redundant calculations between basic and extended metrics

### Data Flow

```
Backtest Results (Step 4)
├─ P&L DataFrame (MultiIndex)
└─ Positions DataFrame (MultiIndex)
    ↓
Performance Analysis (this notebook)
├─ Reconstruct BacktestResult objects
├─ Run analyze_backtest_performance()
│   └─ Calls compute_all_metrics() internally
└─ Generate PerformanceResult objects
    ↓
Outputs
├─ Markdown reports (one per pair)
├─ Registry entries (performance_registry.json)
└─ Metadata (performance_evaluation_metadata.json)
```

### Re-Running This Notebook

- **Prerequisites:** Requires completed Step 4 (backtest execution)
- **Data Loading:** Loads cached P&L and positions automatically
- **Configuration:** Edit cell 3 to adjust evaluation parameters
- **Focus Visualization:** Top 5 by stability score (automatic)
- **Outputs:** Overwrites reports, registry, and metadata
- **Reports:** All signal-strategy pairs get full markdown reports

### Key Files Generated

```
reports/
└── performance/
    ├── {signal}_{strategy}_{timestamp}.md (one per pair)
    └── ... (typically 9-15 reports)

logs/
└── performance_evaluation_metadata.json

src/aponyx/evaluation/performance/
└── performance_registry.json (updated)
```

### Extended Metrics Explained

- **Stability Score (0-1):** Overall consistency across subperiods
- **Profit Factor:** Gross wins / gross losses (>1 is profitable)
- **Tail Ratio:** 95th percentile / 5th percentile return (asymmetry)
- **Rolling Sharpe:** Mean/std of 3-month rolling Sharpe ratios
- **Consistency:** Percentage of 3-week windows with positive returns
- **Recovery Days:** Average time to recover from drawdowns
- **Drawdown Count:** Number of distinct drawdown periods

### Attribution Components

**Directional:** Long vs short P&L contribution  
**Signal Strength:** Tercile breakdown (weak/medium/strong signals)  
**Win/Loss:** Positive vs negative day decomposition

### Troubleshooting

**Backtest results not found:**
- Run `04_backtest_execution.ipynb` first
- Verify files exist: `data/processed/backtest_results_pnl.parquet` and `backtest_results_positions.parquet`
- Check DATA_DIR configuration

**Evaluation errors:**
- Ensure sufficient observations (min 252 days by default)
- Verify P&L and positions DataFrames have matching structure
- Check DatetimeIndex is properly formatted
- Review ERROR logs for specific validation failures

**Report generation issues:**
- Verify PERFORMANCE_REPORTS_DIR exists and is writable
- Check disk space for markdown file creation
- Ensure all signal/strategy names are valid file name components
- Review report content in generated .md files

**Visualization issues:**
- Ensure plotly installed: `uv sync --extra viz`
- Verify notebook can render Plotly figures
- Check rolling window size relative to data length
- Reduce number of top performers if too many plots

**Registry errors:**
- Check PERFORMANCE_REGISTRY_PATH exists and is writable
- Verify JSON format is valid (delete and recreate if corrupted)
- Ensure unique evaluation IDs (timestamp-based)
- Review registry file permissions
