# Performance Analysis Workflow

**Systematic Macro Credit Research — Step 5 of 5**

This notebook performs comprehensive post-backtest performance analysis using the evaluation/performance layer. It represents the final step in the systematic research workflow.

## Workflow Position

```
1. Data Download (01_data_download.ipynb)
   ↓
2. Signal Computation (02_signal_computation.ipynb)
   ↓
3. Signal Suitability Evaluation (03_suitability_evaluation.ipynb)
   ↓
4. Backtest Execution (04_backtest.ipynb)
   ↓
5. Performance Analysis ← YOU ARE HERE
```

## Prerequisites

**This notebook only loads cached data. It does not generate any new data.**

Required files from previous steps:
- **P&L Results:** `data/processed/backtest_results_pnl.parquet` (from Step 4)
- **Position Results:** `data/processed/backtest_results_positions.parquet` (from Step 4)
- **Execution Metadata:** `logs/backtest_metadata.json` (from Step 4)

## What This Notebook Does

1. **Load Backtest Results** — Read P&L and positions from Step 4
2. **Configure Evaluation** — Set parameters for performance analysis
3. **Reconstruct BacktestResult Objects** — Prepare data for evaluation layer
4. **Run Performance Evaluations** — Analyze all signal-strategy pairs
5. **Display Extended Metrics** — Show comprehensive performance table
6. **Visualize Rolling Sharpe** — Plot temporal stability analysis
7. **Display Attribution** — Show return decomposition by direction/strength
8. **Visualize Attribution** — Plot directional P&L breakdown
9. **Generate Reports** — Create markdown reports for all pairs
10. **Register Evaluations** — Track metadata for reproducibility
11. **Persist Results** — Save evaluation metadata

## Outputs

- **Performance Reports:** `reports/performance/{signal}_{strategy}_{timestamp}.md`
- **Evaluation Registry:** `src/aponyx/evaluation/performance/performance_registry.json`
- **Execution Metadata:** `logs/performance_evaluation_metadata.json`

## Key Design Patterns

- **Cache-Only:** Loads all data from Step 4 (no backtest execution)
- **Individual Analysis:** Each signal-strategy pair analyzed separately
- **Extended Metrics:** Rolling Sharpe, profit factor, tail ratio, consistency
- **Attribution:** Directional, signal strength, and win/loss decomposition
- **Comprehensive Reporting:** Full markdown reports for all pairs

---

In [13]:
import logging
from datetime import datetime
from pathlib import Path

import pandas as pd
import plotly.graph_objects as go
from plotly.subplots import make_subplots

from aponyx.config import (
    DATA_DIR,
    LOGS_DIR,
    PERFORMANCE_REGISTRY_PATH,
    PERFORMANCE_REPORTS_DIR,
)
from aponyx.backtest import BacktestResult
from aponyx.evaluation.performance import (
    analyze_backtest_performance,
    PerformanceConfig,
    PerformanceRegistry,
    generate_performance_report,
    save_report,
)
from aponyx.persistence import load_parquet, load_json, save_json

# Configure logging for notebook
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
)
logger = logging.getLogger(__name__)

print("=" * 80)
print("PERFORMANCE ANALYSIS WORKFLOW — Step 5 of 5")
print("=" * 80)
print(f"\nConfiguration:")
print(f"  Data directory: {DATA_DIR}")
print(f"  Logs directory: {LOGS_DIR}")
print(f"  Reports directory: {PERFORMANCE_REPORTS_DIR}")
print(f"  Registry path: {PERFORMANCE_REGISTRY_PATH}")
print(f"\n✓ Imports complete")

PERFORMANCE ANALYSIS WORKFLOW — Step 5 of 5

Configuration:
  Data directory: C:\Users\ROG3003\PythonProjects\aponyx\data
  Logs directory: C:\Users\ROG3003\PythonProjects\aponyx\logs
  Reports directory: C:\Users\ROG3003\PythonProjects\aponyx\reports\performance
  Registry path: C:\Users\ROG3003\PythonProjects\aponyx\src\aponyx\evaluation\performance\performance_registry.json

✓ Imports complete


In [14]:
# Load P&L and positions
pnl_path = DATA_DIR / "processed" / "backtest_results_pnl.parquet"
positions_path = DATA_DIR / "processed" / "backtest_results_positions.parquet"
metadata_path = LOGS_DIR / "backtest_metadata.json"

if not pnl_path.exists():
    raise FileNotFoundError(
        f"P&L file not found: {pnl_path}\n"
        "Please run 04_backtest.ipynb first."
    )

if not positions_path.exists():
    raise FileNotFoundError(
        f"Positions file not found: {positions_path}\n"
        "Please run 04_backtest.ipynb first."
    )

# Load data
pnl_df = load_parquet(pnl_path)
positions_df = load_parquet(positions_path)
backtest_metadata = load_json(metadata_path)

print(f"\n{'='*80}")
print(f"BACKTEST RESULTS LOADED")
print(f"{'='*80}\n")
print(f"P&L path: {pnl_path}")
print(f"  Shape: {pnl_df.shape}")
print(f"  Index levels: {pnl_df.index.names}")
print(f"\nPositions path: {positions_path}")
print(f"  Shape: {positions_df.shape}")
print(f"  Index levels: {positions_df.index.names}")

# Extract signal-strategy pairs
signal_strategy_pairs = pnl_df.index.droplevel('date').unique()
signals = signal_strategy_pairs.get_level_values('_signal_id').unique().tolist()
strategies = signal_strategy_pairs.get_level_values('_strategy_id').unique().tolist()

print(f"\n{'─'*80}")
print(f"Signal-Strategy Pairs")
print(f"{'─'*80}\n")
print(f"Signals ({len(signals)}): {', '.join(signals)}")
print(f"Strategies ({len(strategies)}): {', '.join(strategies)}")
print(f"Total pairs: {len(signal_strategy_pairs)}")

print(f"\nDate range: {backtest_metadata['summary']['date_range']['start']} "
      f"to {backtest_metadata['summary']['date_range']['end']}")
print(f"Backtest execution: {backtest_metadata['timestamp']}")
print(f"\n✓ Data loaded successfully")

2025-11-09 18:21:05,049 - aponyx.persistence.parquet_io - INFO - Loading Parquet file: path=C:\Users\ROG3003\PythonProjects\aponyx\data\processed\backtest_results_pnl.parquet, columns=all
2025-11-09 18:21:05,058 - aponyx.persistence.parquet_io - INFO - Loaded 15472 rows, 4 columns from C:\Users\ROG3003\PythonProjects\aponyx\data\processed\backtest_results_pnl.parquet
2025-11-09 18:21:05,059 - aponyx.persistence.parquet_io - INFO - Loading Parquet file: path=C:\Users\ROG3003\PythonProjects\aponyx\data\processed\backtest_results_positions.parquet, columns=all
2025-11-09 18:21:05,068 - aponyx.persistence.parquet_io - INFO - Loaded 15472 rows, 4 columns from C:\Users\ROG3003\PythonProjects\aponyx\data\processed\backtest_results_positions.parquet
2025-11-09 18:21:05,069 - aponyx.persistence.json_io - INFO - Loading JSON from C:\Users\ROG3003\PythonProjects\aponyx\logs\backtest_metadata.json



BACKTEST RESULTS LOADED

P&L path: C:\Users\ROG3003\PythonProjects\aponyx\data\processed\backtest_results_pnl.parquet
  Shape: (15472, 4)
  Index levels: ['_signal_id', '_strategy_id', 'date']

Positions path: C:\Users\ROG3003\PythonProjects\aponyx\data\processed\backtest_results_positions.parquet
  Shape: (15472, 4)
  Index levels: ['_signal_id', '_strategy_id', 'date']

────────────────────────────────────────────────────────────────────────────────
Signal-Strategy Pairs
────────────────────────────────────────────────────────────────────────────────

Signals (3): cdx_etf_basis, cdx_vix_gap, spread_momentum
Strategies (4): conservative, balanced, aggressive, experimental
Total pairs: 12

Date range: 2020-11-09 00:00:00 to 2024-06-05 00:00:00
Backtest execution: 2025-11-09T15:25:40.820423

✓ Data loaded successfully


In [15]:
# Create performance evaluation configuration
config = PerformanceConfig(
    min_obs=252,  # Require at least 1 year of data
    n_subperiods=4,  # Quarterly stability analysis
    risk_free_rate=0.0,  # Zero risk-free rate
    rolling_window=63,  # 3-month rolling metrics
    report_format='markdown',  # Markdown reports
    attribution_quantiles=3,  # Terciles for signal strength
)

print(f"\n{'='*80}")
print(f"EVALUATION CONFIGURATION")
print(f"{'='*80}\n")
print(f"Minimum observations: {config.min_obs}")
print(f"Subperiods for stability: {config.n_subperiods} (quarterly)")
print(f"Rolling window: {config.rolling_window} days (3 months)")
print(f"Attribution quantiles: {config.attribution_quantiles} (terciles)")
print(f"Report format: {config.report_format}")
print(f"\n✓ Configuration set")


EVALUATION CONFIGURATION

Minimum observations: 252
Subperiods for stability: 4 (quarterly)
Rolling window: 63 days (3 months)
Attribution quantiles: 3 (terciles)
Report format: markdown

✓ Configuration set


In [16]:
def reconstruct_backtest_result(
    signal_name: str,
    strategy_name: str,
    pnl_df: pd.DataFrame,
    positions_df: pd.DataFrame,
    metadata: dict,
) -> BacktestResult:
    """
    Reconstruct BacktestResult from cached DataFrames.
    
    Parameters
    ----------
    signal_name : str
        Signal identifier.
    strategy_name : str
        Strategy identifier.
    pnl_df : pd.DataFrame
        MultiIndex DataFrame with P&L data.
    positions_df : pd.DataFrame
        MultiIndex DataFrame with position data.
    metadata : dict
        Backtest execution metadata.
    
    Returns
    -------
    BacktestResult
        Reconstructed backtest result object.
    """
    # Filter to specific signal-strategy pair
    pnl = pnl_df.loc[(signal_name, strategy_name)].copy()
    positions = positions_df.loc[(signal_name, strategy_name)].copy()
    
    # Drop the MultiIndex levels that were used for filtering
    pnl.index.name = 'date'
    positions.index.name = 'date'
    
    # The positions DataFrame now has the original columns: signal, position, days_held, spread
    # No need to drop anything - the MultiIndex columns were already removed by set_index
    
    # Create BacktestResult
    return BacktestResult(
        pnl=pnl,
        positions=positions,
        metadata={
            "signal_id": signal_name,
            "strategy_id": strategy_name,
            "config": metadata.get("configuration", {}),
            "backtest_timestamp": metadata.get("timestamp"),
        },
    )

print(f"\n{'='*80}")
print(f"RECONSTRUCTING BACKTEST RESULTS")
print(f"{'='*80}\n")

# Reconstruct BacktestResult for each signal-strategy pair
backtest_results = {}

for signal, strategy in signal_strategy_pairs:
    result = reconstruct_backtest_result(
        signal,
        strategy,
        pnl_df,
        positions_df,
        backtest_metadata,
    )
    backtest_results[(signal, strategy)] = result
    print(f"✓ Reconstructed: {signal} × {strategy}")

print(f"\n✓ Reconstructed {len(backtest_results)} BacktestResult objects")


RECONSTRUCTING BACKTEST RESULTS

✓ Reconstructed: cdx_etf_basis × conservative
✓ Reconstructed: cdx_etf_basis × balanced
✓ Reconstructed: cdx_etf_basis × aggressive
✓ Reconstructed: cdx_etf_basis × experimental
✓ Reconstructed: cdx_vix_gap × conservative
✓ Reconstructed: cdx_vix_gap × balanced
✓ Reconstructed: cdx_vix_gap × aggressive
✓ Reconstructed: cdx_vix_gap × experimental
✓ Reconstructed: spread_momentum × conservative
✓ Reconstructed: spread_momentum × balanced
✓ Reconstructed: spread_momentum × aggressive
✓ Reconstructed: spread_momentum × experimental

✓ Reconstructed 12 BacktestResult objects



indexing past lexsort depth may impact performance.


indexing past lexsort depth may impact performance.



In [17]:
print(f"\n{'='*80}")
print(f"RUNNING PERFORMANCE EVALUATIONS")
print(f"{'='*80}\n")

# Store performance results
performance_results = {}
evaluation_start = datetime.now()

# Run evaluations
total_evaluations = len(backtest_results)
current_eval = 0

for (signal_name, strategy_name), backtest_result in backtest_results.items():
    current_eval += 1
    print(f"[{current_eval}/{total_evaluations}] Evaluating: {signal_name} × {strategy_name}")
    
    # Run performance analysis
    perf_result = analyze_backtest_performance(backtest_result, config)
    performance_results[(signal_name, strategy_name)] = perf_result
    
    # Display summary
    print(f"  Stability score: {perf_result.stability_score:.3f}")
    print(f"  Profit factor: {perf_result.metrics['profit_factor']:.2f}")

evaluation_end = datetime.now()
evaluation_time = (evaluation_end - evaluation_start).total_seconds()

print(f"\n✓ Performance evaluations complete: {total_evaluations} pairs in {evaluation_time:.1f}s")
print(f"  Average: {evaluation_time/total_evaluations:.2f}s per evaluation")

2025-11-09 18:21:05,117 - aponyx.evaluation.performance.analyzer - INFO - Analyzing backtest performance: config=PerformanceConfig(min_obs=252, n_subperiods=4, risk_free_rate=0.0, rolling_window=63, report_format='markdown', attribution_quantiles=3)
2025-11-09 18:21:05,118 - aponyx.evaluation.performance.risk_metrics - INFO - Computing extended risk metrics: window=63 days
2025-11-09 18:21:05,132 - aponyx.evaluation.performance.risk_metrics - INFO - Extended metrics computed: profit_factor=0.85, tail_ratio=0.94, consistency=36.8%
2025-11-09 18:21:05,133 - aponyx.evaluation.performance.decomposition - INFO - Computing return attribution: n_quantiles=3


2025-11-09 18:21:05,138 - aponyx.evaluation.performance.decomposition - INFO - Attribution computed: long=1.9%, wins=435.3%
2025-11-09 18:21:05,140 - aponyx.evaluation.performance.analyzer - INFO - Performance evaluation complete: stability=0.50, profit_factor=0.85
2025-11-09 18:21:05,140 - aponyx.evaluation.performance.analyzer - INFO -


RUNNING PERFORMANCE EVALUATIONS

[1/12] Evaluating: cdx_etf_basis × conservative
  Stability score: 0.500
  Profit factor: 0.85
[2/12] Evaluating: cdx_etf_basis × balanced
  Stability score: 0.500
  Profit factor: 0.92
[3/12] Evaluating: cdx_etf_basis × aggressive
  Stability score: 0.500
  Profit factor: 0.76
[4/12] Evaluating: cdx_etf_basis × experimental
  Stability score: 0.500
  Profit factor: 1.29
[5/12] Evaluating: cdx_vix_gap × conservative


2025-11-09 18:21:05,226 - aponyx.evaluation.performance.analyzer - INFO - Performance evaluation complete: stability=0.75, profit_factor=1.54
2025-11-09 18:21:05,226 - aponyx.evaluation.performance.analyzer - INFO - Analyzing backtest performance: config=PerformanceConfig(min_obs=252, n_subperiods=4, risk_free_rate=0.0, rolling_window=63, report_format='markdown', attribution_quantiles=3)
2025-11-09 18:21:05,227 - aponyx.evaluation.performance.risk_metrics - INFO - Computing extended risk metrics: window=63 days
2025-11-09 18:21:05,244 - aponyx.evaluation.performance.risk_metrics - INFO - Extended metrics computed: profit_factor=0.49, tail_ratio=0.52, consistency=46.0%
2025-11-09 18:21:05,246 - aponyx.evaluation.performance.decomposition - INFO - Computing return attribution: n_quantiles=3


2025-11-09 18:21:05,250 - aponyx.evaluation.performance.decomposition - INFO - Attribution computed: long=54.0%, wins=35.2%
2025-11-09 18:21:05,251 - aponyx.evaluation.performance.analyzer - INFO -

  Stability score: 0.750
  Profit factor: 1.54
[6/12] Evaluating: cdx_vix_gap × balanced
  Stability score: 0.000
  Profit factor: 0.49
[7/12] Evaluating: cdx_vix_gap × aggressive


2025-11-09 18:21:05,266 - aponyx.evaluation.performance.decomposition - INFO - Computing return attribution: n_quantiles=3


2025-11-09 18:21:05,271 - aponyx.evaluation.performance.decomposition - INFO - Attribution computed: long=34.9%, wins=82.8%
2025-11-09 18:21:05,271 - aponyx.evaluation.performance.analyzer - INFO - Performance evaluation complete: stability=0.50, profit_factor=0.60
2025-11-09 18:21:05,273 - aponyx.evaluation.performance.analyzer - INFO - Analyzing backtest performance: config=PerformanceConfig(min_obs=252, n_subperiods=4, risk_free_rate=0.0, rolling_window=63, report_format='markdown', attribution_quantiles=3)


  Stability score: 0.500
  Profit factor: 0.60
[8/12] Evaluating: cdx_vix_gap × experimental


2025-11-09 18:21:05,273 - aponyx.evaluation.performance.risk_metrics - INFO - Computing extended risk metrics: window=63 days
2025-11-09 18:21:05,285 - aponyx.evaluation.performance.risk_metrics - INFO - Extended metrics computed: profit_factor=0.81, tail_ratio=0.94, consistency=44.3%
2025-11-09 18:21:05,286 - aponyx.evaluation.performance.decomposition - INFO - Computing return attribution: n_quantiles=3


2025-11-09 18:21:05,291 - aponyx.evaluation.performance.decomposition - INFO - Attribution computed: long=41.7%, wins=277.4%
2025-11-09 18:21:05,291 - aponyx.evaluation.performance.analyzer - INFO - Performance evaluation complete: stability=0.50, profit_factor=0.81
2025-11-09 18:21:05,291 - aponyx.evaluation.performance.analyzer - INFO - Analyzing backtest performance: config=PerformanceConfig(min_obs=252, n_subperiods=4, risk_free_rate=0.0, rolling_window=63, report_format='markdown', attribution_quantiles=3)
2025-11-09 18:21:05,291 - aponyx.evaluation.performance.risk_metrics - I

  Stability score: 0.500
  Profit factor: 0.81
[9/12] Evaluating: spread_momentum × conservative


2025-11-09 18:21:05,306 - aponyx.evaluation.performance.decomposition - INFO - Computing return attribution: n_quantiles=3


2025-11-09 18:21:05,310 - aponyx.evaluation.performance.decomposition - INFO - Attribution computed: long=39.1%, wins=260.8%
2025-11-09 18:21:05,310 - aponyx.evaluation.performance.analyzer - INFO - Performance evaluation complete: stability=0.75, profit_factor=1.16
2025-11-09 18:21:05,310 - aponyx.evaluation.performance.analyzer - INFO - Analyzing backtest performance: config=PerformanceConfig(min_obs=252, n_subperiods=4, risk_free_rate=0.0, rolling_window=63, report_format='markdown', attribution_quantiles=3)
2025-11-09 18:21:05,310 - aponyx.evaluation.performance.risk_metrics - INFO - Computing extended risk metrics: window=63 days
2025-11-09 18:21:05,322 - aponyx.evaluation.performance.risk_metrics - INFO - Extended metrics computed: profit_factor=0.75, tail_ratio=0.69, consistency=46.6%
2025-11-09 18:21:05,324 - aponyx.evaluation.performance.decomposition - 

  Stability score: 0.750
  Profit factor: 1.16
[10/12] Evaluating: spread_momentum × balanced
  Stability score: 0.250
  Profit factor: 0.75
[11/12] Evaluating: spread_momentum × aggressive
  Stability score: 0.500
  Profit factor: 0.97
[12/12] Evaluating: spread_momentum × experimental
  Stability score: 0.250
  Profit factor: 0.64

✓ Performance evaluations complete: 12 pairs in 0.2s
  Average: 0.02s per evaluation


In [18]:
print(f"\n{'='*80}")
print(f"EXTENDED METRICS SUMMARY")
print(f"{'='*80}\n")

# Create summary table
metrics_summary = []

for (signal_name, strategy_name), perf_result in performance_results.items():
    metrics = perf_result.metrics
    metrics_summary.append({
        "Signal": signal_name,
        "Strategy": strategy_name,
        "Stability": f"{perf_result.stability_score:.3f}",
        "Profit Factor": f"{metrics['profit_factor']:.2f}",
        "Tail Ratio": f"{metrics['tail_ratio']:.2f}",
        "Rolling Sharpe μ": f"{metrics['rolling_sharpe_mean']:.2f}",
        "Rolling Sharpe σ": f"{metrics['rolling_sharpe_std']:.2f}",
        "Consistency": f"{metrics['consistency_score']:.1%}",
        "Recovery Days": f"{metrics['avg_recovery_days']:.0f}",
        "DD Count": f"{int(metrics['n_drawdowns'])}",
    })

metrics_df = pd.DataFrame(metrics_summary)

# Sort by stability score (descending)
metrics_df_sorted = metrics_df.copy()
metrics_df_sorted['_stability'] = metrics_df_sorted['Stability'].astype(float)
metrics_df_sorted = metrics_df_sorted.sort_values('_stability', ascending=False)
metrics_df_sorted = metrics_df_sorted.drop(columns=['_stability'])

print(metrics_df_sorted.to_markdown(index=False))

print(f"\n{'─'*80}")
print(f"Metric Definitions:")
print(f"{'─'*80}")
print(f"  Stability: Overall consistency score (0-1)")
print(f"  Profit Factor: Gross wins / gross losses")
print(f"  Tail Ratio: 95th percentile / 5th percentile return")
print(f"  Rolling Sharpe: Mean/std of 3-month rolling Sharpe ratios")
print(f"  Consistency: % of 3-week windows with positive returns")
print(f"  Recovery Days: Average time to recover from drawdowns")
print(f"  DD Count: Number of distinct drawdown periods")


EXTENDED METRICS SUMMARY

| Signal          | Strategy     |   Stability |   Profit Factor |   Tail Ratio |   Rolling Sharpe μ |   Rolling Sharpe σ | Consistency   |   Recovery Days |   DD Count |
|:----------------|:-------------|------------:|----------------:|-------------:|-------------------:|-------------------:|:--------------|----------------:|-----------:|
| spread_momentum | conservative |        0.75 |            1.16 |         1.09 |               0.23 |               5.13 | 47.5%         |              93 |         13 |
| cdx_vix_gap     | conservative |        0.75 |            1.54 |         1.49 |               1.3  |               3.34 | 48.4%         |               6 |         32 |
| cdx_etf_basis   | conservative |        0.5  |            0.85 |         0.94 |              -0.38 |               4.43 | 36.8%         |              64 |          5 |
| cdx_etf_basis   | balanced     |        0.5  |            0.92 |         1.08 |              -0.68 |               4

In [19]:
import numpy as np

print(f"\n{'='*80}")
print(f"ROLLING SHARPE ANALYSIS — Top 5 by Stability")
print(f"{'='*80}\n")

# Get top 5 pairs by stability score
top_5_pairs = sorted(
    performance_results.items(),
    key=lambda x: x[1].stability_score,
    reverse=True
)[:5]

for (signal_name, strategy_name), perf_result in top_5_pairs:
    # Get backtest result for this pair
    backtest_result = backtest_results[(signal_name, strategy_name)]
    pnl_series = backtest_result.pnl['net_pnl']
    
    # Compute rolling Sharpe
    rolling_sharpe = (
        pnl_series.rolling(window=config.rolling_window, min_periods=config.rolling_window // 2)
        .mean() / 
        pnl_series.rolling(window=config.rolling_window, min_periods=config.rolling_window // 2)
        .std()
    ) * np.sqrt(252)
    
    # Create plot
    fig = go.Figure()
    
    fig.add_trace(go.Scatter(
        x=rolling_sharpe.index,
        y=rolling_sharpe,
        mode='lines',
        name='Rolling Sharpe',
        line=dict(color='blue', width=2),
    ))
    
    # Add mean line
    mean_sharpe = perf_result.metrics['rolling_sharpe_mean']
    fig.add_hline(
        y=mean_sharpe,
        line_dash="dash",
        line_color="green",
        annotation_text=f"Mean: {mean_sharpe:.2f}",
        annotation_position="right"
    )
    
    # Add zero line
    fig.add_hline(y=0, line_dash="dot", line_color="gray")
    
    # Update layout
    fig.update_layout(
        title=f"Rolling Sharpe Ratio: {signal_name} × {strategy_name} (Stability: {perf_result.stability_score:.3f})",
        xaxis_title="Date",
        yaxis_title=f"Rolling {config.rolling_window}-Day Sharpe Ratio",
        hovermode='x unified',
        height=400,
        showlegend=False,
    )
    
    fig.show()
    print(f"✓ Plotted: {signal_name} × {strategy_name}\n")


ROLLING SHARPE ANALYSIS — Top 5 by Stability



✓ Plotted: cdx_vix_gap × conservative



✓ Plotted: spread_momentum × conservative



✓ Plotted: cdx_etf_basis × conservative



✓ Plotted: cdx_etf_basis × balanced



✓ Plotted: cdx_etf_basis × aggressive



In [20]:
print(f"\n{'='*80}")
print(f"ATTRIBUTION ANALYSIS — Top 5 by Stability")
print(f"{'='*80}\n")

for (signal_name, strategy_name), perf_result in top_5_pairs:
    print(f"\n{'─'*80}")
    print(f"{signal_name} × {strategy_name}")
    print(f"{'─'*80}\n")
    
    attribution = perf_result.attribution
    
    # Directional attribution
    dir_attr = attribution['direction']
    print("Directional Attribution:")
    print(f"  Long P&L:  ${dir_attr['long_pnl']:>12,.0f} ({dir_attr['long_pct']:>6.1%})")
    print(f"  Short P&L: ${dir_attr['short_pnl']:>12,.0f} ({dir_attr['short_pct']:>6.1%})")
    
    # Signal strength attribution
    sig_attr = attribution['signal_strength']
    print(f"\nSignal Strength Attribution (Terciles):")
    print(f"  Q1 (Weak):   ${sig_attr['q1_pnl']:>12,.0f} ({sig_attr['q1_pct']:>6.1%})")
    print(f"  Q2 (Medium): ${sig_attr['q2_pnl']:>12,.0f} ({sig_attr['q2_pct']:>6.1%})")
    print(f"  Q3 (Strong): ${sig_attr['q3_pnl']:>12,.0f} ({sig_attr['q3_pct']:>6.1%})")
    
    # Win/loss decomposition
    wl_attr = attribution['win_loss']
    print(f"\nWin/Loss Decomposition:")
    print(f"  Wins:   ${wl_attr['gross_wins']:>12,.0f} ({wl_attr['win_contribution']:>6.1%})")
    print(f"  Losses: ${wl_attr['gross_losses']:>12,.0f} ({wl_attr['loss_contribution']:>6.1%})")


ATTRIBUTION ANALYSIS — Top 5 by Stability


────────────────────────────────────────────────────────────────────────────────
cdx_vix_gap × conservative
────────────────────────────────────────────────────────────────────────────────

Directional Attribution:
  Long P&L:  $  -1,982,803 ( 36.0%)
  Short P&L: $  -3,530,916 ( 64.0%)

Signal Strength Attribution (Terciles):
  Q1 (Weak):   $   8,876,939 (-161.0%)
  Q2 (Medium): $  -3,804,777 ( 69.0%)
  Q3 (Strong): $ -10,585,881 (192.0%)

Win/Loss Decomposition:
  Wins:   $  14,426,588 (261.6%)
  Losses: $ -19,940,308 (361.6%)

────────────────────────────────────────────────────────────────────────────────
spread_momentum × conservative
────────────────────────────────────────────────────────────────────────────────

Directional Attribution:
  Long P&L:  $  17,659,763 ( 39.1%)
  Short P&L: $  27,541,970 ( 60.9%)

Signal Strength Attribution (Terciles):
  Q1 (Weak):   $ -11,259,072 (-24.9%)
  Q2 (Medium): $   5,157,527 ( 11.4%)
  Q3 (Strong

In [21]:
print(f"\n{'='*80}")
print(f"DIRECTIONAL ATTRIBUTION VISUALIZATION")
print(f"{'='*80}\n")

# Prepare data for visualization
pairs_list = []
long_pnl_list = []
short_pnl_list = []

for (signal_name, strategy_name), perf_result in top_5_pairs:
    pair_label = f"{signal_name}<br>{strategy_name}"
    pairs_list.append(pair_label)
    
    dir_attr = perf_result.attribution['direction']
    long_pnl_list.append(dir_attr['long_pnl'])
    short_pnl_list.append(dir_attr['short_pnl'])

# Create stacked bar chart
fig = go.Figure()

fig.add_trace(go.Bar(
    name='Long P&L',
    x=pairs_list,
    y=long_pnl_list,
    marker_color='green',
))

fig.add_trace(go.Bar(
    name='Short P&L',
    x=pairs_list,
    y=short_pnl_list,
    marker_color='red',
))

fig.update_layout(
    title="Directional Attribution: Long vs Short P&L (Top 5 by Stability)",
    xaxis_title="Signal × Strategy",
    yaxis_title="P&L ($)",
    barmode='relative',
    height=500,
    hovermode='x unified',
)

fig.show()
print(f"\n✓ Directional attribution chart displayed")


DIRECTIONAL ATTRIBUTION VISUALIZATION




✓ Directional attribution chart displayed


In [22]:
print(f"\n{'='*80}")
print(f"GENERATING PERFORMANCE REPORTS")
print(f"{'='*80}\n")

# Ensure reports directory exists
PERFORMANCE_REPORTS_DIR.mkdir(parents=True, exist_ok=True)

# Store report paths
report_paths = {}

for (signal_name, strategy_name), perf_result in performance_results.items():
    print(f"Generating report: {signal_name} × {strategy_name}")
    
    # Generate markdown report
    report = generate_performance_report(
        perf_result,
        signal_name,
        strategy_name,
    )
    
    # Save report
    report_path = save_report(
        report,
        signal_name,
        strategy_name,
        PERFORMANCE_REPORTS_DIR,
    )
    
    report_paths[(signal_name, strategy_name)] = report_path
    print(f"  ✓ Saved: {report_path.name}")

print(f"\n✓ Generated {len(report_paths)} performance reports")
print(f"  Location: {PERFORMANCE_REPORTS_DIR}")

2025-11-09 18:21:05,567 - aponyx.evaluation.performance.report - INFO - Saved performance report to C:\Users\ROG3003\PythonProjects\aponyx\reports\performance\cdx_etf_basis_conservative_20251109_182105.md
2025-11-09 18:21:05,569 - aponyx.evaluation.performance.report - INFO - Saved performance report to C:\Users\ROG3003\PythonProjects\aponyx\reports\performance\cdx_etf_basis_balanced_20251109_182105.md



GENERATING PERFORMANCE REPORTS

Generating report: cdx_etf_basis × conservative
  ✓ Saved: cdx_etf_basis_conservative_20251109_182105.md
Generating report: cdx_etf_basis × balanced
  ✓ Saved: cdx_etf_basis_balanced_20251109_182105.md
Generating report: cdx_etf_basis × aggressive


2025-11-09 18:21:05,570 - aponyx.evaluation.performance.report - INFO - Saved performance report to C:\Users\ROG3003\PythonProjects\aponyx\reports\performance\cdx_etf_basis_aggressive_20251109_182105.md
2025-11-09 18:21:05,572 - aponyx.evaluation.performance.report - INFO - Saved performance report to C:\Users\ROG3003\PythonProjects\aponyx\reports\performance\cdx_etf_basis_experimental_20251109_182105.md
2025-11-09 18:21:05,573 - aponyx.evaluation.performance.report - INFO - Saved performance report to C:\Users\ROG3003\PythonProjects\aponyx\reports\performance\cdx_vix_gap_conservative_20251109_182105.md
2025-11-09 18:21:05,574 - aponyx.evaluation.performance.report - INFO - Saved performance report to C:\Users\ROG3003\PythonProjects\aponyx\reports\performance\cdx_vix_gap_balanced_20251109_182105.md
2025-11-09 18:21:05,575 - aponyx.evaluation.performance.report - INFO - Saved performance report to C:\Users\ROG3003\PythonProjects\aponyx\reports\performance\cdx_vix_gap_aggressive_20251109

  ✓ Saved: cdx_etf_basis_aggressive_20251109_182105.md
Generating report: cdx_etf_basis × experimental
  ✓ Saved: cdx_etf_basis_experimental_20251109_182105.md
Generating report: cdx_vix_gap × conservative
  ✓ Saved: cdx_vix_gap_conservative_20251109_182105.md
Generating report: cdx_vix_gap × balanced
  ✓ Saved: cdx_vix_gap_balanced_20251109_182105.md
Generating report: cdx_vix_gap × aggressive
  ✓ Saved: cdx_vix_gap_aggressive_20251109_182105.md
Generating report: cdx_vix_gap × experimental
  ✓ Saved: cdx_vix_gap_experimental_20251109_182105.md
Generating report: spread_momentum × conservative
  ✓ Saved: spread_momentum_conservative_20251109_182105.md
Generating report: spread_momentum × balanced
  ✓ Saved: spread_momentum_balanced_20251109_182105.md
Generating report: spread_momentum × aggressive
  ✓ Saved: spread_momentum_aggressive_20251109_182105.md
Generating report: spread_momentum × experimental
  ✓ Saved: spread_momentum_experimental_20251109_182105.md

✓ Generated 12 performa

In [23]:
print(f"\n{'='*80}")
print(f"REGISTERING EVALUATIONS")
print(f"{'='*80}\n")

# Initialize performance registry
registry = PerformanceRegistry(PERFORMANCE_REGISTRY_PATH)

# Register all evaluations
evaluation_ids = []

for (signal_name, strategy_name), perf_result in performance_results.items():
    report_path = report_paths[(signal_name, strategy_name)]
    
    # Register evaluation
    eval_id = registry.register_evaluation(
        perf_result,
        signal_name,
        strategy_name,
        report_path,
    )
    
    evaluation_ids.append(eval_id)
    print(f"✓ Registered: {eval_id}")

print(f"\n✓ Registered {len(evaluation_ids)} evaluations")
print(f"  Registry: {PERFORMANCE_REGISTRY_PATH}")

# Display registry summary
all_evaluations = registry.list_evaluations()
print(f"\nRegistry Summary:")
print(f"  Total evaluations: {len(all_evaluations)}")
print(f"  Unique signals: {len(set(e.split('_')[0] for e in all_evaluations))}")
print(f"  Unique strategies: {len(set('_'.join(e.split('_')[1:-1]) for e in all_evaluations))}")

2025-11-09 18:21:05,589 - aponyx.persistence.json_io - INFO - Loading JSON from C:\Users\ROG3003\PythonProjects\aponyx\src\aponyx\evaluation\performance\performance_registry.json
2025-11-09 18:21:05,589 - aponyx.evaluation.performance.registry - INFO - Loaded existing performance registry: 24 evaluations
2025-11-09 18:21:05,589 - aponyx.persistence.json_io - INFO - Saving JSON to C:\Users\ROG3003\PythonProjects\aponyx\src\aponyx\evaluation\performance\performance_registry.json (25 top-level keys)
2025-11-09 18:21:05,589 - aponyx.evaluation.performance.registry - INFO - Registered performance evaluation: cdx_etf_basis_conservative_20251109_182105 (stability=0.500, sharpe=0.00)
2025-11-09 18:21:05,589 - aponyx.persistence.json_io - INFO - Saving JSON to C:\Users\ROG3003\PythonProjects\aponyx\src\aponyx\evaluation\performance\performance_registry.json (26 top-level keys)



REGISTERING EVALUATIONS

✓ Registered: cdx_etf_basis_conservative_20251109_182105


2025-11-09 18:21:05,609 - aponyx.evaluation.performance.registry - INFO - Registered performance evaluation: cdx_etf_basis_balanced_20251109_182105 (stability=0.500, sharpe=0.00)
2025-11-09 18:21:05,610 - aponyx.persistence.json_io - INFO - Saving JSON to C:\Users\ROG3003\PythonProjects\aponyx\src\aponyx\evaluation\performance\performance_registry.json (27 top-level keys)
2025-11-09 18:21:05,615 - aponyx.evaluation.performance.registry - INFO - Registered performance evaluation: cdx_etf_basis_aggressive_20251109_182105 (stability=0.500, sharpe=0.00)
2025-11-09 18:21:05,616 - aponyx.persistence.json_io - INFO - Saving JSON to C:\Users\ROG3003\PythonProjects\aponyx\src\aponyx\evaluation\performance\performance_registry.json (28 top-level keys)
2025-11-09 18:21:05,622 - aponyx.evaluation.performance.registry - INFO - Registered performance evaluation: cdx_etf_basis_experimental_20251109_182105 (stability=0.500, sharpe=0.00)
2025-11-09 18:21:05,624 - aponyx.persistence.json_io - INFO - Sav

✓ Registered: cdx_etf_basis_balanced_20251109_182105
✓ Registered: cdx_etf_basis_aggressive_20251109_182105
✓ Registered: cdx_etf_basis_experimental_20251109_182105
✓ Registered: cdx_vix_gap_conservative_20251109_182105


2025-11-09 18:21:05,636 - aponyx.evaluation.performance.registry - INFO - Registered performance evaluation: cdx_vix_gap_balanced_20251109_182105 (stability=0.000, sharpe=0.00)
2025-11-09 18:21:05,637 - aponyx.persistence.json_io - INFO - Saving JSON to C:\Users\ROG3003\PythonProjects\aponyx\src\aponyx\evaluation\performance\performance_registry.json (31 top-level keys)
2025-11-09 18:21:05,644 - aponyx.evaluation.performance.registry - INFO - Registered performance evaluation: cdx_vix_gap_aggressive_20251109_182105 (stability=0.500, sharpe=0.00)
2025-11-09 18:21:05,645 - aponyx.persistence.json_io - INFO - Saving JSON to C:\Users\ROG3003\PythonProjects\aponyx\src\aponyx\evaluation\performance\performance_registry.json (32 top-level keys)


✓ Registered: cdx_vix_gap_balanced_20251109_182105
✓ Registered: cdx_vix_gap_aggressive_20251109_182105


2025-11-09 18:21:05,651 - aponyx.evaluation.performance.registry - INFO - Registered performance evaluation: cdx_vix_gap_experimental_20251109_182105 (stability=0.500, sharpe=0.00)
2025-11-09 18:21:05,653 - aponyx.persistence.json_io - INFO - Saving JSON to C:\Users\ROG3003\PythonProjects\aponyx\src\aponyx\evaluation\performance\performance_registry.json (33 top-level keys)
2025-11-09 18:21:05,659 - aponyx.evaluation.performance.registry - INFO - Registered performance evaluation: spread_momentum_conservative_20251109_182105 (stability=0.750, sharpe=0.00)
2025-11-09 18:21:05,661 - aponyx.persistence.json_io - INFO - Saving JSON to C:\Users\ROG3003\PythonProjects\aponyx\src\aponyx\evaluation\performance\performance_registry.json (34 top-level keys)
2025-11-09 18:21:05,667 - aponyx.evaluation.performance.registry - INFO - Registered performance evaluation: spread_momentum_balanced_20251109_182105 (stability=0.250, sharpe=0.00)
2025-11-09 18:21:05,668 - aponyx.persistence.json_io - INFO -

✓ Registered: cdx_vix_gap_experimental_20251109_182105
✓ Registered: spread_momentum_conservative_20251109_182105
✓ Registered: spread_momentum_balanced_20251109_182105
✓ Registered: spread_momentum_aggressive_20251109_182105


2025-11-09 18:21:05,679 - aponyx.evaluation.performance.registry - INFO - Registered performance evaluation: spread_momentum_experimental_20251109_182105 (stability=0.250, sharpe=0.00)


✓ Registered: spread_momentum_experimental_20251109_182105

✓ Registered 12 evaluations
  Registry: C:\Users\ROG3003\PythonProjects\aponyx\src\aponyx\evaluation\performance\performance_registry.json

Registry Summary:
  Total evaluations: 36
  Unique signals: 2
  Unique strategies: 12


In [24]:
print(f"\n{'='*80}")
print(f"PERSISTING EVALUATION METADATA")
print(f"{'='*80}\n")

# Create evaluation metadata
evaluation_metadata = {
    "timestamp": datetime.now().isoformat(),
    "execution_time_seconds": evaluation_time,
    "configuration": {
        "min_obs": config.min_obs,
        "n_subperiods": config.n_subperiods,
        "risk_free_rate": config.risk_free_rate,
        "rolling_window": config.rolling_window,
        "attribution_quantiles": config.attribution_quantiles,
    },
    "summary": {
        "total_evaluations": len(performance_results),
        "signals": signals,
        "strategies": strategies,
        "top_performer": {
            "signal": top_5_pairs[0][0][0],
            "strategy": top_5_pairs[0][0][1],
            "stability_score": float(top_5_pairs[0][1].stability_score),
            "profit_factor": float(top_5_pairs[0][1].metrics['profit_factor']),
        },
        "reports_directory": str(PERFORMANCE_REPORTS_DIR),
        "registry_path": str(PERFORMANCE_REGISTRY_PATH),
    },
    "backtest_reference": {
        "timestamp": backtest_metadata["timestamp"],
        "date_range": backtest_metadata["summary"]["date_range"],
    },
}

# Save metadata
metadata_path = LOGS_DIR / "performance_evaluation_metadata.json"
save_json(evaluation_metadata, metadata_path)

print(f"✓ Saved evaluation metadata: {metadata_path}")
print(f"\nMetadata summary:")
print(f"  Evaluations: {evaluation_metadata['summary']['total_evaluations']}")
print(f"  Top performer: {evaluation_metadata['summary']['top_performer']['signal']} × "
      f"{evaluation_metadata['summary']['top_performer']['strategy']}")
print(f"  Stability: {evaluation_metadata['summary']['top_performer']['stability_score']:.3f}")
print(f"  Profit factor: {evaluation_metadata['summary']['top_performer']['profit_factor']:.2f}")

2025-11-09 18:21:05,700 - aponyx.persistence.json_io - INFO - Saving JSON to C:\Users\ROG3003\PythonProjects\aponyx\logs\performance_evaluation_metadata.json (5 top-level keys)



PERSISTING EVALUATION METADATA

✓ Saved evaluation metadata: C:\Users\ROG3003\PythonProjects\aponyx\logs\performance_evaluation_metadata.json

Metadata summary:
  Evaluations: 12
  Top performer: cdx_vix_gap × conservative
  Stability: 0.750
  Profit factor: 1.54


---

## Workflow Complete

Performance analysis successful! Comprehensive post-backtest evaluation has been completed for all signal-strategy pairs.

### What Was Accomplished

✓ **Backtest Results Loaded** — P&L and positions from Step 4  
✓ **Evaluation Configured** — Performance analysis parameters set  
✓ **BacktestResult Objects Reconstructed** — Data prepared for evaluation layer  
✓ **Performance Evaluations Executed** — All pairs analyzed comprehensively  
✓ **Extended Metrics Displayed** — Stability, profit factor, tail ratio, consistency  
✓ **Rolling Sharpe Visualized** — Temporal stability analysis for top performers  
✓ **Attribution Analyzed** — Directional, signal strength, win/loss decomposition  
✓ **Attribution Visualized** — P&L breakdown charts generated  
✓ **Reports Generated** — Comprehensive markdown reports for all pairs  
✓ **Evaluations Registered** — Metadata tracked in PerformanceRegistry  
✓ **Metadata Persisted** — Execution summary saved

### Data Flow

```
Backtest Results (Step 4)
├─ P&L DataFrame (MultiIndex)
└─ Positions DataFrame (MultiIndex)
    ↓
Performance Analysis (this notebook)
├─ Reconstruct BacktestResult objects
├─ Run analyze_backtest_performance()
└─ Generate PerformanceResult objects
    ↓
Outputs
├─ Markdown reports (one per pair)
├─ Registry entries (performance_registry.json)
└─ Metadata (performance_evaluation_metadata.json)
```

### Re-Running This Notebook

- **Prerequisites:** Requires completed Step 4 (backtest execution)
- **Data Loading:** Loads cached P&L and positions automatically
- **Configuration:** Edit cell 3 to adjust evaluation parameters
- **Focus Visualization:** Top 5 by stability score (automatic)
- **Outputs:** Overwrites reports, registry, and metadata
- **Reports:** All signal-strategy pairs get full markdown reports

### Key Files Generated

```
reports/
└── performance/
    ├── {signal}_{strategy}_{timestamp}.md (one per pair)
    └── ... (typically 9-15 reports)

logs/
└── performance_evaluation_metadata.json

src/aponyx/evaluation/performance/
└── performance_registry.json (updated)
```

### Extended Metrics Explained

- **Stability Score (0-1):** Overall consistency across subperiods
- **Profit Factor:** Gross wins / gross losses (>1 is profitable)
- **Tail Ratio:** 95th percentile / 5th percentile return (asymmetry)
- **Rolling Sharpe:** Mean/std of 3-month rolling Sharpe ratios
- **Consistency:** Percentage of 3-week windows with positive returns
- **Recovery Days:** Average time to recover from drawdowns
- **Drawdown Count:** Number of distinct drawdown periods

### Attribution Components

**Directional:** Long vs short P&L contribution  
**Signal Strength:** Tercile breakdown (weak/medium/strong signals)  
**Win/Loss:** Positive vs negative day decomposition

### Troubleshooting

**Backtest results not found:**
- Run `04_backtest.ipynb` first
- Verify files exist: `data/processed/backtest_results_pnl.parquet` and `backtest_results_positions.parquet`
- Check DATA_DIR configuration

**Evaluation errors:**
- Ensure sufficient observations (min 252 days by default)
- Verify P&L and positions DataFrames have matching structure
- Check DatetimeIndex is properly formatted
- Review ERROR logs for specific validation failures

**Report generation issues:**
- Verify PERFORMANCE_REPORTS_DIR exists and is writable
- Check disk space for markdown file creation
- Ensure all signal/strategy names are valid file name components
- Review report content in generated .md files

**Visualization issues:**
- Ensure plotly installed: `uv sync --extra viz`
- Verify notebook can render Plotly figures
- Check rolling window size relative to data length
- Reduce number of top performers if too many plots

**Registry errors:**
- Check PERFORMANCE_REGISTRY_PATH exists and is writable
- Verify JSON format is valid (delete and recreate if corrupted)
- Ensure unique evaluation IDs (timestamp-based)
- Review registry file permissions