Statistical validation and diagnostics for quantitative trading strategies: signal analysis, backtest evaluation, and overfitting detection.
This library is one of five interconnected libraries supporting the machine learning for trading workflow described in Machine Learning for Trading:
Each library addresses a distinct stage: data infrastructure, feature engineering, signal evaluation, strategy backtesting, and live deployment.
Evaluating whether a signal or strategy has genuine predictive power requires statistical rigor. ml4t-diagnostic provides:
- Information coefficient (IC) analysis with HAC-adjusted standard errors
- Deflated Sharpe Ratio (DSR) and other multiple-testing corrections
- Combinatorial purged cross-validation (CPCV) for time series
- Feature importance analysis (MDI, PFI, MDA, SHAP)
- Trade-level diagnostics with SHAP-based error pattern discovery
- Portfolio performance metrics and tear sheets
The library implements methods from the academic finance literature, particularly those addressing backtest overfitting and false discovery in strategy research.
pip install ml4t-diagnosticOptional dependencies:
pip install ml4t-diagnostic[ml] # SHAP, importance analysis
pip install ml4t-diagnostic[viz] # Plotly visualizations
pip install ml4t-diagnostic[all] # Everythingfrom ml4t.diagnostic import analyze_signal
result = analyze_signal(
factor=factor_data, # date, asset, factor
prices=price_data, # date, asset, price
periods=(1, 5, 21),
)
print(f"IC (1D): {result.ic['1D']:.4f}")
print(f"IC t-stat (1D): {result.ic_t_stat['1D']:.2f}")
print(f"Q5-Q1 spread (1D): {result.spread['1D']:.2%}")from ml4t.diagnostic.evaluation.stats import deflated_sharpe_ratio
# Accounts for multiple testing
dsr_result = deflated_sharpe_ratio(
returns=strategy_returns,
benchmark_sharpe=0.0,
n_trials=100,
)
print(f"Sharpe: {dsr_result.sharpe_ratio:.2f}")
print(f"Deflated Sharpe: {dsr_result.deflated_sharpe:.2f}")
print(f"Significant: {dsr_result.is_significant}")from ml4t.diagnostic.evaluation import analyze_ml_importance
# Combines MDI, PFI, MDA, SHAP methods
results = analyze_ml_importance(model, X, y)
print(results.consensus_ranking)from ml4t.diagnostic.evaluation import TradeAnalysis, TradeShapAnalyzer
analyzer = TradeAnalysis(trade_records)
worst_trades = analyzer.worst_trades(n=20)
# SHAP-based error pattern discovery
shap_analyzer = TradeShapAnalyzer(model, features_df, shap_values)
result = shap_analyzer.explain_worst_trades(worst_trades)
for pattern in result.error_patterns:
print(f"Pattern: {pattern.hypothesis}")
print(f"Potential savings: ${pattern.potential_impact:,.2f}")Tier 1: Feature Analysis (Pre-Modeling)
├── Time series diagnostics (stationarity, ACF, volatility)
├── Distribution analysis (moments, normality, tails)
├── Feature importance (MDI, PFI, MDA, SHAP)
└── Feature interactions (conditional IC, H-stat)
Tier 2: Signal Analysis (Model Outputs)
├── IC analysis (time series, histogram, decay)
├── Quantile returns (spreads, monotonicity)
├── Turnover analysis
└── Multi-signal comparison
Tier 3: Backtest Analysis (Post-Modeling)
├── Trade analysis (win/loss, holding periods)
├── Statistical validity (DSR, RAS, PBO)
├── Trade-SHAP diagnostics
└── Excursion analysis (TP/SL optimization)
Tier 4: Portfolio Analysis (Production)
├── Performance metrics (Sharpe, Sortino, Calmar)
├── Drawdown analysis
├── Rolling metrics
└── Risk metrics (VaR, CVaR)
| Method | Purpose |
|---|---|
| DSR (Deflated Sharpe) | Corrects for multiple testing bias |
| CPCV (Combinatorial Purged CV) | Leak-free time series validation |
| RAS (Rademacher Anti-Serum) | Backtest overfitting detection |
| PBO | Probability of backtest overfitting |
| HAC-adjusted IC | Autocorrelation-robust information coefficient |
| FDR Control | Multiple comparisons (Benjamini-Hochberg) |
from ml4t.diagnostic.splitters import WalkForwardCV, CombinatorialCV
from ml4t.diagnostic.visualization import plot_cv_folds
# Walk-forward with purging
cv = WalkForwardCV(n_splits=5, train_size=252, test_size=63, purge_days=21)
# Visualize fold structure
fig = plot_cv_folds(cv, dates)
fig.show()- Polars-based: Native Polars DataFrames throughout
- HAC standard errors: Newey-West adjustment for autocorrelated data
- Time-aware validation: Purged and embargoed cross-validation splits
- ml4t-data: Market data acquisition and storage
- ml4t-engineer: Feature engineering and technical indicators
- ml4t-backtest: Event-driven backtesting
- ml4t-live: Live trading with broker integration
git clone https://github.com/applied-ai/ml4t-diagnostic.git
cd ml4t-diagnostic
uv sync
uv run pytest tests/ -q -n auto
uv run ty check- Lopez de Prado, M. (2018). Advances in Financial Machine Learning. Wiley.
- Bailey, D., & Lopez de Prado, M. (2012). "The Sharpe Ratio Efficient Frontier."
- Bailey, D., et al. (2014). "The Deflated Sharpe Ratio."
MIT License - see LICENSE for details.

