# Backtest Overfitting: CSCV, PBO, PSR/DSR

## Overview

-   Problem: With many candidate strategies, the best in‑sample often
    overstates true performance (selection bias, data snooping).
-   Guardrails:
    -   Time‑aware model selection (CSCV) → quantify Probability of
        Backtest Overfitting (PBO) (Bailey et al. 2015)
    -   Performance significance with non‑normality: PSR; selection‑bias
        deflation: DSR (Bailey and Prado 2014)

## Key Definitions

-   CSCV: Combinatorially Symmetric Cross‑Validation (contiguous folds;
    swap in/out halves; select in‑sample champion; score out‑of‑sample).
-   PBO: Fraction of CSCV splits where the in‑sample champion ranks
    poorly OOS (logit rank \< 0).
-   PSR: Probabilistic Sharpe Ratio — probability that true SR exceeds a
    benchmark under finite samples, skew, kurtosis.
-   DSR: Deflated Sharpe Ratio — raises the benchmark SR to account for
    multiple, correlated trials.

## Minimal Equations

Let (S) be observed Sharpe, (S\_) the benchmark Sharpe. PSR (Lopez de
Prado/Bailey):

\[ = !(),, \]

where (\_3) is skewness and (\_4) kurtosis (normal=3), (n) sample size.

DSR increases (S\_) to reflect the number of trials and their dependence
(see Bailey and Prado (2014)).

## Reporting Template (copy/paste)

-   Data: sample period, asset universe, costs/slippage assumptions,
    data vintage/release timing.
-   Trials: number of candidate strategies or hyper‑parameter settings
    explored; correlation comment (e.g., similar families).
-   Selection: selection rule (e.g., in‑sample Sharpe), time‑aware
    protocol (CSCV/walk‑forward).
-   Robustness: CSCV PBO = X.XX (N splits); logit rank distribution
    shown.
-   Significance: PSR = X.XX vs benchmark SR\*; DSR = X.XX (assumptions
    noted).
-   Decision: promote/park; plan for live validation.

## Lightweight Code (repo utilities)

-   `scripts/overfit_metrics.py`:
    -   `cscv_pbo(returns, n_folds=10)` → PBO and diagnostics
    -   `probabilistic_sharpe_ratio(sr_hat, sr_star, n_obs, skew, kurtosis)`
    -   `generate_noise_strategies(T, N, rho)` for demos

See Lab 6B for full examples and figures. If `mlfinlab` is available,
compare DSR implementations.

## References

-   Bailey et al. (2015) — Probability of Backtest Overfitting (PBO) and
    CSCV  
-   Bailey and Prado (2014) — Deflated Sharpe Ratio (DSR)  
-   White (2000) — Reality Check  
-   Hansen (2005) — SPA test

## Visual Workflows

![](attachment:../images/overfitting/dsr_workflow.png)

![](attachment:../images/overfitting/cpcv.png)

![](attachment:../images/overfitting/walkforward_vs_cpcv.png)

![](attachment:../images/overfitting/robust_workflow.png)

Bailey, David H., Jonathan M. Borwein, Marcos López de Prado, and Qiji
Jim Zhu. 2015. “The Probability of Backtest Overfitting.” *Journal of
Computational Finance*. <https://doi.org/10.2139/ssrn.2326253>.

Bailey, David H., and Marcos López de Prado. 2014. “The Deflated Sharpe
Ratio: Correcting for Selection Bias, Backtest Overfitting and
Non-Normality.” *Journal of Portfolio Management* 40 (5): 94–107.
<https://doi.org/10.2139/ssrn.2460551>.

Hansen, Peter R. 2005. “A Test for Superior Predictive Ability.”
*Journal of Business & Economic Statistics* 23 (4): 365–80.
<https://doi.org/10.1198/073500105000000063>.

White, Halbert. 2000. “A Reality Check for Data Snooping.”
*Econometrica* 68 (5): 1097–1126.
<https://doi.org/10.1111/1468-0262.00152>.