# **Dependecies**

The CSCV analysis depends on the following python packages:
- matplotlib  
- statsmodels

# **Importing Libraries**

In [None]:
import pandas as pd 
import numpy as np
from nautilus_trader.backtest.cscv_analysis import CSCV
from nautilus_trader.analysis.statistics.sharpe_ratio import SharpeRatio

# **Load Data**

A EMACrossTrailingStop Strategy is used as example,I perform backtesing of this strategy with different fast_ema_period and slow_ema_period,results are saved into
backtest_returns.csv. This pandas DataFrame's index is DatetimeIndex and each column represents one group of parameters' return on a certain day.

In [None]:
df = pd.read_csv("backtest_returns.csv",index_col=0)
df.index = pd.to_datetime(df.index)

In [None]:
df.head()

## **CSCV Analysis**

The Combinatorially Symmetric Cross-Validation (CSCV) was introduced in [1].CSCV is a framwork framework for estimating the  probability of backtest overfitting(PBO) for different type of  strategies.

### the Combinational Symmetric Cross Validation procedure
1. Collect a matrix $J_s$ of returns' time series for each strategy 
2. Split $J_s$ into S disjoint submatrices
    * for example, $J_s$ in [$J_1$, $J_2$, $J_3$, $J_4$,, ...]
3. Generate all combination $C_s$ from $J_s$
    * for example
        * training sets = [$J_1$ + $J_2$ ,   $J_1$ +$J_3$,   $J_1$ + $J_4$, ...] (in-sample)
        * testing   sets = [$J_3$ + $J_4$ ,   $J_2$ +$J_4$,   $J_2$ + $J_3$, ...] (out-of-sample)
4. Each c in $C_s$,
    1. compute in-sample and out-of-sample performance for each strategy 
    2. find the best in-sample strategy, and its corresponding out-of-sample performance
    3. Determine the relative rank of the out-of-sample perfoemance associated with the trial chosen in-sample, denoted by $\bar{w}_c$, where $\bar{w}_c \in (0,1)$; 
    4. compute the logit $\lambda_c = log(\frac{\bar{w}_c}{1-\bar{w}_c})$
5. compute the probability of backtest overfitting
    * $PBO = \int_{-\inf}^0 f(\lambda)\, d\lambda$

In [None]:
cv = CSCV(df,10)

In [None]:
cv.estimate()

In [None]:
cv.plot_pbo()

As the figure shows that current parameters range for the EMACrossTrailingStop Strategy's probability of backtest overfitting is 16.7%

In [None]:
cv.plot_performance_degradation()

The last figure Perform a linear regression between the best in-sample strategy simulation performance and out-of-sample simulation performance,
the negetive slope indicates that current parameters range for the EMACrossTrailingStop Strategy might meet a performance degradation in the future.


In [None]:
cv.plot_stochastic_dominance()

The last figure plot  cumulative distribution function of Sharp ratio for in sample and 
Out of sample set.We can see that in most case SD2 is bigger than 0,which indicates 
the EMACrossTrailingStop Strategy In-Sample performance is preferable to randomly choosing.

## **How to use it**

CSCV is used to compare different type of strategies performance,you can compute a CSCV for each strategy,as use 1.0/PBO as asset management weights.

# **References**

[1]. M. L. de Prado, Advances in financial machine learning (John Wiley & Sons, 2018).  

[2]. D. Bailey, J. Borwein, M. López de Prado and J. Zhu, “The probability of backtest overfitting,” working paper, 2013, http://ssrn.com/abstract=2326253.  

[3]. Bailey, D., J. Borwein, M. López de Prado and J. Zhu. “Pseudo-Mathematics and Financial Charlatanism: The Effects of Backtest Overfitting on Out-Of-Sample Performance.” Notices of the American Mathematical Society, Vol. 61, No. 5 (2014), pp. 458-471. Available at http://ssrn.com/abstract=2308659.