## Conducting Walk-Forward Optimization with VectorBT

In [1]:
import numpy as np
import scipy.stats as stats
import vectorbt as vbt
from IPython.display import Markdown, display

Define the start and end dates for data download

In [2]:
start = "2016-01-01 UTC"
end = "2020-01-01 UTC"

Download historical closing prices for the symbol "AAPL" from Yahoo Finance

In [3]:
prices = vbt.YFData.download("AAPL", start=start, end=end).get("Close")

Split the prices data into in-sample and out-of-sample sets

In [4]:
# Divide os preços em conjuntos de treino (in-sample) e teste (out-of-sample)
# n=30: número de divisões
# window_len=365*2: tamanho da janela de 2 anos
# set_lens=(180,): tamanho do conjunto de teste de 180 dias
# left_to_right=False: divisão da direita para esquerda
(in_price, in_indexes), (out_price, out_indexes) = prices.vbt.rolling_split(
    n=30,
    window_len=365 * 2,
    set_lens=(180,),
    left_to_right=False,
)

Function to simulate all parameter combinations and calculate Sharpe ratios

In [5]:
def simulate_all_params(price, windows, **kwargs):
    # Executa todas as combinações possíveis de médias móveis
    # price: série de preços
    # windows: array com os tamanhos das janelas
    # kwargs: argumentos adicionais para o Portfolio
    fast_ma, slow_ma = vbt.MA.run_combs(
        price, windows, r=2, short_names=["fast", "slow"]
    )
    # Gera sinais de entrada quando a média rápida cruza acima da lenta
    entries = fast_ma.ma_crossed_above(slow_ma)
    # Gera sinais de saída quando a média rápida cruza abaixo da lenta  
    exits = fast_ma.ma_crossed_below(slow_ma)
    # Cria um portfólio com os sinais gerados
    pf = vbt.Portfolio.from_signals(price, entries, exits, **kwargs)
    # Retorna o Sharpe ratio do portfólio
    return pf.sharpe_ratio()

Function to get the best index based on performance

In [6]:
def get_best_index(performance):
    # Retorna os índices das melhores performances para cada split
    # performance: DataFrame com as performances de cada combinação de parâmetros
    # Agrupa por split_idx e pega o índice do valor máximo de cada grupo
    return performance[performance.groupby("split_idx").idxmax()].index

Function to get the best parameters from the best index

In [7]:
def get_best_params(best_index, level_name):
    # Retorna os valores do nível especificado do índice em formato numpy array
    # best_index: índice MultiIndex com os melhores parâmetros
    # level_name: nome do nível do índice que queremos extrair
    return best_index.get_level_values(level_name).to_numpy()

Function to simulate the best parameters and calculate Sharpe ratios

In [8]:
def simulate_best_params(price, best_fast_windows, best_slow_windows, **kwargs):
    fast_ma = vbt.MA.run(price, window=best_fast_windows, per_column=True)
    slow_ma = vbt.MA.run(price, window=best_slow_windows, per_column=True)

    entries = fast_ma.ma_crossed_above(slow_ma)
    exits = fast_ma.ma_crossed_below(slow_ma)

    pf = vbt.Portfolio.from_signals(price, entries, exits, **kwargs)
    return pf.sharpe_ratio()

Define the range of windows for moving averages

In [9]:
windows = np.arange(10, 40)

Simulate all parameter combinations for in-sample data and calculate Sharpe ratios

In [10]:
in_sharpe = simulate_all_params(in_price, windows, direction="both", freq="d")
in_sharpe

fast_window  slow_window  split_idx
10           11           0            1.482997
                          1            1.315676
                          2            1.382605
                          3            1.266839
                          4            0.830667
                                         ...   
38           39           25          -1.096163
                          26          -1.041609
                          27          -0.875327
                          28          -0.800649
                          29          -0.695308
Name: sharpe_ratio, Length: 13050, dtype: float64

Get the best index and parameters from the in-sample Sharpe ratios

In [11]:
in_best_index = get_best_index(in_sharpe)
in_best_index

MultiIndex([(10, 11,  0),
            (12, 13,  1),
            (12, 13,  2),
            (10, 11,  3),
            (12, 13,  4),
            (10, 11,  5),
            (10, 11,  6),
            (18, 23,  7),
            (18, 23,  8),
            (18, 23,  9),
            (18, 23, 10),
            (18, 23, 11),
            (18, 23, 12),
            (18, 23, 13),
            (18, 23, 14),
            (18, 23, 15),
            (23, 26, 16),
            (23, 26, 17),
            (23, 26, 18),
            (24, 25, 19),
            (24, 25, 20),
            (24, 25, 21),
            (24, 25, 22),
            (24, 25, 23),
            (24, 25, 24),
            (24, 25, 25),
            (24, 25, 26),
            (24, 25, 27),
            (24, 25, 28),
            (24, 25, 29)],
           names=['fast_window', 'slow_window', 'split_idx'])

In [12]:
in_best_fast_windows = get_best_params(in_best_index, "fast_window")
in_best_fast_windows

array([10, 12, 12, 10, 12, 10, 10, 18, 18, 18, 18, 18, 18, 18, 18, 18, 23,
       23, 23, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24])

In [13]:
in_best_slow_windows = get_best_params(in_best_index, "slow_window")
in_best_slow_windows

array([11, 13, 13, 11, 13, 11, 11, 23, 23, 23, 23, 23, 23, 23, 23, 23, 26,
       26, 26, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25])

In [14]:
# Combina as janelas rápidas e lentas ótimas em pares usando zip()
# e converte para um array numpy para uso posterior
in_best_window_pairs = np.array(list(zip(in_best_fast_windows,in_best_slow_windows)))
in_best_window_pairs

array([[10, 11],
       [12, 13],
       [12, 13],
       [10, 11],
       [12, 13],
       [10, 11],
       [10, 11],
       [18, 23],
       [18, 23],
       [18, 23],
       [18, 23],
       [18, 23],
       [18, 23],
       [18, 23],
       [18, 23],
       [18, 23],
       [23, 26],
       [23, 26],
       [23, 26],
       [24, 25],
       [24, 25],
       [24, 25],
       [24, 25],
       [24, 25],
       [24, 25],
       [24, 25],
       [24, 25],
       [24, 25],
       [24, 25],
       [24, 25]])

Simulate the best parameters for out-of-sample data and calculate Sharpe ratios

In [15]:
out_test_sharpe = simulate_best_params(
    out_price, in_best_fast_windows, in_best_slow_windows, direction="both", freq="d"
)


In [16]:
display(out_test_sharpe)

ma_window  ma_window  split_idx
10         11         0            0.104944
12         13         1            0.318315
                      2            0.971220
10         11         3            1.386776
12         13         4            1.303271
10         11         5            2.133295
                      6            2.043524
18         23         7            1.756910
                      8            2.219371
                      9            2.283883
                      10           2.543995
                      11           2.724648
                      12           2.389899
                      13           2.838682
                      14           2.393307
                      15           1.116318
23         26         16           0.670553
                      17           0.594142
                      18           0.816458
24         25         19           0.276339
                      20          -0.052507
                      21          -0.363486


It’s common to overfit backtesting models to market noise. This is especially acute when brute force optimizing technical analysis strategies. To collect evidence to this effect, we can use a one-sided independent t-test to assess the statistical significance between the means of Sharpe ratios for in-sample and out-of-sample datasets:

Perform a t-test to compare the in-sample and out-of-sample Sharpe ratios

In [17]:
in_sample_best = in_sharpe[in_best_index].values
in_sample_best


array([1.48299719, 1.6582792 , 1.65772091, 1.2668385 , 0.84853396,
       0.74690573, 0.76385134, 0.7590957 , 0.65572241, 0.72938928,
       0.80372209, 1.09831879, 0.99308813, 1.14479216, 1.01930212,
       1.0449866 , 1.31312069, 1.37597608, 1.43878032, 1.81287614,
       1.83892253, 2.14768268, 2.06634163, 1.401204  , 1.42425804,
       1.49925313, 1.54774418, 1.61024994, 1.6581574 , 1.75580951])

In [18]:
out_sample_test = out_test_sharpe.values
out_sample_test

array([ 0.10494383,  0.31831456,  0.97122039,  1.38677649,  1.30327091,
        2.13329489,  2.04352394,  1.75690973,  2.21937089,  2.28388334,
        2.54399532,  2.72464798,  2.38989915,  2.8386821 ,  2.39330699,
        1.11631758,  0.67055253,  0.59414186,  0.81645839,  0.27633857,
       -0.05250692, -0.36348638, -0.89563507, -0.6659686 , -0.14755552,
        0.36917689,  0.71068709,  0.46362204,  1.06484465,  1.4371619 ])

The ttest_ind function from the SciPy stats module takes the two independent out_sample_test and in_sample_best samples as its arguments. **The alternative="greater" parameter specifies that the test is one-sided**, which we use to evaluate whether the mean Sharpe ratio of the out-of-sample set is statistically greater than that of the in-sample set. The function returns the calculated t-statistic and the p-value.

In [19]:
# Realiza um teste t independente para comparar as médias dos Sharpe ratios
# a=out_sample_test: dados fora da amostra (out-of-sample)
# b=in_sample_best: dados dentro da amostra (in-sample) 
# alternative="greater": testa se a média de out_sample_test é maior que in_sample_best
t, p = stats.ttest_ind(a=out_sample_test, b=in_sample_best, alternative="greater")

In [20]:
display(t, p)

-1.0849232316848565

0.8587776392244969

The results give us a t-statistic of approximately -1.085 and a p-value of approximately 0.859. The negative value of the t-statistic suggests that the mean of the out-of-sample Sharpe ratios is negative. Further, the high p-value tells us there is not enough statistical evidence to conclude that the out-of-sample Sharpe ratios are greater than the in-sample Sharpe ratios. The negative t-statistic and the high p-value together suggest that the strategy may not perform as well on new, unseen data as it does on the data on which it was optimized. This could be a warning sign regarding the strategy’s robustness and its ability to generalize to new data. Ideally, you’d hope to see a t-statistic over 1.0 and a p-value under 0.05.

In [21]:
display(out_test_sharpe)

ma_window  ma_window  split_idx
10         11         0            0.104944
12         13         1            0.318315
                      2            0.971220
10         11         3            1.386776
12         13         4            1.303271
10         11         5            2.133295
                      6            2.043524
18         23         7            1.756910
                      8            2.219371
                      9            2.283883
                      10           2.543995
                      11           2.724648
                      12           2.389899
                      13           2.838682
                      14           2.393307
                      15           1.116318
23         26         16           0.670553
                      17           0.594142
                      18           0.816458
24         25         19           0.276339
                      20          -0.052507
                      21          -0.363486


Defines the alternative hypothesis. The following options are available (default is ‘two-sided’):
*  ‘two-sided’: the means of the distributions underlying the samples are unequal.
*  ‘less’: the mean of the distribution underlying the first sample is less than the mean of the distribution underlying the second sample.
*  ‘greater’: the mean of the distribution underlying the first sample is greater than the mean of the distribution underlying the second sample.

**Jason Strimpel** is the founder of <a href='https://pyquantnews.com/'>PyQuant News</a> and co-founder of <a href='https://www.tradeblotter.io/'>Trade Blotter</a>. His career in algorithmic trading spans 20+ years. He previously traded for a Chicago-based hedge fund, was a risk manager at JPMorgan, and managed production risk technology for an energy derivatives trading firm in London. In Singapore, he served as APAC CIO for an agricultural trading firm and built the data science team for a global metals trading firm. Jason holds degrees in Finance and Economics and a Master's in Quantitative Finance from the Illinois Institute of Technology. His career spans America, Europe, and Asia. He shares his expertise through the <a href='https://pyquantnews.com/subscribe-to-the-pyquant-newsletter/'>PyQuant Newsletter</a>, social media, and has taught over 1,000+ algorithmic trading with Python in his popular course **<a href='https://gettingstartedwithpythonforquantfinance.com/'>Getting Started With Python for Quant Finance</a>**. All code is for educational purposes only. Nothing provided here is financial advise. Use at your own risk.