In [1]:
import warnings
import numpy as np
import pandas as pd
import scipy.stats as stats
import vectorbt as vbt
warnings.filterwarnings("ignore")

In [2]:
start = "2015"
end = "now"

In [3]:
prices = vbt.YFData.download (
    #["META", "AAPL" , "AMZN", "NFLX", "GOOG"], start=start, end=end
    "AAPL" , start=start, end=end
)

In [4]:
prices = prices.get("Close")
prices.dropna(inplace=True)

In [5]:
prices

Date
2015-01-05 05:00:00+00:00     23.661270
2015-01-06 05:00:00+00:00     23.663502
2015-01-07 05:00:00+00:00     23.995314
2015-01-08 05:00:00+00:00     24.917269
2015-01-09 05:00:00+00:00     24.943985
                                ...    
2024-12-27 05:00:00+00:00    255.589996
2024-12-30 05:00:00+00:00    252.199997
2024-12-31 05:00:00+00:00    250.419998
2025-01-02 05:00:00+00:00    243.850006
2025-01-03 05:00:00+00:00    243.360001
Name: Close, Length: 2517, dtype: float64

Caculate 10-day and 30- day moving averages using VectorBT's built-in technical indicators

In [6]:
fast_ma = vbt.MA.run(prices, 10, short_name="fast")
slow_ma = vbt.MA.run(prices, 30, short_name="slow")

Identify entry points where the fast moving average crosses above the slow moving average

In [7]:
entries = fast_ma.ma_crossed_above(slow_ma)
entries

Date
2015-01-05 05:00:00+00:00    False
2015-01-06 05:00:00+00:00    False
2015-01-07 05:00:00+00:00    False
2015-01-08 05:00:00+00:00    False
2015-01-09 05:00:00+00:00    False
                             ...  
2024-12-27 05:00:00+00:00    False
2024-12-30 05:00:00+00:00    False
2024-12-31 05:00:00+00:00    False
2025-01-02 05:00:00+00:00    False
2025-01-03 05:00:00+00:00    False
Length: 2517, dtype: bool

In [8]:
exits = fast_ma.ma_crossed_below(slow_ma)
exits

Date
2015-01-05 05:00:00+00:00    False
2015-01-06 05:00:00+00:00    False
2015-01-07 05:00:00+00:00    False
2015-01-08 05:00:00+00:00    False
2015-01-09 05:00:00+00:00    False
                             ...  
2024-12-27 05:00:00+00:00    False
2024-12-30 05:00:00+00:00    False
2024-12-31 05:00:00+00:00    False
2025-01-02 05:00:00+00:00    False
2025-01-03 05:00:00+00:00    False
Length: 2517, dtype: bool

Run the backtest using the identified entry and exit points and the price data

In [9]:
pf = vbt.Portfolio.from_signals(prices, entries, exits)

Display the performance statistics of the backtest

In [10]:
pf.stats()

Start                         2015-01-05 05:00:00+00:00
End                           2025-01-03 05:00:00+00:00
Period                                             2517
Start Value                                       100.0
End Value                                     504.07842
Total Return [%]                              404.07842
Benchmark Return [%]                         928.516217
Max Gross Exposure [%]                            100.0
Total Fees Paid                                     0.0
Max Drawdown [%]                              30.793083
Max Drawdown Duration                             353.0
Total Trades                                         44
Total Closed Trades                                  43
Total Open Trades                                     1
Open Trade PnL                                 17.19203
Win Rate [%]                                  48.837209
Best Trade [%]                                58.402831
Worst Trade [%]                              -12

Time to optimize

Becuase VectorBT can simulate millions of runs in seconds, it's perfectly suited for walk forward analysis. Walk forward analysis (also called cross-fold validation), is a technique which aimS to avoid over fitting. It splits the data into a series of training and testing splits, optimizes our chosen parameters on the training data, and sees how well the strategy performs on the testing data.


Download stock price data for a single ticker for walk-forward analysis

In [11]:
start = "2015"
end = "now"
prices = vbt.YFData.download("AAPL", start = start, end = end).get("Close")

Define moving average window combinations for testing

In [12]:
windows = np.arange(10,50)

Perform rolling split to create in-sample and out-of-sample datasets for walk-forward analysis

In [13]:
(in_price, in_indexes), (out_price, out_indexes) = prices.vbt.rolling_split(
n=30,
window_len=365 * 2, 
set_lens= (180,), 
left_to_right=False,
trace_names= ["train", "test"],
)

In [14]:
print (in_price. shape, len (in_indexes)) 
print (out_price.shape, len(out_indexes))

(550, 30) 30
(180, 30) 30


Visualize the rolling split for walk-forward analysis

In [15]:
prices.vbt. rolling_split(
    n=30,
    window_len=365 * 2, 
    set_lens= (180,), 
    left_to_right=False,
    trace_names =["train", "test"], 
    plot=True,
)

FigureWidget({
    'data': [{'colorscale': [[0.0, '#1f77b4'], [1.0, '#1f77b4']],
              'hoverongaps': False,
              'name': 'train',
              'showlegend': True,
              'showscale': False,
              'type': 'heatmap',
              'uid': '1082b472-b65d-4220-ba0a-257127944ffd',
              'x': array([datetime.datetime(2015, 1, 5, 5, 0, tzinfo=datetime.timezone.utc),
                          datetime.datetime(2015, 1, 6, 5, 0, tzinfo=datetime.timezone.utc),
                          datetime.datetime(2015, 1, 7, 5, 0, tzinfo=datetime.timezone.utc), ...,
                          datetime.datetime(2024, 12, 31, 5, 0, tzinfo=datetime.timezone.utc),
                          datetime.datetime(2025, 1, 2, 5, 0, tzinfo=datetime.timezone.utc),
                          datetime.datetime(2025, 1, 3, 5, 0, tzinfo=datetime.timezone.utc)],
                         dtype=object),
              'y': array([29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15

Define a helper function to simulate all parameter combinations and return the Sharpe ratio

In [16]:
def simulate_all_params(price, windows, **kwargs):
    """
    Simulate all parameter combinations and return Sharpe ratio
    
    This function runs combinations of moving averages and 
    calculates the Sharpe ratio for each run.
    
    Parameters
    ----------
    price:  pd.Series
        Price data for the asset
    windows :  np.ndarray
        Array of window sizes to test 
    kwargs: dict
        Additional arguments for the portfolio simulation
    
    Returns
    -------
    pd. DataFrame
        DataFrame containing Sharpe ratios for each combination
    """
    fast_ma, slow_ma = vbt.MA.run_combs(
        price, windows, r=2, short_names=["fast", "slow"]
    )
    
    entries = fast_ma.ma_crossed_above(slow_ma)
    exits = fast_ma.ma_crossed_below(slow_ma)

    pf = vbt.Portfolio.from_signals(price, entries, exits, **kwargs)
    return pf.sharpe_ratio()

Define a helper function to get the best parameters given their indexes

In [17]:
def get_best_params(performance, level_name):
    """
    Get the best parameters based on performance
    
    This function retrieves the parameter values that
    maximize performance for each split.

    Parameters
    ----------
    performance : pd. DataFrame
        Dataframe containing performance metrics 
    level name : str
        The level name to extract parameter values from

    Returns
    -------
    np.ndarray
        Array of best parameter values
    """

    idx = performance[performance.groupby("split_idx").idxmax()].index
    return idx.get_level_values(level_name).to_numpy()

Define a helper function to get the indexes of the best parameter combinations

In [18]:
def get_best_index (performance):
    """Get indexes of best parameter combinations
    
        This function identifies the best parameter combinations
        based on performance metrics.
        
        Parameters
        ----------
        performance : pd. DataFrame
            DataFrame containing performance metrics
            
        Returns
        -------
        pd. Index
            Index containing the best parameter combinations
    """
    return performance[performance.groupby("split_idx").idxmax()].index

Define a helper function to simulate the best parameters for each split

In [19]:
def simulate_best_params(price, best_fast_windows, best_slow_windows, **kwargs):
    """Simulate best parameters for each split

    This function runs the best parameter combinations
    and calculates the Sharpe ratio.

    Parameters
    ----------
    price : pd.Series
        Price data for the asset 
    best_fast_windows : np.ndarray
        Array of best fast window sizes 
    best_slow_windows: np.ndarray
        Array of best slow window sizes 
    kwargs: dict
        Additional arguments for the portfolio simulation
    
    Returns
    -------
    pd. DataFrame
        DataFrame containing Sharpe ratios for the best parameters
    """
    fast_ma = vbt.MA.run(price, window=best_fast_windows, per_column=True)
    slow_ma = vbt.MA.run(price, window=best_slow_windows, per_column=True)
    
    entries = fast_ma.ma_crossed_above(slow_ma)
    exits = fast_ma.ma_crossed_below(slow_ma)
    
    pf = vbt.Portfolio.from_signals(price, entries, exits, **kwargs)
    return pf.sharpe_ratio()

Simulate all moving average window combinations for in-sample data to find the best Sharpe ratio

In [20]:
in_sharpe = simulate_all_params(
    in_price, 
    windows, 
    direction="both", 
    freq="d"
)
in_sharpe

fast_window  slow_window  split_idx
10           11           0            1.242212
                          1            1.036683
                          2            1.289575
                          3            2.051369
                          4            1.427594
                                         ...   
48           49           25          -0.277185
                          26           0.230947
                          27           0.292398
                          28           0.052446
                          29           0.120119
Name: sharpe_ratio, Length: 23400, dtype: float64

Get the indexes of the best parameter combinations for in-sample data

In [21]:
in_best_index = get_best_index(in_sharpe)
in_best_index

MultiIndex([(10, 17,  0),
            (10, 17,  1),
            (10, 17,  2),
            (10, 11,  3),
            (10, 11,  4),
            (48, 49,  5),
            (48, 49,  6),
            (24, 25,  7),
            (24, 25,  8),
            (23, 25,  9),
            (23, 25, 10),
            (20, 26, 11),
            (18, 27, 12),
            (18, 27, 13),
            (18, 19, 14),
            (11, 21, 15),
            (20, 46, 16),
            (18, 20, 17),
            (18, 20, 18),
            (18, 20, 19),
            (18, 20, 20),
            (37, 39, 21),
            (20, 21, 22),
            (19, 21, 23),
            (19, 20, 24),
            (16, 18, 25),
            (10, 14, 26),
            (10, 14, 27),
            (10, 14, 28),
            (17, 38, 29)],
           names=['fast_window', 'slow_window', 'split_idx'])

Get the fast moving average windows that maximize the in ™sample Sharpe ratio for each split

In [22]:
in_best_fast_windows = get_best_params(in_sharpe, "fast_window")
in_best_fast_windows

array([10, 10, 10, 10, 10, 48, 48, 24, 24, 23, 23, 20, 18, 18, 18, 11, 20,
       18, 18, 18, 18, 37, 20, 19, 19, 16, 10, 10, 10, 17])

Get the slow moving average windows that maximize the in-sample Sharpe ratio for each split

In [23]:
in_best_slow_windows = get_best_params(in_sharpe, "slow_window")
in_best_slow_windows

array([17, 17, 17, 11, 11, 49, 49, 25, 25, 25, 25, 26, 27, 27, 19, 21, 46,
       20, 20, 20, 20, 39, 21, 21, 20, 18, 14, 14, 14, 38])

Combine the best moving average window pairs

In [24]:
in_best_window_pairs = np.array(list(zip(in_best_fast_windows, in_best_slow_windows)))
in_best_window_pairs

array([[10, 17],
       [10, 17],
       [10, 17],
       [10, 11],
       [10, 11],
       [48, 49],
       [48, 49],
       [24, 25],
       [24, 25],
       [23, 25],
       [23, 25],
       [20, 26],
       [18, 27],
       [18, 27],
       [18, 19],
       [11, 21],
       [20, 46],
       [18, 20],
       [18, 20],
       [18, 20],
       [18, 20],
       [37, 39],
       [20, 21],
       [19, 21],
       [19, 20],
       [16, 18],
       [10, 14],
       [10, 14],
       [10, 14],
       [17, 38]])

Simulate all moving average window combinations for out-of-sample data to find the best Sharpe ratio

In [25]:
out_sharpe = simulate_all_params(
    out_price, 
    windows, 
    direction="both", 
    freq="d"
)
out_sharpe

fast_window  slow_window  split_idx
10           11           0            1.854179
                          1            0.483929
                          2           -1.344980
                          3           -0.844082
                          4           -0.475928
                                         ...   
48           49           25          -0.305775
                          26          -0.515798
                          27           1.649000
                          28           1.532408
                          29          -0.151780
Name: sharpe_ratio, Length: 23400, dtype: float64

Evaluate the performance of the best in-sample parameters on out-of-sample data

In [26]:
out_test_sharpe = simulate_best_params(
out_price, in_best_fast_windows, in_best_slow_windows, direction="both", freq="d"
)
out_test_sharpe

ma_window  ma_window  split_idx
10         17         0            1.467409
                      1            1.044320
                      2           -2.216971
           11         3           -0.844082
                      4           -0.475928
48         49         5            0.663517
                      6            1.065501
24         25         7            0.312871
                      8            0.557717
23         25         9            3.576687
                      10           1.043349
20         26         11           2.038092
18         27         12           1.439517
                      13           0.851511
           19         14           0.455529
11         21         15           0.589951
20         46         16          -0.226141
18         20         17           1.032577
                      18          -0.273849
                      19          -0.672301
                      20           0.509168
37         39         21          -1.610192


Calculate the median Sharpe ratio for in-sample data and the test Sharpe ratio for out-of-sample data

In [27]:
in_sample_median = in_sharpe.groupby("split_idx"). median().values
out_sample_test = out_test_sharpe.values
print(len (in_sample_median), len(out_sample_test))

30 30


Run a one-sided t-test to compare the mean Sharpe ratios of in-sample and out-of-sample data

In [28]:
t, p = stats.ttest_ind (a=out_sample_test, b=in_sample_median, alternative="greater")
t, p

(-0.2581381623719491, 0.6013929647736866)