---
### Challenge: Backtest on Other Datasets
---

#### I. Download data from `yfinance`

In [123]:
import yfinance as yf

In [124]:
ticker = 'META'
df = yf.download(ticker)
df.columns = df.columns.droplevel('Ticker')  # Flatten the columns
df.head(n=5)

[*********************100%***********************]  1 of 1 completed


Price,Close,High,Low,Open,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2012-05-18,38.050671,44.788914,37.82175,41.852752,573576400
2012-05-21,33.870369,36.488033,32.845202,36.358642,168192700
2012-05-22,30.854584,33.432435,30.794866,32.457032,101786600
2012-05-23,31.849892,32.347546,31.212894,31.222848,73600000
2012-05-24,32.875057,33.054213,31.620969,32.795434,50237200


---
#### II. Preprocess the data

Filter the date range. We take the data from the 01/01/2021.

In [125]:
df = df.loc['2021-01-01':].copy()
df.head(n=5)

Price,Close,High,Low,Open,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2021-01-04,267.678436,273.710007,263.95599,273.491038,15106100
2021-01-05,269.698914,271.122198,266.951851,267.031492,9871600
2021-01-06,262.074829,266.494004,258.790321,260.770977,24354100
2021-01-07,267.47937,270.335903,263.537954,264.652696,15789800
2021-01-08,266.31485,267.688381,261.945429,267.051369,18528300


We create the target variable: `change_tomorrow`.

In [126]:
df['change_tomorrow'] = df.Close.pct_change(-1)
df.change_tomorrow = df.change_tomorrow * -1
df.change_tomorrow = df.change_tomorrow * 100
df.head(n=5)

Price,Close,High,Low,Open,Volume,change_tomorrow
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2021-01-04,267.678436,273.710007,263.95599,273.491038,15106100,0.74916
2021-01-05,269.698914,271.122198,266.951851,267.031492,9871600,-2.909125
2021-01-06,262.074829,266.494004,258.790321,260.770977,24354100,2.020545
2021-01-07,267.47937,270.335903,263.537954,264.652696,15789800,-0.437272
2021-01-08,266.31485,267.688381,261.945429,267.051369,18528300,-4.177694


We remove rows with any missing data.

In [127]:
df = df.dropna().copy()
df.head(n=5)

Price,Close,High,Low,Open,Volume,change_tomorrow
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2021-01-04,267.678436,273.710007,263.95599,273.491038,15106100,0.74916
2021-01-05,269.698914,271.122198,266.951851,267.031492,9871600,-2.909125
2021-01-06,262.074829,266.494004,258.790321,260.770977,24354100,2.020545
2021-01-07,267.47937,270.335903,263.537954,264.652696,15789800,-0.437272
2021-01-08,266.31485,267.688381,261.945429,267.051369,18528300,-4.177694


---
#### III. Compute Machine Learning model

Feature selection:

1. Target: which variable do you want to predict?
2. Explanatory: which variables will you use to calculate the prediction?

In [128]:
y = df.change_tomorrow
X = df.drop(columns='change_tomorrow')

Train test split.

In [129]:
n_days = len(df.index)
n_days

1073

In [130]:
n_days_split = int(n_days*0.7)
n_days_split

751

In [131]:
X_train, y_train = X.iloc[:n_days_split], y.iloc[:n_days_split]
X_test, y_test = X.iloc[n_days_split:], y.iloc[n_days_split:]

Fit the model on train set.

In [132]:
from sklearn.tree import DecisionTreeRegressor

In [133]:
model_dt_split = DecisionTreeRegressor(max_depth=15, random_state=42)
model_dt_split.fit(X=X_train, y=y_train)

---
#### IV. Evaluate model

In [134]:
from sklearn.metrics import mean_squared_error

On test set.

In [135]:
y_pred_test = model_dt_split.predict(X=X_test)
mean_squared_error(y_true=y_test, y_pred=y_pred_test)

np.float64(12.349249610441916)

On train set.

In [136]:
y_pred_train = model_dt_split.predict(X=X_train)
mean_squared_error(y_true=y_train, y_pred=y_pred_train)

np.float64(1.731601991374963)

---
#### V. Backtesting

In [137]:
from backtesting import Backtest, Strategy

Create the `Strategy`.

In [138]:
class Regression(Strategy):
    limit_buy = 1
    limit_sell = -5
    
    def init(self):
        self.model = DecisionTreeRegressor(max_depth=15, random_state=42)
        self.already_bought = False
        
        self.model.fit(X=X_train, y=y_train)

    def next(self):
        explanatory_today = self.data.df.iloc[[-1], :]
        forecast_tomorrow = self.model.predict(explanatory_today)[0]
        
        if forecast_tomorrow > self.limit_buy and self.already_bought == False:
            self.buy()
            self.already_bought = True
        elif forecast_tomorrow < self.limit_sell and self.already_bought == True:
            self.sell()
            self.already_bought = False
        else:
            pass

Run the backtest on `train` data.

In [139]:
bt_train = Backtest(X_train, Regression,
              cash=10000, commission=.002, exclusive_orders=True)

In [140]:
results = bt_train.run(limit_buy=1, limit_sell=-5)

df_results_train = results.to_frame(name='Values').loc[:'Return [%]']\
    .rename({'Values':'In Sample (Train)'}, axis=1)
df_results_train

Unnamed: 0,In Sample (Train)
Start,2021-01-04 00:00:00
End,2023-12-27 00:00:00
Duration,1087 days 00:00:00
Exposure Time [%],64.047936
Equity Final [$],100494.874149
Equity Peak [$],100494.874149
Commissions [$],2012.713172
Return [%],904.948741


Run the backtest on `test` data.

In [141]:
bt_test = Backtest(X_test, Regression,
              cash=10000, commission=.002, exclusive_orders=True)

In [142]:
results = bt_test.run(limit_buy=1, limit_sell=-5)

df_results_test = results.to_frame(name='Values').loc[:'Return [%]']\
    .rename({'Values':'Out of Sample (Test)'}, axis=1)
df_results_test

Unnamed: 0,Out of Sample (Test)
Start,2023-12-28 00:00:00
End,2025-04-10 00:00:00
Duration,469 days 00:00:00
Exposure Time [%],70.496894
Equity Final [$],13404.941063
Equity Peak [$],17958.346336
Commissions [$],303.739543
Return [%],34.049411


---
#### VI. Compare both backtests

In [143]:
import pandas as pd

In [144]:
df_results = pd.concat([df_results_train, df_results_test], axis=1)
df_results

Unnamed: 0,In Sample (Train),Out of Sample (Test)
Start,2021-01-04 00:00:00,2023-12-28 00:00:00
End,2023-12-27 00:00:00,2025-04-10 00:00:00
Duration,1087 days 00:00:00,469 days 00:00:00
Exposure Time [%],64.047936,70.496894
Equity Final [$],100494.874149,13404.941063
Equity Peak [$],100494.874149,17958.346336
Commissions [$],2012.713172,303.739543
Return [%],904.948741,34.049411


Plot both backtest reports.

In [145]:
bt_train.plot(filename='backtests/regression_train_set.html')

<p align="center">
  <img src="screen/backtest_report_META_train.png" width="800"/>
</p>

In [122]:
bt_test.plot(filename='backtests/regression_test_set.html')

<p align="center">
  <img src="screen/backtest_report_META_test.png" width="800"/>
</p>

![](<src/10_Table_Validation Methods.png>)