# Challenge: Backtest on Other Datasets

## Download data from `yfinance`

In [13]:
import yfinance as yf
import numpy as np

In [14]:
ticker = 'IWDA.AS'
yf.download(tickers=ticker)

[*********************100%***********************]  1 of 1 completed


Price,Adj Close,Close,High,Low,Open,Volume
Ticker,IWDA.AS,IWDA.AS,IWDA.AS,IWDA.AS,IWDA.AS,IWDA.AS
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
2009-09-25,16.990000,16.990000,16.990000,16.990000,16.990000,0
2009-09-28,16.990000,16.990000,16.990000,16.990000,16.990000,0
2009-09-29,16.990000,16.990000,16.990000,16.990000,16.990000,0
2009-09-30,16.990000,16.990000,16.990000,16.990000,16.990000,0
2009-10-01,16.990000,16.990000,16.990000,16.990000,16.990000,0
...,...,...,...,...,...,...
2024-11-28,104.434998,104.434998,104.544998,104.239998,104.495003,133272
2024-11-29,104.830002,104.830002,104.839996,104.209999,104.290001,89400
2024-12-02,105.699997,105.699997,105.794998,105.035004,105.169998,142930
2024-12-03,105.629997,105.629997,105.800003,105.430000,105.754997,167135


In [15]:
df = yf.download(tickers=ticker)

[*********************100%***********************]  1 of 1 completed


## Preprocess the data

### Filter the date range

- Since 1 year ago at least

In [16]:
df_last_year = df.loc['2023-12-08':,:].copy()

df_last_year

Price,Adj Close,Close,High,Low,Open,Volume
Ticker,IWDA.AS,IWDA.AS,IWDA.AS,IWDA.AS,IWDA.AS,IWDA.AS
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
2023-12-08,80.875000,80.875000,80.974998,80.389999,80.434998,129960
2023-12-11,81.160004,81.160004,81.309998,80.809998,81.000000,207655
2023-12-12,81.139999,81.139999,81.300003,80.940002,81.254997,133125
2023-12-13,81.434998,81.434998,81.570000,81.385002,81.425003,125092
2023-12-14,81.404999,81.404999,82.285004,81.349998,82.165001,214888
...,...,...,...,...,...,...
2024-11-28,104.434998,104.434998,104.544998,104.239998,104.495003,133272
2024-11-29,104.830002,104.830002,104.839996,104.209999,104.290001,89400
2024-12-02,105.699997,105.699997,105.794998,105.035004,105.169998,142930
2024-12-03,105.629997,105.629997,105.800003,105.430000,105.754997,167135


### Create the target variable

#### Percentage change

- Percentage change on `Adj Close` for tomorrow

In [17]:
df_last_year['change_tomorrow'] = df_last_year['Adj Close'].pct_change(-1) * 100 * -1

#### Drop rows with any missing data

In [18]:
df_last_year = df_last_year.dropna()

df_last_year

Price,Adj Close,Close,High,Low,Open,Volume,change_tomorrow
Ticker,IWDA.AS,IWDA.AS,IWDA.AS,IWDA.AS,IWDA.AS,IWDA.AS,Unnamed: 7_level_1
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
2023-12-08,80.875000,80.875000,80.974998,80.389999,80.434998,129960,0.351163
2023-12-11,81.160004,81.160004,81.309998,80.809998,81.000000,207655,-0.024654
2023-12-12,81.139999,81.139999,81.300003,80.940002,81.254997,133125,0.362250
2023-12-13,81.434998,81.434998,81.570000,81.385002,81.425003,125092,-0.036851
2023-12-14,81.404999,81.404999,82.285004,81.349998,82.165001,214888,0.695334
...,...,...,...,...,...,...,...
2024-11-27,103.934998,103.934998,104.735001,103.820000,104.720001,138488,0.478767
2024-11-28,104.434998,104.434998,104.544998,104.239998,104.495003,133272,0.376805
2024-11-29,104.830002,104.830002,104.839996,104.209999,104.290001,89400,0.823080
2024-12-02,105.699997,105.699997,105.794998,105.035004,105.169998,142930,-0.066269


#### Change sign

Did the stock go up or down?

In [19]:
df_last_year.change_tomorrow = np.where(df_last_year.change_tomorrow > 0, 1, -1)
df_last_year

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_last_year.change_tomorrow = np.where(df_last_year.change_tomorrow > 0, 1, -1)


Price,Adj Close,Close,High,Low,Open,Volume,change_tomorrow
Ticker,IWDA.AS,IWDA.AS,IWDA.AS,IWDA.AS,IWDA.AS,IWDA.AS,Unnamed: 7_level_1
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
2023-12-08,80.875000,80.875000,80.974998,80.389999,80.434998,129960,1
2023-12-11,81.160004,81.160004,81.309998,80.809998,81.000000,207655,-1
2023-12-12,81.139999,81.139999,81.300003,80.940002,81.254997,133125,1
2023-12-13,81.434998,81.434998,81.570000,81.385002,81.425003,125092,-1
2023-12-14,81.404999,81.404999,82.285004,81.349998,82.165001,214888,1
...,...,...,...,...,...,...,...
2024-11-27,103.934998,103.934998,104.735001,103.820000,104.720001,138488,1
2024-11-28,104.434998,104.434998,104.544998,104.239998,104.495003,133272,1
2024-11-29,104.830002,104.830002,104.839996,104.209999,104.290001,89400,1
2024-12-02,105.699997,105.699997,105.794998,105.035004,105.169998,142930,-1


## Compute Machine Learning model

Proposal: Random Forest within `ensemble` module of `sklearn` library

In [20]:
from sklearn.ensemble import RandomForestClassifier

In [22]:
model = RandomForestClassifier(max_depth=7, random_state=42)

y = df_last_year.change_tomorrow
X = df_last_year.drop(columns='change_tomorrow')

In [23]:
model.fit(X, y)

In [24]:
model.score(X, y)

0.9087301587301587

## Backtesting

### Create the Strategy

In [25]:
from backtesting import Strategy

### Run the Backtest

In [None]:
bt = Backtest(
    ???, ???, cash=10000,
    commission=.002, exclusive_orders=True
)

### Show the report in a DataFrame

## Plot the backtest report

> Don't worry about this new tool just yet, we will explain in a future chapter how to interpret the following chart.

## How to invest based on the numerical increase?

> Instead of the direction (UP or DOWN)

Next chapter → [Backtesting with Regression Models]()

Classification Model | Regression Model
-|-
![](src/pred_classification.png) | ![](src/pred_regression.png)

Classification Strategy | Regression Strategy
-|-
![](src/res_classification.png) | ![](src/res_regression.png)