---
### Backtesting ML Classification-Based
---

#### I. Load the model

In [1]:
import pickle

with open('models/model_dt_classification.pkl', 'rb') as f:
    model_dt = pickle.load(f)

model_dt

---
#### II. Load the data

In [2]:
import pandas as pd

df = pd.read_excel('data/Microsoft_LinkedIn_Processed.xlsx', index_col=0, parse_dates=['Date'])
df.head(n=5)

Unnamed: 0_level_0,Close,High,Low,Open,Volume,change_tomorrow,change_tomorrow_direction
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2016-12-08,55.181126,55.696671,55.027369,55.44342,21220800,1.549151,UP
2016-12-09,56.049416,56.067505,55.289669,55.334891,27349400,0.321666,UP
2016-12-12,56.230289,56.34787,55.823285,55.91373,20198100,1.286169,UP
2016-12-13,56.962929,57.36089,56.29363,56.528788,35718900,-0.478644,DOWN
2016-12-14,56.691578,57.388013,56.555907,56.981005,30352700,-0.159789,DOWN


---
#### III. Backtesting.py Library

Create your Strategy Class.

In [4]:
from backtesting import Backtest, Strategy

In [5]:
df_explanatory = df[['Open', 'High', 'Low', 'Close', 'Volume']].copy()
model_dt.predict(X=df_explanatory)

array(['UP', 'UP', 'UP', ..., 'UP', 'UP', 'DOWN'], dtype=object)

Simulate the prediction for the last observation.

In [6]:
explanatory_today = df_explanatory.iloc[-1:,:]
explanatory_today

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2025-04-03,374.790009,377.480011,369.350006,373.109985,30147000


In [7]:
forecast_tomorrow = model_dt.predict(explanatory_today)[0]
forecast_tomorrow

'DOWN'

Write the prediction process in the `Strategy class`.

In [8]:
class ClassificationUP(Strategy):
    def init(self):
        self.model = model_dt

    def next(self):
        explanatory_today = df_explanatory.iloc[-1:, :]
        forecast_tomorrow = model_dt.predict(explanatory_today)[0]
        
        # Long/Short conditions

---
#### IV. Compute Purchase Recommendation

*Buy* if it goes **up** and *sell* if it goes **down**.

In [9]:
long_short = []

for direction_tomorrow in df.change_tomorrow_direction:
    if direction_tomorrow == 'UP':
        long_short.append(1)        # Long
    else:
        long_short.append(-1)       # Short

In [10]:
df['long_short'] = long_short
df.head(n=5)

Unnamed: 0_level_0,Close,High,Low,Open,Volume,change_tomorrow,change_tomorrow_direction,long_short
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2016-12-08,55.181126,55.696671,55.027369,55.44342,21220800,1.549151,UP,1
2016-12-09,56.049416,56.067505,55.289669,55.334891,27349400,0.321666,UP,1
2016-12-12,56.230289,56.34787,55.823285,55.91373,20198100,1.286169,UP,1
2016-12-13,56.962929,57.36089,56.29363,56.528788,35718900,-0.478644,DOWN,-1
2016-12-14,56.691578,57.388013,56.555907,56.981005,30352700,-0.159789,DOWN,-1


However, you can only sell if you have already bought the stock and you cannot buy the stock if you have it already.

In [11]:
real_long_short = []
already_bought = False

In [12]:
for direction_tomorrow in df.change_tomorrow_direction:
    if direction_tomorrow == 'UP' and already_bought == False:
        real_long_short.append(1)              # Long
        already_bought = True
    elif direction_tomorrow == 'DOWN' and already_bought == True:
        real_long_short.append(-1)             # Short
        already_bought = False
    else:
        real_long_short.append(0)              # Taking no action

In [13]:
df['real_long_short'] = real_long_short
df[['change_tomorrow_direction', 'long_short', 'real_long_short']].head(n=10)

Unnamed: 0_level_0,change_tomorrow_direction,long_short,real_long_short
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2016-12-08,UP,1,1
2016-12-09,UP,1,0
2016-12-12,UP,1,0
2016-12-13,DOWN,-1,-1
2016-12-14,DOWN,-1,0
2016-12-15,DOWN,-1,0
2016-12-16,UP,1,1
2016-12-19,DOWN,-1,-1
2016-12-20,DOWN,-1,0
2016-12-21,UP,1,1


---
#### V. Implement the `Strategy class`.

Add conditions to the strategy.

In [14]:
# Define a custom strategy class for backtesting using machine learning predictions
class ClassificationUP(Strategy):
    
    def init(self):
        """
        This method is called once at the beginning of the backtest.
        We load the trained model and initialize a flag to keep track of whether we already hold a position.
        """
        self.model = model_dt                 # Load the pre-trained decision tree model
        self.already_bought = False           # Track whether a position is currently open

    def next(self):
        """
        This method is called at every step (i.e., for each new time point in the backtest).
        Here we make a prediction based on the most recent market data, and decide whether to buy or sell.
        """
        # Get the most recent row of features (last bar of data)
        explanatory_today = self.data.df.iloc[-1:, :]

        # Make a prediction for the next day
        forecast_tomorrow = self.model.predict(explanatory_today)[0]  # Will be either 'UP' or 'DOWN'
        
        # ---- Trading logic ----

        # If the model predicts "UP" and we don't already have a position open
        if forecast_tomorrow == 'UP' and not self.already_bought:
            self.buy()                        # Open a long position
            self.already_bought = True        # Update flag to indicate we hold a position

        # If the model predicts "DOWN" and we currently hold a position
        elif forecast_tomorrow == 'DOWN' and self.already_bought:
            self.sell()                       # Exit the position
            self.already_bought = False       # Update flag to indicate no open position

        # If prediction is same as current position status, do nothing
        else:
            pass  # No action taken

Define *initial conditions*.

In [15]:
# Create a backtest instance using the provided data and strategy
bt = Backtest(
    data=df_explanatory,         # The historical market data to run the backtest on
    strategy=ClassificationUP,   # The trading strategy class to apply
    cash=10000,                  # Starting capital for the backtest
    commission=0.002,            # Commission fee per trade (0.2%)
    exclusive_orders=True        # Ensures only one order is active at a time (no overlapping trades)
)

---
#### VI. Backtesting

Run backtesting.

In [16]:
results = bt.run()

Interpret backtesting results.

In [17]:
results.to_frame(name='Values').loc[:'Return [%]']

Unnamed: 0,Values
Start,2016-12-08 00:00:00
End,2025-04-03 00:00:00
Duration,3038 days 00:00:00
Exposure Time [%],98.852224
Equity Final [$],31067509.684843
Equity Peak [$],33391464.049773
Commissions [$],12650616.257933
Return [%],310575.096848


**Interpretation**

- The strategy was active almost all the time in the market (98.95% exposure), meaning it was trading frequently based on the model’s predictions.
- The total return is extremely high (>245,000%), suggesting outstanding performance on paper.
- Even with over $12 million in commissions, the strategy still ended with a large final equity.
- The final equity is close to the peak equity, which implies low drawdown near the end of the backtest.

**Limitations – Why This Doesn't Reflect Reality**

This strategy has essentially no validity in a real-life trading context:

1. **No Train/Test Split**: The model was trained and evaluated on the same data. In reality, we never get to train a model using future market movements. This makes the whole evaluation invalid for real-world trading.

2. **Overfitting**: The model has likely memorized the historical data, including its noise. It didn’t learn generalizable patterns. Such models tend to fail immediately when deployed on unseen data.

3. **Unrealistically High Accuracy**: Because predictions were made on the same data the model was trained on, it gives an inflated sense of accuracy and profitability.

4. **No Out-of-Sample Testing**: We haven’t tested the model’s performance on unseen data. All evaluation has been done in-sample, which is a major methodological flaw.

**Why This Is Still Useful**

Even though the strategy is not realistic as-is, this step is important for learning:
- Successfully implemented and connected a machine learning model to a trading strategy.
- Understood how to extract predictions and translate them into trades.
- Learned how to analyze backtest outputs.