Below is a self-contained Python script that:

1. Downloads historical OHLCV data (using `yfinance`)
2. Engineers basic technical indicators as features
3. Trains a machine-learning model (Random Forest) to predict next-day returns
4. Generates entry/exit signals based on the model’s predictions
5. Backtests the strategy with `vectorbt` and prints out key performance metrics

### How it works

1. **Data download**
   We pull daily OHLCV for AAPL from 2015–2022 via `yfinance`.
2. **Feature engineering**

   * **EMA20 & EMA50** capture short- and medium-term trends
   * **RSI14** (Relative Strength Index) flags over-bought/oversold conditions
   * **Momentum (5-day)** measures short-term price acceleration
   * **Volume change** for liquidity shifts
3. **Target**
   Next-day return (`ret1`) becomes our regression target.
4. **Model**
   A `RandomForestRegressor` learns to predict `ret1` from the features.
5. **Signals**
   We generate a **long** signal whenever the model’s predicted return > 0, flat otherwise.
6. **Backtest**
   Using `vectorbt`, we simulate trades at close prices, accounting for **0.1% fees**, and compute all major metrics and an equity‐curve plot.

---

#### Next Steps

* **Hyperparameter tuning** (e.g. grid‐search RF depth, n\_estimators)
* Add **stop-loss/take-profit** logic
* Use **walk-forward validation** rather than a single train/test split
* Try other models: XGBoost, LightGBM, LSTM
* Incorporate additional features: macro data, sentiment, advanced technicals

Feel free to adapt this template to your favorite universe of stocks, intervals (e.g. 5 min, 1 h), or models—and let me know if you hit any snags!


In [1]:
# -*- coding: utf-8 -*-
"""
Machine Learning–Driven Backtest Example
========================================

1. Fetch data with yfinance
2. Feature engineering: EMAs, RSI, momentum
3. Train RandomForestRegressor on past data
4. Predict next-day returns
5. Generate long-only signals
6. Backtest with vectorbt
"""

import numpy as np
import pandas as pd
import yfinance as yf
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import vectorbt as vbt

# 1. PARAMETERS
SYMBOL    = 'AAPL'
START     = '2015-01-01'
END       = '2022-12-31'
TEST_SIZE = 0.2
RND_STATE = 42

# 2. FETCH HISTORICAL DATA
data = yf.download(SYMBOL, start=START, end=END, progress=False)
# Keep only the columns we need
df = data[['Open','High','Low','Close','Volume']].copy()

# 3. FEATURE ENGINEERING
# 3.1 Moving averages
df['ema20'] = df['Close'].ewm(span=20, adjust=False).mean()
df['ema50'] = df['Close'].ewm(span=50, adjust=False).mean()
# 3.2 RSI (14-day)
delta       = df['Close'].diff()
gain        = delta.clip(lower=0)
loss        = -delta.clip(upper=0)
avg_gain    = gain.rolling(14).mean()
avg_loss    = loss.rolling(14).mean()
rs          = avg_gain / avg_loss
df['rsi14'] = 100 - (100 / (1 + rs))
# 3.3 Momentum: close / close.shift(5) - 1
df['mom5']  = df['Close'].pct_change(5)
# 3.4 Volume change
df['vol_chg'] = df['Volume'].pct_change()

# 4. TARGET: next-day return
df['ret1'] = df['Close'].pct_change().shift(-1)

# 5. DROP NA
df.dropna(inplace=True)

# 6. SPLIT INTO FEATURES X AND TARGET y
features = ['ema20','ema50','rsi14','mom5','vol_chg']
X = df[features]
y = df['ret1']

# 7. TRAIN/TEST SPLIT
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=TEST_SIZE, shuffle=False)

# 8. MODEL TRAINING
model = RandomForestRegressor(
    n_estimators=100,
    max_depth=5,
    random_state=RND_STATE,
    n_jobs=-1
)
model.fit(X_train, y_train)

# 9. PREDICTIONS & EVALUATION
y_pred_train = model.predict(X_train)
y_pred_test  = model.predict(X_test)
print(f"Train RMSE: {mean_squared_error(y_train, y_pred_train, squared=False):.5f}")
print(f"Test  RMSE: {mean_squared_error(y_test,  y_pred_test,  squared=False):.5f}")

# 10. GENERATE SIGNALS
# We’ll go long when predicted return > 0, flat otherwise.
df_test = df.iloc[X_train.shape[0]:].copy()
df_test['pred_ret'] = y_pred_test
entries = df_test['pred_ret'] > 0
exits   = entries.shift(1)  # exit when the signal flips off

# 11. BACKTEST WITH VECTORBT
pf = vbt.Portfolio.from_signals(
    close=df_test['Close'],
    entries=entries,
    exits=~entries,
    init_cash=100_000,
    fees=0.001,
    freq='1D'
)

# 12. OUTPUT PERFORMANCE
print("\n=== Performance Summary ===")
print(f"Total Return      : {pf.total_return()*100:.2f}%")
print(f"Annualized Return : {pf.annualized_return()*100:.2f}%")
print(f"Sharpe Ratio      : {pf.sharpe_ratio():.2f}")
print(f"Max Drawdown      : {pf.max_drawdown()*100:.2f}%")

# 13. PLOT EQUITY CURVE
pf.plot().show()


  data = yf.download(SYMBOL, start=START, end=END, progress=False)


TypeError: got an unexpected keyword argument 'squared'