To gather data, we will use pandas_datareader. We also need pandas for data manipulation, matplotlib for plotting, sklearn for machine learning, numpy for numerical computations, and backtrader for backtesting.

We will first import all the necessary libraries after installing them.

In [88]:
import pandas as pd
from pandas_datareader import data as pdr
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import MinMaxScaler
import numpy as np
import backtrader as bt
import datetime
import yfinance as yf

Next, we will fetch the stock data using pandas_datareader. Let us use the 'AAPL' ticker for example, and we are fetching data for a year:

In [89]:
# Get 'AAPL' stock data for the last year
start_date = datetime.datetime.now() - datetime.timedelta(days=365)
end_date = datetime.date.today()

yf.pdr_override() # <== that's all it takes :-)

# download dataframe
df= pdr.get_data_yahoo("AAPL", start=start_date, end=end_date)

[*********************100%***********************]  1 of 1 completed


Now that we have the stock data, let's move on to generating the candlestick pattern dataset. We'll be working with the Open, High, Low, and Close (OHLC) data and creating features based on this data.

Let's create four features for our model:

Open/Close: If the open price is greater than the close price then we will mark it as 1, else we will mark it as 0

High/Low: If the difference between high and low prices of the day is greater than 1% then we will mark it as 1, else 0

Percentage change: Daily percentage change in prices

Volume Shock: If the volume difference is greater than 10% as compared to previous day then we will mark it as 1(Shock), else 0(No Shock)

In [90]:
# Define a function to calculate the four features
def generate_features(df):
    df['Open-Close'] = (df['Open'] - df['Close'])/df['Open']
    df['High-Low'] = (df['High'] - df['Low'])/df['Low']
    df['percent_change'] = df['Close'].pct_change()
    df['vol_shock'] = (df['Volume'].diff()/df['Volume']) > 0.10
    df['vol_shock'] = df['vol_shock'].astype(int)
    df.dropna(inplace=True)
    df.to_csv('AAPL.csv')
# Apply the function to our dataframe
generate_features(df)

Now that we have our features, let's split this data into a training and testing set. We'll use 70% of data for training and 30% for testing.

Also, we will introduce a target variable 'Signal' which would be a binary variable. If next day's closing price is greater than today's closing price then we will mark it as 1, else 0.


In [91]:
# Define a function to create the target variable
def generate_target(df):
    df['Signal'] = np.where(df['Close'].shift(-1) > df['Close'], 1, 0)
    return df

# Apply the function to our dataframe
df = generate_target(df)

# Drop the 'Adj Close' column
df = df.drop('Adj Close', axis=1)

# Split the data into training and test sets
train = df.iloc[:int(0.7*len(df))]
test = df.iloc[int(0.7*len(df)):]

X_train = train.drop(['Signal'], axis=1)
y_train = train['Signal']
X_test = test.drop(['Signal'], axis=1)
y_test = test['Signal']

Next, let's initialize our RandomForestClassifier and train it on our training data:

In [92]:
# Initialize and train the RandomForestClassifier
clf = RandomForestClassifier(n_estimators=100, random_state=1)
clf.fit(X_train, y_train)

Now, we will move forward to the backtesting part using backtrader. Here we will define our strategy which will use our trained random forest classifier to predict the signal (buy/sell) based on the test data.

In [93]:
class MLStrategy(bt.Strategy):
    params = (
        ('maperiod', 15),
        ('printlog', False),
    )

    def log(self, txt, dt=None, doprint=False):
        ''' Logging function for this strategy'''
        if self.params.printlog or doprint:
            dt = dt or self.datas[0].datetime.date(0)
            print('%s, %s' % (dt.isoformat(), txt))

    def __init__(self):
        # To keep track of pending orders and buy price/commission
        self.order = None
        self.buyprice = None
        self.buycomm = None

    def notify_order(self, order):
        if order.status in [order.Submitted, order.Accepted]:
            # Buy/Sell order submitted/accepted to/by broker - Nothing to do
            return

        # Check if an order has been completed
        if order.status in [order.Completed]:
            if order.isbuy():
                self.log(
                    'BUY EXECUTED, Price: %.2f, Cost: %.2f, Comm %.2f' %
                    (order.executed.price,
                     order.executed.value,
                     order.executed.comm))

                self.buyprice = order.executed.price
                self.buycomm = order.executed.comm
            else:  # Sell
                self.log('SELL EXECUTED, Price: %.2f, Cost: %.2f, Comm %.2f' %
                         (order.executed.price,
                          order.executed.value,
                          order.executed.comm))

            self.bar_executed = len(self)

        elif order.status in [order.Canceled, order.Margin, order.Rejected]:
            self.log('Order Canceled/Margin/Rejected')

        self.order = None

    def notify_trade(self, trade):
        if not trade.isclosed:
            return

        self.log('OPERATION PROFIT, GROSS %.2f, NET %.2f' %
                 (trade.pnl, trade.pnlcomm))

    def next(self):
        # Check if an order is pending ... if yes, we cannot send a 2nd one
        if self.order:
            return

        # Create a 1D array with values, excluding the volume shock feature
        params = {
            'Open': self.datas[0].open[0],
            'High': self.datas[0].high[0],
            'Low': self.datas[0].low[0],
            'Close': self.datas[0].close[0],
            'Volume': self.datas[0].volume[0]
        }
    
        # Calculate additional features just like during the training
        params['Open-Close'] = (params['Open'] - params['Close']) / params['Open']
        params['High-Low'] = (params['High'] - params['Low']) / params['Low']
        params['percent_change'] = params['Close'] / self.datas[0].close[-1] - 1
        params['vol_shock'] = (params['Volume'] - self.datas[0].volume[-1]) / self.datas[0].volume[-1] > 0.10

        # Convert to DataFrame as classifier expects 2D array
        params_df = pd.DataFrame([params])
        
        # Predict the class using the RandomForestClassifier
        prediction = clf.predict(params_df)
        # Check if we are in the market
        if not self.position:
            # Buy if the predicted class is 1
            if prediction[0] == 1:
                self.log('BUY CREATE, %.2f' % self.datas[0].close[0])
                self.order = self.buy()
        else:
            # Sell if the predicted class is 0
            if prediction[0] == 0:
                self.log('SELL CREATE, %.2f' % self.datas[0].close[0])
                self.order = self.sell()

    def stop(self):
        self.log('Ending Value %.2f' % self.broker.getvalue(), doprint=True)


We can proceed to run our backtesting using backtrader's Cerebro engine. Cerebro engine is where all the components we've prepared get executed.

This will run the backtesting for the 'AAPL' stock for the date range we previously set, starting with a portfolio value of $100,000. The script logs the details of each transaction and the final portfolio value.

In [94]:
# Create a cerebro entity
cerebro = bt.Cerebro()

# Add a strategy
cerebro.addstrategy(MLStrategy)

# Create a Data Feed
data = bt.feeds.YahooFinanceCSVData(
    dataname='AAPL.csv', 
    fromdate=start_date,
    todate=end_date,
    reverse=False
)

# Add the Data Feed to Cerebro
cerebro.adddata(data)

# Set our desired cash start
cerebro.broker.setcash(100000.0)

# Add a FixedSize sizer according to the stake
cerebro.addsizer(bt.sizers.FixedSize, stake=10)

# Set the commission
cerebro.broker.setcommission(commission=0.001)

# Print out the starting conditions
print('Starting Portfolio Value: %.2f' % cerebro.broker.getvalue())

# Run over everything
cerebro.run()

# Print out the final result
print('Final Portfolio Value: %.2f' % cerebro.broker.getvalue())


Starting Portfolio Value: 100000.00
2023-07-06, Ending Value 101240.08
Final Portfolio Value: 101240.08
