 # 0. Problem statement

As a **quantitative analyst**, I am interested in improving our trading strategy from the previous iteration. Some reasons why the previous strategy failed were: (1) The stock was sold too early after buying it at a good time and (2) RSI levels were not capturing changes in support and resistance levels of our stocks. This time, the strategy will include more complexity in our selling decision: 
- Buy the stock when the RSI oscillator hits 30. 
- Sell the stock when:
  - The RSI oscillator hits 70
  - The 20d MA > 60d MA 
  - The 20d MA has a positive slope 
  - The stock price's high for the day is higher than the upper bollinger band. 
- In addition, we can set a max number of days held for a stock and optimise this variable to maximise returns. 
- We also sold all our holdings at the end of the back-testing period in the previous iteration. This could have negatively impacted our results by not allowing sufficient time for the stock to obtain a "sell" signal. We can reduce the impact of this using the following approach:
  - We noted earlier that we will hold a security for a max number of trading days. 
  - So, we can stop buying stocks for this number of days prior to our end date of back-testing to remove the impact of selling any stock too early.
  - A good metric to evaluate our trading strategy would be to see how many stocks we end up forcefully selling because we have reached our holding period limit. 

After a massive dip in a stock (to get it to RSI < 30), it is possible for the stock to increase consistently over a few days to make up for those losses and get the RSI to 70. The time when the stock reaches RSI=70 may not actually be the highest price it could reach on its rally; it is possible for the stock to go higher and our strategy aims to capitalise on.

The 20d MA is delayed in its indication of a decline in stock prices: after a consistent increase, it will hit a slope of zero only after the stock begins to decline. This means that if the 20d MA has a positive slope and the stock price hits a high that is > than the upper bollinger band, it is likely for the stock to have reached its peak (before a decline) when this happens when RSI > 70. 

The one exception to the MA analysis above would be when the 20d MA < 60 d MA. In this case, if the 20d MA has a positive slope, it is likely that the stock's movement will be indicating a buy signal soon (as it would be expected to intersect the 60d MA curve from below it). 

# I. Imports

In [1]:
cd ../

/home/murali/personal_projects/stock-price-forecasts


In [2]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
from matplotlib.pyplot import figure
import seaborn as sns
import pandas_datareader.data as reader
import datetime as dt
from datetime import timedelta
from typing import Iterable, Union
import itertools
from plotly import graph_objects as go
from plotly.subplots import make_subplots
from plotly import express as px
import logging
import scipy

In [3]:
from src.backtesting import Backtesting
from src.utils import get_10y_treasury_yield_data, select_sample_for_backtesting

# II. Initialising key variables

In [4]:
common_path = "eda/data/"
earliest_data_fetch_date = dt.datetime(2019, 3, 1).date()
start_date = dt.datetime(2019, 4, 15).date()
end_date = dt.datetime(2022, 3, 30).date()
index_ticker = ("^GSPC")    # S&P 500

In [5]:
sp500_full = pd.read_csv(common_path + "sp_500_stocks_processed.csv")

In [6]:
sp500 = pd.read_parquet(common_path + "sp500/sp500.parquet")
sp500_prices = sp500.set_index("date")["closing_price"]

In [7]:
num_stocks_per_sector = {
    "IT": 4,
    "ID": 4,
    "F": 4,
    "HC": 4,
    "CD": 3,
    "CS": 2,
    "RE": 2,
    "U": 2,
    "M": 2,
    "CM": 2,
    "E": 1
}

In [8]:
tickers_per_sector = {}

for sector in num_stocks_per_sector.keys():
    mask = sp500_full["sector"] == sector
    tickers_per_sector[sector] = np.array(sorted(sp500_full.loc[mask]["Symbol"]))

# III. Create: the input datasets for implementing the trading strategy

In [9]:
max_days_held = 30

In [10]:
def calculate_roc_of_ma(ma_series: pd.Series, n: int=5): 
    upper = ma_series
    lower = ma_series.shift(n)
    diff = np.log(upper/lower)
    
    return diff/n

In [11]:
def create_compiled_df(
    df_with_date: pd.DataFrame, stock_sample: Iterable, common_path: str) -> pd.DataFrame:
    
    df = pd.DataFrame(df_with_date["date"]).reset_index(drop=True)
    
    for ticker in stock_sample:
        path = common_path + f"sp500/{ticker}.parquet"
        ticker_df = pd.read_parquet(path, columns=["closing_price", "RSI", "daily_return", "high", "Volume"]).reset_index(drop=True)
        
        closing_price = ticker_df["closing_price"].rename(f"closing_price_{ticker}")
        rsi = ticker_df["RSI"].rename(f"rsi_{ticker}")
        is_rsi_lt_30 = (ticker_df["RSI"] <= 30).rename(f"is_rsi_lt_30_{ticker}")
        is_rsi_gt_70 = (ticker_df["RSI"] >= 70).rename(f"is_rsi_gt_70_{ticker}")
        
        volume = ticker_df["Volume"].rename(f"volume_{ticker}")
        high = ticker_df["high"].rename(f"high_{ticker}")
        ma_20d = ticker_df["closing_price"].rolling(20).mean().rename(f"20d_MA_{ticker}")
        sd_20d = ticker_df["closing_price"].rolling(20).std()
        ma_60d = ticker_df["closing_price"].rolling(60).mean().rename(f"60d_MA_{ticker}")
        upper_bollinger = (ma_20d + 2*sd_20d).rename(f"upper_bollinger_band_{ticker}")
        lower_bollinger = (ma_20d - 2*sd_20d).rename(f"lower_bollinger_band_{ticker}")
        bollinger_width_norm = ((upper_bollinger - lower_bollinger)/(ma_20d)).rename(f"bollinger_width_norm_{ticker}")
        roc_20d_ma = calculate_roc_of_ma(ma_20d).rename(f"roc_20dMA_{ticker}")
        
        df = pd.concat([df, closing_price, rsi, is_rsi_lt_30, is_rsi_gt_70, ma_20d, ma_60d, volume, high, bollinger_width_norm], axis=1)
    
    df = df.dropna().reset_index(drop=True)

    df["month"] = df["date"].dt.month
    df["year"] = df["date"].dt.year
    
    eom_indices = df.reset_index().groupby(["month", "year"]).nth(-1)["index"]
    df["is_eom"] = df.index.isin(eom_indices)

    return df

In [12]:
def find_prev_rsi_low(df: pd.DataFrame) -> pd.DataFrame:
    new_cols = []
    
    for ticker in stock_sample:
        df_copy = df.copy()
        col = df_copy[f"is_rsi_lt_30_{ticker}"]
        mask = (col != col.shift())
        df_copy.loc[mask, "index_transition"] = df_copy.loc[mask].index
        df_copy["index_transition"] = df_copy["index_transition"].fillna(method="ffill")
        g=df_copy[["index_transition", f"is_rsi_lt_30_{ticker}", f"rsi_{ticker}"]].groupby(["index_transition", f"is_rsi_lt_30_{ticker}"]).min().reset_index()
        h = g[g[f"is_rsi_lt_30_{ticker}"]][["index_transition", f"rsi_{ticker}"]]

        for r in h.index:
            i = h.loc[r, "index_transition"]
            rsi=h.loc[r, f"rsi_{ticker}"]
            if r == h.index[0]:
                sel = ((df_copy[f"rsi_{ticker}"] == rsi) & (df_copy["index_transition"] == i))
            else:
                sel |= ((df_copy[f"rsi_{ticker}"] == rsi) & (df_copy["index_transition"] == i))

        df_copy[f"prev_rsi_low_{ticker}"] = df_copy.loc[sel, f"rsi_{ticker}"]
        df_copy[f"prev_rsi_low_{ticker}"] = df_copy[f"prev_rsi_low_{ticker}"].fillna(method="ffill")
        new_cols += [df_copy[f"prev_rsi_low_{ticker}"]]
        
        df_copy = df_copy.drop(columns=["index_transition"])
        df = df.drop(columns=[f"is_rsi_lt_30_{ticker}"])
        
    df = pd.concat([df, pd.concat(new_cols, axis=1)], axis=1)
    
    return df

In [13]:
def find_prev_rsi_high(df: pd.DataFrame) -> pd.DataFrame:
    new_cols = []
    
    for ticker in stock_sample:
        df_copy = df.copy()
        col = df_copy[f"is_rsi_gt_70_{ticker}"]
        mask = (col != col.shift())
        
        df_copy.loc[mask, "index_transition"] = df_copy.loc[mask].index
        df_copy["index_transition"] = df_copy["index_transition"].fillna(method="ffill")

        g=df_copy[["index_transition", f"is_rsi_gt_70_{ticker}", f"rsi_{ticker}"]].groupby(["index_transition", f"is_rsi_gt_70_{ticker}"]).max().reset_index()
        h = g[g[f"is_rsi_gt_70_{ticker}"]][["index_transition", f"rsi_{ticker}"]]

        for r in h.index:
            i = h.loc[r, "index_transition"]
            rsi=h.loc[r, f"rsi_{ticker}"]
            if r == h.index[0]:
                sel = ((df_copy[f"rsi_{ticker}"] == rsi) & (df_copy["index_transition"] == i))
            else:
                sel |= ((df_copy[f"rsi_{ticker}"] == rsi) & (df_copy["index_transition"] == i))

        df_copy[f"prev_rsi_high_{ticker}"] = df_copy.loc[sel, f"rsi_{ticker}"]
        df_copy[f"prev_rsi_high_{ticker}"] = df_copy[f"prev_rsi_high_{ticker}"].fillna(method="ffill")
        new_cols += [df_copy[f"prev_rsi_high_{ticker}"]]
        
        df_copy = df_copy.drop(columns=["index_transition"])
        df = df.drop(columns=[f"is_rsi_gt_70_{ticker}"])
    
    df = pd.concat([df, pd.concat(new_cols, axis=1)], axis=1)

    return df

In [14]:
def create_is_buy_df(compiled_df: pd.DataFrame) -> pd.DataFrame:
    df = pd.DataFrame(compiled_df["date"]).reset_index(drop=True)
    
    for ticker in stock_sample:
        # try:
        path = common_path + f"{ticker}.parquet"
        rsi = compiled_df[f"rsi_{ticker}"]
        vol = compiled_df[f"volume_{ticker}"]
        bollinger_width_norm = compiled_df[f"bollinger_width_norm_{ticker}"]
        prev_rsi_high = compiled_df[f"prev_rsi_high_{ticker}"]
        ma_20d = compiled_df[f"20d_MA_{ticker}"]
        ma_60d = compiled_df[f"60d_MA_{ticker}"]
        date = compiled_df["date"].dt.date
        
        iqr_vol = np.quantile(vol, 0.75) - np.quantile(vol, 0.25)

        mask = (rsi <= prev_rsi_high - 40)
        mask &= (ma_20d > ma_60d)
        mask &= (vol > np.quantile(vol, 0.75) + 1.5*iqr_vol)
        mask &= (bollinger_width_norm < np.quantile(bollinger_width_norm, 0.75))
        mask &= (date <= (end_date - timedelta(days=max_days_held)))
        mask = mask.reset_index(drop=True)

        df[f"is_buy_{ticker}"] = mask
        # except:
        #     print(f"rsi is {rsi}")
        #     print(f"date is {date}")
        #     print(f"end date is {end_date - timedelta(days=max_days_held)}")
    
    return df.reset_index(drop=True)

In [15]:
def create_is_sell_df(compiled_df: pd.DataFrame) -> pd.DataFrame:
    df = pd.DataFrame(compiled_df["date"]).reset_index(drop=True)
    
    for ticker in stock_sample:
        path = common_path + f"{ticker}.parquet"
        high = compiled_df[f"high_{ticker}"]
        rsi = compiled_df[f"rsi_{ticker}"]
        # ma_20d = compiled_df[f"20d_MA_{ticker}"]
        # ma_60d = compiled_df[f"60d_MA_{ticker}"]
        prev_rsi_low = compiled_df[f"prev_rsi_low_{ticker}"]
        # upper_bollinger = compiled_df[f"upper_bollinger_band_{ticker}"]
        vol = compiled_df[f"volume_{ticker}"]
        bollinger_width_norm = compiled_df[f"bollinger_width_norm_{ticker}"]
        # roc_20d_ma = compiled_df[f"roc_20dMA_{ticker}"]
        
        iqr_vol = np.quantile(vol, 0.75) - np.quantile(vol, 0.25)
        
        mask = (rsi >= prev_rsi_low + 40)
        # mask &= (vol > np.quantile(vol, 0.75) + 1.5*iqr_vol)
        # mask &= (bollinger_width_norm >= np.quantile(bollinger_width_norm, 0.75))
        mask = mask.reset_index(drop=True)
        
        df[f"is_sell_{ticker}"] = mask
    
    return df.reset_index(drop=True)

# IV. Implement: the trading strategy

In [16]:
# Trial for one iteration
initial_cash_balance = 30000
# stock_sample = select_sample_for_backtesting(num_stocks_per_sector, tickers_per_sector)
stock_sample = ['EPAM',
'PAYX',
 'GLW',
 'SWKS',
 'HWM',
 'SNA',
 'FDX',
 'ROK',
 'GS',
 'SBNY',
 'USB',
 'BRK-B',
 'MOH',
 'BDX',
 'CRL',
 'ABC',
 'MHK',
 'TSCO',
 'TSLA',
 'GIS',
 'SJM',
 'VNO',
 'SBAC',
 'PNW',
 'EIX',
 'AVY',
 'IP',
 'FOXA',
 'ATVI',
 'WMB']

stock_to_beta_df = pd.read_parquet(common_path + "sp500/stock_to_beta.parquet").set_index("ticker")
compiled_df = create_compiled_df(sp500, stock_sample, common_path)
compiled_df = find_prev_rsi_low(compiled_df)
compiled_df = find_prev_rsi_high(compiled_df)
is_buy_df = create_is_buy_df(compiled_df)
is_sell_df = create_is_sell_df(compiled_df)
ten_yr_yield = get_10y_treasury_yield_data(sp500, common_path+"10-year-treasury-yield.csv")

b = Backtesting(initial_cash_balance, end_date, compiled_df, ten_yr_yield, stock_to_beta_df, sp500_prices)

transactions_df, capm_df = b.implement_trading_strategy(
    is_buy_df=is_buy_df, 
    is_sell_df=is_sell_df, 
    order_buy_trades_by="rsi",
    max_days_held=max_days_held
)

In [17]:
transactions_df.tail(30)

Unnamed: 0_level_0,date,stock,action,price,num_shares,cash_balance
transaction_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
55.0,2021-05-27,AVY,1.0,215.790176,6.92,25383.854434
56.0,2021-06-01,MHK,1.0,204.649994,6.2,24114.664434
57.0,2021-06-07,CRL,-1.0,340.23999,4.66,25700.182788
58.0,2021-06-07,BDX,-1.0,237.121429,6.26,27184.562937
59.0,2021-06-18,PNW,1.0,78.279495,16.21,25915.372937
60.0,2021-06-18,GS,1.0,343.297241,3.7,24646.182937
61.0,2021-06-18,BRK-B,1.0,274.040009,4.63,23376.992937
62.0,2021-06-18,USB,1.0,53.488453,23.73,22107.802937
63.0,2021-06-28,AVY,-1.0,207.30983,6.92,23542.386958
64.0,2021-07-01,MHK,-1.0,197.300003,6.2,24765.646977


In [18]:
capm_df.reached_holding_period_limit.value_counts()

True     30
False    13
Name: reached_holding_period_limit, dtype: int64

In [19]:
capm_df.tail(50)

Unnamed: 0_level_0,stock,buy_date,buy_price,sell_date,sell_price,r_m,r_f,E_r,R_i,risk_adjusted_ri,reached_holding_period_limit
sell_transaction_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
0.0,,,0.0,,0.0,0.0,0.0,0.0,0.0,0.0,False
4.0,MHK,2019-07-26,128.839996,2019-08-26,113.580002,-0.049968,0.001469,-0.065056,-0.126064,-0.061008,True
5.0,WMB,2019-07-26,21.038267,2019-08-26,19.19648,-0.049968,0.001469,-0.052343,-0.091616,-0.039273,True
6.0,EPAM,2019-08-08,189.979996,2019-09-09,193.479996,0.013637,0.001376,0.018205,0.018255,5e-05,True
9.0,SBAC,2019-09-10,238.160919,2019-10-11,238.536316,-0.003066,0.001441,-0.002003,0.001575,0.003578,True
10.0,ROK,2019-10-02,149.074127,2019-10-22,163.567291,0.036846,0.000919,0.040292,0.092781,0.052489,False
13.0,AVY,2019-11-26,125.575119,2019-12-20,127.281906,0.025372,0.001205,0.026393,0.0135,-0.012893,False
14.0,USB,2019-12-18,55.351273,2020-01-21,50.483391,0.039825,0.001734,0.047643,-0.092055,-0.139698,True
21.0,HWM,2020-01-21,29.324753,2020-02-11,31.958494,0.011068,0.00094,0.015462,0.086006,0.070544,False
22.0,MOH,2020-01-31,122.970001,2020-02-14,144.990005,0.046829,0.000611,0.04584,0.164724,0.118885,False


In [None]:
# for iteration in range(0, 200):
#     initial_cash_balance = 30000
#     stock_sample = select_sample_for_backtesting(num_stocks_per_sector, tickers_per_sector)
#     stock_to_beta_df = pd.read_parquet(common_path + "stock_to_beta.parquet").set_index("ticker")
#     compiled_df = create_compiled_df(sp500, stock_sample, common_path)
#     is_buy_df = create_is_buy_df(sp500, compiled_df)
#     is_sell_df = create_is_sell_df(sp500, compiled_df)
#     ten_yr_yield = get_10y_treasury_yield_data(sp500, start_date)
#     transactions_df, capm_df = implement_trading_strategy(initial_cash_balance, stock_sample, compiled_df, is_buy_df, is_sell_df, stock_to_beta_df, sp500_prices, ten_yr_yield)
    
#     results_path = "data/trading_strategy_rsi/"
#     path_transactions_df = results_path + f"transactions/transactions_{iteration}.parquet"
#     path_capm_df = results_path + f"capm/capm_{iteration}.parquet"
    
#     write_df_to_local_directory(path_transactions_df, transactions_df)
#     write_df_to_local_directory(path_capm_df, capm_df)

# V. Analyse: the results

# VI. Conclusion