# 0. Problem statement

As a **quantitative analyst**, I am interested in improving our trading strategy from the previous iteration. Some reasons why the previous strategy failed were: (1) The stock was sold too early after buying it at a good time and (2) RSI levels were not capturing changes in support and resistance levels of our stocks. This time, the strategy will include more complexity in our selling decision: 
- Buy the stock when the RSI oscillator hits 30. 
- Sell the stock when:
  - The RSI oscillator hits 70
  - The 20d MA > 60d MA 
  - The 20d MA has a positive slope 
  - The stock price's high for the day is higher than the upper bollinger band. 

After a massive dip in a stock (to get it to RSI < 30), it is possible for the stock to increase consistently over a few days to make up for those losses and get the RSI to 70. The time when the stock reaches RSI=70 may not actually be the highest price it could reach on its rally; it is possible for the stock to go higher and our strategy aims to capitalise on.

The 20d MA is delayed in its indication of a decline in stock prices: after a consistent increase, it will hit a slope of zero only after the stock begins to decline. This means that if the 20d MA has a positive slope and the stock price hits a high that is > than the upper bollinger band, it is likely for the stock to have reached its peak (before a decline) when this happens when RSI > 70. 

The one exception to the MA analysis above would be when the 20d MA < 60 d MA. In this case, if the 20d MA has a positive slope, it is likely that the stock's movement will be indicating a buy signal soon (as it would be expected to intersect the 60d MA curve from below it). 

# I. Imports

In [1]:
cd ../

/home/murali/personal_projects/stock-price-forecasts


In [2]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
from matplotlib.pyplot import figure
import seaborn as sns
import pandas_datareader.data as reader
import datetime as dt
from datetime import timedelta
from typing import Iterable, Union
import itertools
from plotly import graph_objects as go
from plotly.subplots import make_subplots
from plotly import express as px
import logging
import scipy

In [3]:
from src.backtesting import Backtesting
from src.utils import get_10y_treasury_yield_data, select_sample_for_backtesting

# II. Initialising key variables

In [4]:
common_path = "eda/data/"
earliest_data_fetch_date = dt.datetime(2019, 3, 1).date()
start_date = dt.datetime(2019, 4, 15).date()
end_date = dt.datetime(2022, 3, 30).date()
index_ticker = ("^GSPC")    # S&P 500

In [5]:
sp500_full = pd.read_csv(common_path + "sp_500_stocks_processed.csv")

In [6]:
sp500 = pd.read_parquet(common_path + "sp500/sp500.parquet")
sp500_prices = sp500.set_index("date")["closing_price"]

In [7]:
num_stocks_per_sector = {
    "IT": 4,
    "ID": 4,
    "F": 4,
    "HC": 4,
    "CD": 3,
    "CS": 2,
    "RE": 2,
    "U": 2,
    "M": 2,
    "CM": 2,
    "E": 1
}

In [8]:
tickers_per_sector = {}

for sector in num_stocks_per_sector.keys():
    mask = sp500_full["sector"] == sector
    tickers_per_sector[sector] = np.array(sorted(sp500_full.loc[mask]["Symbol"]))

# III. Create: the input datasets for implementing the trading strategy

In [9]:
def calculate_roc_of_ma(ma_series: pd.Series, n: int=5): 
    upper = ma_series
    lower = ma_series.shift(n)
    diff = np.log(upper/lower)
    
    return diff/n

In [10]:
def create_compiled_df(
    df_with_date: pd.DataFrame, stock_sample: Iterable, common_path: str) -> pd.DataFrame:
    
    df = pd.DataFrame(df_with_date["date"]).reset_index(drop=True)
    
    for ticker in stock_sample:
        path = common_path + f"sp500/{ticker}.parquet"
        ticker_df = pd.read_parquet(path, columns=["closing_price", "RSI", "daily_return", "high"]).reset_index(drop=True)
        
        closing_price = ticker_df["closing_price"].rename(f"closing_price_{ticker}")
        rsi = ticker_df["RSI"].rename(f"rsi_{ticker}")
        high = ticker_df["high"].rename(f"high_{ticker}")
        ma_20d = ticker_df["closing_price"].rolling(20).mean().rename(f"20d_MA_{ticker}")
        sd_20d = ticker_df["closing_price"].rolling(20).std()
        ma_60d = ticker_df["closing_price"].rolling(60).mean().rename(f"60d_MA_{ticker}")
        upper_bollinger = (ma_20d + 2*sd_20d).rename(f"upper_bollinger_band_{ticker}")
        roc_20d_ma = calculate_roc_of_ma(ma_20d).rename(f"roc_20dMA_{ticker}")
        
        df = pd.concat([df, closing_price, rsi, high, ma_20d, ma_60d, upper_bollinger, roc_20d_ma], axis=1)
        
    df = df.dropna().reset_index(drop=True)

    df["month"] = df["date"].dt.month
    df["year"] = df["date"].dt.year
    
    eom_indices = df.reset_index().groupby(["month", "year"]).nth(-1)["index"]
    df["is_eom"] = df.index.isin(eom_indices)

    return df

In [14]:
def create_is_buy_df(compiled_df: pd.DataFrame) -> pd.DataFrame:
    df = pd.DataFrame(compiled_df["date"]).reset_index(drop=True)
    
    for ticker in stock_sample:
        path = common_path + f"{ticker}.parquet"
        rsi = compiled_df[f"rsi_{ticker}"]
        date = compiled_df["date"].dt.date
        mask = (rsi <= 30)
        mask &= (rsi.shift() > 30)
        mask &= (date <= (end_date - timedelta(days=60)))
        mask = mask.reset_index(drop=True)
        
        df[f"is_buy_{ticker}"] = mask
    
    return df.reset_index(drop=True)

In [15]:
def create_is_sell_df(compiled_df: pd.DataFrame) -> pd.DataFrame:
    df = pd.DataFrame(compiled_df["date"]).reset_index(drop=True)
    
    for ticker in stock_sample:
        path = common_path + f"{ticker}.parquet"
        high = compiled_df[f"high_{ticker}"]
        rsi = compiled_df[f"rsi_{ticker}"]
        ma_20d = compiled_df[f"20d_MA_{ticker}"]
        ma_60d = compiled_df[f"60d_MA_{ticker}"]
        upper_bollinger = compiled_df[f"upper_bollinger_band_{ticker}"]
        roc_20d_ma = compiled_df[f"roc_20dMA_{ticker}"]
        
        mask = (rsi >= 70)
        mask &= (ma_20d > ma_60d)
        mask &= (high >= upper_bollinger)
        mask &= (roc_20d_ma >= 0.0015)
        mask = mask.reset_index(drop=True)
        
        df[f"is_sell_{ticker}"] = mask
    
    return df.reset_index(drop=True)

# IV. Implement: the trading strategy

In [16]:
# Trial for one iteration
initial_cash_balance = 30000
stock_sample = select_sample_for_backtesting(num_stocks_per_sector, tickers_per_sector)
stock_to_beta_df = pd.read_parquet(common_path + "sp500/stock_to_beta.parquet").set_index("ticker")
compiled_df = create_compiled_df(sp500, stock_sample, common_path)
is_buy_df = create_is_buy_df(compiled_df)
is_sell_df = create_is_sell_df(compiled_df)
ten_yr_yield = get_10y_treasury_yield_data(sp500, common_path+"10-year-treasury-yield.csv")

b = Backtesting(initial_cash_balance, end_date, compiled_df, ten_yr_yield, stock_to_beta_df, sp500_prices)

transactions_df, capm_df = b.implement_trading_strategy(
    is_buy_df=is_buy_df, 
    is_sell_df=is_sell_df, 
    order_buy_trades_by="rsi",
    max_days_held=120
)

In [17]:
transactions_df

Unnamed: 0_level_0,date,stock,action,price,num_shares,cash_balance
transaction_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0.0,2019-07-11,,0.0,0.000000,0.00,30000.000000
1.0,2019-07-17,APA,1.0,23.346004,64.25,28500.000000
2.0,2019-07-19,IEX,1.0,159.993362,9.38,27000.000000
3.0,2019-07-19,RCL,1.0,107.652290,13.93,25500.000000
4.0,2019-07-19,A,1.0,67.492493,22.22,24000.000000
...,...,...,...,...,...,...
356.0,2022-03-30,SHW,-1.0,253.009995,3.82,34670.823407
357.0,2022-03-30,GRMN,-1.0,120.290001,8.97,35749.824715
358.0,2022-03-30,BRO,-1.0,72.709999,18.23,37075.327998
359.0,2022-03-30,WDC,-1.0,50.619999,20.35,38105.444977


In [18]:
capm_df.tail(50)

Unnamed: 0_level_0,stock,buy_date,buy_price,sell_date,sell_price,r_m,r_f,E_r,R_i,risk_adjusted_ri
sell_transaction_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
274.0,ALLE,2021-06-18,133.935333,2021-08-30,142.320587,0.08339,0.002667,0.085716,0.060725,-0.024991
276.0,DIS,2021-06-17,174.649994,2021-09-08,185.149994,0.066923,0.003038,0.067978,0.058382,-0.009595
283.0,RCL,2021-06-17,86.410004,2021-09-24,90.690002,0.053859,0.003627,0.092029,0.048344,-0.043685
287.0,MTB,2021-06-18,139.774841,2021-09-29,150.753738,0.045284,0.003788,0.053632,0.075615,0.021983
290.0,ANTM,2021-06-16,376.243561,2021-10-15,391.952393,0.056983,0.004563,0.061835,0.040904,-0.020931
291.0,MMM,2021-06-17,190.259201,2021-10-18,178.527344,0.060788,0.004639,0.048405,-0.063646,-0.112051
292.0,FCX,2021-06-17,34.944633,2021-10-18,38.475983,0.060788,0.004639,0.093211,0.096269,0.003058
293.0,IEX,2021-06-18,210.213135,2021-10-18,212.872253,0.074,0.004596,0.066238,0.01257,-0.053667
294.0,UHS,2021-06-18,146.078278,2021-10-18,132.148651,0.074,0.004596,0.089367,-0.100215,-0.189582
296.0,HRL,2021-06-23,46.670654,2021-10-22,41.789383,0.069009,0.00459,0.027256,-0.110473,-0.137729


In [19]:
diff=capm_df["sell_date"]-capm_df["buy_date"]
diff.mean()

Timedelta('80 days 22:16:00')

In [20]:
capm_df.risk_adjusted_ri.mean()

-0.002913314370200961

In [None]:
# for iteration in range(0, 200):
#     initial_cash_balance = 30000
#     stock_sample = select_sample_for_backtesting(num_stocks_per_sector, tickers_per_sector)
#     stock_to_beta_df = pd.read_parquet(common_path + "stock_to_beta.parquet").set_index("ticker")
#     compiled_df = create_compiled_df(sp500, stock_sample, common_path)
#     is_buy_df = create_is_buy_df(sp500, compiled_df)
#     is_sell_df = create_is_sell_df(sp500, compiled_df)
#     ten_yr_yield = get_10y_treasury_yield_data(sp500, start_date)
#     transactions_df, capm_df = implement_trading_strategy(initial_cash_balance, stock_sample, compiled_df, is_buy_df, is_sell_df, stock_to_beta_df, sp500_prices, ten_yr_yield)
    
#     results_path = "data/trading_strategy_rsi/"
#     path_transactions_df = results_path + f"transactions/transactions_{iteration}.parquet"
#     path_capm_df = results_path + f"capm/capm_{iteration}.parquet"
    
#     write_df_to_local_directory(path_transactions_df, transactions_df)
#     write_df_to_local_directory(path_capm_df, capm_df)

# V. Analyse: the results

# VI. Conclusion