In [1]:
from statsmodels.regression.rolling import RollingOLS # estimates relationship b/w dep & indep var using min square diff over a rolling window
import pandas_datareader.data as web # takes web info and stores as dataframe
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np # calculations
import datetime as dt
import yfinance as yf # stock data
import statsmodels.api as sm # stat models
import pandas_ta # technical indicators calculator
import warnings

warnings.filterwarnings('ignore') # annoying :(

# collect sp500 stock data over the past decade, leaving a time gap for testing
sp500 = pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')[0]

sp500['Symbol'] = sp500['Symbol'].str.replace('.', '-')
symbols_list = sp500['Symbol'].unique().tolist()

In [5]:
end_date = '2024-02-01'
start_date = pd.to_datetime(end_date) - pd.DateOffset(years=1)

df = yf.download(tickers=symbols_list, start=start_date, end=end_date)
df = df.stack() # multi indexed

df.index.names = ['date', 'ticker']
df.columns = df.columns.str.lower()

[*********************100%%**********************]  503 of 503 completed

3 Failed downloads:
['SW', 'GEV', 'SOLV']: YFChartError("%ticker%: Data doesn't exist for startDate = 1675227600, endDate = 1706763600")


Price               adj close       close        high         low        open  \
date       ticker                                                               
2023-02-01 A       153.792938  155.449997  156.289993  151.720001  153.309998   
           AAL      16.690001   16.690001   16.719999   15.920000   16.040001   
           AAPL    144.241684  145.429993  146.610001  141.320007  143.970001   
           ABBV    138.326477  146.600006  147.440002  145.250000  146.630005   
           ABNB    113.989998  113.989998  114.889999  109.830002  111.110001   
...                       ...         ...         ...         ...         ...   
2024-01-31 XYL     112.440002  112.440002  114.300003  112.180000  113.919998   
           YUM     128.215134  129.490005  131.979996  129.259995  131.449997   
           ZBH     125.086235  125.599998  127.449997  124.019997  124.019997   
           ZBRA    239.550003  239.550003  250.000000  238.479996  250.000000   
           ZTS     186.83161

In [6]:
df['garman-klass_vol'] = ((np.log(df['high'])-np.log(df['low']))**2)/2 - (2*np.log(2) - 1)*((np.log(df['adj close'])) - np.log(df['open'])**2)
df['rsi'] = df.groupby(level=1)['adj close'].transform(lambda x : pandas_ta.rsi(close=x, length=20))

In [14]:
# df['bb_low'] = df.groupby(level=1)['adj close'].transform(lambda x : pandas_ta.bbands(close=np.log1p(x), length=20).iloc[:,0])
# df['bb_mid'] = df.groupby(level=1)['adj close'].transform(lambda x : pandas_ta.bbands(close=np.log1p(x), length=20).iloc[:,1])
# df['bb_high'] = df.groupby(level=1)['adj close'].transform(lambda x : pandas_ta.bbands(close=np.log1p(x), length=20).iloc[:,2])

# def compute_atr(stock_data):
#     atr = pandas_ta.atr(high=stock_data['high'], low=stock_data['low'], close=stock_data['close'], length=14)
#     return atr.sub(atr.mean()).div(atr.std()) # normalize it

# df['atr'] = df.groupby(level=1, group_keys=False).apply(compute_atr)

# def compute_macd(close):
#     macd = pandas_ta.macd(close=close, length=20).iloc[:,0]
#     return macd.sub(macd.mean()).div(macd.std()) # normalize it

# df['macd'] = df.groupby(level=1, group_keys=False)['adj close'].apply(compute_macd)

df['dollar_vol'] = (df['adj close']*df['volume'])/1e6 # divide by a million

df

Unnamed: 0_level_0,Price,adj close,close,high,low,open,volume,garman-klass_vol,rsi,bb_low,bb_mid,bb_high,atr,macd,dollar_vol
date,ticker,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
2023-02-01,A,153.792938,155.449997,156.289993,151.720001,153.309998,1499900.0,7.838379,,,,,,,230.674028
2023-02-01,AAL,16.690001,16.690001,16.719999,15.920000,16.040001,26925300.0,1.888748,,,,,,,449.383271
2023-02-01,AAPL,144.241684,145.429993,146.610001,141.320007,143.970001,77663600.0,7.620518,,,,,,,11202.328446
2023-02-01,ABBV,138.326477,146.600006,147.440002,145.250000,146.630005,5439900.0,7.706551,,,,,,,752.482203
2023-02-01,ABNB,113.989998,113.989998,114.889999,109.830002,111.110001,4053600.0,6.742969,,,,,,,462.069855
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2024-01-31,XYL,112.440002,112.440002,114.300003,112.180000,113.919998,1304800.0,6.838554,56.463860,4.712249,4.729261,4.746273,-1.290488,0.414453,146.711715
2024-01-31,YUM,128.215134,129.490005,131.979996,129.259995,131.449997,2154200.0,7.319447,51.308300,4.846518,4.863125,4.879731,-0.548845,0.323421,276.201041
2024-01-31,ZBH,125.086235,125.599998,127.449997,124.019997,124.019997,2460100.0,7.111150,65.258886,4.788301,4.809670,4.831039,-0.387016,0.557874,307.724647
2024-01-31,ZBRA,239.550003,239.550003,250.000000,238.479996,250.000000,482600.0,9.661473,43.835842,5.488361,5.532482,5.576602,-0.332214,0.214992,115.606831
