## Unsupervised Learning Trading

- Load and preprocess data
- Feature extraction per item
- Aggregate and filter for top items
- Calculate returns per month
- Load Fama-French Factors and Rolling Factor Bias
- K-Means Cluster per month ~ Efficient Frontier max sharpe ratio optimization
- Visualize data

In [7]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
import pandas_datareader.data as web
import datetime as dt
import yfinance as yf
import sklearn as sk
import pandas_ta
import warnings
warnings.filterwarnings("ignore")
from statsmodels.regression.rolling import RollingOLS
from pypfopt.efficient_frontier import EfficientFrontier

In [12]:
sp500 = pd.read_html("https://en.wikipedia.org/wiki/List_of_S%26P_500_companies")[0]
sp500

Unnamed: 0,Symbol,Security,GICS Sector,GICS Sub-Industry,Headquarters Location,Date added,CIK,Founded
0,MMM,3M,Industrials,Industrial Conglomerates,"Saint Paul, Minnesota",1957-03-04,66740,1902
1,AOS,A. O. Smith,Industrials,Building Products,"Milwaukee, Wisconsin",2017-07-26,91142,1916
2,ABT,Abbott Laboratories,Health Care,Health Care Equipment,"North Chicago, Illinois",1957-03-04,1800,1888
3,ABBV,AbbVie,Health Care,Biotechnology,"North Chicago, Illinois",2012-12-31,1551152,2013 (1888)
4,ACN,Accenture,Information Technology,IT Consulting & Other Services,"Dublin, Ireland",2011-07-06,1467373,1989
...,...,...,...,...,...,...,...,...
498,XYL,Xylem Inc.,Industrials,Industrial Machinery & Supplies & Components,"White Plains, New York",2011-11-01,1524472,2011
499,YUM,Yum! Brands,Consumer Discretionary,Restaurants,"Louisville, Kentucky",1997-10-06,1041061,1997
500,ZBRA,Zebra Technologies,Information Technology,Electronic Equipment & Instruments,"Lincolnshire, Illinois",2019-12-23,877212,1969
501,ZBH,Zimmer Biomet,Health Care,Health Care Equipment,"Warsaw, Indiana",2001-08-07,1136869,1927


In [14]:
sp500['Symbol'] = sp500['Symbol'].str.replace('.','-')
symb_list = sp500['Symbol'].unique().tolist()

In [21]:
## range of data, end date and start date set to 8yrs prior
end_date = '2025-06-27'
start_date = pd.to_datetime(end_date)-pd.DateOffset(365*8)
start_date, end_date

(Timestamp('2017-06-29 00:00:00'), '2025-06-27')

In [30]:
df = yf.download(tickers=symb_list, 
                 start=start_date, 
                 end=end_date).stack()

[*********************100%***********************]  503 of 503 completed


In [33]:
### multi-indexing by stack above, for organization
### indexing to match the new index
df.index.names = ['date', 'ticker']
df.columns = df.columns.str.lower()

### Calculate features and tech indicators
- Garman-Klass Volatility
- RSI
- Bollinger Bands
- ATR
- MACD
- Dollar Volume