# Unsupervised Learning Trading Strategy

* Download/Load SP500 stocks prices data.
* Calculate different features and indicators on each stock.
* Aggregate on monthly level and filter top 150 most liquid stocks.
* Calculate Monthly Returns for different time-horizons.
* Download Fama-French Factors and Calculate Rolling Factor Betas.
* For each month fit a K-Means Clustering Algorithm to group similar assets based on their features.
* For each month select assets based on the cluster and form a portfolio based on Efficient Frontier max sharpe ratio optimization.
* Visualize Portfolio returns and compare to SP500 returns.

In [None]:
from statsmodels.regression.rolling import RollingOLS
import pandas_datareader.data as web
import matplotlib.pyplot as plt
import statsmodels.api as sm
import pandas as pd
import numpy as np
import datetime as dt
import yfinance as yf
import pandas_ta
import warnings
warnings.filterwarnings('ignore')

sp500 = pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')[0]

sp500['Symbol'] = sp500['Symbol'].str.replace('.', '-')

symbols_list = sp500['Symbol'].unique().tolist()

# only symbols in sp500
symbols_list

# 2015-09-29 - 2023-09-26 sp500 stocks price data
end_date = '2023-09-27'

start_date = pd.to_datetime(end_date)-pd.DateOffset(365*8)

df = yf.download(tickers=symbols_list,
                 start=start_date,
                 end=end_date,
                 auto_adjust=False).stack()

df.index.names = ['date', 'ticker']

df.columns = df.columns.str.lower()

df

Unnamed: 0_level_0,Price,adj close,close,high,low,open,volume
date,ticker,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2015-09-29,A,31.251011,33.740002,34.060001,33.240002,33.360001,2252400.0
2015-09-29,AAPL,24.568558,27.264999,28.377501,26.965000,28.207500,293461600.0
2015-09-29,ABBV,35.061218,52.790001,54.189999,51.880001,53.099998,12842800.0
2015-09-29,ABT,32.820751,39.500000,40.150002,39.029999,39.259998,12287500.0
2015-09-29,ACGL,23.217773,24.416668,24.456667,24.100000,24.170000,1888800.0
...,...,...,...,...,...,...,...
2023-09-26,XYL,87.981155,89.519997,90.849998,89.500000,90.379997,1322400.0
2023-09-26,YUM,120.448677,124.010002,124.739998,123.449997,124.239998,1500600.0
2023-09-26,ZBH,110.800163,112.459999,117.110001,112.419998,116.769997,3610500.0
2023-09-26,ZBRA,223.960007,223.960007,226.649994,222.580002,225.970001,355400.0


## 2. Calculate features and technical indicators for each stock.

* Garman-Klass Volatility
* RSI
* Bollinger Bands
* ATR
* MACD
* Dollar Volume

\begin{equation}
\text{Garman-Klass Volatility} = \frac{(\ln(\text{High}) - \ln(\text{Low}))^2}{2} - (2\ln(2) - 1)(\ln(\text{Adj Close}) - \ln(\text{Open}))^2
\end{equation}

In [None]:
df['garman_klass_vol'] = ((np.log(df['high'])-np.log(df['low']))**2)/2-(2*np.log(2)-1)*((np.log(df['adj close'])-np.log(df['open']))**2)

(level=1) -> ticker
df['rsi'] = df.groupby(level=1)['adj close'].transform(lambda x: pandas_ta.rsi(close=x, length=20))

'bb_low, mid, high': Low, mid, high Bollinger Bands
df['bb_low'] = df.groupby(level=1)['adj close'].transform(lambda x: pandas_ta.bbands(close=np.log1p(x), length=20).iloc[:,0])

df['bb_mid'] = df.groupby(level=1)['adj close'].transform(lambda x: pandas_ta.bbands(close=np.log1p(x), length=20).iloc[:,1])

df['bb_high'] = df.groupby(level=1)['adj close'].transform(lambda x: pandas_ta.bbands(close=np.log1p(x), length=20).iloc[:,2])

def compute_atr(stock_data):
    atr = pandas_ta.atr(high=stock_data['high'],
                        low=stock_data['low'],
                        close=stock_data['close'],
                        length=14)
    return atr.sub(atr.mean()).div(atr.std())

df['atr'] = df.groupby(level=1, group_keys=False).apply(compute_atr)

def compute_macd(close):
    macd = pandas_ta.macd(close=close, length=20).iloc[:,0]
    return macd.sub(macd.mean()).div(macd.std())

df['macd'] = df.groupby(level=1, group_keys=False)['adj close'].apply(compute_macd)

df['doller_volume'] = (df['adj close']*df['volume'])/1e6

df

Unnamed: 0_level_0,Price,adj close,close,high,low,open,volume,garman_klass_vol,rsi,bb_low,bb_mid,bb_high,atr,macd,doller_volume
date,ticker,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
2015-09-29,A,31.251011,33.740002,34.060001,33.240002,33.360001,2252400.0,-0.001351,,,,,,,70.389777
2015-09-29,AAPL,24.568558,27.264999,28.377501,26.965000,28.207500,293461600.0,-0.006066,,,,,,,7209.928264
2015-09-29,ABBV,35.061218,52.790001,54.189999,51.880001,53.099998,12842800.0,-0.065607,,,,,,,450.284214
2015-09-29,ABT,32.820751,39.500000,40.150002,39.029999,39.259998,12287500.0,-0.011997,,,,,,,403.284980
2015-09-29,ACGL,23.217773,24.416668,24.456667,24.100000,24.170000,1888800.0,-0.000516,,,,,,,43.853730
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2023-09-26,XYL,87.981155,89.519997,90.849998,89.500000,90.379997,1322400.0,-0.000167,26.146758,4.477311,4.559227,4.641143,0.033800,-2.159188,116.346280
2023-09-26,YUM,120.448677,124.010002,124.739998,123.449997,124.239998,1500600.0,-0.000317,36.057228,4.797300,4.827262,4.857224,0.142547,-1.363695,180.745285
2023-09-26,ZBH,110.800163,112.459999,117.110001,112.419998,116.769997,3610500.0,-0.000229,31.893237,4.739333,4.778997,4.818662,-0.381708,-0.881067,400.043989
2023-09-26,ZBRA,223.960007,223.960007,226.649994,222.580002,225.970001,355400.0,0.000133,29.494977,5.400991,5.539167,5.677342,-0.057389,-1.600791,79.595386


## 3. Aggregate to monthly level and filter top 150 most liquid stocks for each month.

* To reduce training time and experiment with features and strategies, we convert the business-daily data to month-end frequency.