Downstream task as described in 'Stock Embeddings: Representation Learning for Financial Time Series' and
'Contrastive Learning of Asset Embeddings from Financial Time Series'

To reduce investment risk, portfolio managers use diversification and hedging, measuring effectiveness in terms of
volatility reduction. As a result, identifying dissimilar stocks that behave oppositely to similar ones is essential
for traders to hedge their target stocks and limit overall risk.

Typically, hedging involves negatively correlated assets and various correlation metrics. We propose an alternative:
using generated embeddings to find maximally dissimilar stocks and inform hedging strategies. We evaluate a scenario
where an investor holds a position in a stock (query stock) and seeks a single stock (hedge stock) to reduce risk,
measured as volatility, as much as possible.

We test a hedging approach by using two-asset long portfolio, consisting of an anchor asset, and the other asset having
the lowest similarity in the latent space, measured using hamming distance. Embeddings will be computed using train
horizon, and portfolio simulated on out-of-sample horizon.

Benchmark: pearson correlation of returns

In [1]:
import yfinance as yf
import pandas as pd

In [22]:
df = pd.read_csv("../datasets/stocks/nasdaq_100.csv", encoding='unicode_escape')

tickers = list(df['Ticker'])
tickers

['ATVI',
 'ADBE',
 'ADP',
 'ABNB',
 'ALGN',
 'GOOGL',
 'GOOG',
 'AMZN',
 'AMD',
 'AEP',
 'AMGN',
 'ADI',
 'ANSS',
 'AAPL',
 'AMAT',
 'ASML',
 'AZN',
 'TEAM',
 'ADSK',
 'BKR',
 'BIIB',
 'BKNG',
 'AVGO',
 'CDNS',
 'CHTR',
 'CTAS',
 'CSCO',
 'CTSH',
 'CMCSA',
 'CEG',
 'CPRT',
 'CSGP',
 'COST',
 'CRWD',
 'CSX',
 'DDOG',
 'DXCM',
 'FANG',
 'DLTR',
 'EBAY',
 'EA',
 'ENPH',
 'EXC',
 'FAST',
 'FISV',
 'FTNT',
 'GILD',
 'GFS',
 'HON',
 'IDXX',
 'ILMN',
 'INTC',
 'INTU',
 'ISRG',
 'JD',
 'KDP',
 'KLAC',
 'KHC',
 'LRCX',
 'LCID',
 'LULU',
 'MAR',
 'MRVL',
 'MELI',
 'META',
 'MCHP',
 'MU',
 'MSFT',
 'MRNA',
 'MDLZ',
 'MNST',
 'NFLX',
 'NVDA',
 'NXPI',
 'ORLY',
 'ODFL',
 'PCAR',
 'PANW',
 'PAYX',
 'PYPL',
 'PDD',
 'PEP',
 'QCOM',
 'REGN',
 'RIVN',
 'ROST',
 'SGEN',
 'SIRI',
 'SBUX',
 'SNPS',
 'TMUS',
 'TSLA',
 'TXN',
 'VRSK',
 'VRTX',
 'WBA',
 'WBD',
 'WDAY',
 'XEL',
 'ZM',
 'ZS']

In [23]:
ohlc = yf.download(tickers, period="max")


[*********************100%***********************]  101 of 101 completed

2 Failed downloads:
['SGEN', 'ATVI']: YFTzMissingError('$%ticker%: possibly delisted; no timezone found')


In [28]:
prices = ohlc["Adj Close"].loc["2011-01-01":].dropna(axis=1)
prices.tail()

Ticker,AAPL,ADBE,ADI,ADP,ADSK,AEP,ALGN,AMAT,AMD,AMGN,...,SIRI,SNPS,TMUS,TSLA,TXN,VRSK,VRTX,WBA,WBD,XEL
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2024-09-12 00:00:00+00:00,222.770004,586.549988,221.529999,277.0,259.470001,102.989998,228.860001,183.210007,150.770004,330.059998,...,25.52,488.950012,202.410004,229.809998,195.979996,268.980011,478.649994,8.84,7.66,63.341999
2024-09-13 00:00:00+00:00,222.5,536.869995,225.419998,277.51001,263.959991,104.169998,243.729996,188.470001,152.309998,332.450012,...,24.51,490.070007,202.830002,230.289993,199.929993,268.790009,485.369995,9.21,8.49,63.84
2024-09-16 00:00:00+00:00,216.320007,521.5,223.279999,278.600006,267.730011,104.949997,249.559998,187.580002,152.080002,335.26001,...,23.639999,498.570007,205.850006,226.779999,198.470001,269.399994,489.429993,9.02,8.56,64.559998
2024-09-17 00:00:00+00:00,216.789993,515.030029,225.350006,279.410004,266.890015,104.209999,253.160004,188.589996,150.820007,332.799988,...,23.27,502.25,202.699997,227.869995,201.389999,267.019989,481.26001,9.06,8.45,64.5
2024-09-18 00:00:00+00:00,220.690002,508.130005,222.639999,275.910004,264.209991,103.220001,253.009995,186.139999,148.289993,332.920013,...,23.27,495.950012,196.679993,227.199997,200.710007,264.76001,474.160004,9.01,8.42,64.360001


In [32]:
# todo monthly returns instead of daily?
returns = prices.pct_change().dropna()
returns.head()

Ticker,AAPL,ADBE,ADI,ADP,ADSK,AEP,ALGN,AMAT,AMD,AMGN,...,SIRI,SNPS,TMUS,TSLA,TXN,VRSK,VRTX,WBA,WBD,XEL
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2011-01-04 00:00:00+00:00,0.005219,0.007031,-0.011331,-0.003614,-0.018844,0.007173,-0.018621,-0.006401,0.035419,0.021602,...,-0.011834,-0.005558,0.065035,0.001878,-0.001833,-0.005802,-0.013893,0.008393,-0.013806,0.004244
2011-01-05 00:00:00+00:00,0.00818,0.022533,0.002133,0.015999,0.070335,-0.015065,0.016923,-0.015033,0.015963,-0.000353,...,-0.011976,-0.002235,0.024425,0.005999,0.003979,0.001459,0.038039,0.013871,-0.021241,-0.005917
2011-01-06 00:00:00+00:00,-0.000808,0.001552,0.005585,0.007768,0.000485,0.005006,-0.012103,0.009448,-0.024691,-0.003173,...,-0.006061,-0.003734,-0.067321,0.039135,0.013719,0.000583,-0.017372,0.002985,-0.005919,-0.00085
2011-01-07 00:00:00+00:00,0.007161,-0.007127,-0.005818,0.002292,-0.012118,-0.003874,0.020929,0.00504,0.016111,0.007603,...,-0.018293,0.002249,-0.026316,0.012912,-0.000902,-0.002912,-0.001105,-0.005952,0.000248,0.008936
2011-01-10 00:00:00+00:00,0.018833,0.028714,0.002926,0.000416,0.002699,-0.006667,0.01,-0.013611,0.04077,-0.009477,...,-0.018633,0.00187,0.017761,0.007436,0.004816,0.00146,0.003595,0.021706,-0.012401,-0.006327
