In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from tabulate import tabulate
import os

import performance

## Eigen Portfolio Replication for a Basket of Securities
### Introduction
**The idea of using Principal Component Analysis (PCA) to extract factors from a basket of securities is not new.** In this project, we borrow the idea mentioned in Dr. Marco Avellaneda's paper on statsitical arbitrage (https://math.nyu.edu/~avellane/AvellanedaLeeStatArb071108.pdf), and track the performance of eigen portfolios. This approach uses historical daily share price of N stocks going back M days, and applies PCA on the correlation matrix of standardized returns. Specifically in this project, we use top 100 stocks from S&P 500 as our trading universe and one year as look back window as we construct the correlation matrix. In particular, we do not wish to construct an eigen portfolio for S&P 100, but instead we wish to construct one for these 100 stocks. The reason is that we do not want to face the issue of survivorship bias in S&P 100 index, as it is not stationary in its holdings due to frequent additions and deletions of companies.

### Historical Dataset
**In this project, we define our static trading universe as 99 selected stocks from S&P 500, dating from 2015-07-06 to 2022-12-30.** To see the full list of tickers, please see SP 100 tickers.csv under yfinance data folder. We use dividend adjusted price for this exercise, and the dataset does not have missing price values.

In [2]:
data = pd.read_csv('./data/SP 100 Daily Data.csv')

In [3]:
# we drop ticker "DOW" due to its relatively short history (IPO in 2019).
# the remaining 99 stocks have full history from 2015-07-06 to 2022-12-30. This will be our backtesting period.
price_df = data.loc[(data['Type']=='Adj Close')&(data['Ticker']!='DOW')].pivot_table(
    index='Date',columns='Ticker',values='Price').dropna()
price_df.head(5)

Ticker,AAPL,ABBV,ABT,ACN,ADBE,AIG,AMD,AMGN,AMT,AMZN,...,UNH,UNP,UPS,USB,V,VZ,WBA,WFC,WMT,XOM
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2015-07-06,28.502361,48.501286,42.805702,85.402016,80.5,50.658955,2.47,124.00779,80.816475,21.802,...,108.606407,80.921692,75.608345,34.138191,64.487411,32.283218,67.156448,44.779606,61.275925,57.053539
2015-07-07,28.432241,48.742992,43.210178,86.577682,80.589996,50.953804,2.09,124.722221,81.369553,21.836,...,107.308365,82.182899,76.265411,34.036404,64.156059,32.427345,68.619278,44.628857,62.340412,57.288612
2015-07-08,27.726467,48.103153,42.384003,85.647705,79.989998,50.093792,2.01,122.250053,80.391037,21.485001,...,105.601418,80.593735,74.693192,33.410019,63.171535,32.344013,67.565399,43.835453,61.723682,56.666412
2015-07-09,27.16094,48.323536,42.332386,85.761765,80.470001,50.585224,1.98,121.447411,80.314445,21.7195,...,106.099274,80.930077,74.896545,33.637089,63.474442,32.121815,70.428146,44.033802,61.487141,56.410633
2015-07-10,27.887075,49.219345,42.969208,86.665413,80.589996,51.289619,1.96,123.686775,81.19088,22.175501,...,108.597527,82.250175,75.843002,34.01292,64.771408,32.635654,73.227982,44.493977,61.77438,56.839237


### Standardized Return Calculation
**We take a two-step approach to prepare the data for PCA analysis:**
- Calculate daily return based on adjusted close price
- Standardize returns so that different stock return volatilities are on the same scale.