# Reverse-Engineering the Fama-French Five Factor Model Using Principal Component Analysis

Miles Child
November 18, 2023

In this notebook, I will isolate the factor loadings for five of the Fama-French factors on the S&P 100 and attempt to reconstruct them using PCA. I have run Fama-French in the past and have been attracted to the qualitative explainability of Fama-French and detracted by the concept that five intuitions can explain the majority of asset return variance. With PCA, I have been attracted to the abstract methodology of explaining asset return variance but detracted from the inability to rationalize why these latent variables exist and why they should continue to exist.

Principal components will be "matched" with the Fama-French factors on the basis of statistical correlation between their factor loadings.

Unmatchable components will be assumed to be either (1) one of the other fama-french factors, (2) some other known but not fama-french factor, or **(3) idiosyncratic factors**.

Hopefully I am successful in this reverse-engineering so that a middle-ground can be identified (this seems like something that's been done a million times before but I'm going to do it anyway for better understanding of both factor isolation methods).

**Note:**

See fama_french.ipynb and pca.ipynb in lib/ for independent examples of the two.

____________________

#### Step 1: Load Data

In [1]:
# imports

import pandas as pd
import numpy as np
import yfinance as yf
from scipy.stats import uniform
from scipy.optimize import minimize
from numpy.linalg import inv
import pandas_datareader.data as web
from numpy.random import random, uniform, dirichlet, choice

In [3]:
# Get the list of constituents in the S&P 100 index
sp100_constituents = pd.read_html("https://en.wikipedia.org/wiki/S%26P_100")[2]
tickers = sp100_constituents['Symbol'].tolist()

# Define the start and end dates for the historical data
start_date = "2021-01-01"
end_date = "2023-06-31"

# Create a dictionary to store the historical data for each stock
stock_data_dict = {}

# Fetch historical data for each stock
for ticker in tickers:
    try:
        # get daily data
        stock_data = yf.download(ticker, start_date, end_date)
        stock_data_dict[ticker] = stock_data
    except Exception as e:
        print(f"Error fetching data for {ticker}: {e}")

sample_ticker = tickers[0]
print(stock_data_dict[sample_ticker].head())

[*********************100%***********************]  1 of 1 completed

1 Failed download:
- AAPL: ValueError('day is out of range for month')
[*********************100%***********************]  1 of 1 completed

1 Failed download:
- ABBV: ValueError('day is out of range for month')
[*********************100%***********************]  1 of 1 completed

1 Failed download:
- ABT: ValueError('day is out of range for month')
[*********************100%***********************]  1 of 1 completed

1 Failed download:
- ACN: ValueError('day is out of range for month')
[*********************100%***********************]  1 of 1 completed

1 Failed download:
- ADBE: ValueError('day is out of range for month')
[*********************100%***********************]  1 of 1 completed

1 Failed download:
- AIG: ValueError('day is out of range for month')
[*********************100%***********************]  1 of 1 completed

1 Failed download:
- AMD: ValueError('day is out of range for month')
[****************

#### Step 2: Fama French Factors