<a href="https://colab.research.google.com/github/letianzj/QuantResearch/blob/master/notebooks/fama_french.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Introduction

Factor models such as the APT claim that an asset's expected return comes from its exposures to factor risk premiums. In equilibrium, only factor risks are compensated because idiosyncratic risk can be diversified away. If the factors are complete, there will be no market anomaly or a manager's performance attributes completely to factor exposures; otherwise, the manager is expected to find a new factor and earn extra returns. Therefore it becomes an escape-and-catch game where practitioners are struggling to find new factors and enjoy excess returns, and then others catching up and alpha wears away.

A good factor model yields smaller market anomaly or alpha. In terms of variance, the factor covariance matrix would have a higher explanation power on assets' risks. Factor models are cross-sectional in that they try to explain expected returns across assets. Yet the covariance matrix is estimated across time. Practitioners also try to timing the market factors to take advantage of working factors and avoid factors with little effects in the next period. In sum, a manager value lies on discovering new factors as well as timing or allocating on existing factors.

CAPM, as a special case, has only one factor, the market risk premium. The expected return for any asset, efficient or non efficient, should lie on the SML line.  This is in contrast to CML line, that defines efficient portfolios on the tangent line. The efficient portfolio is perfectly correlated with market portfolio $\rho=1$ so that

$$
\beta = \frac{cov(r, r_M)}{var(r_M)}=\rho \times \frac{\sigma_r}{\sigma_{r_M}}=\frac{\sigma_r}{\sigma_{r_M}}
$$

and hence performance assessment using Sharpe ratio.

Factor model theory is hard to verify. Each stock on each time is subject to sample bias, and stocks' error terms are correlated (if AAPL outperforms, FB tends to outperform). One approach taken by [Fama-MacBeth](https://en.wikipedia.org/wiki/Fama%E2%80%93MacBeth_regression), is to use two-step regressions (time-series and then cross-sectional) on panel data to alleviate cross-sectional correlations.

In [Fama-French three-factor model](https://en.wikipedia.org/wiki/Fama%E2%80%93French_three-factor_model), the authors verify their assumptions on $25$ portfolios. Specifically, 
1. To construct factors, first divide stocks into Big (B) and Small (S) according to market cap; and divide into Low (L), Middle (M), High (H) according to B/P ratio. This leads to six groups or portfolios: B/L, B/M, B/H, S/L, S/M, S/H.
2. Calculate historical value-weighted returns of these six portfolios.
3. Define factor returns as long-short:
$$
\begin{aligned}
SMB&=\frac{S/L+S/M+S/H}{3}-\frac{B/L+B/M+B/H}{3} \\\\
HML&=\frac{S/H+B/H}{2}-\frac{S/L+B/L}{2}
\end{aligned}
$$

The $25$ portfolios to be explained are combinations of $5$ market cap groups and $5$ B/P groups.

The factors are kindly published [here](https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html). Below we examine the betas of SPDR sector ETFs.

In [None]:
# !pip install yfinance

Collecting yfinance
  Downloading https://files.pythonhosted.org/packages/c2/31/8b374a12b90def92a4e27d0fc595fc43635f395984e36a075244d98bd265/yfinance-0.1.54.tar.gz
Building wheels for collected packages: yfinance
  Building wheel for yfinance (setup.py) ... [?25l[?25hdone
  Created wheel for yfinance: filename=yfinance-0.1.54-py2.py3-none-any.whl size=22409 sha256=33deed766c4d6639a02aff5515f2005745647a03f900626a3ac08099bd1dfd2c
  Stored in directory: /root/.cache/pip/wheels/f9/e3/5b/ec24dd2984b12d61e0abf26289746c2436a0e7844f26f2515c
Successfully built yfinance
Installing collected packages: yfinance
Successfully installed yfinance-0.1.54


In [None]:
%matplotlib inline
import numpy as np
import pandas as pd
from datetime import datetime, timedelta
import statsmodels.api as sm
import matplotlib.pyplot as plt
import pandas_datareader as pdr
import yfinance as yf

In [None]:
# pdr.famafrench.get_available_datasets()
df_ff = pdr.data.DataReader('F-F_Research_Data_Factors', 'famafrench')[0]
df_ff.head()

Unnamed: 0_level_0,Mkt-RF,SMB,HML,RF
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2015-07,1.54,-4.15,-4.12,0.0
2015-08,-6.04,0.49,2.66,0.0
2015-09,-3.08,-2.64,0.53,0.0
2015-10,7.75,-1.97,-0.07,0.0
2015-11,0.56,3.64,-0.51,0.0


In [None]:
start_date = datetime(2015, 1, 1)
end_date = datetime.today()
sectors = ['XLB', 'XLC', 'XLF', 'XLI', 'XLK', 'XLP', 'XLRE', 'XLU', 'XLV', 'XLY', 'XLE']

df_sectors = pd.DataFrame()
for sym in sectors:
    print(sym)
    # df = downloadpdr.DataReader(name=sym, data_source='yahoo', start=start_date, end=end_date)
    df = yf.download(sym, start=start_date, end=end_date)
    df = df[['Adj Close']]
    df.columns = [sym]
    df_sectors = pd.concat([df_sectors, df], axis=1, join='outer')

XLB
[*********************100%***********************]  1 of 1 completed
XLC
[*********************100%***********************]  1 of 1 completed
XLF
[*********************100%***********************]  1 of 1 completed
XLI
[*********************100%***********************]  1 of 1 completed
XLK
[*********************100%***********************]  1 of 1 completed
XLP
[*********************100%***********************]  1 of 1 completed
XLRE
[*********************100%***********************]  1 of 1 completed
XLU
[*********************100%***********************]  1 of 1 completed
XLV
[*********************100%***********************]  1 of 1 completed
XLY
[*********************100%***********************]  1 of 1 completed
XLE
[*********************100%***********************]  1 of 1 completed


In [None]:
df_sec_ret = df_sectors.resample('M').agg(lambda x: x[-1])
df_sec_ret.index = df_sec_ret.index.to_period()
df_sec_ret = df_sec_ret.pct_change()
df_sec_ret.head()

Unnamed: 0_level_0,XLB,XLC,XLF,XLI,XLK,XLP,XLRE,XLU,XLV,XLY,XLE
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2015-01,,,,,,,,,,,
2015-02,0.079681,,0.058236,0.053509,0.07995,0.041441,,-0.063948,0.042876,0.085441,0.04593
2015-03,-0.049048,,-0.00616,-0.025461,-0.034356,-0.019427,,-0.009953,0.006406,-0.00483,-0.011552
2015-04,0.03362,,0.00083,-0.00251,0.02751,-0.007591,,-0.004726,-0.010897,-0.000531,0.065739
2015-05,0.003768,,0.019478,0.003236,0.018553,0.008683,,0.006332,0.045043,0.013146,-0.051887


In [None]:
df_sec_ret = df_sec_ret.apply(lambda x: x-df_ff['RF']/100.0)
df_sec_ret.dropna(axis=0, inplace=True)
df_Y = df_sec_ret

df_X = df_ff[['Mkt-RF', 'SMB', 'HML']]/100.0
df_X = df_X.loc[df_Y.index]
print(f'{df_Y.shape[1]} stocks, {df_X.shape[1]} factors, {df_Y.shape[0]} time steps')

df_X = sm.add_constant(df_X, prepend=False)

11 stocks, 3 factors, 23 time steps


In [None]:
# fama_macbeth step one: time-series regression ==> factor exposures
beta = pd.DataFrame()             # factor exposures
for sym in df_Y.columns:
    model = sm.OLS(df_Y[sym], df_X)
    results = model.fit()
    beta = pd.concat([beta, pd.DataFrame([results.params[:3]], index=[sym])])

In [None]:
beta

Unnamed: 0,Mkt-RF,SMB,HML
XLB,0.960951,0.173734,0.140637
XLC,0.888132,0.14325,-0.024562
XLF,1.013096,-0.146569,0.590527
XLI,1.073558,0.032304,0.181941
XLK,1.094444,-0.237579,-0.308043
XLP,0.757556,-0.922759,0.061853
XLRE,0.640816,-0.046201,0.183027
XLU,0.587121,-0.715389,0.174943
XLV,0.760626,-0.016817,-0.236447
XLY,1.147642,-0.035691,-0.084228


It makes sense that utilities has lowest market premium beta; due to oil price historical collapse, it is not surprising to see a high beta there. XLE also has a big positive exposure to HML factor, reflects its capital intensive high book value.