# Fama Macbeth two steps regression

Fama Macbeth two steps method is a regression method to estimate beta and risk premia for factors with a multi-asset time series dataset. 

There are two steps for this:

First, for each asset among the portfolio, regress returns on risk factors to determine beta.

Second, for each time period, regress return on beta to determine risk premium.

## Prepare data

Download data from French's website: 

http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html

In [96]:
import warnings
import pandas as pd
import seaborn as sns
warnings.filterwarnings('ignore')
import pandas_datareader.data as web
import matplotlib.pyplot as plt


# para is the zip file path of dataframe to download
#E.G: http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/Portfolios_Formed_on_BETA_TXT.zip 
#Try Portfolios_Formed_on_BETA
para = '10_Industry_Portfolios'
# df = web.DataReader(para, 'famafrench', start='2010', end='2022-12')
# df.keys()

df = web.DataReader(para, 'famafrench', start='2010', end='2022-12')[0]
#Parameter at very end can be acuqired on the txt file downloaded from website

df.head(10)


Unnamed: 0_level_0,NoDur,Durbl,Manuf,Enrgy,HiTec,Telcm,Shops,Hlth,Utils,Other
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2010-01,-2.4,-0.43,-3.32,-4.84,-7.84,-6.72,-1.84,-0.02,-4.44,-0.54
2010-02,2.66,7.39,5.21,2.61,4.81,2.76,4.3,0.38,-0.42,3.64
2010-03,5.89,9.35,6.52,3.23,6.69,7.63,6.26,3.63,3.12,8.1
2010-04,-1.02,7.26,3.19,4.05,2.28,3.41,2.55,-2.18,2.85,1.83
2010-05,-5.73,-9.06,-8.42,-10.27,-7.78,-5.86,-5.36,-8.06,-6.29,-8.76
2010-06,-1.9,-10.67,-5.8,-6.36,-6.17,-3.84,-9.06,-1.67,-0.7,-6.52
2010-07,7.36,15.72,9.66,7.69,7.31,9.53,4.61,2.14,6.81,6.81
2010-08,-1.17,-9.67,-5.49,-3.43,-6.53,-1.99,-3.97,-1.66,0.37,-7.1
2010-09,6.14,13.52,10.09,9.21,12.56,7.87,12.15,9.01,3.65,8.51
2010-10,4.34,8.49,5.12,4.59,6.41,4.57,2.42,1.98,1.86,2.03


Since FM model looks for the risk premium, we need to handle the excess return. Addtional treatment is required

In [36]:
#Use the web library again to get RF 
RF_t = web.DataReader('F-F_Research_Data_Factors', 'famafrench', start = '2010', end='2022-12')[0]
RF_t.info()

df_up = df.sub(RF_t['RF'], axis = 0)
df_up.head(10)

<class 'pandas.core.frame.DataFrame'>
PeriodIndex: 156 entries, 2010-01 to 2022-12
Freq: M
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Mkt-RF  156 non-null    float64
 1   SMB     156 non-null    float64
 2   HML     156 non-null    float64
 3   RF      156 non-null    float64
dtypes: float64(4)
memory usage: 6.1 KB


Unnamed: 0_level_0,NoDur,Durbl,Manuf,Enrgy,HiTec,Telcm,Shops,Hlth,Utils,Other
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2010-01,-2.4,-0.43,-3.32,-4.84,-7.84,-6.72,-1.84,-0.02,-4.44,-0.54
2010-02,2.66,7.39,5.21,2.61,4.81,2.76,4.3,0.38,-0.42,3.64
2010-03,5.88,9.34,6.51,3.22,6.68,7.62,6.25,3.62,3.11,8.09
2010-04,-1.03,7.25,3.18,4.04,2.27,3.4,2.54,-2.19,2.84,1.82
2010-05,-5.74,-9.07,-8.43,-10.28,-7.79,-5.87,-5.37,-8.07,-6.3,-8.77
2010-06,-1.91,-10.68,-5.81,-6.37,-6.18,-3.85,-9.07,-1.68,-0.71,-6.53
2010-07,7.35,15.71,9.65,7.68,7.3,9.52,4.6,2.13,6.8,6.8
2010-08,-1.18,-9.68,-5.5,-3.44,-6.54,-2.0,-3.98,-1.67,0.36,-7.11
2010-09,6.13,13.51,10.08,9.2,12.55,7.86,12.14,9.0,3.64,8.5
2010-10,4.33,8.48,5.11,4.58,6.4,4.56,2.41,1.97,1.85,2.02


In [91]:
## fit the industry data on FF factor 
#First, get rid of the RF from the RF_t
RF_t = web.DataReader('F-F_Research_Data_Factors', 'famafrench', start = '2010', end='2022-12')[0]
RF_t = RF_t.drop('RF', axis = 1)

import statsmodels.api as sm
X = sm.add_constant(RF_t) 


betas = []
for i in df_up:
    mod = sm.OLS(df_up[i], X)
    results = mod.fit()
    
    betas.append(results.params.drop('const'))
    
betas = pd.DataFrame(betas, index = df_up.columns)
betas

Unnamed: 0,Mkt-RF,SMB,HML
NoDur,0.717089,-0.366835,0.103122
Durbl,1.558786,0.610941,-0.280055
Manuf,1.02903,0.056283,0.214019
Enrgy,1.117796,0.441106,1.160358
HiTec,1.132298,-0.12951,-0.320906
Telcm,0.842254,-0.162644,0.16235
Shops,0.943736,-0.048221,-0.254052
Hlth,0.746389,0.086715,-0.243809
Utils,0.536666,-0.329075,0.094283
Other,1.054875,0.097247,0.388098


Now we want to move on the second step,we regress the period of cross-sectional portfolios on the factor loadings

In [92]:
lambdas = []
for period in df_up.index:
    y = df_up.loc[period, betas.index]
    mod = sm.OLS(y,betas)
    results = mod.fit()
    
    lambdas.append(results.params)
lambdas = pd.DataFrame(lambdas, index = df_up.index)
lambdas

Unnamed: 0_level_0,Mkt-RF,SMB,HML
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2010-01,-3.391150,6.452294,-2.124315
2010-02,3.705408,1.535388,-1.395521
2010-03,6.490897,-2.818789,-1.146500
2010-04,2.488796,2.860961,0.303521
2010-05,-7.485347,2.055054,-2.293479
...,...,...,...
2022-08,-3.486683,0.382535,5.194049
2022-09,-9.477782,12.769926,-4.078550
2022-10,6.641009,-11.374778,16.738386
2022-11,3.348409,-14.324088,3.377540


In [99]:
#find t value to check significance
t = lambdas.mean().div(lambdas.std())
t

Mkt-RF    0.236018
SMB      -0.076175
HML      -0.038717
dtype: float64