# Fama and French's 3-factor model

We've already been introduced to our friendly CAPM model:

$$R_{i,t} - Rf_t = \alpha + \beta(MKT_t) + \varepsilon_t$$

For reasons discussed in class, and in the literature (see: Graham-Dodd (1934), Banz (1981), Basu (1983) and Fama-French (1992)), we observe empirically that this model does not capture all the variation in the market, and thus might be incorrect to use as an asset pricing model. Specifically, two primary sources of variation have been identified:

1. The size effect - Smaller stocks tend to outperform larger stocks
2. The value effect - Value stocks tend to outperform growth stocks where value stocks are those defined as having a high book-to-market ratio (or a low price-to-book ratio), and growth stocks are those defined as having a low book-to-market ratio (or high price-to-book ratio). Keep in mind this ratio has nothing to do with the size or age of the company, just how much its book value compares to its market value. A market value much higher than the book value indicates that investors feel like this company's value is in the future, and is therefore a "growth stock", whereas the reverse indicates that the company's value is in the present and is therefore a "value stock". 

Various explanations for these effects have been ventured - mostly centering around transactions costs when dealing with smaller stocks, so this doesn't mean markets are inefficient!

However if these are sources of predictable variation in returns, then it is correct that we account for them in our models. To that extent, Fama and French propose adding two factors to the original Fama-MacBeth (1973) exploration of the explainability of asset returns in cross-sections. The result is the so-called Fama-French 3-factor model:

$$R_{i,t} - Rf_t = \alpha + \beta_M(MKT_t) + \beta_S (SMB_t) + \beta_V (HML_t) + \varepsilon_t$$

where $SMB$ ("small minus big") is the factor that controls for the stock's exposure to small stock variation and $HML$ ("high minus low") is the factor that controls for the stock's book-to-market values. 

Let's explore this in the context of our previous example computing the beta for Microsoft. In this exercise, we'll do the same, but with the fama-french 3-factor model instead of the original CAPM model.

In [2]:
import pandas_datareader.data as reader
import pandas as pd
import datetime as dt
import statsmodels.api as sm

import yfinance as yf

## Getting the data
We'll set up the same way we did before. Let's define our time period (5 years/60 months) for the analysis, and scrape the daily price data for MSFT from yahoo finance. 

Note: we'll use Ken French's data for the market factor, so we won't pull in the S&P 500 return here like we did last time.

In [3]:
end = dt.datetime.now()
start = dt.date(end.year - 5, end.month, end.day)
data_input = ['msft', 'x', 't', 'amzn', 'cat']

In [4]:
raw_df = yf.download(data_input, start=start, end=end)
raw_df.head()

[*********************100%%**********************]  5 of 5 completed


Price,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Close,Close,Close,Close,Close,...,Open,Open,Open,Open,Open,Volume,Volume,Volume,Volume,Volume
Ticker,AMZN,CAT,MSFT,T,X,AMZN,CAT,MSFT,T,X,...,AMZN,CAT,MSFT,T,X,AMZN,CAT,MSFT,T,X
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
2019-05-06,97.527496,121.58551,121.984978,15.632223,16.02961,97.527496,136.759995,128.149994,23.104231,16.629999,...,95.899002,134.779999,126.389999,23.006042,16.549999,108356000,4866500,24239800,33340968,21709100
2019-05-07,96.050003,118.838364,119.481506,15.60156,15.817552,96.050003,133.669998,125.519997,23.058912,16.41,...,96.999496,135.199997,126.459999,23.051359,16.450001,118042000,5622900,36017700,34121069,16487700
2019-05-08,95.888496,117.309212,119.472015,15.484026,14.844015,95.888496,131.949997,125.510002,22.885197,15.4,...,95.943497,133.020004,125.440002,22.953173,15.52,81572000,4140500,28419000,33120125,20006600
2019-05-09,94.9935,116.642395,119.462479,15.524908,15.229576,94.9935,131.199997,125.5,22.94562,15.8,...,95.0,130.220001,124.290001,22.862537,15.27,106166000,5906200,27235800,35851802,14328700
2019-05-10,94.499001,116.766891,121.014069,15.64755,15.14255,94.499001,131.339996,127.129997,23.126888,15.66,...,94.900002,130.529999,124.910004,22.847431,15.8,114360000,5240600,30915100,29145609,11507300


In [5]:
df = raw_df['Adj Close']
df

Ticker,AMZN,CAT,MSFT,T,X
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2019-05-06,97.527496,121.585510,121.984978,15.632223,16.029610
2019-05-07,96.050003,118.838364,119.481506,15.601560,15.817552
2019-05-08,95.888496,117.309212,119.472015,15.484026,14.844015
2019-05-09,94.993500,116.642395,119.462479,15.524908,15.229576
2019-05-10,94.499001,116.766891,121.014069,15.647550,15.142550
...,...,...,...,...,...
2024-04-30,175.000000,334.570007,389.329987,16.889999,36.500000
2024-05-01,179.000000,331.070007,394.940002,16.920000,36.980000
2024-05-02,184.720001,335.440002,397.839996,16.820000,37.049999
2024-05-03,186.210007,336.750000,406.660004,16.850000,36.470001


Next let's calculate monthly returns from daily price data like we did before.

In [6]:
#m_ret = df.resample('M').ffill().pct_change().dropna(axis = 0)

df.columns = df.columns.str.lower()
df.to_csv('prices.csv')

## Getting the fama-french factors

Fortunately for us, Ken French's website calculates the factors for us so we don't have to do all the work necessary to construct these extra factors. This saves us a lot of time! Let's pull in the data. 

In [84]:
ff_data = reader.DataReader('F-F_Research_Data_Factors', 'famafrench', start, end)
ff_data[0].head()

Unnamed: 0_level_0,Mkt-RF,SMB,HML,RF
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2019-04,3.97,-1.74,2.15,0.21
2019-05,-6.94,-1.32,-2.37,0.21
2019-06,6.93,0.29,-0.71,0.18
2019-07,1.19,-1.93,0.48,0.19
2019-08,-2.58,-2.38,-4.78,0.16


French's factor data is in percent, while we calculated our returns in decimal, so let's divide the fama-french data by 100 to correct for this.

Note: The data comes in dictionary form. The key `0` contains the monthly data, which is what we want, and `1` contains the annual data which we can ignore. 

In [85]:
m_factors = ff_data[0]/100
m_factors.columns = m_factors.columns.str.lower()
m_factors.head()

Unnamed: 0_level_0,mkt-rf,smb,hml,rf
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2019-04,0.0397,-0.0174,0.0215,0.0021
2019-05,-0.0694,-0.0132,-0.0237,0.0021
2019-06,0.0693,0.0029,-0.0071,0.0018
2019-07,0.0119,-0.0193,0.0048,0.0019
2019-08,-0.0258,-0.0238,-0.0478,0.0016


As before, we have some mismatch due to our calculation of returns data and French's factor data not being as up to date as the returns, so let's remove the first row of the factor data

In [70]:
#cut irrelevant data
m_factors = m_factors[1:]
m_factors.head()

Unnamed: 0_level_0,mkt-rf,smb,hml,rf
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2019-05,-0.0694,-0.0132,-0.0237,0.0021
2019-06,0.0693,0.0029,-0.0071,0.0018
2019-07,0.0119,-0.0193,0.0048,0.0019
2019-08,-0.0258,-0.0238,-0.0478,0.0016
2019-09,0.0143,-0.0096,0.0675,0.0018


And the last two rows of the returns data

In [72]:
m_ret = m_ret[:-2]
m_ret

Date
2019-05-31   -0.049481
2019-06-30    0.083118
2019-07-31    0.017244
2019-08-31    0.015037
2019-09-30    0.008487
2019-10-31    0.031216
2019-11-30    0.059463
2019-12-31    0.041749
2020-01-31    0.079454
2020-02-29   -0.045688
2020-03-31   -0.026541
2020-04-30    0.136326
2020-05-31    0.025391
2020-06-30    0.110559
2020-07-31    0.007371
2020-08-31    0.102752
2020-09-30   -0.067397
2020-10-31   -0.037370
2020-11-30    0.060061
2020-12-31    0.039006
2021-01-31    0.042892
2021-02-28    0.004118
2021-03-31    0.014588
2021-04-30    0.069602
2021-05-31   -0.007627
2021-06-30    0.084989
2021-07-31    0.051716
2021-08-31    0.061591
2021-09-30   -0.066119
2021-10-31    0.176291
2021-11-30   -0.001282
2021-12-31    0.017333
2022-01-31   -0.075345
2022-02-28   -0.037212
2022-03-31    0.031862
2022-04-30   -0.099867
2022-05-31   -0.018077
2022-06-30   -0.055320
2022-07-31    0.093097
2022-08-31   -0.066663
2022-09-30   -0.109267
2022-10-31   -0.003306
2022-11-30    0.102223
2022-1

Check the sizes of these two objects to make sure we have the same number of rows

In [73]:
print(m_ret.shape)
print(m_factors.shape)

(58,)
(58, 4)


Finally, let's add these to our dataframe and construct our dependent variable `msft-rf`.

In [77]:
m_factors['msft'] = m_ret.values
m_factors['msft-rf'] = m_factors['msft'] - m_factors['rf']
m_factors.head()

Unnamed: 0_level_0,mkt-rf,smb,hml,rf,msft,msft-rf
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2019-05,-0.0694,-0.0132,-0.0237,0.0021,-0.049481,-0.051581
2019-06,0.0693,0.0029,-0.0071,0.0018,0.083118,0.081318
2019-07,0.0119,-0.0193,0.0048,0.0019,0.017244,0.015344
2019-08,-0.0258,-0.0238,-0.0478,0.0016,0.015037,0.013437
2019-09,0.0143,-0.0096,0.0675,0.0018,0.008487,0.006687


## Regression

Here, let's carry out the regression similar to how we did for our standard CAPM model. In this case we'll want to include all three factors as covariates. Remember to add your constant here since we're using statsmodels. 

In [78]:
#Regression setup

y = m_factors['msft-rf']
X = m_factors[['mkt-rf', 'smb', 'hml']]
X_sm = sm.add_constant(X)

In [79]:
ff_3factor_model = sm.OLS(y, X_sm)

In [80]:
res_ff = ff_3factor_model.fit()

In [81]:
res_ff.summary()

0,1,2,3
Dep. Variable:,msft-rf,R-squared:,0.737
Model:,OLS,Adj. R-squared:,0.722
Method:,Least Squares,F-statistic:,50.43
Date:,"Thu, 25 Apr 2024",Prob (F-statistic):,1.13e-15
Time:,14:31:08,Log-Likelihood:,117.06
No. Observations:,58,AIC:,-226.1
Df Residuals:,54,BIC:,-217.9
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.0106,0.004,2.359,0.022,0.002,0.020
mkt-rf,0.9424,0.083,11.286,0.000,0.775,1.110
smb,-0.5538,0.157,-3.526,0.001,-0.869,-0.239
hml,-0.4856,0.093,-5.231,0.000,-0.672,-0.300

0,1,2,3
Omnibus:,7.179,Durbin-Watson:,2.13
Prob(Omnibus):,0.028,Jarque-Bera (JB):,6.742
Skew:,0.603,Prob(JB):,0.0344
Kurtosis:,4.155,Cond. No.,36.6


In [82]:
res_ff.params

const     0.010569
mkt-rf    0.942367
smb      -0.553819
hml      -0.485627
dtype: float64

### Discussion

Great! Our beta from this regression (0.94) is very close to the original beta we calculated for this stock (0.96), so that should give us confidence that nothing has wildly changed. The market beta indicates that this stock is slightly safer than the market return as a whole. 

The other factor loadings are negative. The coefficient on `smb` is -0.55, indicating MSFT has very low exposure to small stock risk. This checks out - Microsoft is a very large cap stock, so this shouldn't surprise us. 

The coefficient on `hml` is also negative (-0.48) indicating MSFT has very low exposure to value stock risk. As of April 23, 2024, MSFT had a P/B ratio over 12, while the average S&P 500 P/B is around 2.5. This suggests that despite its huge size, MSFT is still viewed by the market as a growth stock, so the negative coefficient on `hml` should make sense to us.