# Exercises: Fama-French Factor Models and Arbitrage Pricing Theory

This exercise investigates the Fama-French [three-factor](https://doi.org/10.1016/0304-405X(93)90023-5) and [five-factor](https://doi.org/10.1016/j.jfineco.2014.10.010) models and their ability to explain the (excess) returns of [five industry portfolios](https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/Data_Library/det_5_ind_port.html).

In [1]:
# Required library
import numpy as np
import scipy as sp
import pandas as pd

## Data Preparation

Data on the five industry returns and the Fama-French factors were retrieved from Kenneth R. French's [online data library](https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html#Research). The data library also contains, among many others, explanatory notes on the construction of the factors and the industry returns. Raw data files from the website are the industry returns and the five Fama-French factors. The time series of monthly risk-free rates is also available from the data file for factors.

To estimate the Fama-French factor models, we convert the industry returns to excess returns by deducting the risk-free rate. Furthermore, we consider an out-of-sample period of 12 months for testing the goodness of fit of the factor models.

In [2]:
# Import monthly industry returns and factor time series
ind_returns = pd.read_csv("FamaFrenchIndustryPortfolios.csv", index_col = 0)
ff_factors_raw = pd.read_csv("FamaFrench5Factors.csv", index_col = 0)

# Extract risk-free rate time series
rf = ff_factors_raw['RF']

In [3]:
# Deduct risk-free rate from industry returns
ind_exc_returns = ind_returns.sub(rf, axis = 0)

In [4]:
# Drop risk-free rate from the list of factors
ff_factors = ff_factors_raw.drop(columns = ['RF'])

# Define data frame for three factors
ff_3factors = ff_factors_raw.drop(columns = ['RMW', 'CMA', 'RF'])

In [5]:
# Extract the final 12 observations (equiv one year) for out-of-sample testing
oos_length = 12
ind_exc_returns_oos = ind_exc_returns.tail(oos_length)
ff_factors_oos = ff_factors.tail(oos_length)
ff_3factors_oos = ff_3factors.tail(oos_length)
rf_oos = rf.tail(oos_length)
oos_index = ind_exc_returns_oos.index

In [6]:
# Remove out-of-sample test data from the in-sample data (latter for estimation)
ind_exc_returns = ind_exc_returns.drop(index = oos_index)
ff_factors = ff_factors.drop(index = oos_index)
ff_3factors = ff_3factors.drop(index = oos_index)
rf = rf.drop(index = oos_index)

# Number of risky assets/portfolios and factors
n_assets = len(ind_exc_returns.columns)
n_factors = len(ff_factors.columns)

## Estimation of the Three- and Five-Factor Models

The Fama-French five-factor model is specified as follows (using the notation in the lecture slides):

$$
R(t) - r_0(t) \mathbf{1}_d = A + b_1 (R_\mu(t) - r_0(t)) + b_2 \mathrm{SMB}(t) + b_3 \mathrm{HML}(t) + b_4 \mathrm{RMW}(t) + b_5 \mathrm{CMA}(t) + \epsilon(t),
$$

where $R(t)$ is the vector of returns of $d$ risky assets (which in this exercise are the returns of the $d = 5$ industry portfolios), $r_0(t)$ is the risk-free rate, and $R_\mu(t)$ is the return of the market portfolio. The definition of each factor is stated in the lecture slides and are discussed in detail in French's online data library (see also the paper on the five-factor model). The three-factor model contains only the market excess return, SMB, and HML as factors.

In the equation above, $A, b_1, \dots, b_5 \in \mathbb{R}^d$ are model parameters to be estimated using a time series regression approach.

More compactly, we have

$$
R^E(t) = A + B F(t) + \epsilon(t),
$$

where $R^E(t)$ is the vector of excess returns of the $d$ risky assets,

$$
F(t) = \left[\begin{array}{c}
    R_\mu(t) - r_0(t) \\ \mathrm{SMB}(t) \\ \mathrm{HML}(t) \\ \mathrm{RMW}(t) \\ \mathrm{CMA}(t) 
\end{array}\right]
$$

and $A \in \mathbb{R}^d$ and $B \in \mathbb{R}^{d\times 5}$ are model parameters to be estimated. These parameters are estimated using Proposition 2.6 in the lecture slides,

$$
\hat{B} = \mathsf{Cov}(R^E, F) \Sigma_F^{-1}, \quad \hat{A} = \mathsf{E}[R^E] - \hat{B} \mathsf{E}[F],
$$

where $R^E$ and $F$ are random variables of which we assume the time series $\{R^E(t)\}_{t=1,\dots,T} = \{R(t) - r_0(t) \mathbf{1}_d\}_{t = 1,\dots,T}$ and $\{F(t)\}_{t=1,\dots,T}$ are realizations.

### Five-Factor Model

In [7]:
# Covariance of concatenated data frame of excess returns and factors
cov_full_5f = np.cov(pd.concat([ind_exc_returns, ff_factors], axis = 1), rowvar = False)

# # Alternatively: (output as data frame)
# cov_full_5f = pd.concat([ind_exc_returns, ff_factors], axis = 1).cov()

# Covariance of excess returns and factors ("cross covariances")
cov_cross_5f = cov_full_5f[0:n_assets,n_assets:,]

In [8]:
# Compute estimate for B in 5-factor model
B_5f = cov_cross_5f @ np.linalg.inv(ff_factors.cov())

# Compute estimate for A in 5-factor model
A_5f = (ind_exc_returns.mean(axis = 0) - B_5f @ ff_factors.mean(axis = 0)).to_numpy()

In [9]:
print('Factor exposures of each risky asset:')
print(B_5f)

print(' ')

print('Factor model coefficients:')
print(A_5f)

Factor exposures of each risky asset:
[[ 0.9718423   0.11991798 -0.02258049  0.40947172  0.14779121]
 [ 0.97225521  0.0016812   0.14457329  0.24681754  0.23940974]
 [ 0.9948167  -0.06405975 -0.22335295 -0.38082206 -0.26576314]
 [ 0.86726962 -0.107509   -0.41178275  0.29186355  0.33958885]
 [ 1.12033669  0.0931903   0.47013451  0.11039794 -0.21269677]]
 
Factor model coefficients:
[-0.08378917 -0.169431    0.31722221  0.1578836  -0.18210989]


### Three-Factor Model

In [10]:
# Covariance of concatenated data frame of excess returns and factors
cov_full_3f = np.cov(pd.concat([ind_exc_returns, ff_3factors], axis = 1), rowvar = False)

# Covariance of excess returns and factors ("cross covariances")
cov_cross_3f = cov_full_3f[0:n_assets,n_assets:,]

# Compute estimate for B in 5-factor model
B_3f = cov_cross_3f @ np.linalg.inv(ff_3factors.cov())

# Compute estimate for A in 5-factor model
A_3f = (ind_exc_returns.mean(axis = 0) - B_3f @ ff_3factors.mean(axis = 0)).to_numpy()

print('Factor exposures of each risky asset:')
print(B_3f)

print(' ')

print('Factor model coefficients:')
print(A_3f)

Factor exposures of each risky asset:
[[ 0.94046746  0.01875815  0.06290734]
 [ 0.93782762 -0.06139923  0.2628625 ]
 [ 1.03724003  0.0318183  -0.35984307]
 [ 0.82072866 -0.18289274 -0.24681983]
 [ 1.13794375  0.06945162  0.38102851]]
 
Factor model coefficients:
[ 0.08835634 -0.03486155  0.13082707  0.32858805 -0.1874495 ]


## Out-of-Sample Performance

Using either the three- or five-factor model, we estimate the excess returns for the out-of-sample data using the parameter estimates we obtained above and the out-of-sample factor data. Let $\{R^{E}_{OS}(t)\}_{t=1,\dots,T_{OS}}$ and $\{F_{OS}(t)\}_{t=1,\dots,T_{OS}}$ denote the out-of-sample excess returns and factor time series, where $T_{OS}$ is the out-of-sample size. For each $t = 1,\dots,T_{OS}$, we compute the excess return estimated by the factor model

$$
\hat{R}^{E}_{OS}(t) = \hat{A} + \hat{B} F_{OS}(t).
$$

Model accuracy is then assessed by comparing error metrics such as the root mean squared error (RMSE):

$$
\mathrm{RMSE} = \sqrt{\frac{1}{T_{OS}} \sum_{t = 1}^{T_{OS}} (\hat{R}^{E}_{OS}(t) - R^{E}_{OS}(t))^2}.
$$

### Five-Factor Model

In [11]:
# Calculate estimated excess returns from the model
A_5f.shape = (len(A_5f), 1)                                         # Specify that A_5f is a column vector
ind_exc_returns_5f = B_5f @ ff_factors_oos.to_numpy().T + A_5f      # Industry excess returns under 5-factor model

In [12]:
# ind_exc_returns_5f.T

In [13]:
# ind_exc_returns_oos         # Industry excess returns from OOS data set

In [14]:
# B_5f

In [15]:
# B_5f @ ff_factors_oos.to_numpy().T

In [16]:
# Calculate RMSE for each portfolio
RMSE_5f = np.sqrt(np.mean((ind_exc_returns_5f.T - ind_exc_returns_oos.to_numpy()) ** 2, axis = 0))
RMSE_5f

array([2.42685554, 1.8528015 , 1.67322254, 3.06471593, 1.49201908])

### Three-Factor Model

In [17]:
# Calculate estimated excess returns from the model
A_3f.shape = (len(A_3f), 1)      # Specify that A_3f is a column vector
ind_exc_returns_3f = B_3f @ ff_3factors_oos.to_numpy().T + A_3f

In [18]:
# Calculate RMSE for each portfolio
RMSE_3f = np.sqrt(np.mean((ind_exc_returns_3f.T - ind_exc_returns_oos.to_numpy()) ** 2, axis = 0))

In [19]:
RMSE_5f, RMSE_3f

(array([2.42685554, 1.8528015 , 1.67322254, 3.06471593, 1.49201908]),
 array([2.11481667, 1.93239166, 1.44942409, 3.17819614, 1.43623083]))

## Arbitrage Pricing Theory

Theorem 3.6 in the lecture slides states that a $k$-factor model $R = A + BF + \epsilon$ is an APT model if and only if there exist $\lambda_0 \in \mathbb{R}$ and $\lambda \in \mathbb{R}^k$ such that $$\mathsf{E}[R] = \lambda_0 \mathbf{1}_d + B \lambda.$$

In the case of the five-factor model, the above equation is a system of $d = 5$ equations in $k+1 = 6$ unknowns. As such, it is an underdetermined system of equations which we can solve using the method of least squares, $$\min_{\lambda_0 \in \mathbb{R}, \lambda \in \mathbb{R}^5} \left\|\mathsf{E}[R] - \lambda_0 \mathbf{1}_d - B \lambda \right\|^2,$$ where we use the empirical estimate of the mean return vector $\mathsf{E}[R]$ and the estimated $\hat{B}$ from the above analysis of the five-factor model. Using these empirical estimates, the system of equations can be written as

$$
\mathsf{E}[R] = \left[\begin{array}{cc} \mathbf{1}_d & \hat{B}\end{array}\right]
\left[\begin{array}{c} \lambda_0 \\ \lambda \end{array}\right]
$$

and, defining $C = \left[\begin{array}{cc} \mathbf{1}_d & \hat{B}\end{array}\right]$, the solution to the least squares problem can be calculated as

$$
\left[\begin{array}{c} \lambda_0 \\ \lambda \end{array}\right] = C^\top (CC^\top)^{-1} \mathsf{E}[R].
$$

In [20]:
# Construct in-sample data for industry returns
ind_returns = ind_returns.drop(index = oos_index)

# Calculate mean returns
ind_returns_mean = ind_returns.mean(axis = 0)
ind_returns_mean = ind_returns_mean.to_numpy()
ind_returns_mean.shape = (len(ind_returns_mean), 1)

In [21]:
# Calculating the APT coefficients
Vec1 = np.linspace(1, 1, n_assets)
Vec1.shape = (len(Vec1), 1)
C = np.concatenate((Vec1, B_5f), axis = 1)
lambda_combined = C.T @ np.linalg.inv(C @ C.T) @ ind_returns_mean

In [22]:
# Verify if APT condition is met (value of objective function must be close to 0)
np.sum((ind_returns_mean - lambda_combined[0] * Vec1 - B_5f @ lambda_combined[1:]) ** 2)

2.2005336580898345e-27

The five-factor model, in this case, is very close to being an APT model since the gap between expected returns and the right-hand side of the APT condition is very small.

In [23]:
lambda_combined[0], rf.mean()

(array([0.45340101]), 0.3626344827586207)

The secondary comparison above shows that $\lambda_0$ estimated from the APT model does not necessarily coincide with the mean risk-free rate (observed over time). However, the value of $\lambda_0$ is consistent with the more recent values of the risk-free rate time series.

In [24]:
print(rf)

196307    0.27
196308    0.25
196309    0.27
196310    0.29
196311    0.27
          ... 
202307    0.45
202308    0.45
202309    0.43
202310    0.47
202311    0.44
Name: RF, Length: 725, dtype: float64
