# Module: Data Science for Asset Management {-}

# Session 5: Factor Models and Investing {-}

# Introduction

In this session we explore quantitative equity analysis through various factor models and benchmarks. 

1. **Capital Asset Pricing Model (CAPM)**: This fundamental model establishes the relationship between risk and expected return for an investment. We'll calculate the CAPM beta, a measure of an asset's sensitivity to market movements, and assess its expected return. (Please note: if students do not have previous knowledge of asset pricing or CAPM, please feel free to read this part quickly as for gaining background information. **In this module, particularly for preparing the final exam, students should focus more on Python real-world applications and coding implementations, rather than remembering detailed and sophisticated finance theories. The introduction of theories as mentioned in this notebook should be enough, no need to expand them.**)

2. **Fama-French Model**: Building upon CAPM, the Fama-French Model incorporates additional factors beside market risk to explain return variations. We'll explore these factors and their influence on asset performance.

3. **Factor Benchmarks**: We'll construct factor benchmarks to compare an asset's performance against a basket of securities representing specific risk factors. This allows us to isolate an investment's performance relative to these factors.

# Import Python Libraries

In [2]:
# students should be familiar with these libraries and discuss some of them during the final exam, 
# for example, how to install and import them in different ways
import pandas as pd
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")

In [3]:
# reference: https://ipython.org/ipython-doc/3/config/extensions/autoreload.html
%load_ext autoreload
%autoreload 2
%matplotlib inline

In [4]:
# Students should save the edhec_risk_kit_session_5.py script in the same folder as this notebook, so it can be called directly
# Please note: the edhec_risk_kit_session_5.py script is modified specifically for this session
# This script includes the path for inputing data, so students need save all relevant .csv files in the same folder, such as:
# data_brka_d_ret.csv
# data_F-F_Research_Data_Factors_m.csv
# More detailed descriptions about these datasets can be found as the link below: 
# https://www.kaggle.com/datasets/yousefsaeedian/edhec-investment-management-datasets?resource=download
import edhec_risk_kit_session_5 as erk

In [5]:
import seaborn as sns
sns.set_style("darkgrid")          # adds seaborn style to charts, eg. grid
#plt.style.use("dark_background")  # inverts colors to dark theme
plt.style.use('default')           # use default white theme

# Factor Investing

A **factor** is a variable that influences the **returns of assets**. It represents a **commonality** 
in the returns, i.e., something outside the individual asset, and normally, 
**an exposure to some factor risks** over the long run **yields a reward** (the risk premium).

There are three types of factors:
1. **Macro factors**: industrial growth, inflation,$\dots$
2. **Statistical factors**: something  extracted from the data, which may be or may not be well identified
3. **Style (Intrinsic) factors**: value-growth, momentum, low-volatility$\dots$

For example, some investors consider **Oil prices** to be an important factor in determining equity returns (they are likely to affect stock returns). This would be an example of **macro factor** 
as it is neither a statistical artifact nor is intrinsic to the stock itself.

## Factor models

A **factor model** simply decomposes asset returns $r$ into a set of **factors (premia or risks)** and [an idiosyncratic risk](https://www.northstarrisk.com/idiosyncraticrisk) $\varepsilon$:
$$
r = \beta_1 f_1 + \beta_2 f_2 + \dots + \beta_n f_n + \alpha + \varepsilon,
$$
where $\{\beta_i\}_{i=1,\dots,n}$ are some real coefficients and 
$\{f_i\}_{i=1,\dots,n}$ are the **factor premia which are** nothing but **the returns that we get in exchange for exposing ourselves to these factors**. 

In other words, a factor model is **a decomposition of an asset return into a set of returns from other assets**.

## The Capital Asset Pricing Model (CAPM)

The **CAPM** is an example of factor model. In particular, it is used to determine a theoretically appropriate return of an asset to make decisions about adding this asset to a well-diversified portfolio.

The model takes into account the asset's sensitivity to non-diversifiable risk (also known as systematic risk or market risk), often represented by the quantity beta ($\beta$) in the financial industry, as well as the expected return of the market and the expected return of a theoretical risk-free asset. 

The model can be used for pricing an individual security. The model defines a 
**Security Market Line (SML)** which enables us to compute the reward-to-risk ratio for a security in relation to that of the **overall market**:
$$
\text{SML}: \;\mathbb{E}[r_i] = \beta_i(\mathbb{E}[r_m] - r_f) + r_f, 
$$
which means that the **excess return** of the asset $i$ is given by the **excess return of the market** 
(here, **$r_m$ is the return of the market** and $r_f$ is the risk-free rate) 
times a coefficient $\beta_i$ defined as:
$$
\mathbb{E}[r_i] - r_f = \beta_i(\mathbb{E}[r_m] - r_f), 
\qquad  
\beta_i := \frac{\text{Cov}(r_i,r_m)}{\text{Var}(r_m)}, 
$$
which is the **sensitivity of the asset with respect to the market**.

Note that if $\beta_i$ is large, it means that the asset $i$ is highly correlated with the market, i.e., if the market goes up and so goes up the asset (and, of course, if the market goes down, then the asset goes down). On the other hand, if $\beta_i$ is very small, it means that the stock is almost uncorrelated with respect to the market, hence changes in the market do not affect the returns of the asset.

In addition, note that **the expected market return** is usually estimated by measuring **the arithmetic average of the historical returns of market components**.

By definition of factor model, <span style="color:red">**the CAPM is a one-factor model**</span> since the excess return of the security depends **only** on the excess return of the market. 

## Price-to-book (P/B) or Market-to-book ratio

In accounting, the **book value is the value of an asset according to its balance sheet account balance**. 
For assets, the value is based on **the original cost of the asset less any depreciation, amortization or impairment costs** 
made against the asset. Traditionally, a company's book value is its total assets minus intangible assets and liabilities.

The **price-to-book (P/B) ratio**, is used to compare a company's current market price to its book value. 
The calculation can be performed in two ways. One, in which the company's market capitalization is 
divided by the company's total book value from its balance sheet. A second way is to divide the company's current share price 
by the book value per share (i.e. its book value divided by the number of outstanding shares).

Such ratio is also called **market-to-book ratio**.

## Value stocks and Growth stocks

Note that the inverse of the P/B ratio, is therefore called the **book-to-price (B/P)** (or **book-to-market**) **ratio**. 

Assets (e.g., stocks) with high **book-to-market** ratios are called **Value stocks**. If this ratio is high, it then means that the book value is much larger than the current market price, and this indicates that this asset is somehow cheap with respect to its true value (it is **undervalued** by the market).

On the other hand, assets with low **book-to-market** ratios are called **Growth stocks**. In this case, it then means that the book value is much samller than the current market price which indicates that the asset is **overvalued** by the market. The reason why they may be overvalued is because they are associated with high-quality, successful companies whose earnings are expected to continue 
growing at an above-average rate relative to the market and then growth stock investors may be willing to pay more to own shares. 
Analysts tend to classify a stock as Growht Stock if its **ROE** (**Return on equity**: company's net income divide by average common equity) is larger than or equal to $15\%$.

## Fama-French Model

The <span style="color:red">**Fama-French model** is a **three-factor model**</span> which enhances the CAPM (one-factor model). The three factors are:
- the market risk (i.e., as in the CAPM),
- the outperformance of **small versus big** companies,
- the outperformance of **high book/market versus low book/market** companies.

What Fama and French did was to take the entire universe of stocks and put them into ten buckets (**deciles**). They sorted such deciles in two ways. 

A first sorting was done according to the **size**, i.e., the **market capitalization**, and then they compared the performance of the bottom $10\%$ companies versus the top $10\%$ companies according to the size.

The second sorting was done according to the **book-to-price ratios** (B/P ratio), and then they did the same, i.e., they looked at the performance of the bottom $10\%$ companies (Growth Stocks) versus the top $10\%$ companies (Value Stocks). 

Fama and French observed that the classes of stocks that have tended to do better than the market as a whole have been (i) the **small caps** (bottom decile w.r.t sizes) and (ii) the **Value Stocks** (top decile w.r.t. B/P ratios). 

Hence, they introduced the **size factor** and the **value factor** in addition to the **market factor** of simple CAPM and enhanced the factor model: 
$$
\mathbb{E}[r_i] - r_f = 
\beta_{i,\text{MKT}}\mathbb{E}[r_m - r_f] + \beta_{i,\text{SMB}}\mathbb{E}[\text{SMB}] + \beta_{i,\text{HMS}}\mathbb{E}[\text{HMS}]  
$$
where:
- $\beta_{i,\text{MKT}}$ is the same $\beta$ of the CAPM (we stress the dependence on the **market**), 
- $\text{SMB}$ means **Small (size) Minus Big (size)** stocks, 
- $\text{HML}$ means **High (B/P ratio) Minus Low (B/P ratio)** stocks.

There could be in principle more factors to add. For example, the <span style="color:red">**Carhart four-factor model** </span> enhances the Fama-French model by adding the **momentum factor**. The **Momentum** in a stock is the tendency for the stock price to continue rising if it is going up and to continue declining if it is going down. It can be computed by subtracting the equal weighted average of the lowest performing firms from the equal weighted average of the highest performing firms, lagged one month.

## Factor Benchmark

Any factor model can be re-interpreted as a benchmark. For example, consider the case of single CAPM model. We can rewrite it in the following way:
$$
\mathbb{E}[r_i - r_f] = \beta\, \mathbb{E}[r_m - r_f] + \alpha
\quad\Longrightarrow\quad
\mathbb{E}[r_i] = \beta\, \mathbb{E}[r_m] - (\beta-1) \mathbb{E}[r_f]   + \alpha.
$$
This means that if we have $1$ dollar and can **borrow $\beta-1$ dollars** and **invest $\beta$ dollars in the market**. For instance, let $\beta=1.3$. It means that we borrow $\beta-1=0.3$ dollars and then we put $1.3$ dollars in the market. We can always do this and we would get a return (equal to $1.3\,\mathbb{E}[r_m] - 0.3\,\mathbb{E}[r_f]$).

Now, if we see that there is one asset in the market that has a return higher than what we got, then it means that it is the **value added by the manager**, i.e., the manager of that asset did particularly well.

In other words, we have this model and a manager (i.e., we have a asset $i$), **we do the regression** by finding $\beta$ **and we look for alpha**. **If we do not get any alpha**, it means that **this asset is not going to add any value to us**, just because we could get the same return without this asset (by doing the short-long trade above). In particular, if we found a positive $\alpha$ that the manager (asset) gives some values to us. If $\alpha$ is negative, we should avoid investing in that asset as it destroys value.

In this sense we say that, in this case, the **factor benchmarck is a short position of $\beta-1$ dollars in cash (risk free T-bills)** and a **leveraged position of $\beta$ dollars in the market portfolio**. 

# Factor Analysis of Warren Buffet Berkshire Hathaway

Let us load the returns of the Berkshire Hathaway holding company. Notice that the dataset contains daily returns. Here, we compound 
daily returns over months and load the monthly returns. Data are available from 1990-01 up to 2018-12.

In [6]:
# The dataset is in the file: data_brka_d_ret.csv, we use the following two lines of codes to read the csv file directly
# The detailed implementation and process of reading this .csv file is included in the edhec_risk_kit_session_5.py script
brka_rets = erk.get_brka_rets(monthly=True)
brka_rets.head()

Unnamed: 0_level_0,BRKA
DATE,Unnamed: 1_level_1
1990-01,-0.140634
1990-02,-0.030852
1990-03,-0.069204
1990-04,-0.003717
1990-05,0.067164


Now, we load the **factors** (the explanatory variables) composing the Fama-French model:

In [7]:
# The dataset is in the file: data_F-F_Research_Data_Factors_m.csv, 
# we use the following two lines of codes to read the csv file directly
# The detailed implementation and process of reading this .csv file is included in the edhec_risk_kit_session_5.py script
fff = erk.get_fff_returns()
fff.head()

Unnamed: 0,Mkt-RF,SMB,HML,RF
1926-07,0.0296,-0.023,-0.0287,0.0022
1926-08,0.0264,-0.014,0.0419,0.0025
1926-09,0.0036,-0.0132,0.0001,0.0023
1926-10,-0.0324,0.0004,0.0051,0.0032
1926-11,0.0253,-0.002,-0.0035,0.0031


The columns are the **market return minus the risk-free rate**, 
the **Small Minus Big** (the **Size**), 
the **High Minus Low** (the **Value**), and the pure risk-free rate (assuming this is the return of T-Bills).  

Next, consider a common time period, say from **1990-01 to 2015-05**. 

Firstly, we do a factor analysis using the **CAPM model**. 
That is, we want to decompose the observed return of the Berkshire Hathaway into a portion which is due to the market 
and the rest which is not due to the market:
$$
R_{\text{brka}, t} - R_{f,t} = \beta(R_{mkt, t} - R_{f,t}) + \alpha + \varepsilon_t.
$$

In [8]:
# compute the excess return of Berkshire Hathaway 
brka_excess_rets = brka_rets["1990":"2015-05"] - fff.loc["1990":"2015-05"][["RF"]].values

# save the excess return of the market 
mkt_excess_rets  = fff.loc["1990":"2015-05"][["Mkt-RF"]]

In [9]:
factors = mkt_excess_rets.copy()
# the OLS method assume a bias equal to 0, hence a specific variable for the bias has to be given 
factors["alpha"] = 1

lm = sm.OLS(brka_excess_rets, factors).fit()
lm.summary()

0,1,2,3
Dep. Variable:,BRKA,R-squared:,0.163
Model:,OLS,Adj. R-squared:,0.16
Method:,Least Squares,F-statistic:,59.05
Date:,"Mon, 03 Nov 2025",Prob (F-statistic):,2.15e-13
Time:,15:20:06,Log-Likelihood:,454.77
No. Observations:,305,AIC:,-905.5
Df Residuals:,303,BIC:,-898.1
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Mkt-RF,0.5554,0.072,7.685,0.000,0.413,0.698
alpha,0.0063,0.003,2.005,0.046,0.000,0.013

0,1,2,3
Omnibus:,54.924,Durbin-Watson:,2.07
Prob(Omnibus):,0.0,Jarque-Bera (JB):,141.859
Skew:,0.84,Prob(JB):,1.5700000000000001e-31
Kurtosis:,5.888,Cond. No.,23.1


The coefficients of the regression are:

In [10]:
lm.params

Mkt-RF    0.555427
alpha     0.006340
dtype: float64

that is we have a $\beta$ of about 0.56 and $\alpha$ of about $0.006$. This means that the CAPM benchmark consists of about $0.44$ dollars in Treasury bills and about $0.56$ dollars in the market, i.e., **each dollar in the Berkshire Hathaway portfolio is equivalent $44$ cents in Treasury bills and $56$ cents in the market**. Relative to this, **the company is adding (i.e., it has an $\alpha$ of) $0.6\%$ per month** (although the degree of statistical significance is not very high).

Let us now use the complete **Fama-French model**. We add the rest of factors:

In [11]:
factors = mkt_excess_rets.copy()
factors["Size"]  = fff.loc["1990":"2015-05"][["SMB"]]
factors["Value"] = fff.loc["1990":"2015-05"][["HML"]]
factors["alpha"] = 1

In [12]:
lm_ff = sm.OLS(brka_excess_rets, factors).fit()
lm_ff.summary()

0,1,2,3
Dep. Variable:,BRKA,R-squared:,0.298
Model:,OLS,Adj. R-squared:,0.291
Method:,Least Squares,F-statistic:,42.52
Date:,"Mon, 03 Nov 2025",Prob (F-statistic):,6.12e-23
Time:,15:20:07,Log-Likelihood:,481.5
No. Observations:,305,AIC:,-955.0
Df Residuals:,301,BIC:,-940.1
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Mkt-RF,0.6877,0.069,10.013,0.000,0.553,0.823
Size,-0.4889,0.094,-5.202,0.000,-0.674,-0.304
Value,0.3921,0.101,3.887,0.000,0.194,0.591
alpha,0.0054,0.003,1.853,0.065,-0.000,0.011

0,1,2,3
Omnibus:,51.981,Durbin-Watson:,2.149
Prob(Omnibus):,0.0,Jarque-Bera (JB):,95.007
Skew:,0.929,Prob(JB):,2.3400000000000003e-21
Kurtosis:,5.005,Cond. No.,38.4


With Fama-French model, we have that $\alpha$ has fallen from $0.63\%$ to about $0.54\%$ per month. The loading on the market has moved up from about $0.56$ to $0.69$, which means that adding these new explanatory factors did, effectively, change things. 

We can interpret the loadings on Value being positive (i.e., the **positive beta coefficient of HML**) as saying that **Berkshire Hathaway has a significant Value tilt** - which should not be a shock to anyone that follows Buffet. Additionally, the negative tilt on Size (i.e., **negative beta coefficient of SMB**) suggests that **Berkshire Hathaway tends to invest in large companies rather than small companies**.

In other words, **Berkshire Hathaway appears to be a Large Value investor**. Of course, we may already know this if we were following the company.

The new way to interpret each dollar invested in Berkshire Hathaway is: 
- long **$69$ cents in the market and short $31$ cents in T-Bills**; 
- long **$39$ cents in Value stocks and short $61$ cents in Growth stocks**; 
- short **$49$ cents in SmallCap stocks and long $51$ cents in LargeCap stocks**. 

Finally, if we were to do all of this, we would still end up underperforming Berkshire Hathaway by about $0.54\%$ (**54 basis points**) per month.

Note that we have the **erk.linear_regression(dep_var, expl_vars, alpha=True)** method in the **edhec_risk_kit_session_5.py** script to make a linear regression indirectly.

# Conclusion:
We began with the **Capital Asset Pricing Model (CAPM)**, a foundational model that relates expected return to **beta**, a measure of systematic risk. We then examined the **Fama-French Model**, a three-factor extension of CAPM that incorporates size and value factors to explain return variations. Factor benchmarks were introduced as a way to compare portfolio performance against relevant risk factors. 


In [13]:
# Reference: https://www.kaggle.com/code/yousefsaeedian/introduction-to-factor-investing

In [14]:
# Data inputs: https://www.kaggle.com/code/yousefsaeedian/introduction-to-factor-investing/input

In [15]:
# Data download: https://www.kaggle.com/datasets/yousefsaeedian/edhec-investment-management-datasets?resource=download