# **Estimating Famaâ€“French Five-Factor Exposures Using Daily Equity Returns**

In [None]:
import pandas as pd
import statsmodels.api as sm
from pathlib import Path

BUCKET_NAME = "market-data-jyang130"

TICKER_CONFIG_PATH = Path("../config/nasdaq.json")

EQUITY_PATH = f"s3://{BUCKET_NAME}/cleaned/equities"
FACTOR_PATH = f"s3://{BUCKET_NAME}/raw/factors/FF5_daily.csv"

## **Model Assumptions**

1.  Linear Model: Given the realized factor returns at time *t*, the expected excess return of asset *i* compared to the risk-free rate is a linear combination of the factor returns, with coefficients that measure the asset's exposure to each systematic risk. 
$$
R_{i,t} - R_{f,t} 
=
\alpha_i 
+ 
\beta_{i,1} (MKT\!-\!RF)_t 
+ 
\beta_{i,2} SMB_t 
+
\beta_{i,3} HML_t 
+ 
\beta_{i,4} RMW_t 
+ 
\beta_{i,5} CMA_t
$$

*  $(MKT\!-\!RF):$ Market beta, or the market's excess return vs. the risk free rate. 
*  $SMB:$ Small Minus Big, or difference of returns between small-cap and large-cap stocks 
*  $HML:$ High Minus Low, or difference of returns between high book-to-market and low book-to-market stocks
*  $RMW:$ Robust Minus Weak, or difference of returns between highly profitable and weakly profitable firms
*  $CMA:$ Conservative Minus Agressive, or differece of returns between conservatively investing firms and aggresively investing firms.


Let us denote $X_t = \{MKT\!-\!RF_t, SMB_t, HML_t, RMW_t, CMA_t\}$ as our factor matrix.

2.  We also assume that our factor matrix $X_t$ has full column rank, no perfect multicollinearity.

3.  We assume exogeneity, that our epsilon has a conditional mean of zero.
$$\mathbb{E}[\varepsilon_{i,t} \mid X_t] = 0$$

4.  We allow for heteroskedasticity and autocorrelation, and use HAC standard errors (with 5 lags corresponding to one trading week) for valid inference. 

**Model**:
$$
y = X\beta + \varepsilon,
\qquad
\mathbb{E}[\varepsilon \mid X] = 0,
\qquad
\mathrm{Var}(\varepsilon \mid X) = \Omega.
$$

## **Loading our Data**

In [None]:
def load_ff5_factors(path=FACTOR_PATH):
    ff5 = pd.read_csv(path)
    ff5["Date"] = pd.to_datetime(ff5["Date"])

    factor_cols = ["Mkt-RF", "SMB", "HML", "RMW", "CMA", "RF"]
    for col in factor_cols:
        ff5[col] = ff5[col] / 100.0

    return ff5

def load_equity_data(ticker):
    path = f"{EQUITY_PATH}/{ticker}.csv"
    df = pd.read_csv(path)
    df["Date"] = pd.to_datetime(df["Date"])

    return df

def compute_returns(df):
    df = df.copy()
    df["Return"] = df["Close"].pct_change()
    df = df.dropna(subset=["Return"])
    return df[["Date", "Return"]]

## **Fitting FF5 for one ticker**

In [None]:
def fit_ff5(ticker, ff, hac_lags):
    equity_df = load_equity_data(ticker)
    returns_df = compute_returns(equity_df)

    merged_df = pd.merge(returns_df, ff, on="Date", how="inner")
    merged_df["ExcessReturn"] = merged_df["Return"] - merged_df["RF"]

    X = merged_df[["Mkt-RF", "SMB", "HML", "RMW", "CMA"]]
    y = merged_df["ExcessReturn"]

    X = sm.add_constant(X)

    model = sm.OLS(y, X).fit(cov_type="HAC", cov_kwds={"maxlags": hac_lags})

    return model

In [None]:
ff = load_ff5_factors()
fit_aapl = fit_ff5("AAPL", ff, 5)
print(fit_aapl.summary())

                            OLS Regression Results                            
Dep. Variable:           ExcessReturn   R-squared:                       0.613
Model:                            OLS   Adj. R-squared:                  0.612
Method:                 Least Squares   F-statistic:                     541.2
Date:                Wed, 31 Dec 2025   Prob (F-statistic):               0.00
Time:                        01:22:32   Log-Likelihood:                 8394.9
No. Observations:                2743   AIC:                        -1.678e+04
Df Residuals:                    2737   BIC:                        -1.674e+04
Df Model:                           5                                         
Covariance Type:                  HAC                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0003      0.000      1.219      0.2

## **Interpretation**

An R-squared of 0.612 means that daily FF5 factors explain roughly 60% of variance for Apple, giving us strong explanatory power considering the daily idiosyncratic noise. 