# **Factor Models**


## **Part II: Macroeconomic Factor Model**

**Objective**: The objective of this section is to construct and estimate a macroeconomic factor model that links asset returns to key sources of systematic risk. Specifically, the model aims to identify how market, interest rate, credit, currency, and commodity factors explain the cross-sectional and time-series variation in individual stock returns.

### **1. Import and clean the data**
load the monthly return data, make sure returns are numeric, and remove missing or invalid entries.

In [1]:
import pandas as pd
import numpy as np

df = pd.read_csv("data/sample_monthly_returns.csv", parse_dates=["date"], low_memory=False)
df["return"] = pd.to_numeric(df["RET"], errors="coerce")
df = df[["date", "TICKER", "return"]].dropna(subset=["return"])

display(df)

Unnamed: 0,date,TICKER,return
0,2004-01-30,ORCL,0.047619
1,2004-02-27,ORCL,-0.071429
2,2004-03-31,ORCL,-0.067599
3,2004-04-30,ORCL,-0.062500
4,2004-05-28,ORCL,0.013333
...,...,...,...
112079,2024-08-30,TSLA,-0.077390
112080,2024-09-30,TSLA,0.221942
112081,2024-10-31,TSLA,-0.045025
112082,2024-11-29,TSLA,0.381469


### **2. Construct the return matrix**

Transform the data so that each column represents a stock (TICKER), each row represents a month (date), remove duplicates, and keep only stocks with sufficiently long histories (e.g., at least 250 months).

In [2]:
series_by_ticker = {}
for t, g in df.groupby("TICKER", sort=True):
    s = g.set_index("date")["return"].sort_index()
    if not s.index.is_unique:
        s = s.groupby(level=0).last()  # handle duplicate dates
    series_by_ticker[t] = s

rets = pd.concat(series_by_ticker, axis=1)
rets.columns.name = None
rets.index.name = "date"

rets = rets.replace([np.inf, -np.inf], np.nan)
rets = rets.loc[:, rets.count() >= 250]  # keep tickers with ≥250 months
rets = rets.dropna(how="any")

display(rets)

Unnamed: 0_level_0,A,AAP,AAPL,ABT,ACGL,ADBE,ADI,ADM,ADP,ADSK,...,WMT,WST,WY,WYNN,XEL,XOM,XRAY,YUM,ZBRA,ZION
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2004-03-31,-0.074876,0.030925,0.130435,-0.039720,-0.007310,0.055369,-0.037876,-0.019186,-0.007303,0.097010,...,0.004365,0.014100,0.003831,-0.036874,0.030195,-0.013754,0.010991,0.025925,-0.029519,-0.020079
2004-04-30,-0.146064,0.060979,-0.046598,0.077372,-0.046081,0.055980,-0.112685,0.040901,0.043095,0.063134,...,-0.045066,0.043048,-0.090076,0.140857,-0.060640,0.023082,0.093165,0.021058,0.056509,-0.008932
2004-05-28,-0.048501,-0.006489,0.088441,-0.004019,-0.042331,0.075422,0.155164,-0.048690,0.014152,0.070128,...,-0.020000,-0.020361,0.021622,-0.031806,0.015541,0.022797,0.019604,-0.033256,0.103561,0.088885
2004-06-30,0.139300,0.030558,0.159658,-0.010920,0.036922,0.042180,-0.042116,0.009020,-0.054243,0.194646,...,-0.053293,0.112865,0.043651,-0.000776,-0.004267,0.026821,0.055505,-0.007467,0.075668,0.002447
2004-07-30,-0.186817,-0.159801,-0.006146,-0.028214,-0.034604,-0.092903,-0.156754,-0.080453,0.002388,-0.060967,...,0.004738,-0.094563,-0.011407,-0.073777,0.023339,0.042558,-0.066603,0.034121,-0.050230,-0.015460
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2024-06-28,-0.005981,-0.103482,0.095553,0.016831,-0.016954,0.249078,-0.022645,-0.031871,-0.019721,0.227431,...,0.029653,-0.006095,-0.054612,-0.056703,-0.026916,-0.018250,-0.104963,-0.036164,-0.010918,0.004168
2024-07-31,0.092617,0.003948,0.054411,0.024829,-0.050649,-0.007002,0.013669,0.025806,0.100256,0.000283,...,0.013735,-0.069887,0.118704,-0.074637,0.091181,0.030142,0.089522,0.002793,0.136795,0.191377
2024-08-30,0.010750,-0.284541,0.032286,0.069190,0.180727,0.041258,0.014954,-0.008386,0.050605,0.043956,...,0.128169,0.024366,-0.033690,-0.068703,0.050618,0.002530,-0.068165,0.020778,-0.016544,-0.032901
2024-09-30,0.038903,-0.139484,0.017467,0.006533,-0.010699,-0.098588,-0.015968,-0.020495,0.008046,0.066099,...,0.045578,-0.042949,0.110528,0.247138,0.075412,-0.006105,0.076315,0.035503,0.072210,-0.047215


### **3. Import and construct macroeconomic factors**

The following set of macroeconomic variables is incorporated into the asset–pricing model to capture broad sources of systematic risk that affect financial markets and the real economy. 

- Market Factor (`MKT_RF`): Represented by the excess return on the U.S. equity market (`SPY`) over a risk-free proxy (`BIL`). This variable captures the overall compensation investors require for bearing market-wide equity risk and serves as the central component in most factor models.
- Term Spread (`TERM`): Computed as the difference between the 10-year and 3-month Treasury yields (`^TNX` and `^IRX`). The term spread reflects expectations of future economic growth and inflation as well as duration risk in bond markets.
- Default Spread (`DEF`): Measured as the return difference between high-yield corporate bonds (`HYG`) and Treasury securities of similar duration (`IEF`). It captures variation in credit risk premia and financial conditions, which tend to widen in periods of stress.
- Dollar Index (`DOLLAR`): Based on the ICE U.S. Dollar Index (`DX-Y.NYB`), this factor proxies global risk sentiment and international capital flows. A stronger dollar generally signals tighter global liquidity and affects commodity prices and emerging markets.
- Commodities Factor (`COMMODS`): Represented by returns on crude oil futures (`CL=F`), this variable reflects changes in inflationary pressures, input costs, and global demand conditions. It serves as a broad measure of commodity and real–asset exposure.

**Table: Macroeconomic Factors: Tickers, Indices, and Computations**

| **Factor** | **Tickers** | **Index / Asset** | **Computation** |
|:------------|:------------|:------------------|:----------------|
| Market (MKT_RF) | SPY, BIL | S&P 500 ETF; 1–3M T-Bill ETF |  $ MKT-RF_t = r^{SPY}_t - r^{BIL}_t $ |
| Term (TERM) | ^TNX, ^IRX | 10Y Treasury yield; 3M T-bill yield | $ TERM_t = y^{10Y}_t - y^{3M}_t $ |
| Credit (DEF) | HYG, IEF | HY Corporate ETF; 7–10Y Treasury ETF | $ DEF_t = r^{HYG}_t - r^{IEF}_t $|
| Dollar (DOLLAR) | DX-Y.NYB | ICE U.S. Dollar Index (DXY) | $ DOLLAR_t = r^{DXY}_t $ |
| Commodities (COMMODS) | CL=F | WTI Crude Oil Futures | $ COMMODS_t = r^{CL=F}_t  $|


In [3]:
import yfinance as yf

start = "2010-01-01"
tickers = ["SPY","BIL","HYG","IEF","DX-Y.NYB","CL=F","^TNX","^IRX"]

lvl = (yf.download(tickers, start=start, auto_adjust=True,
                    actions=False, progress=False, threads=True)
         ["Close"].dropna(how="all").resample("M").last())
ret = lvl.pct_change()

factors = pd.DataFrame({
    "RF":      ret["BIL"],
    "MKT_RF":  ret["SPY"] - ret["BIL"],
    "TERM":    lvl["^TNX"].div(1000) - lvl["^IRX"].div(100),
    "DEF":     ret["HYG"] - ret["IEF"],
    "DOLLAR":  ret["DX-Y.NYB"],
    "COMMODS": ret["CL=F"],
}).dropna()

  ["Close"].dropna(how="all").resample("M").last())


In [4]:
display(factors)

Unnamed: 0_level_0,RF,MKT_RF,TERM,DEF,DOLLAR,COMMODS
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2010-02-28,-0.000218,0.031412,0.002445,0.015417,0.011326,0.092880
2010-03-31,0.000000,0.060879,0.002333,0.031903,0.008835,0.051469
2010-04-30,-0.000436,0.015906,0.002113,0.002717,0.009868,0.028534
2010-05-31,0.000436,-0.079891,0.001801,-0.076087,0.056309,-0.141381
2010-06-30,0.000218,-0.051960,0.001251,-0.019371,-0.005319,0.022441
...,...,...,...,...,...,...
2025-06-30,0.003347,0.048039,-0.037670,0.002366,-0.024665,0.071064
2025-07-31,0.003566,0.019466,-0.037990,0.007297,0.032514,0.063738
2025-08-31,0.003708,0.016812,-0.036203,-0.005428,-0.022593,-0.075801
2025-09-30,0.003302,0.032318,-0.034302,0.002349,0.000000,-0.025621


### **4. From 2015 to 2025, estimate monthly betas (factor loadings).**

Run monthly regressions using the previous 5 years (60 months) of observations in order to estimate monthly betas. 

The core loop runs a separate rolling regression for each asset (column) in the return matrix.

For each ticker:
- $y$ = that asset’s time series of returns
- $X$ = the same matrix of factor returns (for all assets)
- `RollingOLS` fits a regression over a moving window (e.g., 36 months)
- `res.params` stores the rolling betas for that asset through time

Example of the resulting betas table:


| **date**     | **TICKER** | **MKT_RF** | **TERM** | **DEF** | **DOLLAR** | **COMMODS** |
|:-------------|:-----------|------------:|----------:|----------:|------------:|-------------:|
| 2010-12-31   | AAPL       | 1.12        | 0.05      | 0.20      | 0.01        | 0.03         |
| 2011-01-31   | AAPL       | 1.10        | 0.04      | 0.22      | 0.00        | 0.02         |
| ⋮            | ⋮          | ⋮           | ⋮         | ⋮         | ⋮           | ⋮            |


In [5]:
from statsmodels.regression.rolling import RollingOLS
import statsmodels.api as sm

# Align returns and factor data
common_dates = rets.index.intersection(factors.index)
rets = rets.loc[common_dates]
factors = factors.loc[common_dates]

X = sm.add_constant(factors)[["MKT_RF", "TERM", "DEF", "DOLLAR", "COMMODS"]]

betas_list = [] # Container for all betas
window = 80 # Rolling regression (80-month window)
for ticker in rets.columns:
    y = rets[ticker]
    model = RollingOLS(y, X, window=window, min_nobs=window)
    res = model.fit()
    
    df_betas = res.params.copy()
    df_betas["TICKER"] = ticker
    df_betas["exret"] = y - factors["RF"] # Compute Excess Returns
    df_betas.index.name = "date"
    betas_list.append(df_betas)

betas = pd.concat(betas_list).reset_index()

betas = betas.dropna()
display(betas)


Unnamed: 0,date,MKT_RF,TERM,DEF,DOLLAR,COMMODS,TICKER,exret
79,2019-07-31,1.627245,0.359716,-0.063823,-0.376037,-0.115889,A,-0.070051
80,2019-09-30,1.618214,0.198311,-0.024148,-0.367434,-0.121815,A,0.078314
81,2019-10-31,1.583393,0.295130,0.031420,-0.348579,-0.125671,A,-0.013091
82,2019-12-31,1.548097,0.231179,0.038948,-0.368178,-0.115622,A,0.057322
83,2020-01-31,1.510463,0.294170,0.121657,-0.363081,-0.110147,A,-0.033438
...,...,...,...,...,...,...,...,...
38435,2024-04-30,0.652180,0.414522,1.190096,0.501633,0.183891,ZION,-0.064691
38436,2024-05-31,0.666950,0.302928,1.123933,0.445895,0.195965,ZION,0.064579
38437,2024-07-31,0.629242,0.033829,1.123712,0.245546,0.189532,ZION,0.186869
38438,2024-09-30,0.626853,0.067985,1.103929,0.249549,0.199157,ZION,-0.051437


Format Dataframe

In [6]:
betas["date"] = pd.to_datetime(betas["date"])
betas = betas.set_index(["TICKER", "date"]).sort_index()

display(betas)


Unnamed: 0_level_0,Unnamed: 1_level_0,MKT_RF,TERM,DEF,DOLLAR,COMMODS,exret
TICKER,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
A,2019-07-31,1.627245,0.359716,-0.063823,-0.376037,-0.115889,-0.070051
A,2019-09-30,1.618214,0.198311,-0.024148,-0.367434,-0.121815,0.078314
A,2019-10-31,1.583393,0.295130,0.031420,-0.348579,-0.125671,-0.013091
A,2019-12-31,1.548097,0.231179,0.038948,-0.368178,-0.115622,0.057322
A,2020-01-31,1.510463,0.294170,0.121657,-0.363081,-0.110147,-0.033438
...,...,...,...,...,...,...,...
ZION,2024-04-30,0.652180,0.414522,1.190096,0.501633,0.183891,-0.064691
ZION,2024-05-31,0.666950,0.302928,1.123933,0.445895,0.195965,0.064579
ZION,2024-07-31,0.629242,0.033829,1.123712,0.245546,0.189532,0.186869
ZION,2024-09-30,0.626853,0.067985,1.103929,0.249549,0.199157,-0.051437


### **5. Fama–MacBeth Regression**

Run a Fama–MacBeth regression with the return as the dependent variable and the lagged betas as regressors. <br>
Are the average factor risk premia from the Fama–MacBeth regression statistically significant?


In [7]:
from linearmodels.panel import FamaMacBeth

data = betas.copy()

factors = ["MKT_RF", "TERM", "DEF", "DOLLAR", "COMMODS"]

data[factors] = data.groupby(level=0)[factors].shift(1)

# Keep rows where we have both lagged betas and realized excess returns
data = data.dropna(subset=factors + ["exret"])

y = data["exret"]*100
X = sm.add_constant(data[factors])

mod = FamaMacBeth(y, X)
fe_res = mod.fit(cov_type="kernel", kernel="bartlett", bandwidth=12)

display(fe_res.summary)


0,1,2,3
Dep. Variable:,exret,R-squared:,0.0025
Estimator:,FamaMacBeth,R-squared (Between):,0.1793
No. Observations:,13640,R-squared (Within):,0.0001
Date:,"Mon, Oct 20 2025",R-squared (Overall):,0.0025
Time:,09:09:10,Log-likelihood,-5.05e+04
Cov. Estimator:,Fama-MacBeth Kernel Cov,,
,,F-statistic:,6.8651
Entities:,310,P-value,0.0000
Avg Obs:,44.000,Distribution:,"F(5,13634)"
Min Obs:,44.000,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
const,0.4306,0.5725,0.7521,0.4520,-0.6916,1.5528
MKT_RF,0.6270,0.6311,0.9935,0.3205,-0.6100,1.8641
TERM,-0.2338,0.3866,-0.6048,0.5453,-0.9916,0.5240
DEF,0.3716,0.2157,1.7229,0.0849,-0.0512,0.7943
DOLLAR,-0.1269,0.2418,-0.5250,0.5996,-0.6009,0.3470
COMMODS,3.5059,1.5349,2.2842,0.0224,0.4973,6.5144


Report Results

In [8]:
sig = (fe_res.pvalues < 0.05)
report = (
    pd.DataFrame({
        "Mean Premium": fe_res.params,
        "Std. Error": fe_res.std_errors,
        "t-Stat": fe_res.tstats,
        "p-Value": fe_res.pvalues,
        "Significant (5%)": sig
    })
    .loc[["const"] + factors]  # order nicely
)
display(report.round(3))



Unnamed: 0,Mean Premium,Std. Error,t-Stat,p-Value,Significant (5%)
const,0.431,0.573,0.752,0.452,False
MKT_RF,0.627,0.631,0.994,0.32,False
TERM,-0.234,0.387,-0.605,0.545,False
DEF,0.372,0.216,1.723,0.085,False
DOLLAR,-0.127,0.242,-0.525,0.6,False
COMMODS,3.506,1.535,2.284,0.022,True
