# Finance Lab: CAPM, Fama-French 3-Factor, and Carhart 4-Factor Models

In this Jupyter Notebook, we will:
1. Load and explore the given datasets (`factors` and `SP500`).
2. Split the data into a training (80%) and testing (20%) set.
3. Evaluate the following models:
   - CAPM
   - Fama-French 3-Factor
   - Carhart 4-Factor
4. Compare their out-of-sample Mean Squared Error (MSE).
5. Provide a brief explanation of each model.

1. CAPM Model (Capital Asset Pricing Model)
   $$ R_i - R_f = \alpha + \beta (R_m - R_f) + \epsilon $$
  **Explanation:**,
    - $R_i$: Return of asset $i$ (e.g., stock)
    - $R_f$: Risk-free rate (e.g., Treasury bill return)
    - $R_m$: Market return (e.g., S&P 500 return)
    - $\beta$: Sensitivity of the stock return to the market return
    - $\alpha$: Intercept, representing abnormal returns
    - $\epsilon$: Error term

2. Fama-French 3-Factor Model
    $$ R_i - R_f = \alpha + \beta_m MKT + \beta_s SMB + \beta_h HML + \epsilon $$
  **Explanation:**,
    - **Market risk premium (MKT)** = $R_m - R_f$
    - **Size factor (SMB - Small Minus Big)**: Measures small-cap vs. large-cap stocks
    - **Value factor (HML - High Minus Low)**: Measures value stocks vs. growth stocks

3. Carhart 4-Factor Model
    $$ R_i - R_f = \alpha + \beta_m MKT + \beta_s SMB + \beta_h HML + \beta_m MOM + \epsilon $$
   **Explanation:**,
    - Extends Fama-French 3-Factor by adding **Momentum Factor (MOM)**
    - Stocks with positive momentum tend to continue performing well in the short term.

In [1]:
import pandas as pd
import statsmodels.api as sm
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt

In [26]:
factors_df = pd.read_csv('factors.csv')
sp500_df = pd.read_csv('sp500.csv')

In [28]:
factors_df 

Unnamed: 0,date,mktrf,smb,hml,umd
0,20110103,0.0118,0.0050,0.0077,-0.0001
1,20110104,-0.0026,-0.0138,0.0007,-0.0058
2,20110105,0.0059,0.0059,0.0013,0.0012
3,20110106,-0.0015,-0.0007,-0.0034,-0.0051
4,20110107,-0.0021,-0.0025,-0.0027,0.0035
...,...,...,...,...,...
2764,20211227,0.0122,-0.0011,0.0030,0.0193
2765,20211228,-0.0027,-0.0060,0.0081,-0.0053
2766,20211229,0.0006,-0.0007,0.0017,0.0042
2767,20211230,-0.0015,0.0010,-0.0040,-0.0105


In [30]:
sp500_df # vwretd: Value-Weighted Return

Unnamed: 0,caldt,vwretd
0,20110103,0.011325
1,20110104,-0.001236
2,20110105,0.005164
3,20110106,-0.001715
4,20110107,-0.001755
...,...,...
2764,20211227,0.013645
2765,20211228,-0.000978
2766,20211229,0.001275
2767,20211230,-0.002951


In [34]:
# Rename columns in factors_df for merging consistency
factors_df.rename(columns={'date': 'caldt'}, inplace=True)

# Convert date columns to datetime format
factors_df['caldt'] = pd.to_datetime(factors_df['caldt'], format='%Y%m%d', errors='coerce')
sp500_df['caldt'] = pd.to_datetime(sp500_df['caldt'], format='%Y%m%d', errors='coerce')

# Merge datasets on 'caldt'
merged_df = pd.merge(sp500_df, factors_df, on='caldt', how='inner')

In [36]:
merged_df.describe()

Unnamed: 0,caldt,vwretd,mktrf,smb,hml,umd
count,2769,2769.0,2769.0,2769.0,2769.0,2769.0
mean,2016-07-03 07:18:23.791982592,0.000618,0.000597,-1.7e-05,-0.000115,8.5e-05
min,2011-01-03 00:00:00,-0.11897,-0.12,-0.0357,-0.05,-0.1437
25%,2013-10-03 00:00:00,-0.003295,-0.0036,-0.0035,-0.0036,-0.004
50%,2016-07-05 00:00:00,0.000778,0.0009,-0.0001,-0.0003,0.0005
75%,2019-04-04 00:00:00,0.005421,0.0057,0.0035,0.0031,0.0048
max,2021-12-31 00:00:00,0.093205,0.0934,0.055,0.0674,0.0593
std,,0.010744,0.011011,0.005928,0.007497,0.009558


In [41]:
# Define independent variables (factors)
X_CAPM = merged_df[['mktrf']]
X_FF3 = merged_df[['mktrf', 'smb', 'hml']]
X_Carhart4 = merged_df[['mktrf', 'smb', 'hml', 'umd']]

# Define dependent variable (excess returns)
y = merged_df['vwretd']

# Split data (no shuffle for time-series structure)
X_CAPM_train, X_CAPM_test, y_train, y_test = train_test_split(X_CAPM, y, test_size=0.2, shuffle=False)
X_FF3_train, X_FF3_test, _, _ = train_test_split(X_FF3, y, test_size=0.2, shuffle=False)
X_Carhart4_train, X_Carhart4_test, _, _ = train_test_split(X_Carhart4, y, test_size=0.2, shuffle=False)

print('Training set size (CAPM):', X_CAPM_train.shape[0])
print('Testing  set size (CAPM):', X_CAPM_test.shape[0])

Training set size (CAPM): 2215
Testing  set size (CAPM): 554


In [43]:
# Add constant term for intercept in OLS
X_CAPM_train = sm.add_constant(X_CAPM_train)
X_CAPM_test = sm.add_constant(X_CAPM_test)

X_FF3_train = sm.add_constant(X_FF3_train)
X_FF3_test = sm.add_constant(X_FF3_test)

X_Carhart4_train = sm.add_constant(X_Carhart4_train)
X_Carhart4_test = sm.add_constant(X_Carhart4_test)

# Fit models
capm_model = sm.OLS(y_train, X_CAPM_train).fit()
ff3_model = sm.OLS(y_train, X_FF3_train).fit()
carhart4_model = sm.OLS(y_train, X_Carhart4_train).fit()

print('CAPM Model Summary:')
print(capm_model.summary())
print('\nFama-French 3-Factor Model Summary:')
print(ff3_model.summary())
print('\nCarhart 4-Factor Model Summary:')
print(carhart4_model.summary())

CAPM Model Summary:
                            OLS Regression Results                            
Dep. Variable:                 vwretd   R-squared:                       0.993
Model:                            OLS   Adj. R-squared:                  0.993
Method:                 Least Squares   F-statistic:                 3.084e+05
Date:                Thu, 20 Feb 2025   Prob (F-statistic):               0.00
Time:                        22:35:16   Log-Likelihood:                 12738.
No. Observations:                2215   AIC:                        -2.547e+04
Df Residuals:                    2213   BIC:                        -2.546e+04
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const       4.464e-05   1.64e-05

In [45]:
# Evaluate models using Mean Squared Error (MSE)
capm_pred = capm_model.predict(X_CAPM_test)
ff3_pred = ff3_model.predict(X_FF3_test)
carhart4_pred = carhart4_model.predict(X_Carhart4_test)

capm_mse = mean_squared_error(y_test, capm_pred)
ff3_mse = mean_squared_error(y_test, ff3_pred)
carhart4_mse = mean_squared_error(y_test, carhart4_pred)

mse_results = pd.DataFrame({
    'Model': ['CAPM', 'Fama-French 3-Factor', 'Carhart 4-Factor'],
    'Out-of-Sample MSE': [capm_mse, ff3_mse, carhart4_mse]
})

print(mse_results)

                  Model  Out-of-Sample MSE
0                  CAPM           0.000002
1  Fama-French 3-Factor           0.000001
2      Carhart 4-Factor           0.000001
