In [8]:
import pandas as pd
import numpy as np
import statsmodels.api as sm

In [11]:
df_test = pd.read_excel('factor_pricing_data_monthly.xlsx', sheet_name='portfolios (excess returns)')
df_test.set_index('Date', inplace=True)
df_test.head()

Unnamed: 0_level_0,Agric,Food,Soda,Beer,Smoke,Toys,Fun,Books,Hshld,Clths,...,Boxes,Trans,Whlsl,Rtail,Meals,Banks,Insur,RlEst,Fin,Other
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1980-01-31,-0.0073,0.0285,0.0084,0.1009,-0.0143,0.0995,0.0348,0.0323,0.0048,0.0059,...,0.0158,0.0851,0.0466,-0.0125,0.043,-0.0284,0.0254,0.077,0.0306,0.0666
1980-02-29,0.0125,-0.0609,-0.0967,-0.0323,-0.0575,-0.0316,-0.0492,-0.0803,-0.0556,-0.0169,...,-0.0083,-0.0543,-0.0345,-0.0641,-0.0653,-0.0824,-0.096,-0.0352,-0.0283,-0.0273
1980-03-31,-0.222,-0.1119,-0.0158,-0.1535,-0.0188,-0.1272,-0.0827,-0.1238,-0.0567,-0.067,...,-0.0819,-0.1512,-0.1602,-0.0905,-0.145,-0.0559,-0.0877,-0.2449,-0.1261,-0.1737
1980-04-30,0.0449,0.0767,0.0232,0.0289,0.083,-0.0529,0.0785,0.0154,0.0305,0.0115,...,0.0422,-0.0102,0.0268,0.0355,0.0539,0.0736,0.0528,0.0964,0.0458,0.0784
1980-05-31,0.0635,0.0797,0.0458,0.0866,0.0822,0.051,0.0325,0.0888,0.056,0.0098,...,0.0564,0.1065,0.1142,0.0877,0.1104,0.057,0.056,0.0889,0.0846,0.0663


In [12]:
df_factors = pd.read_excel('factor_pricing_data_monthly.xlsx', sheet_name='factors (excess returns)')
df_factors.set_index('Date', inplace=True)
df_factors.head()

Unnamed: 0_level_0,MKT,SMB,HML,RMW,CMA,UMD
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1980-01-31,0.055,0.0188,0.0185,-0.0184,0.0189,0.0745
1980-02-29,-0.0123,-0.0162,0.0059,-0.0095,0.0292,0.0789
1980-03-31,-0.1289,-0.0697,-0.0096,0.0182,-0.0105,-0.0958
1980-04-30,0.0396,0.0105,0.0103,-0.0218,0.0034,-0.0048
1980-05-31,0.0526,0.02,0.0038,0.0043,-0.0063,-0.0118


4.
This does not matter for pricing, but report the average (across $n$ estimations) of the time-series regression r-squared statistics.
- Do this for each of the three models you tested. (NOTE: I did 4 — CAPM, FF3, FF5, and AQR)
- Do these models lead to high time-series r-squared stats? That is, would these factors be good in a Linear Factor Decomposition of the assets?

In [25]:
models = {
    'CAPM': ['MKT'],
    'FF3':  ['MKT', 'SMB', 'HML'],
    'FF5':  ['MKT', 'SMB', 'HML', 'RMW', 'CMA'],
    'AQR':  ['MKT', 'HML', 'RMW', 'UMD']
}

avg_r2 = {}
for name, facs in models.items():
    r2s = []
    for asset in df_test.columns:
        y = df_test[asset].dropna()
        X = sm.add_constant(df_factors.loc[y.index, facs])
        model = sm.OLS(y, X, missing='drop').fit()
        r2s.append(model.rsquared)
    avg_r2[name] = np.mean(r2s)

pd.Series(avg_r2)

CAPM    0.522622
FF3     0.567874
FF5     0.591768
AQR     0.571935
dtype: float64

These R-squared values are not particularly high (all in the range of roughly $0.52$ to $0.59$). In other words, on average, the factor models explain a moderate amount of the variation we see in our sample of portfolio returns. So, these factors would not be particularly good for a linear factor decomposition of the assets because while they provide a reasonable fit, they are insufficient in explaining a significant amount of the variation seen in portfolio returns.

5.
We tested three models using the time-series tests (focusing on the time-series alphas.) Re-test these models, but this time use the cross-sectional test.
- Report the time-series premia of the factors (just their sample averages) and compare to the cross-sectionally estimated premia of the factors. Do they differ substantially?
- Report the MAE of the cross-sectional regression residuals for each of the four models. How do they compare to the MAE of the time-series alphas?

In [31]:
def regression(df_test, df_factors, facs):
    betas = []
    for asset in df_test.columns:
        y = df_test[asset].dropna()
        X = sm.add_constant(df_factors.loc[y.index, facs])
        model = sm.OLS(y, X, missing='drop').fit()
        betas.append(model.params[1:])
    return pd.DataFrame(betas, index=df_test.columns, columns=facs)

def cross_sectional(df_test, facs, betas):
    lambdas = []
    abs_errors = []
    for t in df_test.index:
        r_t = df_test.loc[t].dropna()
        B_t = betas.loc[r_t.index, facs]
        lam = np.linalg.lstsq(B_t, r_t, rcond=None)[0]
        lambdas.append(lam)
        abs_errors.extend(np.abs(r_t - B_t @ lam))
    lambdas = pd.DataFrame(lambdas, columns=facs, index=df_test.index)
    return lambdas.mean() * 12, np.mean(abs_errors) * 12

time_series_premia = {}
cross_section_premia = {}
mae_resids = {}

for name, facs in models.items():
    cs_prem, mae_cs = cross_sectional(df_test, facs, regression(df_test, df_factors, facs))
    time_series_premia[name] = df_factors[facs].mean() * 12
    cross_section_premia[name] = cs_prem
    mae_resids[name] = mae_cs

rows = []
for model, facs in models.items():
    for f in facs:
        rows.append({
            'Model': model,
            'Factor': f,
            'Time-Series Premia (Ann)': time_series_premia[model][f],
            'Cross-Sectional Premia (Ann)': cross_section_premia[model][f]
        })

display(pd.DataFrame(rows).set_index('Model'))


Unnamed: 0_level_0,Factor,Time-Series Premia (Ann),Cross-Sectional Premia (Ann)
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
CAPM,MKT,0.087552,0.085638
FF3,MKT,0.087552,0.101587
FF3,SMB,0.00612,-0.062027
FF3,HML,0.026039,-0.01595
FF5,MKT,0.087552,0.096517
FF5,SMB,0.00612,-0.053949
FF5,HML,0.026039,-0.029889
FF5,RMW,0.044047,0.028818
FF5,CMA,0.028288,-0.00862
AQR,MKT,0.087552,0.089023


These results show that the time-series premia (factor means) and the cross-sectionally estimated premia are broadly similar in magnitude for the main market factor, but they diverge more for the size (SMB) and value (HML) factors.

More specifically, the premia are similar across all four models for the MKT factor. However, for the FF3 model, the SMB and HML time-series premia are positive, while the cross-sectional premia are negative. Further, the cross-sectional premium for the SMB factor is larger in magnitude than the time-series premium by a factor of 10. The premia are roughly similar in magnitude for the HML factor.

The FF5 model shows similar trends: the SMB and HML factors have positive premia for time-series and negative for cross-sectional; we also see the same difference in magnitude for the SMB factor. The RMW factor is similar for time-series and cross-sectional in both magnitude and direction. The CMA factor is differs significantly in both magnitude and direction. For time-series, we see a relatively large (in magnitude) positive premium, while for cross-sectional, we see a relatively small (in magnitude) negative premium.

For the AQR 4F model, the relationship between the HML factor premia is similar to the FF3 and FF5 models: roughly the same in magnitude, with the time-series being positive and the cross-sectional being negative. However, the premia for RMW and UMD factors is similar in both magnitude and direction for time-series and cross-sectional.

In [32]:
display(pd.DataFrame({'MAE of Residuals (Cross-Sectional)': mae_resids}))

Unnamed: 0,MAE of Residuals (Cross-Sectional)
CAPM,0.386441
FF3,0.356903
FF5,0.33507
AQR,0.346618


The MAEs of residuals in the cross-sectional regressions (roughly $0.33$ – $0.38$) are larger than the MAEs of time-series alphas. So, the time-series fits within each portfolio are good (small MAEs), but the factor models are not as good at explaining differences in average returns across portfolios. In other words, the factor models do not have a lot of power in cross-sectional pricing, and we see a significant amount of unexplained error (i.e., residuals) in trying to fit the factor models to expected returns.