# Lab 5: JKP Factor Starter

<figure>
<a
href="https://colab.research.google.com/github/quinfer/fin510-colab-notebooks/blob/main/labs/lab05_jkp_starter.ipynb"><img
src="https://colab.research.google.com/assets/colab-badge.svg" /></a>
<figcaption>Open in Colab</figcaption>
</figure>

## Before You Code: The Big Picture

The **Jensen, Kelly, and Pedersen (JKP) factor dataset** contains
returns for hundreds of documented investment factors from academic
research. This is the gold standard for factor replication studies—and
it’s what you’ll use for Coursework 2.

> **What Are Factors?**
>
> **Factors** are characteristics that explain cross-sectional stock
> returns: - **Momentum (MOM)**: Past winners outperform past losers -
> **Value (HML)**: Cheap stocks outperform expensive stocks (High Minus
> Low book-to-market) - **Size (SMB)**: Small stocks outperform large
> stocks (Small Minus Big) - **Quality, Profitability, Investment**:
> Other documented anomalies
>
> **Why They Matter:** - If factors work, you can build systematic
> strategies around them - If factors are spurious (data mining), they
> won’t persist out-of-sample - Factor replication tests whether
> published findings are real or lucky
>
> **Your Task in Coursework 2:** Pick a factor, replicate it using JKP
> data, evaluate performance with rigorous backtesting, and critically
> assess whether it’s exploitable.

### What You’ll Learn Today

This is a **quick starter lab** to familiarize you with:

-   ✅ Loading JKP factor data (monthly returns)
-   ✅ Computing CAPM alpha (does factor beat the market?)
-   ✅ Basic prediction setup (OLS vs Ridge for next-month market
    return)
-   ✅ Understanding what you’ll extend for Coursework 2

**Time estimate:** 20-30 minutes (this is intentionally brief—real work
happens in your coursework)

> **Coursework 2 Preview**
>
> This lab shows the **minimal skeleton**. For Coursework 2, you’ll: -
> Use full JKP dataset (not just this sample) - Implement walk-forward
> validation (not simple train/test split) - Add HAC standard errors
> (Newey-West) for proper inference - Calculate out-of-sample R², Sharpe
> ratios, CER gains - Write critical analysis of replication results

## Objective

-   Load a small JKP factor slice (CSV) and compute rolling alpha vs
    market.  
-   Run a tiny prediction baseline (OLS vs ridge) for next‑month `MKT`.
-   Understand the scaffolding you’ll extend for Coursework 2.

## 1) Load data (JKP CSV or course sample)

``` python
import os
import pandas as pd

# Option A: point to your downloaded JKP CSV
# jkp = pd.read_csv('/content/JKP_region_monthly.csv')

# Option B: use course sample (small demo)
sample_url = 'https://raw.githubusercontent.com/quinfer/fin510-colab-notebooks/main/resources/jkp-sample.csv'
jkp = pd.read_csv(sample_url)

# Expect columns: date, MKT, SMB, HML, MOM (monthly)
jkp['date'] = pd.to_datetime(jkp['date'])
jkp = jkp.set_index('date').sort_index()
jkp.head()
```

## 2) Rolling alpha vs market (HAC optional in project)

``` python
import statsmodels.api as sm
import numpy as np

ls_ret = jkp['MOM'].dropna()  # example: long-short momentum
mkt    = jkp['MKT'].reindex(ls_ret.index)

# Simple CAPM alpha (no HAC here to keep the lab minimal)
capm = sm.OLS(ls_ret, sm.add_constant(mkt)).fit()
alpha = capm.params['const']
alpha_t = capm.tvalues['const']
alpha, alpha_t
```

## 3) Tiny prediction baseline (OLS vs ridge)

``` python
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.metrics import mean_squared_error

# Predict next-month MKT using lagged features
X = pd.concat({
    'mkt_l1': jkp['MKT'].shift(1),
    'mom_l1': jkp['MOM'].shift(1),
}, axis=1).dropna()
y = jkp['MKT'].reindex(X.index)

split = int(len(X)*0.7)
X_train, X_test = X.iloc[:split], X.iloc[split:]
y_train, y_test = y.iloc[:split], y.iloc[split:]

ols = LinearRegression().fit(X_train, y_train)
ridge = Ridge(alpha=5.0).fit(X_train, y_train)

ols_rmse  = mean_squared_error(y_test, ols.predict(X_test), squared=False)
rid_rmse  = mean_squared_error(y_test, ridge.predict(X_test), squared=False)
{'ols_rmse': ols_rmse, 'ridge_rmse': rid_rmse}
```

## 4) Notes

-   For the assessment: extend the windowing to a walk‑forward scheme
    and add evaluation (`R²_oos`, CER gain, etc.).  
-   Replace the sample with your JKP CSV; document your exact
    filters/version/date.  
-   Use HAC standard errors for regression inference in the project.