# Week 7 Homework --- Deep Cross-Sectional Model

**Quantitative Finance ML Course**

**Total: 100 points**

---

## Overview

In this homework, you will:
1. Implement the Gu-Kelly-Xiu feedforward architecture in PyTorch
2. Train with expanding-window cross-validation
3. Compare against your best tree-based model from Week 5
4. Build a model ensemble (NN + XGBoost + LightGBM)
5. Analyze where neural nets win vs trees

### Grading

| Part | Points | Topic |
|------|--------|-------|
| 1 | 20 | Gu-Kelly-Xiu architecture |
| 2 | 25 | Expanding-window CV with MPS |
| 3 | 25 | Compare against Week 5 best model |
| 4 | 15 | Ensemble (NN + XGBoost + LightGBM) |
| 5 | 15 | Analysis: where NN wins vs trees |

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
from scipy.stats import spearmanr
import warnings
warnings.filterwarnings('ignore')

plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 5)

# Device selection
if torch.backends.mps.is_available():
    device = torch.device('mps')
elif torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = torch.device('cpu')
print(f'Using device: {device}')

In [None]:
# --- Data generation (same synthetic panel as lecture/seminar) ---

np.random.seed(42)
n_stocks = 500
n_months = 240  # 20 years for expanding window

records = []
for t in range(n_months):
    for i in range(n_stocks):
        mom_1m = np.random.randn() * 0.08
        mom_12m = np.random.randn() * 0.20
        vol_20d = np.abs(np.random.randn()) * 0.02 + 0.01
        size = np.random.randn() * 2 + 15
        bm = np.random.randn() * 0.5
        turnover = np.abs(np.random.randn()) * 0.01
        rev_1m = np.random.randn() * 0.05

        ret_next = (
            -0.002 * mom_1m + 0.003 * mom_12m - 0.005 * vol_20d
            + 0.001 * bm + 0.002 * np.sin(mom_12m * size)
            + np.random.randn() * 0.08
        )

        records.append({
            'date_idx': t, 'stock_id': i,
            'mom_1m': mom_1m, 'mom_12m': mom_12m, 'vol_20d': vol_20d,
            'size': size, 'bm': bm, 'turnover': turnover, 'rev_1m': rev_1m,
            'ret_next': ret_next
        })

df = pd.DataFrame(records)
feature_cols = ['mom_1m', 'mom_12m', 'vol_20d', 'size', 'bm', 'turnover', 'rev_1m']

# Cross-sectional rank normalization
for col in feature_cols:
    df[col] = df.groupby('date_idx')[col].transform(
        lambda x: (x.rank() - 1) / (len(x) - 1) - 0.5
    )

print(f'Panel: {df.shape[0]:,} obs, {n_months} months, {n_stocks} stocks')

---

## Part 1: Gu-Kelly-Xiu Architecture (20 pts)

Implement the NN3 architecture from the paper:
- 3 hidden layers: 32 -> 16 -> 8
- BatchNorm + ReLU + Dropout after each hidden layer
- Single linear output

Also implement:
- `CrossSectionalDataset` class
- `EarlyStopping` class

**Grading**: Architecture correct (10), Dataset/EarlyStopping (5), forward pass works (5)

In [None]:
class CrossSectionalDataset(Dataset):
    def __init__(self, features, targets):
        # TODO
        pass

    def __len__(self):
        # TODO
        pass

    def __getitem__(self, idx):
        # TODO
        pass


class EarlyStopping:
    def __init__(self, patience=10):
        # TODO
        pass

    def step(self, val_loss, model):
        # TODO: return True if should stop
        pass

    def restore_best(self, model):
        # TODO
        pass


class GuKellyXiuNet(nn.Module):
    def __init__(self, input_dim, hidden_sizes=(32, 16, 8), dropout=0.5):
        super().__init__()
        # TODO: Build 3-layer network with BN, ReLU, Dropout
        pass

    def forward(self, x):
        # TODO
        pass


# Verify
model = GuKellyXiuNet(input_dim=len(feature_cols))
print(model)
x_test = torch.randn(32, len(feature_cols))
print(f'Output shape: {model(x_test).shape}')  # should be (32,)

---

## Part 2: Expanding-Window CV with MPS (25 pts)

Implement expanding-window cross-validation:

```
Fold 1: Train [0, 120)   Val [120, 144)  Test [144, 168)
Fold 2: Train [0, 144)   Val [144, 168)  Test [168, 192)
Fold 3: Train [0, 168)   Val [168, 192)  Test [192, 216)
Fold 4: Train [0, 192)   Val [192, 216)  Test [216, 240)
```

For each fold:
1. Train an ensemble of 3 models (different seeds) with early stopping
2. Predict on the test period
3. Compute monthly IC

Use MPS (or CUDA) for training if available.

**Grading**: Correct splitting (10), training loop with early stopping (10), MPS usage (5)

In [None]:
def expanding_window_folds(n_months, train_start=0, initial_train=120,
                            val_size=24, test_size=24):
    """
    TODO: Generate expanding-window fold definitions.
    Returns list of (train_end, val_end, test_end) tuples.
    """
    pass


# folds = expanding_window_folds(n_months)
# for i, (tr, va, te) in enumerate(folds):
#     print(f'Fold {i+1}: Train [0, {tr})  Val [{tr}, {va})  Test [{va}, {te})')

In [None]:
def train_single_model(model, train_loader, val_loader, n_epochs=100,
                        lr=1e-3, device='cpu'):
    """
    TODO: Train a single model with early stopping.
    Return the trained model (with best weights restored).
    """
    pass


def train_fold(df, feature_cols, train_end, val_end, test_end,
               n_seeds=3, device='cpu'):
    """
    TODO: Train an ensemble for one fold.
    1. Split data into train/val/test by date_idx
    2. Create DataLoaders
    3. Train n_seeds models
    4. Predict on test, compute IC
    Return: test_df with predictions, IC series
    """
    pass

In [None]:
# TODO: Run expanding-window CV
# Collect all test predictions and IC values

# all_test_preds = []
# all_ic = []
# for fold_idx, (tr, va, te) in enumerate(folds):
#     print(f'\nFold {fold_idx+1}...')
#     test_df, ic = train_fold(df, feature_cols, tr, va, te, device=str(device))
#     all_test_preds.append(test_df)
#     all_ic.append(ic)

print('Expanding-window CV: implement above')

---

## Part 3: Compare Against Week 5 Best Model (25 pts)

Train an XGBoost model using the same expanding-window splits and compare:

1. Mean IC
2. IC information ratio (IC / std(IC))
3. Long-short portfolio Sharpe ratio
4. Cumulative return plot

**Grading**: XGBoost implementation (10), fair comparison (10), plots and analysis (5)

In [None]:
# TODO: Import and train XGBoost with same expanding-window splits
# from xgboost import XGBRegressor

# Use the same folds as Part 2
# For each fold:
#   1. Train XGBoost on train set
#   2. Predict on test set
#   3. Compute IC

print('XGBoost comparison: implement above')

In [None]:
# TODO: Create comparison table
# Columns: Model, Mean IC, IC Std, IC IR, Sharpe, Hit Rate (IC > 0)
# Rows: NN Ensemble, XGBoost

print('Comparison table: implement above')

In [None]:
# TODO: Plot cumulative long-short returns for both models
# Side by side: cumulative returns, monthly IC

print('Comparison plots: implement above')

---

## Part 4: Ensemble (NN + XGBoost + LightGBM) (15 pts)

Build a meta-ensemble that combines predictions from:
1. Neural net ensemble (from Part 2)
2. XGBoost (from Part 3)
3. LightGBM (new)

Combination method: simple average of rank-normalized predictions.

**Grading**: LightGBM implementation (5), ensemble logic (5), improvement over individual models (5)

In [None]:
# TODO: Train LightGBM with same expanding-window splits
# import lightgbm as lgb

print('LightGBM: implement above')

In [None]:
# TODO: Combine predictions
# 1. Rank-normalize each model's predictions within each month
# 2. Average the ranks
# 3. Compute IC for the ensemble

# def ensemble_predictions(nn_preds, xgb_preds, lgb_preds):
#     """Average rank-normalized predictions."""
#     from scipy.stats import rankdata
#     ranks = [rankdata(p) for p in [nn_preds, xgb_preds, lgb_preds]]
#     return np.mean(ranks, axis=0)

print('Ensemble: implement above')

In [None]:
# TODO: Final comparison table with all 4 models
# NN, XGBoost, LightGBM, Ensemble

print('Final comparison: implement above')

---

## Part 5: Analysis --- Where NN Wins vs Trees (15 pts)

Analyze when and why the neural net outperforms or underperforms tree-based models.

**Required analysis** (write 2-3 paragraphs for each):

1. **By time period**: Are there periods where NN beats trees? Do these correspond to market regimes?
2. **By stock characteristic**: Does NN do better for large-cap vs small-cap? High-vol vs low-vol?
3. **Feature importance**: Use SHAP or permutation importance to understand what NN and XGBoost focus on.

**Grading**: Temporal analysis (5), cross-sectional analysis (5), depth of insight (5)

In [None]:
# TODO: Temporal analysis
# Plot rolling IC difference (NN IC minus XGBoost IC) over time
# Identify periods where NN dominates vs underperforms

print('Temporal analysis: implement above')

In [None]:
# TODO: Cross-sectional analysis
# Split stocks into groups by size, vol, etc.
# Compare IC within each group for NN vs XGBoost

print('Cross-sectional analysis: implement above')

### Your Analysis

*Write your analysis here (2-3 paragraphs per question):*

**1. By time period:**

...

**2. By stock characteristic:**

...

**3. Feature importance:**

...

---

## Submission Checklist

- [ ] Part 1: `GuKellyXiuNet` class works, forward pass produces correct shape
- [ ] Part 2: Expanding-window CV produces IC for each fold
- [ ] Part 3: Fair comparison with XGBoost, comparison table and plots
- [ ] Part 4: Ensemble predictions and IC
- [ ] Part 5: Written analysis (at least 2 paragraphs per question)
- [ ] All cells run without errors
- [ ] Notebook is clean and well-organized