# Homework 6: GNN for Emerging Market Alpha Prediction


## Assignment Overview

In this assignment, you will build a **Graph Neural Network (GNN)** to predict **alpha (beta-adjusted returns)** for 10 emerging market equity indices. You will learn how to:

1. Compute beta and alpha (CAPM-style risk adjustment)
2. Engineer financial features for prediction
3. Construct a graph from market correlations
4. Implement a Graph Convolutional Network (GCN)
5. Interpret model results in a financial context

### Learning Objectives:
- Understand beta-adjusted returns (alpha) and their financial interpretation
- Implement rolling statistical calculations for financial features
- Build adjacency matrices from correlation/covariance data
- Implement GCN message passing layers
- Analyze walk-forward backtesting results

### Instructions:
1. Complete all code sections marked with `# TODO: ...`
2. Run all cells to generate results
3. Answer the discussion questions in markdown cells
4. Submit the completed notebook with all outputs

### Grading:
- Part 1 (Beta & Alpha): 15 points
- Part 2 (Feature Engineering): 25 points
- Part 3 (Graph Construction): 15 points
- Part 4 (GNN Implementation): 25 points
- Part 5 (Results Interpretation): 20 points
- **Total: 100 points**
- Bonus: 10 points

---

## Part 0: Setup and Data Loading (Provided)

Run these cells to import libraries and load the data. **Do not modify.**

In [None]:
# Standard libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from typing import Dict, List, Tuple
from sklearn.linear_model import Ridge
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import r2_score, mean_squared_error
import warnings
warnings.filterwarnings('ignore')

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.optim import Adam

# Set random seeds for reproducibility
SEED = 42
np.random.seed(SEED)
torch.manual_seed(SEED)

print(f"PyTorch: {torch.__version__}")
print("Setup complete!")

In [None]:
# Load EM data
df = pd.read_csv('equity_indicies.csv', low_memory=False)

# Load MSCI World data
df_world = pd.read_csv('msci_world.csv', low_memory=False)

# Select our 10 EM indices
EM_TICKERS = [
    'I2CHN003',  # China - Shanghai Composite
    'I2IND007',  # India - Nifty
    'I4BRA002',  # Brazil - Bovespa
    'I2KOR003',  # South Korea - KOSPI
    'I2TWN002',  # Taiwan - TAIEX
    'I4MEX005',  # Mexico - IPC
    'I2IDN003',  # Indonesia - Jakarta
    'I2THA002',  # Thailand - SET
    'I3TUR002',  # Turkey - ISE 30
    'I3RUS002',  # Russia - RTS
]

EM_NAMES = [
    'China', 'India', 'Brazil', 'S.Korea', 'Taiwan',
    'Mexico', 'Indonesia', 'Thailand', 'Turkey', 'Russia'
]

# Filter to our indices
df_em = df[df['tic'].isin(EM_TICKERS)].copy()

# Parse dates for EM
df_em['date'] = pd.to_datetime(df_em['datadate'], format='%m/%d/%y', errors='coerce')
df_em.loc[df_em['date'].dt.year > 2050, 'date'] = df_em.loc[df_em['date'].dt.year > 2050, 'date'] - pd.DateOffset(years=100)
df_em = df_em[['tic', 'date', 'prccd']].dropna()
df_em['prccd'] = pd.to_numeric(df_em['prccd'], errors='coerce')
df_em = df_em.dropna()

# Parse dates for MSCI World
df_world['date'] = pd.to_datetime(df_world['datadate'], format='%m/%d/%y', errors='coerce')
df_world.loc[df_world['date'].dt.year > 2050, 'date'] = df_world.loc[df_world['date'].dt.year > 2050, 'date'] - pd.DateOffset(years=100)
df_world = df_world[['date', 'prccd']].dropna()
df_world['prccd'] = pd.to_numeric(df_world['prccd'], errors='coerce')
df_world = df_world.dropna()
df_world = df_world.set_index('date').sort_index()
df_world.columns = ['MSCI_World']

# Pivot EM data to wide format
df_pivot = df_em.pivot(index='date', columns='tic', values='prccd')
df_pivot = df_pivot[EM_TICKERS]
df_pivot.columns = EM_NAMES

# Align EM and MSCI World on common dates
common_dates = df_pivot.index.intersection(df_world.index)
df_pivot = df_pivot.loc[common_dates]
df_world = df_world.loc[common_dates]

# Drop rows with missing values
valid_rows = df_pivot.dropna().index.intersection(df_world.dropna().index)
df_pivot = df_pivot.loc[valid_rows].sort_index()
df_world = df_world.loc[valid_rows].sort_index()

# Compute daily returns
returns = df_pivot.pct_change().dropna()
world_returns = df_world.pct_change().dropna()

# Align returns
common_ret_dates = returns.index.intersection(world_returns.index)
returns = returns.loc[common_ret_dates]
world_returns = world_returns.loc[common_ret_dates]

# Get world return as Series
world_ret = world_returns.iloc[:, 0]

print(f"EM Price data shape: {df_pivot.shape}")
print(f"MSCI World data shape: {df_world.shape}")
print(f"Returns shape: {returns.shape}")
print(f"Date range: {returns.index.min()} to {returns.index.max()}")
print(f"\nCountries: {EM_NAMES}")

In [None]:
# Visualize the data
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Normalized prices
ax = axes[0]
normalized = df_pivot / df_pivot.iloc[0] * 100
for col in normalized.columns:
    ax.plot(normalized.index, normalized[col], label=col, alpha=0.8)
ax.set_xlabel('Date')
ax.set_ylabel('Normalized Price (base=100)')
ax.set_title('Emerging Market Indices (Normalized)')
ax.legend(loc='upper left', ncol=2, fontsize=8)
ax.grid(True, alpha=0.3)

# Correlation heatmap
ax = axes[1]
corr = returns.corr()
im = ax.imshow(corr.values, cmap='RdBu_r', vmin=-1, vmax=1)
ax.set_xticks(range(len(EM_NAMES)))
ax.set_yticks(range(len(EM_NAMES)))
ax.set_xticklabels(EM_NAMES, rotation=45, ha='right')
ax.set_yticklabels(EM_NAMES)
ax.set_title('Return Correlation Matrix')
plt.colorbar(im, ax=ax)

plt.tight_layout()
plt.show()

---

## Part 1: Understanding Beta and Alpha (15 points)

### Background

In the CAPM framework:
- **Beta (β)** measures a country's sensitivity to the global market (MSCI World)
- **Alpha (α)** is the return unexplained by market exposure: `α = r_country - β × r_world`

We compute **rolling beta** using:
$$\beta = \frac{Cov(r_{country}, r_{world})}{Var(r_{world})}$$

This isolates country-specific performance from global market movements.

### Task 1.1: Implement Rolling Beta Computation (5 points)

Implement a function to compute rolling beta for a country vs the world index.

In [None]:
def compute_rolling_beta(country_returns: pd.Series, 
                         world_returns: pd.Series, 
                         window: int = 63) -> pd.Series:
    """
    Compute rolling beta of a country vs the world index.
    
    Beta = Cov(r_country, r_world) / Var(r_world)
    
    Args:
        country_returns: Daily returns of a country index
        world_returns: Daily returns of MSCI World index
        window: Rolling window size (default 63 = ~3 months)
    
    Returns:
        Series of rolling beta values
    """
    # TODO: Compute rolling covariance between country and world returns
    # Hint: Use .rolling(window).cov(other_series)
    rolling_cov = None  # YOUR CODE HERE
    
    # TODO: Compute rolling variance of world returns
    # Hint: Use .rolling(window).var()
    rolling_var = None  # YOUR CODE HERE
    
    # TODO: Compute beta = cov / var (add small epsilon to avoid division by zero)
    beta = None  # YOUR CODE HERE
    
    return beta


# Test your implementation
test_beta = compute_rolling_beta(returns['Brazil'], world_ret, window=63)
test_beta_clean = test_beta.dropna()

assert len(test_beta_clean) > 0, "Beta should have non-NaN values after window period"
assert test_beta_clean.mean() > 0, "Brazil beta should be positive on average"
assert 0.5 < test_beta_clean.mean() < 2.0, f"Brazil beta mean ({test_beta_clean.mean():.2f}) seems unreasonable"

print(f"✓ Rolling beta tests passed!")
print(f"Brazil beta: mean={test_beta_clean.mean():.3f}, std={test_beta_clean.std():.3f}")

### Task 1.2: Implement Alpha Computation (5 points)

Implement a function to compute alpha (beta-adjusted return).

In [None]:
def compute_alpha(country_forward_returns: pd.Series,
                  world_forward_returns: pd.Series,
                  beta: pd.Series) -> pd.Series:
    """
    Compute alpha (beta-adjusted return).
    
    Alpha = r_country - beta * r_world
    
    Args:
        country_forward_returns: Forward returns of country
        world_forward_returns: Forward returns of world index
        beta: Rolling beta of country vs world
    
    Returns:
        Series of alpha values
    """
    # TODO: Compute alpha = country_return - beta * world_return
    alpha = None  # YOUR CODE HERE
    
    return alpha


# Test your implementation
# Compute 21-day forward returns
FORWARD_DAYS = 21
forward_returns = df_pivot.pct_change(periods=FORWARD_DAYS).shift(-FORWARD_DAYS)
world_forward = df_world.pct_change(periods=FORWARD_DAYS).shift(-FORWARD_DAYS).iloc[:, 0]

# Compute beta for Brazil
brazil_beta = compute_rolling_beta(returns['Brazil'], world_ret, window=63)

# Compute alpha
brazil_alpha = compute_alpha(forward_returns['Brazil'], world_forward, brazil_beta)
brazil_alpha_clean = brazil_alpha.dropna()

assert len(brazil_alpha_clean) > 0, "Alpha should have non-NaN values"
# Alpha should have lower std than total return (market risk removed)
total_return_std = forward_returns['Brazil'].dropna().std()
alpha_std = brazil_alpha_clean.std()

print(f"✓ Alpha computation tests passed!")
print(f"Brazil 21-day total return std: {total_return_std:.4f}")
print(f"Brazil 21-day alpha std: {alpha_std:.4f}")
print(f"Alpha std is {alpha_std/total_return_std:.1%} of total return std")

### Task 1.3: Discussion Question (5 points)

**Why might predicting alpha be more useful than predicting total returns for a portfolio manager?**

Consider:
- What does alpha capture vs total return?
- How would a long-short strategy use alpha predictions?
- Why might alpha be "harder" to predict than total return?

**YOUR ANSWER HERE:**

*Double-click to edit this cell*

### Compute Betas and Alphas for All Countries (Provided)

In [None]:
# Compute rolling betas for all countries
BETA_WINDOW = 63
rolling_betas = pd.DataFrame(index=returns.index, columns=EM_NAMES)

for col in EM_NAMES:
    rolling_betas[col] = compute_rolling_beta(returns[col], world_ret, BETA_WINDOW)

# Compute forward alphas for all countries
forward_alpha = pd.DataFrame(index=forward_returns.index, columns=EM_NAMES)

for col in EM_NAMES:
    forward_alpha[col] = compute_alpha(forward_returns[col], world_forward, rolling_betas[col])

print("Rolling Beta Statistics:")
print(rolling_betas.dropna().describe().T[['mean', 'std', 'min', 'max']])

# Visualize betas
fig, ax = plt.subplots(figsize=(12, 5))
for col in EM_NAMES:
    ax.plot(rolling_betas[col].dropna(), label=col, alpha=0.7)
ax.axhline(y=1, color='black', linestyle='--', label='Beta=1')
ax.set_xlabel('Date')
ax.set_ylabel('Beta to MSCI World')
ax.set_title('Rolling Beta (63-day) for Each Country')
ax.legend(loc='upper right', ncol=2, fontsize=8)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

---

## Part 2: Feature Engineering (25 points)

We will create features to predict 21-day forward alpha. Features include:
- **Momentum features**: Past returns at different horizons
- **World features**: MSCI World momentum, volatility, beta
- **Technical indicators**: RSI, volatility, etc.

### Task 2.1: Implement Momentum Features (10 points)

Implement momentum features for each country.

In [None]:
def compute_momentum_features(prices: pd.DataFrame) -> Dict[str, pd.DataFrame]:
    """
    Compute momentum features for each country.
    
    Features to compute:
    - mom_21: 21-day momentum (return over past 21 days)
    - mom_63: 63-day momentum (return over past quarter)
    - vol_21: 21-day volatility (std of daily returns)
    
    Args:
        prices: DataFrame of prices (columns = countries)
    
    Returns:
        Dict mapping feature_name -> DataFrame with same columns as prices
    """
    features = {}
    daily_returns = prices.pct_change()
    
    # TODO: Compute 21-day momentum for each country
    # Hint: Use prices.pct_change(periods=21)
    features['mom_21'] = None  # YOUR CODE HERE
    
    # TODO: Compute 63-day momentum for each country
    features['mom_63'] = None  # YOUR CODE HERE
    
    # TODO: Compute 21-day rolling volatility
    # Hint: Use daily_returns.rolling(21).std()
    features['vol_21'] = None  # YOUR CODE HERE
    
    return features


# Test your implementation
momentum_features = compute_momentum_features(df_pivot)

assert 'mom_21' in momentum_features, "Missing mom_21 feature"
assert 'mom_63' in momentum_features, "Missing mom_63 feature"
assert 'vol_21' in momentum_features, "Missing vol_21 feature"
assert momentum_features['mom_21'].shape == df_pivot.shape, "mom_21 wrong shape"
assert momentum_features['vol_21'].dropna().min().min() >= 0, "Volatility should be non-negative"

print("✓ Momentum features tests passed!")
print(f"\nSample mom_21 statistics:")
print(momentum_features['mom_21'].dropna().describe().T[['mean', 'std']])

### Task 2.2: Implement World Features (10 points)

Implement features based on MSCI World index.

In [None]:
def compute_world_features(world_prices: pd.DataFrame,
                           country_returns: pd.DataFrame,
                           world_returns: pd.Series) -> Dict[str, pd.DataFrame]:
    """
    Compute features based on MSCI World index.
    
    Features to compute:
    - world_mom_21: 21-day MSCI World momentum (same for all countries)
    - world_vol_21: 21-day MSCI World volatility (same for all countries)
    - beta_to_world: Rolling 63-day beta to MSCI World (different per country)
    
    Args:
        world_prices: DataFrame with MSCI World prices
        country_returns: DataFrame of daily returns (columns = countries)
        world_returns: Series of MSCI World daily returns
    
    Returns:
        Dict mapping feature_name -> DataFrame with same columns as country_returns
    """
    features = {}
    countries = country_returns.columns
    world_ret_series = world_returns if isinstance(world_returns, pd.Series) else world_returns.iloc[:, 0]
    world_price_series = world_prices.iloc[:, 0] if isinstance(world_prices, pd.DataFrame) else world_prices
    
    # TODO: Compute 21-day MSCI World momentum
    # This is the same value for all countries on each date
    # Hint: Compute for world, then broadcast to all countries
    world_mom = None  # YOUR CODE HERE - compute pct_change(21) for world
    
    # Create DataFrame with same value for all countries
    features['world_mom_21'] = pd.DataFrame(
        {col: world_mom for col in countries},
        index=country_returns.index
    )
    
    # TODO: Compute 21-day MSCI World volatility
    world_vol = None  # YOUR CODE HERE - compute rolling(21).std() for world returns
    
    features['world_vol_21'] = pd.DataFrame(
        {col: world_vol for col in countries},
        index=country_returns.index
    )
    
    # TODO: Compute rolling beta to world for each country
    # Use the compute_rolling_beta function from Part 1
    beta_df = pd.DataFrame(index=country_returns.index, columns=countries)
    for col in countries:
        # YOUR CODE HERE - compute rolling beta for each country
        beta_df[col] = None  # Use compute_rolling_beta
    
    features['beta_to_world'] = beta_df
    
    return features


# Test your implementation
world_features = compute_world_features(df_world, returns, world_ret)

assert 'world_mom_21' in world_features, "Missing world_mom_21 feature"
assert 'world_vol_21' in world_features, "Missing world_vol_21 feature"
assert 'beta_to_world' in world_features, "Missing beta_to_world feature"

# World features should be same across countries
world_mom_std = world_features['world_mom_21'].std(axis=1).dropna()
assert (world_mom_std < 1e-10).all(), "world_mom_21 should be same for all countries"

# Beta should differ by country
beta_var = world_features['beta_to_world'].var(axis=1).dropna()
assert beta_var.mean() > 0.01, "beta_to_world should differ across countries"

print("✓ World features tests passed!")
print(f"\nSample beta_to_world statistics:")
print(world_features['beta_to_world'].dropna().describe().T[['mean', 'std']])

### Task 2.3: Feature Interpretation Question (5 points)

Look at the beta statistics above.

**Questions:**
1. Which countries have the highest average beta to MSCI World?
2. Which countries have the lowest average beta?
3. What does high beta mean for a country's risk profile?

**YOUR ANSWER HERE:**

*Double-click to edit this cell*

### Combine All Features (Provided)

In [None]:
def compute_rsi(prices: pd.Series, window: int = 14) -> pd.Series:
    """Compute Relative Strength Index (provided)."""
    delta = prices.diff()
    gain = delta.where(delta > 0, 0).rolling(window=window).mean()
    loss = (-delta.where(delta < 0, 0)).rolling(window=window).mean()
    rs = gain / (loss + 1e-10)
    return 100 - (100 / (1 + rs))


def compute_all_features(prices: pd.DataFrame, 
                         returns: pd.DataFrame,
                         world_prices: pd.DataFrame,
                         world_returns: pd.Series) -> pd.DataFrame:
    """
    Compute all features for all countries.
    Returns DataFrame with MultiIndex columns: (country, feature)
    """
    # Get momentum and world features from your implementations
    mom_features = compute_momentum_features(prices)
    world_feats = compute_world_features(world_prices, returns, world_returns)
    
    features_list = []
    
    for col in EM_NAMES:
        feat = pd.DataFrame(index=prices.index)
        
        # Momentum features
        feat['mom_21'] = mom_features['mom_21'][col]
        feat['mom_63'] = mom_features['mom_63'][col]
        feat['vol_21'] = mom_features['vol_21'][col]
        
        # World features
        feat['world_mom_21'] = world_feats['world_mom_21'][col]
        feat['world_vol_21'] = world_feats['world_vol_21'][col]
        feat['beta_to_world'] = world_feats['beta_to_world'][col]
        
        # Additional technical features
        feat['rsi_14'] = compute_rsi(prices[col], 14)
        feat['ma_ratio'] = prices[col] / prices[col].rolling(50).mean()
        
        # Add column prefix
        feat.columns = pd.MultiIndex.from_product([[col], feat.columns])
        features_list.append(feat)
    
    all_features = pd.concat(features_list, axis=1)
    return all_features.dropna()


# Compute all features
features = compute_all_features(df_pivot, returns, df_world, world_ret)

# Get feature names
feature_names = features.columns.get_level_values(1).unique().tolist()
N_FEATURES = len(feature_names)

print(f"Features shape: {features.shape}")
print(f"Number of features per country: {N_FEATURES}")
print(f"Feature names: {feature_names}")
print(f"Date range: {features.index.min()} to {features.index.max()}")

---

## Part 3: Graph Construction (15 points)

We construct a graph where:
- **Nodes** = Countries (10 nodes)
- **Edge weights** = Correlation between country returns

The intuition: correlated markets might share information for prediction.

### Task 3.1: Implement Correlation-Based Adjacency Matrix (10 points)

In [None]:
def compute_correlation_adjacency(returns_window: pd.DataFrame) -> np.ndarray:
    """
    Compute adjacency matrix from return correlations.
    
    Steps:
    1. Compute correlation matrix of returns
    2. Take absolute value (both positive and negative correlation = linkage)
    3. Set diagonal to 0 (no self-loops)
    
    Args:
        returns_window: DataFrame of returns for a time window
    
    Returns:
        Adjacency matrix (n_countries x n_countries)
    """
    # TODO: Compute correlation matrix
    # Hint: Use returns_window.corr()
    corr = None  # YOUR CODE HERE
    
    # TODO: Take absolute value (both positive and negative correlations matter)
    adj = None  # YOUR CODE HERE
    
    # TODO: Remove self-loops (set diagonal to 0)
    # Hint: Use np.fill_diagonal(adj, 0)
    # YOUR CODE HERE
    
    return adj


# Test your implementation
test_window = returns.iloc[-63:]  # Last 63 days
test_adj = compute_correlation_adjacency(test_window)

assert test_adj.shape == (10, 10), f"Adjacency shape should be (10,10), got {test_adj.shape}"
assert np.allclose(test_adj, test_adj.T), "Adjacency should be symmetric"
assert np.allclose(np.diag(test_adj), 0), "Diagonal should be zero (no self-loops)"
assert test_adj.min() >= 0, "Adjacency values should be non-negative"
assert test_adj.max() <= 1, "Adjacency values should be <= 1"

print("✓ Adjacency matrix tests passed!")
print(f"Average edge weight: {test_adj.mean():.3f}")
print(f"Max edge weight: {test_adj.max():.3f}")

In [None]:
# Visualize the adjacency matrix
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Correlation matrix (with sign)
ax = axes[0]
corr_matrix = test_window.corr().values
im = ax.imshow(corr_matrix, cmap='RdBu_r', vmin=-1, vmax=1)
ax.set_xticks(range(len(EM_NAMES)))
ax.set_yticks(range(len(EM_NAMES)))
ax.set_xticklabels(EM_NAMES, rotation=45, ha='right')
ax.set_yticklabels(EM_NAMES)
ax.set_title('Correlation Matrix (with sign)')
plt.colorbar(im, ax=ax)

# Adjacency matrix (absolute, no diagonal)
ax = axes[1]
im = ax.imshow(test_adj, cmap='Blues', vmin=0, vmax=1)
ax.set_xticks(range(len(EM_NAMES)))
ax.set_yticks(range(len(EM_NAMES)))
ax.set_xticklabels(EM_NAMES, rotation=45, ha='right')
ax.set_yticklabels(EM_NAMES)
ax.set_title('Adjacency Matrix (|correlation|, no self-loops)')
plt.colorbar(im, ax=ax)

plt.tight_layout()
plt.show()

### Task 3.2: Discussion Question (5 points)

Look at the adjacency matrix visualization above.

**Questions:**
1. Which pairs of countries have the strongest connections (highest correlation)?
2. Are there regional patterns (e.g., Asian markets correlating with each other)?
3. How might the GNN use these connections for prediction?

**YOUR ANSWER HERE:**

*Double-click to edit this cell*

---

## Part 4: GNN Implementation (25 points)

We implement a Graph Convolutional Network (GCN) that aggregates information from connected countries to improve predictions.

### Task 4.1: Implement GCN Layer (15 points)

The GCN layer performs message passing:
$$H' = \tilde{D}^{-1} \tilde{A} H W$$

Where:
- $\tilde{A} = A + I$ (adjacency with self-loops)
- $\tilde{D}$ = degree matrix of $\tilde{A}$
- $H$ = node features
- $W$ = learnable weights

In [None]:
class GCNLayer(nn.Module):
    """
    Graph Convolutional Layer.
    
    Performs: H' = activation(D^{-1} * A_hat * H * W)
    where A_hat = A + I (adjacency with self-loops)
    """
    
    def __init__(self, in_features: int, out_features: int):
        super(GCNLayer, self).__init__()
        self.linear = nn.Linear(in_features, out_features)
    
    def forward(self, x: torch.Tensor, adj: torch.Tensor) -> torch.Tensor:
        """
        Forward pass of GCN layer.
        
        Args:
            x: Node features (batch_size, n_nodes, in_features)
            adj: Adjacency matrix (batch_size, n_nodes, n_nodes)
        
        Returns:
            Updated node features (batch_size, n_nodes, out_features)
        """
        # TODO: Step 1 - Add self-loops to adjacency matrix
        # Hint: adj_with_self = adj + torch.eye(n_nodes)
        # Note: adj.shape[-1] gives n_nodes, use .to(adj.device) for GPU compatibility
        n_nodes = adj.shape[-1]
        adj_with_self = None  # YOUR CODE HERE
        
        # TODO: Step 2 - Compute degree and normalize
        # degree = sum of each row, then D^{-1} * A
        # Hint: deg = adj_with_self.sum(dim=-1, keepdim=True)
        #       adj_norm = adj_with_self / (deg + 1e-10)
        deg = None  # YOUR CODE HERE
        adj_norm = None  # YOUR CODE HERE
        
        # TODO: Step 3 - Message passing: aggregate neighbor features
        # Hint: Use torch.bmm(adj_norm, x) for batch matrix multiplication
        x_agg = None  # YOUR CODE HERE
        
        # Step 4 - Apply linear transformation (provided)
        out = self.linear(x_agg)
        
        return out


# Test your implementation
test_layer = GCNLayer(in_features=8, out_features=16)
test_x = torch.randn(2, 10, 8)  # batch=2, nodes=10, features=8
test_adj = torch.rand(2, 10, 10)  # random adjacency
test_adj = (test_adj + test_adj.transpose(-1, -2)) / 2  # make symmetric

test_out = test_layer(test_x, test_adj)

assert test_out.shape == (2, 10, 16), f"Output shape should be (2,10,16), got {test_out.shape}"
assert not torch.isnan(test_out).any(), "Output contains NaN"

print("✓ GCNLayer tests passed!")
print(f"Input shape: {test_x.shape}")
print(f"Output shape: {test_out.shape}")

### Task 4.2: Complete the GCN Model (10 points)

In [None]:
class GCN(nn.Module):
    """
    Graph Convolutional Network for node-level regression.
    
    Architecture:
    - Input: (batch, n_nodes, n_features)
    - GCN Layer 1: n_features -> hidden_dim
    - ReLU + Dropout
    - GCN Layer 2: hidden_dim -> hidden_dim
    - ReLU + Dropout  
    - Linear: hidden_dim -> 1 (output per node)
    """
    
    def __init__(self, input_dim: int, hidden_dim: int = 32, dropout: float = 0.3):
        super(GCN, self).__init__()
        
        # TODO: Create two GCN layers
        self.gcn1 = None  # YOUR CODE HERE - GCNLayer(input_dim, hidden_dim)
        self.gcn2 = None  # YOUR CODE HERE - GCNLayer(hidden_dim, hidden_dim)
        
        # Output layer (provided)
        self.fc_out = nn.Linear(hidden_dim, 1)
        self.dropout = nn.Dropout(dropout)
    
    def forward(self, x: torch.Tensor, adj: torch.Tensor) -> torch.Tensor:
        """
        Forward pass.
        
        Args:
            x: Node features (batch, n_nodes, input_dim)
            adj: Adjacency matrix (batch, n_nodes, n_nodes)
        
        Returns:
            Predictions (batch, n_nodes)
        """
        # TODO: Apply first GCN layer + ReLU + dropout
        x = None  # YOUR CODE HERE
        
        # TODO: Apply second GCN layer + ReLU + dropout
        x = None  # YOUR CODE HERE
        
        # Output layer (provided)
        out = self.fc_out(x).squeeze(-1)  # (batch, n_nodes)
        
        return out
    
    def count_parameters(self):
        return sum(p.numel() for p in self.parameters() if p.requires_grad)


# Test your implementation
test_model = GCN(input_dim=8, hidden_dim=32, dropout=0.3)
test_x = torch.randn(4, 10, 8)  # batch=4, nodes=10, features=8
test_adj = torch.rand(4, 10, 10)
test_adj = (test_adj + test_adj.transpose(-1, -2)) / 2

test_model.eval()
with torch.no_grad():
    test_out = test_model(test_x, test_adj)

assert test_out.shape == (4, 10), f"Output shape should be (4,10), got {test_out.shape}"
assert not torch.isnan(test_out).any(), "Output contains NaN"

print("✓ GCN model tests passed!")
print(f"Model parameters: {test_model.count_parameters()}")
print(f"Input shape: {test_x.shape}")
print(f"Output shape: {test_out.shape}")

### MLP Baseline (Provided)

In [None]:
class MLP(nn.Module):
    """MLP baseline - processes each node independently (no graph structure)."""
    
    def __init__(self, input_dim: int, hidden_dim: int = 32, dropout: float = 0.3):
        super().__init__()
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        self.fc2 = nn.Linear(hidden_dim, hidden_dim)
        self.fc3 = nn.Linear(hidden_dim, 1)
        self.dropout = nn.Dropout(dropout)
    
    def forward(self, x, adj=None):
        # adj is ignored - MLP doesn't use graph structure
        x = F.relu(self.fc1(x))
        x = self.dropout(x)
        x = F.relu(self.fc2(x))
        x = self.dropout(x)
        return self.fc3(x).squeeze(-1)
    
    def count_parameters(self):
        return sum(p.numel() for p in self.parameters() if p.requires_grad)

print(f"MLP parameters: {MLP(N_FEATURES, 32).count_parameters()}")
print(f"GCN parameters: {GCN(N_FEATURES, 32).count_parameters()}")

---

## Part 5: Training and Results (Provided + 20 points for interpretation)

We use **walk-forward validation**: train on past data, predict future alpha.

In [None]:
# Prepare data for walk-forward evaluation

# Filter to 2010-2019 period
START_DATE = '2010-01-01'
END_DATE = '2019-12-31'

# Align features and target
common_dates = features.index.intersection(forward_alpha.dropna().index)
common_dates = common_dates[(common_dates >= START_DATE) & (common_dates <= END_DATE)]

features_aligned = features.loc[common_dates]
target_alpha = forward_alpha.loc[common_dates]

print(f"Data period: {common_dates.min()} to {common_dates.max()}")
print(f"Number of samples: {len(common_dates)}")
print(f"Features per country: {N_FEATURES}")

In [None]:
# Walk-forward configuration
TRAIN_WINDOW = 504  # 2 years
TEST_WINDOW = 21    # 1 month
BUFFER = 21         # 1 month gap
STEP = 21           # Retrain monthly
COV_WINDOW = 63     # For adjacency matrix

# Training settings
HIDDEN_DIM = 32
N_EPOCHS = 30
BATCH_SIZE = 32
LR = 0.001
DROPOUT = 0.3

def prepare_fold_data(features, returns, target, train_start, train_end, test_start, test_end):
    """Prepare data for one walk-forward fold."""
    all_dates = features.index.tolist()
    train_dates = all_dates[train_start:train_end]
    test_dates = all_dates[test_start:test_end]
    
    n_train, n_test = len(train_dates), len(test_dates)
    n_nodes, n_features = len(EM_NAMES), N_FEATURES
    
    # Prepare arrays
    X_train = np.zeros((n_train, n_nodes, n_features))
    y_train = np.zeros((n_train, n_nodes))
    adj_train = np.zeros((n_train, n_nodes, n_nodes))
    
    X_test = np.zeros((n_test, n_nodes, n_features))
    y_test = np.zeros((n_test, n_nodes))
    adj_test = np.zeros((n_test, n_nodes, n_nodes))
    
    for i, date in enumerate(train_dates):
        for j, name in enumerate(EM_NAMES):
            X_train[i, j, :] = features.loc[date, name].values
        y_train[i, :] = target.loc[date].values
        date_loc = returns.index.get_loc(date)
        if date_loc >= COV_WINDOW:
            adj_train[i] = compute_correlation_adjacency(returns.iloc[date_loc-COV_WINDOW:date_loc])
    
    for i, date in enumerate(test_dates):
        for j, name in enumerate(EM_NAMES):
            X_test[i, j, :] = features.loc[date, name].values
        y_test[i, :] = target.loc[date].values
        date_loc = returns.index.get_loc(date)
        if date_loc >= COV_WINDOW:
            adj_test[i] = compute_correlation_adjacency(returns.iloc[date_loc-COV_WINDOW:date_loc])
    
    return {
        'X_train': X_train, 'y_train': y_train, 'adj_train': adj_train,
        'X_test': X_test, 'y_test': y_test, 'adj_test': adj_test,
        'train_dates': train_dates, 'test_dates': test_dates
    }


def train_model(model, X, y, adj, n_epochs=30, lr=0.001, batch_size=32):
    """Train a model."""
    optimizer = Adam(model.parameters(), lr=lr, weight_decay=1e-3)
    criterion = nn.MSELoss()
    
    X_t = torch.tensor(X, dtype=torch.float32)
    y_t = torch.tensor(y, dtype=torch.float32)
    adj_t = torch.tensor(adj, dtype=torch.float32)
    
    # Winsorize targets
    y_flat = y_t.flatten()
    low, high = torch.quantile(y_flat, 0.02), torch.quantile(y_flat, 0.98)
    y_t = torch.clamp(y_t, low, high)
    
    model.train()
    for epoch in range(n_epochs):
        perm = torch.randperm(len(X))
        for i in range(0, len(X), batch_size):
            idx = perm[i:i+batch_size]
            optimizer.zero_grad()
            pred = model(X_t[idx], adj_t[idx])
            loss = criterion(pred, y_t[idx])
            loss.backward()
            optimizer.step()


def evaluate_model(model, X, y, adj):
    """Evaluate model."""
    model.eval()
    with torch.no_grad():
        X_t = torch.tensor(X, dtype=torch.float32)
        adj_t = torch.tensor(adj, dtype=torch.float32)
        pred = model(X_t, adj_t).numpy()
    return pred, y

print("Training utilities defined.")

In [None]:
# Generate walk-forward folds
n_total = len(features_aligned)
all_dates = features_aligned.index.tolist()

folds = []
fold_idx = TRAIN_WINDOW

while fold_idx + BUFFER + TEST_WINDOW <= n_total:
    folds.append({
        'train_start': fold_idx - TRAIN_WINDOW,
        'train_end': fold_idx,
        'test_start': fold_idx + BUFFER,
        'test_end': min(fold_idx + BUFFER + TEST_WINDOW, n_total)
    })
    fold_idx += STEP

print(f"Number of walk-forward folds: {len(folds)}")
print(f"First fold: train {all_dates[folds[0]['train_start']]} to {all_dates[folds[0]['train_end']-1]}")
print(f"           test  {all_dates[folds[0]['test_start']]} to {all_dates[folds[0]['test_end']-1]}")

In [None]:
# Run walk-forward evaluation
model_names = ['Ridge', 'MLP', 'GCN']
results = {name: {'y_true': [], 'y_pred': []} for name in model_names}

print("Running Walk-Forward Evaluation...")
print("="*60)

# Limit to first 20 folds for speed (can increase for full evaluation)
MAX_FOLDS = 20

for fold_num, fold in enumerate(folds[:MAX_FOLDS]):
    if fold_num % 5 == 0:
        print(f"Fold {fold_num + 1}/{min(len(folds), MAX_FOLDS)}...")
    
    # Prepare data
    data = prepare_fold_data(
        features_aligned, returns, target_alpha,
        fold['train_start'], fold['train_end'],
        fold['test_start'], fold['test_end']
    )
    
    # Standardize features
    scaler = StandardScaler()
    X_train_flat = data['X_train'].reshape(-1, N_FEATURES)
    X_test_flat = data['X_test'].reshape(-1, N_FEATURES)
    X_train_scaled = scaler.fit_transform(X_train_flat).reshape(data['X_train'].shape)
    X_test_scaled = scaler.transform(X_test_flat).reshape(data['X_test'].shape)
    
    # Ridge baseline
    ridge = Ridge(alpha=10.0)
    ridge.fit(X_train_scaled.reshape(-1, N_FEATURES), data['y_train'].flatten())
    ridge_pred = ridge.predict(X_test_scaled.reshape(-1, N_FEATURES)).reshape(data['y_test'].shape)
    results['Ridge']['y_pred'].append(ridge_pred)
    results['Ridge']['y_true'].append(data['y_test'])
    
    # MLP
    torch.manual_seed(SEED)
    mlp = MLP(N_FEATURES, HIDDEN_DIM, DROPOUT)
    train_model(mlp, X_train_scaled, data['y_train'], data['adj_train'], N_EPOCHS, LR, BATCH_SIZE)
    mlp_pred, _ = evaluate_model(mlp, X_test_scaled, data['y_test'], data['adj_test'])
    results['MLP']['y_pred'].append(mlp_pred)
    results['MLP']['y_true'].append(data['y_test'])
    
    # GCN
    torch.manual_seed(SEED)
    gcn = GCN(N_FEATURES, HIDDEN_DIM, DROPOUT)
    train_model(gcn, X_train_scaled, data['y_train'], data['adj_train'], N_EPOCHS, LR, BATCH_SIZE)
    gcn_pred, _ = evaluate_model(gcn, X_test_scaled, data['y_test'], data['adj_test'])
    results['GCN']['y_pred'].append(gcn_pred)
    results['GCN']['y_true'].append(data['y_test'])

# Concatenate results
for name in model_names:
    results[name]['y_true'] = np.vstack(results[name]['y_true'])
    results[name]['y_pred'] = np.vstack(results[name]['y_pred'])

print("\n" + "="*60)
print("Walk-forward evaluation complete!")

In [None]:
# Compute and display results
print("\n" + "="*70)
print("RESULTS SUMMARY: Predicting 21-day Alpha")
print("="*70)
print(f"{'Model':<10} {'Test R2':>12} {'Test Corr':>12} {'Test RMSE':>12}")
print("-"*70)

metrics = {}
for name in model_names:
    y_true = results[name]['y_true'].flatten()
    y_pred = results[name]['y_pred'].flatten()
    
    r2 = r2_score(y_true, y_pred)
    corr = np.corrcoef(y_true, y_pred)[0, 1]
    rmse = np.sqrt(mean_squared_error(y_true, y_pred))
    
    metrics[name] = {'r2': r2, 'corr': corr, 'rmse': rmse}
    print(f"{name:<10} {r2:>12.4f} {corr:>12.4f} {rmse:>12.4f}")

print("="*70)

In [None]:
# Per-country correlation
print("\nPer-Country Test Correlation:")
print("-"*60)
print(f"{'Country':<12}", end="")
for name in model_names:
    print(f"{name:>12}", end="")
print()
print("-"*60)

per_country_corr = {name: [] for name in model_names}
for i, country in enumerate(EM_NAMES):
    print(f"{country:<12}", end="")
    for name in model_names:
        y_true = results[name]['y_true'][:, i]
        y_pred = results[name]['y_pred'][:, i]
        corr = np.corrcoef(y_true, y_pred)[0, 1]
        per_country_corr[name].append(corr)
        print(f"{corr:>12.4f}", end="")
    print()

print("-"*60)
print(f"{'Average':<12}", end="")
for name in model_names:
    print(f"{np.mean(per_country_corr[name]):>12.4f}", end="")
print()

In [None]:
# Visualize results
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

colors = {'Ridge': 'gray', 'MLP': 'blue', 'GCN': 'green'}

for idx, name in enumerate(model_names):
    ax = axes[idx]
    y_true = results[name]['y_true'].flatten()
    y_pred = results[name]['y_pred'].flatten()
    
    ax.scatter(y_pred, y_true, alpha=0.2, s=10, c=colors[name])
    
    # Add diagonal
    lims = [min(y_pred.min(), y_true.min()), max(y_pred.max(), y_true.max())]
    ax.plot(lims, lims, 'k--', alpha=0.5)
    
    ax.set_xlabel('Predicted Alpha')
    ax.set_ylabel('Actual Alpha')
    ax.set_title(f'{name}\nCorr={metrics[name]["corr"]:.4f}')
    ax.grid(True, alpha=0.3)

plt.suptitle('Predicted vs Actual Alpha', fontsize=14)
plt.tight_layout()
plt.show()

### Task 5.1: Model Comparison (7 points)

Based on the results above:

1. Which model performed best in terms of correlation with actual alpha?
2. Did the GCN outperform the MLP? Why or why not might this be the case?
3. Why are R² values negative? What does this mean?

**YOUR ANSWER HERE:**

*Double-click to edit this cell*

### Task 5.2: Per-Country Analysis (7 points)

Look at the per-country correlation table.

1. Which countries were easiest to predict (highest correlation)?
2. Which countries were hardest to predict?
3. Can you hypothesize why some countries might be more predictable than others?

**YOUR ANSWER HERE:**

*Double-click to edit this cell*

### Task 5.3: Financial Interpretation (6 points)

Consider using these predictions in a real trading strategy.

1. How might you use alpha predictions to construct a portfolio?
2. What are the limitations of this approach?
3. What additional data or features might improve predictions?

**YOUR ANSWER HERE:**

*Double-click to edit this cell*

---

## Bonus: Graph Attention Network (10 points)

Implement a simple attention mechanism for the graph.

In [None]:
class GraphAttentionLayer(nn.Module):
    """
    BONUS: Implement a Graph Attention Layer.
    
    Instead of using fixed adjacency weights, learn attention weights.
    """
    
    def __init__(self, in_features: int, out_features: int):
        super().__init__()
        self.W = nn.Linear(in_features, out_features)
        self.a = nn.Linear(2 * out_features, 1)
        self.leaky_relu = nn.LeakyReLU(0.2)
    
    def forward(self, x: torch.Tensor, adj: torch.Tensor) -> torch.Tensor:
        """
        TODO (BONUS): Implement attention-based message passing.
        
        Steps:
        1. Transform features: h = W @ x
        2. Compute attention scores for each pair (i,j)
        3. Apply softmax over neighbors (masked by adjacency)
        4. Aggregate: weighted sum of neighbor features
        """
        # YOUR CODE HERE (BONUS)
        pass


# If you implement the bonus, test it here
# test_gat_layer = GraphAttentionLayer(8, 16)
# test_out = test_gat_layer(test_x, test_adj)
# print(f"GAT layer output shape: {test_out.shape}")

---

## Submission Checklist

Before submitting, make sure:

- [ ] All code cells run without errors
- [ ] All TODO sections are completed
- [ ] All test assertions pass (you see ✓ messages)
- [ ] All discussion questions are answered
- [ ] Your name and student ID are at the top
- [ ] The notebook is saved with all outputs visible

**Submit**: Upload this completed notebook (.ipynb file) to the course portal.