# Session 4: Multi-Output GPs and Case Studies

**Duration:** 2-3 hours  
**Prerequisites:** Sessions 1-3

## Learning Objectives

1. Understand multi-output GP models for correlated outputs
2. Handle multidimensional inputs with ARD lengthscales
3. Build coregionalized models using Hadamard product kernels
4. Execute a comprehensive case study: Soccer player skill modeling
5. Integrate hierarchical structure and non-Gaussian likelihoods
6. Interpret factor models that decompose skill from context

In [None]:
# Core scientific computing
import numpy as np
import scipy.stats as stats
import polars as pl

# PyMC ecosystem
import pymc as pm
import pytensor.tensor as pt
import arviz as az

# Visualization
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.io as pio

# Reproducibility
RANDOM_SEED = 20090425
RNG = np.random.default_rng(RANDOM_SEED)
pio.renderers.default = "plotly_mimetype+notebook_connected"

print(f"PyMC: {pm.__version__}, NumPy: {np.__version__}")
print(f"Polars: {pl.__version__}, ArviZ: {az.__version__}")

## Part A: Multi-Output Gaussian Processes

### Why Model Multiple Outputs Together?

Imagine analyzing 27 elite soccer players. You could fit 27 separate GPs, but this misses that **players operate in a shared context**.

Multi-output GPs offer:
1. **Information sharing** between related outputs
2. **Partial pooling** for data-scarce outputs
3. **Learned correlation structure**  
4. **Computational efficiency**

Let's start with multidimensional inputs.

### Automatic Relevance Determination (ARD)

ARD assigns independent lengthscales to each input dimension:

$$k(\mathbf{x}, \mathbf{x}') = \sigma^2 \exp\left(-\frac{1}{2}\sum_{i=1}^d \frac{(x_i - x'_i)^2}{\ell_i^2}\right)$$

Large $\ell_i$ â†’ dimension $i$ is irrelevant.

### Generating Synthetic Data

Create 2D data where only x1 matters.

In [None]:
n_obs = 150
x1 = RNG.uniform(-3, 3, n_obs)  # Relevant
x2 = RNG.uniform(-3, 3, n_obs)  # Irrelevant
y_obs = np.sin(2 * x1) + 0.5 * x1 + RNG.normal(0, 0.2, n_obs)
X_train = np.column_stack([x1, x2])

print(f"{X_train.shape[0]} observations, {X_train.shape[1]} features")

### Fitting ARD Model

Watch what lengthscales the model learns.

In [None]:
with pm.Model() as ard_model:
    ls = pm.Gamma("ls", alpha=2, beta=1, shape=2)
    eta = pm.HalfNormal("eta", sigma=2)
    cov_func = eta**2 * pm.gp.cov.ExpQuad(input_dim=2, ls=ls)
    gp = pm.gp.Marginal(cov_func=cov_func)
    sigma = pm.HalfNormal("sigma", sigma=0.5)
    y_ = gp.marginal_likelihood("y", X=X_train, y=y_obs, sigma=sigma)
    trace_ard = pm.sample(1000, tune=1000, random_seed=RANDOM_SEED, chains=2)

### Visualizing Learned Lengthscales

In [None]:
ls_post = az.extract(trace_ard, var_names=["ls"])
ls_means = ls_post.mean(dim="sample").values

fig = go.Figure()
for i in range(2):
    fig.add_trace(go.Violin(y=ls_post.sel(ls_dim_0=i).values,
                            name=f"Feature {i+1}", box_visible=True))
fig.update_layout(title="Learned Lengthscales", yaxis_title="Lengthscale")
fig.show()

print(f"F1: {ls_means[0]:.2f}, F2: {ls_means[1]:.2f}")
print(f"Ratio: {ls_means[1]/ls_means[0]:.1f}x")

### Interpretation

Feature 1: small ls â†’ relevant  
Feature 2: large ls â†’ irrelevant

ARD discovered which dimension matters!

## Part B: Coregionalization

ICM uses Hadamard product:

$$k([\mathbf{x}, i], [\mathbf{x}', j]) = k_{\text{input}}(\mathbf{x}, \mathbf{x}') \times k_{\text{coreg}}(i, j)$$

### Generate 3 Related Time Series

In [None]:
n_times, n_outputs = 40, 3
t = np.linspace(0, 10, n_times)
f1 = np.sin(t) + RNG.normal(0, 0.2, n_times)
f2 = np.sin(t) + 0.5*np.cos(2*t) + RNG.normal(0, 0.2, n_times)
f3 = -np.cos(t) + RNG.normal(0, 0.2, n_times)

X_mogp = np.column_stack([np.tile(t, n_outputs), np.repeat([0,1,2], n_times)])
y_mogp = np.concatenate([f1, f2, f3])

print(f"Multi-output: {X_mogp.shape[0]} obs")

### Fit Coregionalized GP

In [None]:
with pm.Model() as mogp_model:
    ls_time = pm.Gamma("ls_time", alpha=2, beta=1)
    eta = pm.HalfNormal("eta", sigma=2)
    cov_time = eta**2 * pm.gp.cov.ExpQuad(input_dim=2, ls=ls_time, active_dims=[0])
    
    W = pm.Normal("W", mu=0, sigma=1, shape=(n_outputs, 2))
    kappa = pm.Gamma("kappa", alpha=1.5, beta=1, shape=n_outputs)
    B = pm.Deterministic("B", pt.dot(W, W.T) + pt.diag(kappa))
    cov_out = pm.gp.cov.Coregion(input_dim=2, W=W, kappa=kappa, active_dims=[1])
    
    cov_total = cov_time * cov_out
    gp = pm.gp.Marginal(cov_func=cov_total)
    sigma = pm.HalfNormal("sigma", sigma=0.5)
    y_ = gp.marginal_likelihood("y", X=X_mogp, y=y_mogp, sigma=sigma)
    trace_mogp = pm.sample(1000, tune=1000, random_seed=RANDOM_SEED, chains=2)

### Learned Correlation Matrix

In [None]:
B_post = az.extract(trace_mogp, var_names=["B"]).mean(dim="sample").values
fig = go.Figure(data=go.Heatmap(z=B_post, colorscale='RdBu', zmid=0))
fig.update_layout(title="Output Covariance B", height=400)
fig.show()

Outputs 1-2: high covariance â†’ correlated  
Output 3: lower covariance â†’ independent

## Part C: Soccer Player Skill Modeling

### Challenge: Identify True Skill

Account for team strength, opponent quality, context, varying sample sizes.

### Hierarchical Logistic Regression

$$P(\text{goal}_{ij} = 1) = \text{logit}^{-1}(\alpha_i + \mathbf{X}_{ij}^T\boldsymbol{\beta})$$

### Load Data

In [None]:
df = pl.read_csv("../data/SFM_data_byPlayer_clean.csv")
n_players = df.select(pl.col("name_player")).unique().height
goal_rate = df.select(pl.col("goal").mean()).item()

print(f"{df.shape[0]} observations, {n_players} players")
print(f"Goal rate: {goal_rate:.1%}")

### Factors: Context Variables

1. home_pitch: Home advantage
2. points_diff: Recent form
3. goal_balance_diff: Team vs opponent strength

### Visualize Factor Effects

In [None]:
factors = ["home_pitch", "points_diff", "goal_balance_diff"]
fig = make_subplots(rows=1, cols=3, subplot_titles=factors)

for i, factor in enumerate(factors, 1):
    binned = (df.with_columns([
        pl.col(factor).cut(breaks=[-np.inf,-1,0,1,np.inf], 
                          labels=["Low","Mid-Low","Mid-High","High"]).alias("bin")
    ]).group_by("bin").agg([
        pl.col("goal").mean().alias("rate")
    ]).sort("bin"))
    
    fig.add_trace(go.Bar(x=binned["bin"].to_list(), 
                        y=binned["rate"].to_list()), row=1, col=i)

fig.update_layout(title="Goal Rate by Factor", showlegend=False, height=350)
fig.show()

### Prepare Data for Modeling

In [None]:
factor_cols = ["home_pitch", "points_diff", "goal_balance_diff"]
X_factors = df.select(factor_cols).to_numpy().astype(np.float64)
y_goals = df.select("goal").to_numpy().flatten().astype(int)

player_names = df.select("name_player").unique().sort("name_player")["name_player"].to_list()
player_idx_map = {name: i for i, name in enumerate(player_names)}
player_idx = df.select(pl.col("name_player").replace(player_idx_map)).to_numpy().flatten()

n_players, n_factors = len(player_names), len(factor_cols)
print(f"{len(y_goals)} obs, {n_players} players, {n_factors} factors")

### Fit Hierarchical Model

Partial pooling: data-scarce players regularized toward population mean.

In [None]:
def logit(p): 
    return np.log(p / (1 - p))

with pm.Model() as sfm_model:
    mu_alpha = pm.Normal("mu_alpha", mu=logit(goal_rate), sigma=1)
    sigma_alpha = pm.HalfNormal("sigma_alpha", sigma=0.5)
    alpha = pm.Normal("alpha", mu=mu_alpha, sigma=sigma_alpha, shape=n_players)
    
    beta = pm.Normal("beta", mu=0, sigma=1, shape=n_factors)
    
    eta = alpha[player_idx] + pm.math.dot(X_factors, beta)
    
    y_obs = pm.Bernoulli("y_obs", logit_p=eta, observed=y_goals)
    
    trace_sfm = pm.sample(1000, tune=1000, random_seed=RANDOM_SEED, 
                         target_accept=0.9, chains=2)

### Convergence Diagnostics

In [None]:
summary = az.summary(trace_sfm, var_names=["alpha","beta","mu_alpha","sigma_alpha"])
print(f"R-hat: [{summary['r_hat'].min():.4f}, {summary['r_hat'].max():.4f}]")
print(f"ESS: [{summary['ess_bulk'].min():.0f}, {summary['ess_bulk'].max():.0f}]")
print("\nFactor coefficients:")
print(summary.filter(like="beta", axis=0)[["mean","sd","hdi_3%","hdi_97%"]])

### Extract Player Skills

In [None]:
alpha_post = az.extract(trace_sfm, var_names=["alpha"])
alpha_means = alpha_post.mean(dim="sample").values

results = pl.DataFrame({
    "player": player_names,
    "skill_mean": alpha_means,
    "skill_lower": np.percentile(alpha_post.values, 2.5, axis=1),
    "skill_upper": np.percentile(alpha_post.values, 97.5, axis=1)
}).sort("skill_mean", descending=True)

print("Top 10:")
print(results.head(10))

### Visualize with Uncertainty

In [None]:
results_sorted = results.sort("skill_mean", descending=False)

fig = go.Figure()
fig.add_trace(go.Scatter(
    y=results_sorted["player"].to_list(),
    x=results_sorted["skill_mean"].to_list(),
    error_x=dict(type='data', symmetric=False,
                array=(results_sorted["skill_upper"]-results_sorted["skill_mean"]).to_list(),
                arrayminus=(results_sorted["skill_mean"]-results_sorted["skill_lower"]).to_list()),
    mode='markers', marker=dict(size=8, color='steelblue')
))
fig.add_vline(x=0, line_dash="dash", annotation_text="Avg")
fig.update_layout(title="Player Skills (Î±)", xaxis_title="Skill", height=700)
fig.show()

### Interpretation

- Skill hierarchy: Top players consistently better
- Uncertainty varies: More data â†’ narrower intervals
- Overlap matters: Can't confidently rank when intervals overlap
- Context-adjusted: Fair comparison across situations

### Factor Coefficients

In [None]:
beta_post = az.extract(trace_sfm, var_names=["beta"])
factor_res = pl.DataFrame({
    "factor": factor_cols,
    "coef": beta_post.mean(dim="sample").values,
    "lower": np.percentile(beta_post.values, 2.5, axis=1),
    "upper": np.percentile(beta_post.values, 97.5, axis=1)
})

fig = go.Figure()
fig.add_trace(go.Scatter(x=factor_res["coef"].to_list(), 
                        y=factor_res["factor"].to_list(),
                        error_x=dict(type='data', symmetric=False,
                                    array=(factor_res["upper"]-factor_res["coef"]).to_list(),
                                    arrayminus=(factor_res["coef"]-factor_res["lower"]).to_list()),
                        mode='markers', marker=dict(size=12, color='coral')))
fig.add_vline(x=0, line_dash="dash")
fig.update_layout(title="Factor Effects (Î²)", xaxis_title="Coefficient", height=300)
fig.show()

print(factor_res)

goal_balance_diff: strongest (team quality matters!)  
home_pitch: positive home advantage  
points_diff: form correlates with scoring

### ðŸ¤– LLM Exercise

Extend the model with temporal dynamics.

In [None]:
# ðŸ¤– EXERCISE: Add time-varying skills

def extend_with_hsgp():
    """
    Add HSGP for player skills over seasons.
    
    Prompt: "I have hierarchical logistic regression (Bernoulli, 
    alpha player effects, beta factors). Make alpha_i vary over 
    seasons with HSGP. Help me: 1) Define HSGP over seasons,  
    2) Integrate with model, 3) Update predictor. PyMC code."
    """
    pass

print("ðŸŽ¯ Extend SFM with time-varying HSGP skills")

### Extensions

1. Temporal dynamics: Aging curves, form  
2. More factors: Defensive rating, rest, injuries
3. Multi-level: Group by position
4. Predictive checks: Simulate vs holdout
5. Decision-making: Transfer value

## Workshop Summary

### Sessions 1-4 Journey

**Session 1**: Foundations (Bayesian inference, MVNâ†’GP, kernels)  
**Session 2**: Model Building (kernel composition, likelihoods)  
**Session 3**: Scaling (O(nÂ³), sparse, HSGP)  
**Session 4**: Applications (multi-output, hierarchical, real case study)

### GP Mindset

1. Flexibility: Adapt to data
2. Uncertainty: Full posteriors
3. Interpretability: Clear meanings
4. Composability: Complex from simple
5. Scalability: Modern approximations

### Next Steps

- PyMC docs: https://www.pymc.io/
- Rasmussen & Williams book (free online)
- PyMC examples and Discourse
- Apply to your data!

**Final thought**: GPs encode smoothness assumptions, let data speak. You're equipped for real-world problems!

### Acknowledgments

PyMC team, Alex Andorra & Max Goebel (soccer case), Danh Phan, Bill Engels, Chris Fonnesbeck (multi-output GPs).

Materials for educational use under open-source licenses.