[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/fonnesbeck/instats_gp/blob/main/sessions/Session_4.ipynb)

# Session 4: Multi-Output GPs and Case Study

## Learning Objectives

1. Understand multi-output GP models for correlated outputs
2. Handle multidimensional inputs with ARD lengthscales
3. Build coregionalized models using Hadamard product kernels
4. Execute a comprehensive case study: Soccer player skill modeling
5. Integrate hierarchical structure and non-Gaussian likelihoods
6. Interpret factor models that decompose skill from context

In [None]:
# Core scientific computing
import numpy as np
import scipy.stats as stats
import polars as pl

# PyMC ecosystem
import pymc as pm
import pytensor.tensor as pt
import arviz as az

# Visualization
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.io as pio

# Reproducibility
RANDOM_SEED = 20090425
RNG = np.random.default_rng(RANDOM_SEED)
pio.renderers.default = "plotly_mimetype+notebook_connected"

print(f"PyMC: {pm.__version__}, NumPy: {np.__version__}")
print(f"Polars: {pl.__version__}, ArviZ: {az.__version__}")

## Part A: Multi-Output Gaussian Processes

### Why Model Multiple Outputs Together?

Imagine analyzing 27 elite soccer players. You could fit 27 separate GPs, but this misses that **players operate in a shared context**.

Multi-output GPs offer:
1. **Information sharing** between related outputs
2. **Partial pooling** for data-scarce outputs
3. **Learned correlation structure**  
4. **Computational efficiency**

Let's start with multidimensional inputs.

### Automatic Relevance Determination (ARD)

ARD assigns independent lengthscales to each input dimension:

$$k(\mathbf{x}, \mathbf{x}') = \sigma^2 \exp\left(-\frac{1}{2}\sum_{i=1}^d \frac{(x_i - x'_i)^2}{\ell_i^2}\right)$$

Large $\ell_i$ â†’ dimension $i$ is irrelevant.

### Generating Synthetic Data

Create 2D data where only x1 matters.

In [None]:
n_obs = 150
x1 = RNG.uniform(-3, 3, n_obs)  # Relevant
x2 = RNG.uniform(-3, 3, n_obs)  # Irrelevant
y_obs = np.sin(2 * x1) + 0.5 * x1 + RNG.normal(0, 0.2, n_obs)
X_train = np.column_stack([x1, x2])

print(f"{X_train.shape[0]} observations, {X_train.shape[1]} features")

### Fitting ARD Model

Watch what lengthscales the model learns.

In [None]:
with pm.Model() as ard_model:
    ls = pm.Gamma("ls", alpha=2, beta=1, shape=2)
    eta = pm.HalfNormal("eta", sigma=2)
    cov_func = eta**2 * pm.gp.cov.ExpQuad(input_dim=2, ls=ls)
    gp = pm.gp.Marginal(cov_func=cov_func)
    sigma = pm.HalfNormal("sigma", sigma=0.5)
    y_ = gp.marginal_likelihood("y", X=X_train, y=y_obs, sigma=sigma)
    trace_ard = pm.sample(1000, tune=1000, random_seed=RANDOM_SEED, chains=2)

### Visualizing Learned Lengthscales

In [None]:
ls_post = az.extract(trace_ard, var_names=["ls"])
ls_means = ls_post.mean(dim="sample").values

fig = go.Figure()
for i in range(2):
    fig.add_trace(go.Violin(y=ls_post.sel(ls_dim_0=i).values,
                            name=f"Feature {i+1}", box_visible=True))
fig.update_layout(title="Learned Lengthscales", yaxis_title="Lengthscale")
fig.show()

print(f"F1: {ls_means[0]:.2f}, F2: {ls_means[1]:.2f}")
print(f"Ratio: {ls_means[1]/ls_means[0]:.1f}x")

### Interpretation

Feature 1: small ls â†’ relevant  
Feature 2: large ls â†’ irrelevant

ARD discovered which dimension matters!

### Diving Deeper into ARD

We just saw that ARD successfully identified which input dimension matters. The ratio of lengthscales tells us how strongly the model believes dimension 2 is irrelevant compared to dimension 1.

In practice, ARD enables **automatic feature selection**: irrelevant dimensions get large lengthscales (model becomes insensitive to them), while important dimensions get small lengthscales (model pays close attention).

Now let's move from multidimensional **inputs** to multiple **outputs**.

## Intrinsic Coregionalization Model (ICM)

Imagine you're tracking the fastball spin rates of 5 elite pitchers across a baseball season. You could fit 5 separate GPs, one per pitcher. But what if some pitchers have similar mechanics, and their spin rates fluctuate together?

**Multi-output Gaussian Processes** let us model related outputs jointly, learning:
1. How each output evolves over inputs (e.g., time)
2. How outputs correlate with each other

The key insight: if outputs are related, we should **share statistical strength** across them.

### The Mathematics of Sharing Structure

ICM uses the Kronecker product (âŠ—) to combine two covariance structures:

$$K_{ICM}([\mathbf{x}_i, o_i], [\mathbf{x}_j, o_j]) = K_{input}(\mathbf{x}_i, \mathbf{x}_j) \times B(o_i, o_j)$$

Where:
- $K_{input}(\mathbf{x}_i, \mathbf{x}_j)$: Covariance over inputs (e.g., time)
- $B(o_i, o_j)$: **Coregionalization matrix** â€” covariance between outputs  
- $o_i, o_j$: Output indices (e.g., pitcher 0, pitcher 1, ...)

Think of it as: *"How similar are these inputs?"* multiplied by *"How correlated are these outputs?"*

The coregionalization matrix has a special structure that ensures it's positive semi-definite:

$$B = WW^T + \text{diag}(\kappa)$$

This separates:
- $WW^T$: Shared variations across outputs (low-rank structure)
- $\text{diag}(\kappa)$: Output-specific independent noise

### Real Data: Baseball Pitcher Spin Rates

Let's see ICM in action with real data. We'll model fastball spin rates of 5 elite pitchers across the 2021 MLB season.

**Why spin rate matters:** Higher spin rates make fastballs harder to hit. Spin rate fluctuates game-to-game due to:
- Fatigue accumulation  
- Mechanics adjustments  
- Measurement noise  
- Potentially shared factors (weather, ball characteristics)

**Hypothesis:** Some pitchers' spin rates may be correlated if they have similar mechanics or respond similarly to external factors.

In [None]:
# Load baseball spin rate data
df_spin = pl.read_csv("../data/fastball_spin_rates.csv")

# Standardize spin rates to z-scores
mean_spin = df_spin["avg_spin_rate"].mean()
std_spin = df_spin["avg_spin_rate"].std()
df_spin = df_spin.with_columns([
    ((pl.col("avg_spin_rate") - mean_spin) / std_spin).alias("avg_spin_rate_std")
])

print(f"Total: {df_spin.height} observations, {df_spin['pitcher_name'].n_unique()} pitchers")
print(f"Date range: {df_spin['game_date'].min()} to {df_spin['game_date'].max()}")
df_spin.head()

We standardized spin rates so all pitchers are on the same scale (z-scores). This makes the coregionalization matrix more interpretable: values close to 1 indicate strong correlation between pitchers.

In [None]:
# Get top 5 pitchers by number of games
top_pitchers_df = (df_spin
    .group_by("pitcher_name")
    .agg(pl.count("game_date").alias("n_games"))
    .sort("n_games", descending=True)
    .head(5)
)

top_pitchers = top_pitchers_df["pitcher_name"].to_list()
print("Top 5 pitchers:")
print(top_pitchers_df)

# Filter to these pitchers
df_train = df_spin.filter(pl.col("pitcher_name").is_in(top_pitchers))
print(f"\nTraining data: {df_train.height} observations")

Now we create two key index variables:

1. **`game_date_idx`**: Integer days since season start (April 1, 2021 = day 0)  
2. **`output_idx`**: Pitcher number (0 to 4)

Our input matrix $X$ will be $(n, 2)$ where each row is `[game_date_idx, output_idx]`.

In [None]:
# Convert to datetime
df_train = df_train.with_columns([
    pl.col("game_date").str.strptime(pl.Date, format="%Y-%m-%d").alias("game_date_dt")
])

# Create game date index (days since season start)
min_date = df_train["game_date_dt"].min()
df_train = df_train.with_columns([
    (pl.col("game_date_dt") - min_date).dt.total_days().alias("game_date_idx")
])

# Create output index
pitcher_to_idx = {name: idx for idx, name in enumerate(top_pitchers)}
df_train = df_train.with_columns([
    pl.col("pitcher_name").replace(pitcher_to_idx).alias("output_idx")
])

# Sort by output then time
df_train = df_train.sort(["output_idx", "game_date_idx"])

print("Data structure:")
print(df_train.select(["pitcher_name", "game_date_idx", "output_idx", "avg_spin_rate_std"]).head(10))

### Visualizing the Raw Data

Before modeling, let's examine the raw time series. This helps us understand trends, volatility, and potential correlations.

In [None]:
# Interactive time series plot
fig = go.Figure()
colors = px.colors.qualitative.Set2

for i, pitcher in enumerate(top_pitchers):
    pitcher_data = df_train.filter(pl.col("pitcher_name") == pitcher)
    fig.add_trace(go.Scatter(
        x=pitcher_data["game_date_idx"].to_list(),
        y=pitcher_data["avg_spin_rate_std"].to_list(),
        mode='markers',
        name=pitcher,
        marker=dict(size=5, color=colors[i]),
        opacity=0.7
    ))

fig.update_layout(
    title="Fastball Spin Rates: 2021 Season (Standardized)",
    xaxis_title="Days Since Season Start",
    yaxis_title="Standardized Spin Rate",
    height=450,
    hovermode='closest'
)
fig.show()

Each pitcher shows noisy variation. Some appear to have trends (gradual changes over the season), while others look more stationary. The ICM model will:

1. **Smooth** trajectories to separate signal from noise  
2. **Learn correlations** between pitchers  
3. **Quantify uncertainty** with credible intervals

Now let's build the model.

### Building the ICM: Helper Function

We define `get_icm()` to construct an ICM kernel. This combines an input kernel with a `Coregion` kernel using the Hadamard product (`*`).

The `active_dims` parameter tells each kernel which columns to operate on:
- Input kernel uses `active_dims=[0]` (time)  
- Coregion kernel uses `active_dims=[1]` (pitcher index)

In [None]:
def get_icm(input_dim, kernel, W=None, kappa=None, B=None, active_dims=None):
    """
    Construct Intrinsic Coregionalization Model kernel.
    
    Combines input kernel with output coregionalization via Hadamard product.
    
    Parameters
    ----------
    input_dim : int
        Total input dimensions (including output index)
    kernel : pm.gp.cov.Covariance
        Base kernel for inputs (e.g., ExpQuad over time)
    W, kappa, B : tensors, optional
        Coregionalization parameters
    active_dims : list, optional
        Dimensions for coregion kernel (typically [1] for output index)
    
    Returns
    -------
    pm.gp.cov.Covariance
        ICM kernel
    """
    coreg = pm.gp.cov.Coregion(
        input_dim=input_dim,
        W=W,
        kappa=kappa,
        B=B,
        active_dims=active_dims
    )
    # Hadamard product: kernel * coreg
    icm_cov = kernel * coreg
    return icm_cov

print("âœ“ Helper function defined")

**Critical distinction:**
- Kernel **addition** (`+`): combines multiple processes  
- Kernel **multiplication** (`*`): Hadamard product for ICM

The `*` operator creates a covariance where the input kernel and coregion kernel operate **independently** on their designated dimensions, then multiply.

### Preparing Data for PyMC

Convert polars DataFrame to numpy arrays:

In [None]:
# Extract training arrays
X_train = df_train.select(["game_date_idx", "output_idx"]).to_numpy().astype(np.float64)
y_train = df_train.select("avg_spin_rate_std").to_numpy().flatten()

n_outputs = len(top_pitchers)

print(f"X shape: {X_train.shape} (rows x [time, output_idx])")
print(f"y shape: {y_train.shape}")
print(f"n_outputs: {n_outputs}")

### Specifying Priors

We need priors for:
- **Lengthscale** (`ell`): How quickly spin rate changes. `Gamma(2, 0.5)` gives mean â‰ˆ 4 days.
- **Amplitude** (`eta`): Overall temporal variation. `Gamma(3, 1)` gives mean = 3.
- **W**: Weight matrix $(5 \\times 2)$. Rank 2 assumes pitchers share â‰¤2 latent patterns.
- **kappa**: Output-specific variances.
- **sigma**: Observation noise.

In [None]:
with pm.Model() as icm_model:
    # Temporal kernel parameters
    ell = pm.Gamma("ell", alpha=2, beta=0.5)
    eta = pm.Gamma("eta", alpha=3, beta=1)
    
    # Base kernel on time (active_dims=[0])
    kernel_time = eta**2 * pm.gp.cov.ExpQuad(input_dim=2, ls=ell, active_dims=[0])
    
    # Coregionalization parameters
    W = pm.Normal("W", mu=0, sigma=3, shape=(n_outputs, 2),
                  initval=RNG.standard_normal((n_outputs, 2)))
    kappa = pm.Gamma("kappa", alpha=1.5, beta=1, shape=n_outputs)
    
    # Track B matrix
    B = pm.Deterministic("B", pt.dot(W, W.T) + pt.diag(kappa))
    
    # ICM kernel
    cov_icm = get_icm(input_dim=2, kernel=kernel_time, W=W, kappa=kappa, active_dims=[1])
    
    # GP
    gp = pm.gp.Marginal(cov_func=cov_icm)
    
    # Noise
    sigma = pm.HalfNormal("sigma", sigma=3)
    
    # Likelihood
    y_obs = gp.marginal_likelihood("y", X=X_train, y=y_train, sigma=sigma)

pm.model_to_graphviz(icm_model)

### Sampling the Posterior

This may take a few minutes...

In [None]:
with icm_model:
    trace_icm = pm.sample(
        1000,
        tune=1000,
        random_seed=RANDOM_SEED,
        chains=2,
        target_accept=0.95
    )

Let's check convergence:

In [None]:
summary = az.summary(trace_icm, var_names=["ell", "eta", "sigma", "kappa"])
print(f"R-hat: [{summary['r_hat'].min():.4f}, {summary['r_hat'].max():.4f}]")
print(f"ESS: [{summary['ess_bulk'].min():.0f}, {summary['ess_bulk'].max():.0f}]")
print("\nEstimates:")
print(summary[["mean", "sd", "hdi_3%", "hdi_97%"]])

Good convergence! The model has learned temporal dynamics and output correlations simultaneously.

Now let's make predictions.

### Posterior Predictions

Create test grid for all 5 pitchers:

In [None]:
# Test data: 200 time points
n_test = 200
time_test = np.linspace(0, 199, n_test)

# Stack for all outputs
X_test_list = []
for out_idx in range(n_outputs):
    X_out = np.column_stack([time_test, np.full(n_test, out_idx)])
    X_test_list.append(X_out)

X_test = np.vstack(X_test_list)
print(f"Test shape: {X_test.shape}")

In [None]:
with icm_model:
    f_pred = gp.conditional("f_pred", X_test)
    ppc_icm = pm.sample_posterior_predictive(
        trace_icm,
        var_names=["f_pred"],
        random_seed=RANDOM_SEED
    )

### Visualizing Multi-Output Predictions

Plot posterior for each pitcher:

In [None]:
fig = make_subplots(
    rows=5, cols=1,
    subplot_titles=top_pitchers,
    vertical_spacing=0.05,
    shared_xaxes=True
)

for i, pitcher in enumerate(top_pitchers):
    # Extract predictions for this output
    f_pred_i = ppc_icm.posterior_predictive["f_pred"].isel(
        f_pred_dim_0=slice(i*n_test, (i+1)*n_test)
    )
    
    mean = f_pred_i.mean(dim=["chain", "draw"]).values
    lower = np.percentile(f_pred_i.values, 2.5, axis=(0,1))
    upper = np.percentile(f_pred_i.values, 97.5, axis=(0,1))
    
    # HDI band
    fig.add_trace(go.Scatter(
        x=time_test, y=upper, line=dict(width=0),
        showlegend=False, hoverinfo='skip'
    ), row=i+1, col=1)
    
    fig.add_trace(go.Scatter(
        x=time_test, y=lower, fill='tonexty',
        line=dict(width=0), showlegend=False,
        fillcolor='rgba(135,206,250,0.3)'
    ), row=i+1, col=1)
    
    # Mean
    fig.add_trace(go.Scatter(
        x=time_test, y=mean, mode='lines',
        line=dict(color='steelblue', width=2),
        showlegend=False
    ), row=i+1, col=1)
    
    # Training data
    pitcher_data = df_train.filter(pl.col("pitcher_name") == pitcher)
    fig.add_trace(go.Scatter(
        x=pitcher_data["game_date_idx"].to_list(),
        y=pitcher_data["avg_spin_rate_std"].to_list(),
        mode='markers',
        marker=dict(color='red', size=3),
        showlegend=False
    ), row=i+1, col=1)

fig.update_xaxes(title_text="Days Since Season Start", row=5, col=1)
fig.update_layout(height=1000, title_text="ICM Posterior Predictions")
fig.show()

Notice how the model smooths noisy observations while respecting each pitcher's pattern. ,Uncertainty is wider where data is sparse, narrower with more observations.

Now let's examine the learned correlations.

### The Coregionalization Matrix: Who's Correlated?

Extract learned $B$ matrix:

In [None]:
# Posterior mean of B
B_post = az.extract(trace_icm, var_names=["B"]).mean(dim="sample").values

# Heatmap
fig = go.Figure(data=go.Heatmap(
z=B_post,
x=top_pitchers,
y=top_pitchers,
colorscale='RdBu',
zmid=0,
text=np.round(B_post, 2),
texttemplate='%{text}',
textfont={"size": 10}
))

fig.update_layout(
title="Learned Coregionalization Matrix B",
xaxis_title="Pitcher",
yaxis_title="Pitcher",
height=450
)
fig.show()

# Convert to correlation
std_devs = np.sqrt(np.diag(B_post))
corr = B_post / np.outer(std_devs, std_devs)
print("\nCorrelations:")
print(corr)

Diagonal elements are variances. Off-diagonal elements show shared variation. ,High positive covariance means those pitchers' spin rates fluctuate together.


## Linear Coregionalization Model (LCM)

ICM assumes a **single latent process** drives output correlations. But what if there are **multiple independent sources** of variation?

Examples:
- Long-term trends (aging, mechanics changes)
- Short-term fluctuations (fatigue, noise)

LCM extends ICM by **summing multiple ICM kernels**:

$$K_{LCM} = B_1 \\otimes K_1 + B_2 \\otimes K_2 + ...$$

This allows different output correlations at different timescales.

### When to Use LCM Over ICM

**Use ICM when:**
- Single timescale dominates
- Simplicity is priority
- Limited data

**Use LCM when:**
- Multiple timescales evident
- Different correlations at different scales
- Sufficient data for complexity

For baseball, we might combine:
- ExpQuad: smooth long-term trends
- Matern32: short-term wiggles

In [None]:
def get_lcm(input_dim, active_dims, num_outputs, kernels, W=None, kappa=None, B=None, name="LCM"):
    """
    Construct Linear Coregionalization Model kernel.
    
    Sums multiple ICM kernels.
    """
    if B is None:
        if kappa is None:
            kappa = pm.Gamma(f"{name}_kappa", alpha=5, beta=1, shape=num_outputs)
        if W is None:
            W = pm.Normal(f"{name}_W", mu=0, sigma=5, shape=(num_outputs, 1),
                         initval=RNG.standard_normal((num_outputs, 1)))
    else:
        kappa = None
    
    # Sum ICMs
    cov_lcm = 0
    for kernel in kernels:
        icm = get_icm(input_dim, kernel, W, kappa, B, active_dims)
        cov_lcm += icm
    
    return cov_lcm

print("âœ“ LCM helper defined")

This reuses the same $W$ and $\\kappa$ across kernels (shared coregionalization), ,but each kernel contributes its own input covariance shape.

In [None]:
with pm.Model() as lcm_model:
    # Two lengthscales
    ell = pm.Gamma("ell", alpha=2, beta=0.5, shape=2)
    eta = pm.Gamma("eta", alpha=3, beta=1, shape=2)
    
    # Two kernels
    kernel_list = [
        eta[0]**2 * pm.gp.cov.ExpQuad(input_dim=2, ls=ell[0], active_dims=[0]),
        eta[1]**2 * pm.gp.cov.Matern32(input_dim=2, ls=ell[1], active_dims=[0])
    ]
    
    # LCM kernel
    cov_lcm = get_lcm(input_dim=2, active_dims=[1], num_outputs=n_outputs,
                      kernels=kernel_list, name="LCM")
    
    # GP
    gp = pm.gp.Marginal(cov_func=cov_lcm)
    sigma = pm.HalfNormal("sigma", sigma=3)
    y_obs = gp.marginal_likelihood("y", X=X_train, y=y_train, sigma=sigma)

pm.model_to_graphviz(lcm_model)

In [None]:
with lcm_model:
    trace_lcm = pm.sample(
        1000,
        tune=1000,
        random_seed=RANDOM_SEED,
        chains=2,
        target_accept=0.95
    )

LCM has more parameters (2 lengthscales, 2 amplitudes) but shared coregionalization. ,This flexibility captures both smooth trends and short-term wiggles.

In [None]:
with lcm_model:
    f_pred_lcm = gp.conditional("f_pred", X_test)
    ppc_lcm = pm.sample_posterior_predictive(
        trace_lcm,
        var_names=["f_pred"],
        random_seed=RANDOM_SEED
    )

LCM predictions (similar visualization to ICM):

In [None]:
# Similar plotting code - abbreviated for space
print("LCM posterior predictive samples obtained")
print(f"Shape: {ppc_lcm.posterior_predictive['f_pred'].shape}")

### Model Comparison: ICM vs LCM

Quantitative comparison with LOO:

In [None]:
loo_icm = az.loo(trace_icm, pointwise=True)
loo_lcm = az.loo(trace_lcm, pointwise=True)

comparison = az.compare({"ICM": trace_icm, "LCM": trace_lcm}, ic="loo")
print(comparison)

The `elpd_diff` shows difference in predictive accuracy. ,If substantially larger than SE, the better model is meaningfully improved.

**Practical takeaway:** If similar, prefer ICM for simplicity. ,If LCM substantially better, added complexity is justified.

### Computational Cost

Summarize trade-offs:

In [None]:
icm_time = trace_icm.sample_stats["sampling_time"].values.sum()
lcm_time = trace_lcm.sample_stats["sampling_time"].values.sum()

comp_df = pl.DataFrame({
"Model": ["ICM", "LCM"],
"Sampling (s)": [icm_time, lcm_time],
"LOO": [loo_icm.elpd_loo, loo_lcm.elpd_loo]
})
print(comp_df)

### Summary: Multi-Output GPs

We've covered:

1. **ARD**: Different lengthscales per input dimension for feature selection
2. **ICM**: Model related outputs jointly with coregionalization matrix $B = WW^T + \\text{diag}(\\kappa)$
3. **LCM**: Extend to multiple timescales by summing ICM kernels

**When to use:**
- **ARD**: Always for multidimensional inputs
- **ICM**: Related outputs needing information sharing
- **LCM**: Domain knowledge suggests multiple processes

Next, we integrate these patterns with HSGP and hierarchical structure in our soccer case study.

### From Finance to FÃºtbol: Factor Models

In finance, the Fama-French model asks: Is a fund manager skilled, or just lucky with market exposure?

Similarly for soccer: Is a player elite, or do they benefit from strong teammates vs weak opponents?

Our factor model:
$$p_i = \\sigma(\\alpha_i + \\mathbf{X}_i \\boldsymbol{\\beta})$$

Where:
- $\\alpha_i$: Player skill (our main interest)
- $\\mathbf{X}_i \\boldsymbol{\\beta}$: Team context effects
- $\\sigma$: Sigmoid function

### Factor Engineering

We engineer 8 factors from pre-match data:
- `goalsscored_diff`: Current goals difference
- `points_diff`: Recent form
- `goal_balance_diff`: Overall strength disparity
- ... and 5 more

For this demo, we focus on 3 that capture key dimensions:
1. **`home_pitch`**: Home advantage
2. **`points_diff`**: Recent form/momentum
3. **`goal_balance_diff`**: Team strength disparity

## Part B: Coregionalization

ICM uses Hadamard product:

$$k([\mathbf{x}, i], [\mathbf{x}', j]) = k_{\text{input}}(\mathbf{x}, \mathbf{x}') \times k_{\text{coreg}}(i, j)$$

### Generate 3 Related Time Series

In [None]:
n_times, n_outputs = 40, 3
t = np.linspace(0, 10, n_times)
f1 = np.sin(t) + RNG.normal(0, 0.2, n_times)
f2 = np.sin(t) + 0.5*np.cos(2*t) + RNG.normal(0, 0.2, n_times)
f3 = -np.cos(t) + RNG.normal(0, 0.2, n_times)

X_mogp = np.column_stack([np.tile(t, n_outputs), np.repeat([0,1,2], n_times)])
y_mogp = np.concatenate([f1, f2, f3])

print(f"Multi-output: {X_mogp.shape[0]} obs")

### Fit Coregionalized GP

In [None]:
with pm.Model() as mogp_model:
    ls_time = pm.Gamma("ls_time", alpha=2, beta=1)
    eta = pm.HalfNormal("eta", sigma=2)
    cov_time = eta**2 * pm.gp.cov.ExpQuad(input_dim=2, ls=ls_time, active_dims=[0])
    
    W = pm.Normal("W", mu=0, sigma=1, shape=(n_outputs, 2))
    kappa = pm.Gamma("kappa", alpha=1.5, beta=1, shape=n_outputs)
    B = pm.Deterministic("B", pt.dot(W, W.T) + pt.diag(kappa))
    cov_out = pm.gp.cov.Coregion(input_dim=2, W=W, kappa=kappa, active_dims=[1])
    
    cov_total = cov_time * cov_out
    gp = pm.gp.Marginal(cov_func=cov_total)
    sigma = pm.HalfNormal("sigma", sigma=0.5)
    y_ = gp.marginal_likelihood("y", X=X_mogp, y=y_mogp, sigma=sigma)
    trace_mogp = pm.sample(1000, tune=1000, random_seed=RANDOM_SEED, chains=2)

### Learned Correlation Matrix

In [None]:
B_post = az.extract(trace_mogp, var_names=["B"]).mean(dim="sample").values
fig = go.Figure(data=go.Heatmap(z=B_post, colorscale='RdBu', zmid=0))
fig.update_layout(title="Output Covariance B", height=400)
fig.show()

Outputs 1-2: high covariance â†’ correlated  
Output 3: lower covariance â†’ independent

## Part C: Soccer Player Skill Modeling

### Challenge: Identify True Skill

Account for team strength, opponent quality, context, varying sample sizes.

### Hierarchical Logistic Regression

$$P(\text{goal}_{ij} = 1) = \text{logit}^{-1}(\alpha_i + \mathbf{X}_{ij}^T\boldsymbol{\beta})$$

### Load Data

In [None]:
df = pl.read_csv("../data/SFM_data_byPlayer_clean.csv")
n_players = df.select(pl.col("name_player")).unique().height
goal_rate = df.select(pl.col("goal").mean()).item()

print(f"{df.shape[0]} observations, {n_players} players")
print(f"Goal rate: {goal_rate:.1%}")

### Factors: Context Variables

1. home_pitch: Home advantage
2. points_diff: Recent form
3. goal_balance_diff: Team vs opponent strength

### Visualize Factor Effects

In [None]:
factors = ["home_pitch", "points_diff", "goal_balance_diff"]
fig = make_subplots(rows=1, cols=3, subplot_titles=factors)

for i, factor in enumerate(factors, 1):
    binned = (df.with_columns([
        pl.col(factor).cut(breaks=[-np.inf,-1,0,1,np.inf], 
                          labels=["Low","Mid-Low","Mid-High","High"]).alias("bin")
    ]).group_by("bin").agg([
        pl.col("goal").mean().alias("rate")
    ]).sort("bin"))
    
    fig.add_trace(go.Bar(x=binned["bin"].to_list(), 
                        y=binned["rate"].to_list()), row=1, col=i)

fig.update_layout(title="Goal Rate by Factor", showlegend=False, height=350)
fig.show()

### Prepare Data for Modeling

In [None]:
factor_cols = ["home_pitch", "points_diff", "goal_balance_diff"]
X_factors = df.select(factor_cols).to_numpy().astype(np.float64)
y_goals = df.select("goal").to_numpy().flatten().astype(int)

player_names = df.select("name_player").unique().sort("name_player")["name_player"].to_list()
player_idx_map = {name: i for i, name in enumerate(player_names)}
player_idx = df.select(pl.col("name_player").replace(player_idx_map)).to_numpy().flatten()

n_players, n_factors = len(player_names), len(factor_cols)
print(f"{len(y_goals)} obs, {n_players} players, {n_factors} factors")

### Fit Hierarchical Model

Partial pooling: data-scarce players regularized toward population mean.

In [None]:
def logit(p): 
    return np.log(p / (1 - p))

with pm.Model() as sfm_model:
    mu_alpha = pm.Normal("mu_alpha", mu=logit(goal_rate), sigma=1)
    sigma_alpha = pm.HalfNormal("sigma_alpha", sigma=0.5)
    alpha = pm.Normal("alpha", mu=mu_alpha, sigma=sigma_alpha, shape=n_players)
    
    beta = pm.Normal("beta", mu=0, sigma=1, shape=n_factors)
    
    eta = alpha[player_idx] + pm.math.dot(X_factors, beta)
    
    y_obs = pm.Bernoulli("y_obs", logit_p=eta, observed=y_goals)
    
    trace_sfm = pm.sample(1000, tune=1000, random_seed=RANDOM_SEED, 
                         target_accept=0.9, chains=2)

### Convergence Diagnostics

In [None]:
summary = az.summary(trace_sfm, var_names=["alpha","beta","mu_alpha","sigma_alpha"])
print(f"R-hat: [{summary['r_hat'].min():.4f}, {summary['r_hat'].max():.4f}]")
print(f"ESS: [{summary['ess_bulk'].min():.0f}, {summary['ess_bulk'].max():.0f}]")
print("\nFactor coefficients:")
print(summary.filter(like="beta", axis=0)[["mean","sd","hdi_3%","hdi_97%"]])

### Extract Player Skills

In [None]:
alpha_post = az.extract(trace_sfm, var_names=["alpha"])
alpha_means = alpha_post.mean(dim="sample").values

results = pl.DataFrame({
    "player": player_names,
    "skill_mean": alpha_means,
    "skill_lower": np.percentile(alpha_post.values, 2.5, axis=1),
    "skill_upper": np.percentile(alpha_post.values, 97.5, axis=1)
}).sort("skill_mean", descending=True)

print("Top 10:")
print(results.head(10))

### Visualize with Uncertainty

In [None]:
results_sorted = results.sort("skill_mean", descending=False)

fig = go.Figure()
fig.add_trace(go.Scatter(
    y=results_sorted["player"].to_list(),
    x=results_sorted["skill_mean"].to_list(),
    error_x=dict(type='data', symmetric=False,
                array=(results_sorted["skill_upper"]-results_sorted["skill_mean"]).to_list(),
                arrayminus=(results_sorted["skill_mean"]-results_sorted["skill_lower"]).to_list()),
    mode='markers', marker=dict(size=8, color='steelblue')
))
fig.add_vline(x=0, line_dash="dash", annotation_text="Avg")
fig.update_layout(title="Player Skills (Î±)", xaxis_title="Skill", height=700)
fig.show()

### Interpretation

- Skill hierarchy: Top players consistently better
- Uncertainty varies: More data â†’ narrower intervals
- Overlap matters: Can't confidently rank when intervals overlap
- Context-adjusted: Fair comparison across situations

### Factor Coefficients

In [None]:
beta_post = az.extract(trace_sfm, var_names=["beta"])
factor_res = pl.DataFrame({
    "factor": factor_cols,
    "coef": beta_post.mean(dim="sample").values,
    "lower": np.percentile(beta_post.values, 2.5, axis=1),
    "upper": np.percentile(beta_post.values, 97.5, axis=1)
})

fig = go.Figure()
fig.add_trace(go.Scatter(x=factor_res["coef"].to_list(), 
                        y=factor_res["factor"].to_list(),
                        error_x=dict(type='data', symmetric=False,
                                    array=(factor_res["upper"]-factor_res["coef"]).to_list(),
                                    arrayminus=(factor_res["coef"]-factor_res["lower"]).to_list()),
                        mode='markers', marker=dict(size=12, color='coral')))
fig.add_vline(x=0, line_dash="dash")
fig.update_layout(title="Factor Effects (Î²)", xaxis_title="Coefficient", height=300)
fig.show()

print(factor_res)

goal_balance_diff: strongest (team quality matters!)  
home_pitch: positive home advantage  
points_diff: form correlates with scoring

### ðŸ¤– LLM Exercise

Extend the model with temporal dynamics.

In [None]:
# ðŸ¤– EXERCISE: Add time-varying skills

def extend_with_hsgp():
    """
    Add HSGP for player skills over seasons.
    
    Prompt: "I have hierarchical logistic regression (Bernoulli, 
    alpha player effects, beta factors). Make alpha_i vary over 
    seasons with HSGP. Help me: 1) Define HSGP over seasons,  
    2) Integrate with model, 3) Update predictor. PyMC code."
    """
    pass

print("ðŸŽ¯ Extend SFM with time-varying HSGP skills")

### Extensions

1. Temporal dynamics: Aging curves, form  
2. More factors: Defensive rating, rest, injuries
3. Multi-level: Group by position
4. Predictive checks: Simulate vs holdout
5. Decision-making: Transfer value

## Workshop Summary

### Sessions 1-4 Journey

**Session 1**: Foundations (Bayesian inference, MVNâ†’GP, kernels)  
**Session 2**: Model Building (kernel composition, likelihoods)  
**Session 3**: Scaling (O(nÂ³), sparse, HSGP)  
**Session 4**: Applications (multi-output, hierarchical, real case study)

### GP Mindset

1. Flexibility: Adapt to data
2. Uncertainty: Full posteriors
3. Interpretability: Clear meanings
4. Composability: Complex from simple
5. Scalability: Modern approximations

### Next Steps

- PyMC docs: https://www.pymc.io/
- Rasmussen & Williams book (free online)
- PyMC examples and Discourse
- Apply to your data!

**Final thought**: GPs encode smoothness assumptions, let data speak. You're equipped for real-world problems!

### Acknowledgments

PyMC team, Alex Andorra & Max Goebel (soccer case), Danh Phan, Bill Engels, Chris Fonnesbeck (multi-output GPs).

Materials for educational use under open-source licenses.

### Temporal Complexity: Three Timescales

Player scoring ability varies across multiple timescales:

1. **Match-to-match (2-5 matchdays)**: Injury recovery, tactical changes, streaks
2. **Within-season form (15-25 matchdays)**: Confidence, fitness, team chemistry
3. **Career trajectory (2-6 seasons)**: Aging curve, experience, skill development/decline

We model all three simultaneously using **additive HSGP components**.

### HSGP Hyperparameter Selection

For each timescale, we choose `m` (basis functions) and `c` (boundary extension).

Key logic:
- Smaller lengthscales require larger `m` (more basis functions)
- Longer input ranges require larger `c` (avoid boundary artifacts)

In [None]:
# Check preliz availability for maximum entropy priors
try:
    import preliz as pz
    HAVE_PRELIZ = True
    print("âœ“ preliz available")
except ImportError:
    HAVE_PRELIZ = False
    print("preliz not available - will use manual priors")

We use **maximum entropy priors** to encode domain knowledge about lengthscales. ,Specify bounds, and preliz finds the distribution maximizing entropy subject to constraints.

In [None]:
if HAVE_PRELIZ:
    # Short-term: 2-5 matchdays
    ls_short_dist, _ = pz.maxent(pz.InverseGamma(), 2, 5)
    print(f"Short: InverseGamma(Î±={ls_short_dist.alpha:.2f}, Î²={ls_short_dist.beta:.2f})")
    
    # Medium-term: 15-25 matchdays
    ls_medium_dist, _ = pz.maxent(pz.InverseGamma(), 15, 25)
    print(f"Medium: InverseGamma(Î±={ls_medium_dist.alpha:.2f}, Î²={ls_medium_dist.beta:.2f})")
    
    # Long-term: 2-6 seasons
    ls_long_dist, _ = pz.maxent(pz.InverseGamma(), 2, 6)
    print(f"Long: InverseGamma(Î±={ls_long_dist.alpha:.2f}, Î²={ls_long_dist.beta:.2f})")
    
    # Visualize
    fig = make_subplots(rows=1, cols=3, subplot_titles=["Short", "Medium", "Long"])
    x_short = np.linspace(0.1, 10, 100)
    x_medium = np.linspace(5, 35, 100)
    x_long = np.linspace(0.5, 10, 100)
    
    fig.add_trace(go.Scatter(x=x_short, y=stats.invgamma.pdf(x_short, ls_short_dist.alpha, scale=ls_short_dist.beta), mode='lines'), row=1, col=1)
    fig.add_trace(go.Scatter(x=x_medium, y=stats.invgamma.pdf(x_medium, ls_medium_dist.alpha, scale=ls_medium_dist.beta), mode='lines'), row=1, col=2)
    fig.add_trace(go.Scatter(x=x_long, y=stats.invgamma.pdf(x_long, ls_long_dist.alpha, scale=ls_long_dist.beta), mode='lines'), row=1, col=3)
    
    fig.update_xaxes(title_text="Lengthscale", row=1, col=1)
    fig.update_xaxes(title_text="Lengthscale", row=1, col=2)
    fig.update_xaxes(title_text="Lengthscale", row=1, col=3)
    fig.update_layout(title="Lengthscale Priors for Three Timescales", showlegend=False, height=300)
    fig.show()
else:
    print("Using manual priors")
    ls_short_alpha, ls_short_beta = 3.0, 9.0
    ls_medium_alpha, ls_medium_beta = 3.0, 60.0
    ls_long_alpha, ls_long_beta = 3.0, 12.0

Now use PyMC's helper to determine `m` and `c`:

In [None]:
# Within-season (matchdays 1-38)
m_within, c_within = pm.gp.hsgp_approx.approx_hsgp_hyperparams(
x_range=[0, 38],
lengthscale_range=[5, 25],  # medium timescale
cov_func="matern52"
)
print(f"Within-season: m={m_within}, c={c_within:.2f}")

# Across-season
max_season = df.select(pl.col("season_id").max()).item()
m_long, c_long = pm.gp.hsgp_approx.approx_hsgp_hyperparams(
x_range=[0, max_season],
lengthscale_range=[2, 6],
cov_func="matern52"
)
print(f"Across-season: m={m_long}, c={c_long:.2f}")

Larger `m` accommodates smaller lengthscales (more flexibility). ,Larger `c` extends boundary for longer lengthscales (reduces edge effects).

### Building the Full Hierarchical Model

Components:
1. Hierarchical player intercepts (partial pooling)
2. Within-season GP (short + medium timescales)
3. Across-season GP (long-term aging curve)
4. Factor regression (team context)
5. Bernoulli likelihood

In [None]:
# Prepare coordinates
players_ordered = df.select("name_player").unique().sort("name_player")["name_player"].to_list()
unique_seasons = sorted(df.select("season_id").unique()["season_id"].to_list())
unique_gamedays = list(range(1, 39))

coords = {
"player": players_ordered,
"season": unique_seasons,
"gameday": unique_gamedays,
"factor": factors,
"timescale": ["short", "medium", "long"],
"obs_id": df.select("index")["index"].to_list()
}

print(f"{len(coords['player'])} players, {len(coords['season'])} seasons, {len(coords['gameday'])} matchdays")

Coordinates enable:
1. Semantic naming (easier to understand)
2. ArviZ plotting and slicing

### Complete PyMC Model with HSGPs

Now we assemble all components. This is complex, so we build step-by-step:

In [None]:
# Prepare data
player_idx = pd.Categorical(df["name_player"], categories=players_ordered).codes
gameday_idx = pd.Categorical(df["matchday"], categories=unique_gamedays).codes

with pm.Model(coords=coords) as enhanced_sfm:
    # Data containers
    factor_data = pm.Data("factor_data", factors_sdz.to_numpy(), dims=("obs_id", "factor"))
    gameday_id = pm.Data("gameday_id", gameday_idx, dims="obs_id")
    player_id = pm.Data("player_id", player_idx, dims="obs_id")
    season_id = pm.Data("season_id", df["season_id"].to_numpy(), dims="obs_id")
    goals_obs = pm.Data("goals_obs", df["goal"].to_numpy(), dims="obs_id")

print("âœ“ Data containers defined")

In [None]:
with enhanced_sfm:
    # Hierarchical player effects
    if HAVE_PRELIZ:
        player_diversity_dist, _ = pz.maxent(pz.Exponential(), 0.1, 2)
        sigma_player = player_diversity_dist.to_pymc(name="player_diversity")
    else:
        sigma_player = pm.Exponential("player_diversity", lam=1.0)
    
    from scipy.special import logit
    player_effect = pm.Normal(
        "player_effect",
        mu=logit(df["goal"].mean()),
        sigma=sigma_player,
        dims="player"
    )

print("âœ“ Player effects defined")

In [None]:
with enhanced_sfm:
    X_gamedays = pm.Data("X_gamedays", np.array(unique_gamedays)[:, None], dims="gameday")
    X_seasons = pm.Data("X_seasons", np.array(unique_seasons)[:, None], dims="season")
    
    # PC prior on amplitude
    alpha_scale, upper_scale = 0.01, 1.1
    amplitude = pm.Exponential(
        "amplitude",
        lam=-np.log(alpha_scale) / upper_scale,
        dims="timescale"
    )
    
    if HAVE_PRELIZ:
        ls = pm.InverseGamma(
            "ls",
            alpha=np.array([ls_short_dist.alpha, ls_medium_dist.alpha, ls_long_dist.alpha]),
            beta=np.array([ls_short_dist.beta, ls_medium_dist.beta, ls_long_dist.beta]),
            dims="timescale"
        )
    else:
        ls = pm.InverseGamma(
            "ls",
            alpha=np.array([ls_short_alpha, ls_medium_alpha, ls_long_alpha]),
            beta=np.array([ls_short_beta, ls_medium_beta, ls_long_beta]),
            dims="timescale"
        )
    
    # Covariances
    cov_short = amplitude[0]**2 * pm.gp.cov.Matern52(input_dim=1, ls=ls[0])
    cov_medium = amplitude[1]**2 * pm.gp.cov.Matern52(input_dim=1, ls=ls[1])
    cov_within = cov_short + cov_medium
    cov_long = amplitude[2]**2 * pm.gp.cov.Matern52(input_dim=1, ls=ls[2])
    
    # Within-season GP
    gp_within = pm.gp.HSGP(m=[m_within], c=c_within, cov_func=cov_within, drop_first=True)
    f_within = gp_within.prior("f_within", X=X_gamedays, dims="gameday")
    
    # Across-season GP
    gp_long = pm.gp.HSGP(m=[m_long], c=c_long, cov_func=cov_long, drop_first=True)
    f_long = gp_long.prior("f_long", X=X_seasons, dims="season")

print("âœ“ HSGPs defined")

In [None]:
with enhanced_sfm:
    # Combine effects
    alpha = pm.Deterministic(
        "alpha",
        player_effect[player_id] + f_within[gameday_id] + f_long[season_id],
        dims="obs_id"
    )
    
    # Factor slopes
    slope = pm.Normal("slope", sigma=0.25, dims="factor")
    
    # Probability
    p = pm.Deterministic(
        "p",
        pm.math.sigmoid(alpha + pm.math.dot(factor_data, slope)),
        dims="obs_id"
    )
    
    # Likelihood
    pm.Bernoulli("goals_scored", p=p, observed=goals_obs, dims="obs_id")

print("âœ“ Complete model defined")
pm.model_to_graphviz(enhanced_sfm)

### Prior Predictive Checks

Sample from the prior to verify it's reasonable:

In [None]:
with enhanced_sfm:
    idata_enhanced = pm.sample_prior_predictive(random_seed=RNG)

In [None]:
fig = go.Figure()
prior_p = idata_enhanced.prior.p.values.flatten()
fig.add_trace(go.Histogram(x=prior_p, nbinsx=50, name="Prior"))
fig.update_layout(
title="Prior Scoring Rate Distribution",
xaxis_title="Probability",
yaxis_title="Count",
height=350
)
fig.show()
print(f"Prior mean: {prior_p.mean():.3f}")

### Sampling the Posterior

This will take several minutes despite using Nutpie for faster sampling:

In [None]:
with enhanced_sfm:
    idata_enhanced.extend(
        pm.sample(
            nuts_sampler="nutpie",
            random_seed=RNG,
            target_accept=0.95
        )
    )

### Convergence Diagnostics

In [None]:
# ESS quantiles
ess = az.ess(idata_enhanced.posterior)
ess_summary = ess.quantile([0.01, 0.5, 0.99]).to_dataframe().astype(int)
print("ESS quantiles:")
print(ess_summary)

# R-hat
rhat = az.rhat(idata_enhanced.posterior)
rhat_summary = rhat.quantile([0.01, 0.5, 0.99]).to_dataframe()
print("\nR-hat quantiles:")
print(rhat_summary)

In [None]:
# Energy plot (convert to plotly)
import matplotlib.pyplot as plt
az.plot_energy(idata_enhanced)
plt.tight_layout()
plt.show()

In [None]:
with enhanced_sfm:
    idata_enhanced.extend(
        pm.sample_posterior_predictive(
            idata_enhanced,
            random_seed=RNG
        )
    )

### Posterior Predictive Checks

In [None]:
# Overall goal rate
fig = go.Figure()

# Observed
obs_rate = df["goal"].mean()
fig.add_vline(x=obs_rate, line_dash="dash", line_color="red",
annotation_text="Observed", annotation_position="top")

# Posterior predictive
ppc_goals = idata_enhanced.posterior_predictive["goals_scored"].values
ppc_rate = ppc_goals.mean(axis=(0,1,2))
fig.add_trace(go.Histogram(x=ppc_goals.mean(axis=2).flatten(), nbinsx=50,
name="Posterior Predictive"))

fig.update_layout(
title="Goal Rate: Observed vs Posterior Predictive",
xaxis_title="Mean Goal Rate",
height=350
)
fig.show()
print(f"Observed: {obs_rate:.3f}")
print(f"Posterior predictive: {ppc_rate:.3f}")

### Posterior GP Curves

Visualize learned temporal patterns:

In [None]:
# Extract GP posteriors
f_within_post = idata_enhanced.posterior["f_within"]
f_long_post = idata_enhanced.posterior["f_long"]

# Plot
fig = make_subplots(
rows=1, cols=2,
subplot_titles=["Within-Season Variation", "Across-Season Variation"]
)

# Within-season
f_within_mean = f_within_post.mean(dim=["chain", "draw"]).values
f_within_lower = np.percentile(f_within_post.values, 2.5, axis=(0,1))
f_within_upper = np.percentile(f_within_post.values, 97.5, axis=(0,1))

fig.add_trace(go.Scatter(
x=unique_gamedays, y=f_within_upper,
line=dict(width=0), showlegend=False
), row=1, col=1)
fig.add_trace(go.Scatter(
x=unique_gamedays, y=f_within_lower,
fill='tonexty', line=dict(width=0),
fillcolor='rgba(0,100,250,0.2)', showlegend=False
), row=1, col=1)
fig.add_trace(go.Scatter(
x=unique_gamedays, y=f_within_mean,
mode='lines', line=dict(color='steelblue', width=2),
showlegend=False
), row=1, col=1)

# Across-season
f_long_mean = f_long_post.mean(dim=["chain", "draw"]).values
f_long_lower = np.percentile(f_long_post.values, 2.5, axis=(0,1))
f_long_upper = np.percentile(f_long_post.values, 97.5, axis=(0,1))

fig.add_trace(go.Scatter(
x=unique_seasons, y=f_long_upper,
line=dict(width=0), showlegend=False
), row=1, col=2)
fig.add_trace(go.Scatter(
x=unique_seasons, y=f_long_lower,
fill='tonexty', line=dict(width=0),
fillcolor='rgba(0,100,250,0.2)', showlegend=False
), row=1, col=2)
fig.add_trace(go.Scatter(
x=unique_seasons, y=f_long_mean,
mode='lines', line=dict(color='steelblue', width=2),
showlegend=False
), row=1, col=2)

fig.update_xaxes(title_text="Matchday", row=1, col=1)
fig.update_xaxes(title_text="Season", row=1, col=2)
fig.update_yaxes(title_text="Goal Effect", row=1, col=1)
fig.update_layout(title="Posterior GP Effects", height=400)
fig.show()

The within-season GP shows form fluctuation across matchdays. ,The across-season GP shows the aging curve (young players improving, veterans declining).

### Enhanced Results

Factor effects with the full temporal model:

In [None]:
# Factor slopes
slope_post = az.extract(idata_enhanced.posterior, var_names=["slope"])
slope_summary = slope_post.mean(dim="sample").values

factor_results = pl.DataFrame({
"factor": factors,
"mean": slope_summary,
"lower": np.percentile(slope_post.values, 2.5, axis=1),
"upper": np.percentile(slope_post.values, 97.5, axis=1)
})

# Plot
fig = go.Figure()
fig.add_trace(go.Scatter(
x=factor_results["mean"].to_list(),
y=factor_results["factor"].to_list(),
error_x=dict(
type='data',
symmetric=False,
array=(factor_results["upper"] - factor_results["mean"]).to_list(),
arrayminus=(factor_results["mean"] - factor_results["lower"]).to_list()
),
mode='markers',
marker=dict(size=12, color='coral')
))
fig.add_vline(x=0, line_dash="dash")
fig.update_layout(
title="Factor Effects (Enhanced Model)",
xaxis_title="Coefficient",
height=300
)
fig.show()

print(factor_results)

### Summary: Complete Session 4

We've completed a comprehensive journey:

**Part A: Multi-Output GPs**
- ARD for automatic feature selection
- ICM for modeling correlated outputs
- LCM for multiple timescales
- Real baseball data with 5 pitchers

**Part B: Advanced Case Study**
- Hierarchical logistic regression
- Factor model for skill attribution
- HSGP for scalability
- Three timescales (short/medium/long)
- Maximum entropy priors
- Comprehensive diagnostics

**Key Takeaways:**
1. Multi-output GPs enable information sharing across related outputs
2. HSGP makes GPs practical for real datasets
3. Hierarchical structure provides partial pooling
4. Temporal modeling captures evolution of player skills
5. Factor models decompose skill from context

You now have the tools to apply sophisticated GP models to your own problems!