[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/rpjena/random_matrix/blob/main/orthogonal_basket_attribution.ipynb)

# Portfolio PnL Attribution & Risk Decomposition Along Orthogonalized Stock Baskets

## Motivation

Given a **long/short portfolio**, we want to understand its performance, PnL attribution, and risk exposure along **thousands of long-only stock baskets** (e.g., sector baskets, thematic baskets, style baskets).

### The Problem with Raw Baskets

Raw baskets are correlated with the broad market (e.g., S&P 500). A portfolio that appears to have large exposure to "Tech" may simply have market beta. To isolate *idiosyncratic* basket exposures, we **orthogonalize** each basket's returns against its corresponding market index.

### Approach

1. **Construct basket returns** from constituent weights
2. **Orthogonalize** each basket against its market index via OLS regression, keeping the residual
3. **Regress** the portfolio's daily returns on the orthogonalized basket returns to get exposures
4. **Attribute PnL** = exposure x orthogonalized basket return (daily, cumulative)
5. **Decompose risk** via covariance-based marginal/absolute contribution to risk
6. **Present results** with clean formatting

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
import seaborn as sns
from scipy import linalg as sla
from IPython.display import display, HTML

sns.set_theme(style='whitegrid', font_scale=1.1)
pd.options.display.float_format = '{:,.4f}'.format
np.random.seed(42)

## 1. Simulated Data

We simulate:
- A universe of **200 stocks** over **504 trading days** (~2 years)
- **50 long-only baskets** (each containing a random subset of stocks)
- Each basket is mapped to a **market index** (here we use a single S&P 500 proxy for simplicity; extend to multiple markets as needed)
- A **long/short portfolio** with positions across these stocks

Replace this section with your own data to use real baskets.

In [None]:
# --- Configuration ---
N_STOCKS = 200
T_DAYS = 504
N_BASKETS = 50
BASKET_SIZE_RANGE = (10, 40)  # each basket has between 10 and 40 stocks
ANNUAL_FACTOR = 252

dates = pd.bdate_range('2023-01-02', periods=T_DAYS, freq='B')
stock_names = [f'STOCK_{i:03d}' for i in range(N_STOCKS)]
basket_names = [f'BASKET_{i:03d}' for i in range(N_BASKETS)]

In [None]:
# --- Simulate market index (S&P 500 proxy) ---
market_daily_mu = 0.08 / ANNUAL_FACTOR
market_daily_vol = 0.16 / np.sqrt(ANNUAL_FACTOR)
market_returns = pd.Series(
    np.random.normal(market_daily_mu, market_daily_vol, T_DAYS),
    index=dates, name='SP500'
)

# --- Simulate individual stock returns ---
# Each stock has a beta to the market plus idiosyncratic noise
betas = np.random.uniform(0.5, 1.5, N_STOCKS)
idio_vols = np.random.uniform(0.15, 0.45, N_STOCKS) / np.sqrt(ANNUAL_FACTOR)
alphas = np.random.normal(0, 0.02 / ANNUAL_FACTOR, N_STOCKS)

stock_returns = pd.DataFrame(index=dates, columns=stock_names, dtype=float)
for i, s in enumerate(stock_names):
    stock_returns[s] = (
        alphas[i]
        + betas[i] * market_returns.values
        + np.random.normal(0, idio_vols[i], T_DAYS)
    )

print(f'Stock returns: {stock_returns.shape}')
stock_returns.head()

In [None]:
# --- Construct long-only baskets with equal weights ---
basket_weights = {}  # dict: basket_name -> Series(stock_name -> weight)
basket_market_map = {}  # dict: basket_name -> market_index_name

for b in basket_names:
    size = np.random.randint(*BASKET_SIZE_RANGE)
    members = np.random.choice(stock_names, size=size, replace=False)
    w = pd.Series(1.0 / size, index=members, name=b)
    basket_weights[b] = w
    basket_market_map[b] = 'SP500'  # all US-based in this example

# Compute basket returns
basket_returns = pd.DataFrame(index=dates, columns=basket_names, dtype=float)
for b in basket_names:
    w = basket_weights[b]
    basket_returns[b] = stock_returns[w.index].values @ w.values

print(f'Basket returns: {basket_returns.shape}')
basket_returns.head()

In [None]:
# --- Simulate a long/short portfolio ---
# Random positions: some long, some short, roughly dollar-neutral
raw_positions = np.random.randn(N_STOCKS)
raw_positions -= raw_positions.mean()  # roughly dollar-neutral
portfolio_weights = pd.Series(raw_positions / np.abs(raw_positions).sum(),
                              index=stock_names, name='portfolio')

portfolio_returns = (stock_returns * portfolio_weights).sum(axis=1)
portfolio_returns.name = 'Portfolio'

print(f'Portfolio gross leverage: {portfolio_weights.abs().sum():.2f}')
print(f'Portfolio net exposure:   {portfolio_weights.sum():.4f}')
print(f'Portfolio ann. return:    {portfolio_returns.mean() * ANNUAL_FACTOR:.2%}')
print(f'Portfolio ann. vol:       {portfolio_returns.std() * np.sqrt(ANNUAL_FACTOR):.2%}')

## 2. Orthogonalize Baskets Against Their Market Index

For each basket $b$ with market index $m_b$, we run:

$$R^{\text{basket}}_b(t) = \alpha_b + \beta_b \cdot R^{\text{market}}_{m_b}(t) + \varepsilon_b(t)$$

The **orthogonalized basket return** is the residual $\varepsilon_b(t)$, which captures the basket's return *after removing market exposure*.

We store the regression statistics (alpha, beta, R-squared) for diagnostics.

In [None]:
def orthogonalize_baskets(basket_ret, market_ret_series, basket_market_map):
    """
    Orthogonalize each basket's returns against its corresponding market index.

    Parameters:
        basket_ret (pd.DataFrame): T x B basket returns
        market_ret_series (dict or pd.DataFrame): market returns keyed by market name.
            If a single Series, wraps into a dict.
        basket_market_map (dict): basket_name -> market_index_name

    Returns:
        ortho_ret (pd.DataFrame): T x B orthogonalized basket returns (residuals)
        reg_stats (pd.DataFrame): B x 3 DataFrame with alpha, beta, R2 per basket
    """
    if isinstance(market_ret_series, pd.Series):
        market_ret_series = {market_ret_series.name: market_ret_series}
    elif isinstance(market_ret_series, pd.DataFrame):
        market_ret_series = {c: market_ret_series[c] for c in market_ret_series.columns}

    ortho_ret = pd.DataFrame(index=basket_ret.index, columns=basket_ret.columns, dtype=float)
    stats = []

    for b in basket_ret.columns:
        mkt_name = basket_market_map[b]
        mkt = market_ret_series[mkt_name].reindex(basket_ret.index)
        y = basket_ret[b].values
        X = np.column_stack([np.ones(len(mkt)), mkt.values])

        # OLS: y = X @ [alpha, beta] + eps
        coeffs, residuals, _, _ = np.linalg.lstsq(X, y, rcond=None)
        alpha_b, beta_b = coeffs
        fitted = X @ coeffs
        eps = y - fitted

        ss_res = np.sum(eps ** 2)
        ss_tot = np.sum((y - y.mean()) ** 2)
        r2 = 1.0 - ss_res / ss_tot if ss_tot > 0 else 0.0

        ortho_ret[b] = eps
        stats.append({'basket': b, 'market': mkt_name,
                      'alpha_ann': alpha_b * ANNUAL_FACTOR,
                      'beta': beta_b, 'R2': r2})

    reg_stats = pd.DataFrame(stats).set_index('basket')
    return ortho_ret, reg_stats


ortho_basket_returns, regression_stats = orthogonalize_baskets(
    basket_returns, market_returns, basket_market_map
)

print('=== Orthogonalization Regression Stats (first 10 baskets) ===')
display(regression_stats.head(10).style
    .format({'alpha_ann': '{:.4f}', 'beta': '{:.3f}', 'R2': '{:.3f}'})
    .set_caption('Market Regression: alpha (annualized), beta, R²')
    .background_gradient(subset=['R2'], cmap='YlOrRd')
)

In [None]:
# Verify orthogonality: correlation of residuals with market should be ~0
corr_with_market = ortho_basket_returns.corrwith(market_returns)
print(f'Correlation of orthogonalized baskets with market:')
print(f'  Mean:   {corr_with_market.mean():.6f}')
print(f'  Max:    {corr_with_market.abs().max():.6f}')
print(f'  Stdev:  {corr_with_market.std():.6f}')
print('=> All near zero confirms successful orthogonalization.')

## 3. Portfolio Exposure to Orthogonalized Baskets

We regress the portfolio's returns on the orthogonalized basket returns using **ridge regression** (L2 regularization) since the number of baskets can be large and they may still be correlated with each other:

$$R^{\text{port}}(t) = \sum_b \delta_b \cdot \varepsilon_b(t) + \eta(t)$$

The coefficients $\delta_b$ are the portfolio's exposures to each orthogonalized basket.

In [None]:
def estimate_basket_exposures(port_ret, ortho_ret, ridge_lambda=0.01):
    """
    Estimate portfolio exposures to orthogonalized baskets via ridge regression.

    Parameters:
        port_ret (pd.Series): T portfolio returns
        ortho_ret (pd.DataFrame): T x B orthogonalized basket returns
        ridge_lambda (float): L2 regularization parameter

    Returns:
        exposures (pd.Series): B exposures (regression coefficients)
        r2 (float): in-sample R-squared
        residual (pd.Series): T unexplained returns
    """
    X = ortho_ret.values  # T x B
    y = port_ret.values   # T
    B = X.shape[1]

    # Ridge: (X'X + lambda*I)^-1 X'y
    XtX = X.T @ X
    Xty = X.T @ y
    coeffs = np.linalg.solve(XtX + ridge_lambda * np.eye(B), Xty)

    fitted = X @ coeffs
    resid = y - fitted
    ss_res = np.sum(resid ** 2)
    ss_tot = np.sum((y - y.mean()) ** 2)
    r2 = 1.0 - ss_res / ss_tot

    exposures = pd.Series(coeffs, index=ortho_ret.columns, name='exposure')
    residual = pd.Series(resid, index=port_ret.index, name='unexplained')
    return exposures, r2, residual


exposures, model_r2, unexplained = estimate_basket_exposures(
    portfolio_returns, ortho_basket_returns, ridge_lambda=0.01
)

print(f'Model R²: {model_r2:.4f}')
print(f'\nTop 10 basket exposures (by magnitude):')
top_exp = exposures.abs().nlargest(10)
display(exposures.loc[top_exp.index].to_frame('Exposure').style
    .format('{:.4f}')
    .bar(color=['#d65f5f', '#5fba7d'], align='mid')
    .set_caption('Portfolio Exposure to Orthogonalized Baskets')
)

## 4. PnL Attribution

Daily PnL attributed to basket $b$:

$$\text{PnL}_b(t) = \delta_b \cdot \varepsilon_b(t)$$

We compute daily and cumulative attribution, plus summary statistics.

In [None]:
def compute_pnl_attribution(exposures, ortho_ret, port_ret):
    """
    Compute PnL attribution of portfolio returns to orthogonalized baskets.

    Parameters:
        exposures (pd.Series): B exposures
        ortho_ret (pd.DataFrame): T x B orthogonalized basket returns
        port_ret (pd.Series): T portfolio returns

    Returns:
        daily_attr (pd.DataFrame): T x (B+1) daily PnL attribution (baskets + unexplained)
        cum_attr (pd.DataFrame): T x (B+1) cumulative PnL attribution
        summary (pd.DataFrame): B+1 summary stats
    """
    # Daily attribution
    daily_attr = ortho_ret.mul(exposures, axis=1)
    daily_attr['Unexplained'] = port_ret.values - daily_attr.sum(axis=1).values

    # Cumulative
    cum_attr = daily_attr.cumsum()

    # Summary statistics
    total_pnl = daily_attr.sum()
    ann_return = daily_attr.mean() * ANNUAL_FACTOR
    ann_vol = daily_attr.std() * np.sqrt(ANNUAL_FACTOR)
    sharpe = ann_return / ann_vol.replace(0, np.nan)

    summary = pd.DataFrame({
        'Total PnL (cum ret)': total_pnl,
        'Ann. Return': ann_return,
        'Ann. Vol': ann_vol,
        'Sharpe': sharpe,
        'Pct of Total PnL': total_pnl / port_ret.sum() * 100
    })

    return daily_attr, cum_attr, summary


daily_attr, cum_attr, attr_summary = compute_pnl_attribution(
    exposures, ortho_basket_returns, portfolio_returns
)

# Show top contributors
top_n = 15
top_contributors = attr_summary['Total PnL (cum ret)'].abs().nlargest(top_n + 1).index
print(f'=== Top {top_n} PnL Contributors (by absolute contribution) ===')
display(attr_summary.loc[top_contributors].style
    .format({
        'Total PnL (cum ret)': '{:.4f}',
        'Ann. Return': '{:.4f}',
        'Ann. Vol': '{:.4f}',
        'Sharpe': '{:.2f}',
        'Pct of Total PnL': '{:.1f}%'
    })
    .background_gradient(subset=['Pct of Total PnL'], cmap='RdYlGn')
    .bar(subset=['Total PnL (cum ret)'], color=['#d65f5f', '#5fba7d'], align='mid')
    .set_caption('PnL Attribution Summary')
)

In [None]:
# --- Cumulative PnL Attribution Plot ---
fig, axes = plt.subplots(2, 1, figsize=(14, 10), gridspec_kw={'height_ratios': [2, 1]})

# Top chart: cumulative PnL of top contributors
ax = axes[0]
top_baskets = attr_summary.drop('Unexplained', errors='ignore')[
    'Total PnL (cum ret)'].abs().nlargest(8).index
cum_attr[list(top_baskets)].plot(ax=ax, linewidth=1.5)
cum_attr['Unexplained'].plot(ax=ax, linewidth=1.5, linestyle='--', color='grey', label='Unexplained')
portfolio_returns.cumsum().plot(ax=ax, linewidth=2.5, color='black', label='Total Portfolio')
ax.set_title('Cumulative PnL Attribution (Top 8 Baskets)', fontsize=14, fontweight='bold')
ax.set_ylabel('Cumulative Return')
ax.legend(loc='upper left', fontsize=8, ncol=2)
ax.axhline(0, color='black', linewidth=0.5)
ax.yaxis.set_major_formatter(mtick.PercentFormatter(1.0))

# Bottom chart: stacked area of attribution
ax2 = axes[1]
# Use rolling 21-day sum for smoother visualization
rolling_attr = daily_attr[list(top_baskets)].rolling(21).sum()
rolling_attr.plot.area(ax=ax2, linewidth=0, alpha=0.7, stacked=True)
ax2.set_title('Rolling 21-Day PnL Attribution (Top 8 Baskets)', fontsize=12)
ax2.set_ylabel('21-Day Return')
ax2.legend(loc='upper left', fontsize=7, ncol=2)
ax2.axhline(0, color='black', linewidth=0.5)
ax2.yaxis.set_major_formatter(mtick.PercentFormatter(1.0))

plt.tight_layout()
plt.show()

## 5. Risk Decomposition

We decompose portfolio risk along orthogonalized baskets using **Marginal Contribution to Total Risk (MCTR)** and **Absolute Contribution to Total Risk (ACTR)**.

Given exposures $\delta$ and the covariance matrix $\Sigma$ of orthogonalized basket returns:

$$\sigma_p = \sqrt{\delta^T \Sigma \delta}$$

$$\text{MCTR}_b = \frac{(\Sigma \delta)_b}{\sigma_p}, \qquad \text{ACTR}_b = \delta_b \cdot \text{MCTR}_b$$

ACTR sums to total portfolio variance: $\sum_b \text{ACTR}_b = \sigma_p^2 / \sigma_p = \sigma_p$.

In [None]:
def compute_risk_decomposition(exposures, ortho_ret, annual_factor=252):
    """
    Decompose portfolio risk along orthogonalized baskets.

    Parameters:
        exposures (pd.Series): B exposures
        ortho_ret (pd.DataFrame): T x B orthogonalized basket returns
        annual_factor (int): annualization factor

    Returns:
        risk_df (pd.DataFrame): risk decomposition per basket
        portfolio_vol (float): annualized portfolio vol from this model
    """
    cov_matrix = ortho_ret.cov() * annual_factor  # annualized
    delta = exposures.values
    cov = cov_matrix.values

    port_var = delta @ cov @ delta
    port_vol = np.sqrt(port_var)

    mctr = (cov @ delta) / port_vol  # marginal contribution
    actr = delta * mctr              # absolute contribution
    pct_risk = actr / port_vol * 100 # percentage of total risk

    risk_df = pd.DataFrame({
        'Exposure': exposures.values,
        'MCTR (ann)': mctr,
        'ACTR (ann)': actr,
        'Pct of Risk': pct_risk
    }, index=exposures.index)

    return risk_df, port_vol


risk_decomp, model_port_vol = compute_risk_decomposition(
    exposures, ortho_basket_returns
)

print(f'Annualized portfolio vol (basket model): {model_port_vol:.4f} ({model_port_vol:.2%})')
print(f'Sum of ACTR (should equal port vol):     {risk_decomp["ACTR (ann)"].sum():.4f}')
print(f'\n=== Top 15 Risk Contributors ===')

top_risk = risk_decomp['ACTR (ann)'].abs().nlargest(15).index
display(risk_decomp.loc[top_risk].style
    .format({
        'Exposure': '{:.4f}',
        'MCTR (ann)': '{:.4f}',
        'ACTR (ann)': '{:.4f}',
        'Pct of Risk': '{:.1f}%'
    })
    .background_gradient(subset=['Pct of Risk'], cmap='RdYlGn_r')
    .bar(subset=['ACTR (ann)'], color=['#d65f5f', '#5fba7d'], align='mid')
    .set_caption('Risk Decomposition by Orthogonalized Basket')
)

In [None]:
# --- Risk Decomposition Visualization ---
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Bar chart of top risk contributors
ax = axes[0]
top20_risk = risk_decomp['ACTR (ann)'].abs().nlargest(20)
colors = ['#d65f5f' if v < 0 else '#5fba7d'
          for v in risk_decomp.loc[top20_risk.index, 'ACTR (ann)']]
risk_decomp.loc[top20_risk.index, 'ACTR (ann)'].plot.barh(ax=ax, color=colors)
ax.set_title('Top 20 ACTR Contributors', fontsize=13, fontweight='bold')
ax.set_xlabel('Annualized ACTR')
ax.axvline(0, color='black', linewidth=0.5)
ax.invert_yaxis()

# Pie chart of absolute risk contribution
ax2 = axes[1]
top10_pct = risk_decomp['ACTR (ann)'].abs().nlargest(10)
other_pct = risk_decomp['ACTR (ann)'].abs().sum() - top10_pct.sum()
pie_data = pd.concat([top10_pct, pd.Series({'Other': other_pct})])
pie_data.plot.pie(ax=ax2, autopct='%1.1f%%', startangle=90, fontsize=8)
ax2.set_title('Risk Share (|ACTR|)', fontsize=13, fontweight='bold')
ax2.set_ylabel('')

plt.tight_layout()
plt.show()

## 6. Combined Dashboard

A single summary table combining PnL attribution and risk decomposition for the most important baskets.

In [None]:
# Merge attribution and risk into a single dashboard
dashboard = attr_summary.join(risk_decomp[['MCTR (ann)', 'ACTR (ann)', 'Pct of Risk']])
dashboard = dashboard.drop('Unexplained', errors='ignore')

# Add basket beta from orthogonalization
dashboard = dashboard.join(regression_stats[['beta', 'R2']])
dashboard.columns = [
    'Cum PnL', 'Ann Return', 'Ann Vol', 'Sharpe',
    'PnL %', 'MCTR', 'ACTR', 'Risk %',
    'Mkt Beta', 'Mkt R²'
]

# Sort by absolute PnL contribution
dashboard = dashboard.reindex(
    dashboard['Cum PnL'].abs().sort_values(ascending=False).index
)

print('=== Portfolio Attribution & Risk Dashboard (Top 20 Baskets) ===')
display(dashboard.head(20).style
    .format({
        'Cum PnL': '{:.4f}',
        'Ann Return': '{:.4f}',
        'Ann Vol': '{:.4f}',
        'Sharpe': '{:.2f}',
        'PnL %': '{:.1f}%',
        'MCTR': '{:.4f}',
        'ACTR': '{:.4f}',
        'Risk %': '{:.1f}%',
        'Mkt Beta': '{:.2f}',
        'Mkt R²': '{:.2f}'
    })
    .background_gradient(subset=['Sharpe'], cmap='RdYlGn', vmin=-2, vmax=2)
    .background_gradient(subset=['Mkt R²'], cmap='YlOrRd')
    .bar(subset=['Cum PnL'], color=['#d65f5f', '#5fba7d'], align='mid')
    .bar(subset=['ACTR'], color=['#d65f5f', '#5fba7d'], align='mid')
    .set_caption('Combined Attribution & Risk Dashboard')
    .set_table_styles([{
        'selector': 'th',
        'props': [('background-color', '#2c3e50'), ('color', 'white'),
                  ('font-size', '11px'), ('text-align', 'center')]
    }])
)

In [None]:
# --- Correlation heatmap of top orthogonalized baskets ---
top_baskets_for_corr = dashboard.head(15).index
corr_matrix = ortho_basket_returns[top_baskets_for_corr].corr()

fig, ax = plt.subplots(figsize=(10, 8))
mask = np.triu(np.ones_like(corr_matrix, dtype=bool), k=1)
sns.heatmap(corr_matrix, mask=mask, annot=True, fmt='.2f', cmap='RdBu_r',
            center=0, vmin=-1, vmax=1, square=True, linewidths=0.5,
            ax=ax, cbar_kws={'shrink': 0.8})
ax.set_title('Correlation of Orthogonalized Basket Returns (Top 15)', fontsize=13, fontweight='bold')
plt.tight_layout()
plt.show()

## 7. Scaling to 1000s of Baskets

The code above works for any number of baskets. Key considerations when scaling:

1. **Orthogonalization** is per-basket, so it scales linearly — O(B * T)
2. **Ridge regression** avoids multicollinearity issues with many correlated baskets. Tune `ridge_lambda` via cross-validation if needed
3. **Risk decomposition** requires a B x B covariance matrix. For B > 1000, consider:
   - Shrinkage estimators (Ledoit-Wolf)
   - Factor-based covariance estimation
   - Sparse covariance (only keep significant correlations)

Below we demonstrate with a larger basket count.

In [None]:
# --- Demonstrate scaling: 1000 baskets ---
N_BASKETS_LARGE = 1000
basket_names_lg = [f'B_{i:04d}' for i in range(N_BASKETS_LARGE)]
basket_weights_lg = {}
basket_market_map_lg = {}

for b in basket_names_lg:
    size = np.random.randint(10, 40)
    members = np.random.choice(stock_names, size=size, replace=False)
    basket_weights_lg[b] = pd.Series(1.0 / size, index=members)
    basket_market_map_lg[b] = 'SP500'

basket_ret_lg = pd.DataFrame(index=dates, columns=basket_names_lg, dtype=float)
for b in basket_names_lg:
    w = basket_weights_lg[b]
    basket_ret_lg[b] = stock_returns[w.index].values @ w.values

print(f'Large basket returns shape: {basket_ret_lg.shape}')

ortho_lg, stats_lg = orthogonalize_baskets(basket_ret_lg, market_returns, basket_market_map_lg)
exp_lg, r2_lg, resid_lg = estimate_basket_exposures(portfolio_returns, ortho_lg, ridge_lambda=1.0)

print(f'Orthogonalization complete. R² of model: {r2_lg:.4f}')
print(f'Non-trivial exposures (|exp| > 0.01): {(exp_lg.abs() > 0.01).sum()}')
print(f'Top 10 exposures:')
print(exp_lg.abs().nlargest(10))

In [None]:
# --- Distribution of basket betas and R² ---
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

axes[0].hist(stats_lg['beta'], bins=40, edgecolor='white', alpha=0.8, color='steelblue')
axes[0].axvline(stats_lg['beta'].mean(), color='red', linestyle='--',
                label=f'Mean={stats_lg["beta"].mean():.2f}')
axes[0].set_title('Distribution of Basket Market Betas', fontweight='bold')
axes[0].set_xlabel('Beta to Market')
axes[0].legend()

axes[1].hist(stats_lg['R2'], bins=40, edgecolor='white', alpha=0.8, color='darkorange')
axes[1].axvline(stats_lg['R2'].mean(), color='red', linestyle='--',
                label=f'Mean={stats_lg["R2"].mean():.2f}')
axes[1].set_title('Distribution of Market R² per Basket', fontweight='bold')
axes[1].set_xlabel('R²')
axes[1].legend()

plt.tight_layout()
plt.show()

## 8. Summary

### What we built

| Step | Method | Output |
|------|--------|--------|
| Orthogonalize baskets | OLS regression vs market index | Market-neutral basket returns |
| Estimate exposures | Ridge regression | Portfolio loading on each basket |
| PnL attribution | Exposure x orthogonalized return | Daily/cumulative PnL per basket |
| Risk decomposition | Covariance-based MCTR/ACTR | Risk contribution per basket |

### Key design choices

- **Orthogonalization** removes market beta from each basket, isolating the idiosyncratic component
- **Ridge regression** handles multicollinearity when the number of baskets is large
- **MCTR/ACTR** provides additive risk decomposition that sums to total portfolio risk

### To use with real data

1. Replace `stock_returns` with actual daily stock returns (DataFrame: dates x tickers)
2. Replace `basket_weights` with actual basket compositions (dict of Series)
3. Replace `market_returns` with actual index returns (handle multiple markets via `basket_market_map`)
4. Replace `portfolio_weights` / `portfolio_returns` with actual portfolio data
5. Tune `ridge_lambda` — larger values shrink exposures toward zero