# Panel Regression Basics: Random Intercepts and ICC

## Introduction to Panel Regression

**Panel regression** (also known as **mixed effects models**, **multilevel models**, or **hierarchical linear models**) is designed for data with a grouped/hierarchical structure where observations within groups are correlated.

### When to Use Panel Regression

Use `panel_reg()` instead of `linear_reg()` when:
- Your data has multiple groups/entities (stores, patients, countries, etc.)
- Observations within groups are likely correlated
- Groups have different baseline levels but similar slopes
- You want to borrow strength across groups (partial pooling)

### Key Concepts

**Random Intercepts**: Each group has its own baseline level
- Premium stores might have higher baseline sales than budget stores
- But price sensitivity (slope) is similar across stores
- More efficient than creating dummy variables for each group

**Intraclass Correlation Coefficient (ICC)**: Proportion of total variance due to group differences
- ICC < 0.3: Low (most variance within groups) - consider linear_reg()
- ICC 0.3-0.7: Medium (mixed) - panel_reg() beneficial
- ICC > 0.7: High (most variance between groups) - panel_reg() highly beneficial

**Variance Components**: Panel regression partitions variance into:
- **Between-group variance**: How much groups differ in their baselines
- **Within-group variance**: How much observations vary within each group

---

## Learning Objectives

In this notebook, you will:
1. Generate realistic multi-store sales data with group structure
2. Fit a basic panel regression model with random intercepts
3. Extract and interpret the three-DataFrame outputs
4. Calculate and interpret the ICC
5. Compare panel_reg() with linear_reg()
6. Make predictions for training stores and new stores
7. Visualize group-specific effects

## Setup and Imports

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

# py-tidymodels imports
from py_parsnip import panel_reg, linear_reg
from py_workflows import workflow

# Set random seed for reproducibility
np.random.seed(42)

# Plotting style
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (10, 6)
plt.rcParams['font.size'] = 11

## 1. Generate Multi-Store Sales Data

We'll create a realistic dataset with:
- **10 stores**: 5 premium stores (higher baseline sales) + 5 budget stores (lower baseline)
- **52 weeks** of data per store = 520 total observations
- **Variables**:
  - `sales`: Weekly sales (outcome)
  - `price`: Product price (predictor)
  - `advertising`: Advertising spend (predictor)
  - `store_id`: Store identifier (group)

**Data Generating Process**:
- Premium stores: Baseline ~150 units/week
- Budget stores: Baseline ~100 units/week
- Price effect: -2.5 (same for all stores)
- Advertising effect: +1.5 (same for all stores)
- Within-store noise: œÉ = 10

In [None]:
# Parameters
n_stores = 10
n_weeks = 52
n_total = n_stores * n_weeks

# Store types
store_types = ['Premium'] * 5 + ['Budget'] * 5
store_ids = [f'Store_{i+1:02d}' for i in range(n_stores)]

# Random intercepts (baseline sales by store)
# Premium stores: mean 150, Budget stores: mean 100
random_intercepts = np.array(
    [np.random.normal(150, 15) for _ in range(5)] +  # Premium
    [np.random.normal(100, 15) for _ in range(5)]    # Budget
)

# Fixed effects
beta_price = -2.5
beta_advertising = 1.5

# Generate data
data_list = []

for i, store_id in enumerate(store_ids):
    # Predictors
    price = np.random.uniform(15, 25, n_weeks)
    advertising = np.random.uniform(5, 15, n_weeks)
    week = np.arange(1, n_weeks + 1)
    
    # Sales = intercept + price*beta + advertising*beta + noise
    sales = (
        random_intercepts[i] + 
        beta_price * price + 
        beta_advertising * advertising + 
        np.random.normal(0, 10, n_weeks)
    )
    
    store_data = pd.DataFrame({
        'store_id': store_id,
        'store_type': store_types[i],
        'week': week,
        'price': price,
        'advertising': advertising,
        'sales': sales
    })
    
    data_list.append(store_data)

# Combine all stores
sales_data = pd.concat(data_list, ignore_index=True)

print(f"Dataset shape: {sales_data.shape}")
print(f"\nFirst few rows:")
print(sales_data.head(10))
print(f"\nSummary statistics:")
print(sales_data.describe())

In [None]:
# Visualize store-level variation
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Plot 1: Mean sales by store
mean_sales_by_store = sales_data.groupby(['store_id', 'store_type'])['sales'].mean().reset_index()
colors = ['steelblue' if t == 'Premium' else 'coral' for t in mean_sales_by_store['store_type']]

axes[0].bar(range(len(mean_sales_by_store)), mean_sales_by_store['sales'], color=colors)
axes[0].set_xlabel('Store ID')
axes[0].set_ylabel('Mean Sales')
axes[0].set_title('Mean Sales by Store (Premium vs Budget)')
axes[0].set_xticks(range(len(mean_sales_by_store)))
axes[0].set_xticklabels(mean_sales_by_store['store_id'], rotation=45)
axes[0].axhline(y=mean_sales_by_store['sales'].mean(), color='black', linestyle='--', label='Overall Mean')
axes[0].legend()

# Plot 2: Distribution by store type
sales_data.boxplot(column='sales', by='store_type', ax=axes[1])
axes[1].set_xlabel('Store Type')
axes[1].set_ylabel('Sales')
axes[1].set_title('Sales Distribution by Store Type')
plt.suptitle('')  # Remove auto title

plt.tight_layout()
plt.show()

print("\nüìä Key Observation: Premium stores have consistently higher sales than budget stores.")

## 2. Fit Basic Panel Model - Random Intercepts

We'll fit a panel regression model with:
- **Random intercepts**: Each store has its own baseline sales level
- **Fixed effects**: Price and advertising effects are the same across all stores

The model is: `sales ~ price + advertising` with random intercepts for `store_id`.

In [None]:
# Create panel regression specification
spec = panel_reg(random_effects="intercept")  # Random intercepts only (default)

# Create workflow
wf = workflow().add_formula("sales ~ price + advertising").add_model(spec)

# Fit using fit_global() to specify group column
fit = wf.fit_global(sales_data, group_col='store_id')

print("‚úÖ Panel regression model fitted successfully!")
print(f"\nModel type: {fit.spec.model_type}")
print(f"Random effects: {fit.spec.args['random_effects']}")

## 3. Extract and Interpret Three-DataFrame Outputs

Panel regression returns the standard three DataFrames:
1. **Outputs**: Observation-level predictions, actuals, residuals (with `group` column)
2. **Coefficients**: Fixed effects + variance components for random effects
3. **Stats**: Model-level metrics including ICC

### 3.1 Outputs DataFrame

In [None]:
outputs, coefficients, stats = fit.extract_outputs()

print("Outputs DataFrame (first 10 rows):")
print(outputs.head(10))
print(f"\nShape: {outputs.shape}")
print(f"Columns: {list(outputs.columns)}")
print(f"\nNote: 'group' column shows which store each observation belongs to.")

In [None]:
# Visualize fitted values by store
fig, axes = plt.subplots(2, 5, figsize=(16, 8), sharex=True, sharey=True)
axes = axes.flatten()

for i, store_id in enumerate(sorted(outputs['group'].unique())):
    store_data = outputs[outputs['group'] == store_id]
    
    axes[i].scatter(store_data['actuals'], store_data['fitted'], alpha=0.6, s=30)
    axes[i].plot([store_data['actuals'].min(), store_data['actuals'].max()],
                 [store_data['actuals'].min(), store_data['actuals'].max()],
                 'r--', lw=1)
    axes[i].set_title(store_id, fontsize=10)
    axes[i].set_xlabel('Actual Sales')
    axes[i].set_ylabel('Fitted Sales')
    axes[i].grid(True, alpha=0.3)

plt.suptitle('Actual vs Fitted Sales by Store', fontsize=14, y=1.00)
plt.tight_layout()
plt.show()

print("\nüìà Each store's predictions follow the 45¬∞ line closely, indicating good fit.")

### 3.2 Coefficients DataFrame

In [None]:
print("Coefficients DataFrame:")
print(coefficients)
print(f"\nFixed effects (type='fixed'):")
print(coefficients[coefficients['type'] == 'fixed'][['variable', 'coefficient', 'std_error', 'p_value']])
print(f"\nRandom effects variance components (type='random'):")
print(coefficients[coefficients['type'] == 'random'][['variable', 'coefficient', 'type']])
print(f"\nResidual variance (type='residual'):")
print(coefficients[coefficients['type'] == 'residual'][['variable', 'coefficient', 'type']])

In [None]:
# Interpret fixed effects
fixed_effects = coefficients[coefficients['type'] == 'fixed']
price_coef = fixed_effects[fixed_effects['variable'] == 'price']['coefficient'].values[0]
price_pval = fixed_effects[fixed_effects['variable'] == 'price']['p_value'].values[0]
ad_coef = fixed_effects[fixed_effects['variable'] == 'advertising']['coefficient'].values[0]
ad_pval = fixed_effects[fixed_effects['variable'] == 'advertising']['p_value'].values[0]

print("\n" + "="*60)
print("FIXED EFFECTS INTERPRETATION")
print("="*60)
print(f"\nüí∞ Price Effect: {price_coef:.3f}")
print(f"   ‚Üí For every $1 increase in price, sales decrease by {abs(price_coef):.2f} units")
print(f"   ‚Üí p-value: {price_pval:.4f} {'(significant)' if price_pval < 0.05 else '(not significant)'}")
print(f"\nüì¢ Advertising Effect: {ad_coef:.3f}")
print(f"   ‚Üí For every $1 increase in advertising, sales increase by {ad_coef:.2f} units")
print(f"   ‚Üí p-value: {ad_pval:.4f} {'(significant)' if ad_pval < 0.05 else '(not significant)'}")

# Interpret variance components
re_intercept_var = coefficients[coefficients['variable'] == 'RE: Intercept Variance']['coefficient'].values[0]
residual_var = coefficients[coefficients['variable'] == 'Residual Variance']['coefficient'].values[0]

print("\n" + "="*60)
print("VARIANCE COMPONENTS")
print("="*60)
print(f"\nüè™ Between-Store Variance (Random Intercept): {re_intercept_var:.3f}")
print(f"   ‚Üí Variance in baseline sales levels across stores")
print(f"\nüìä Within-Store Variance (Residual): {residual_var:.3f}")
print(f"   ‚Üí Variance in sales within each store (unexplained by model)")

### 3.3 Stats DataFrame and ICC

In [None]:
print("Stats DataFrame (key metrics):")
key_metrics = ['rmse', 'mae', 'r_squared', 'adj_r_squared', 'icc', 'n_groups', 'aic', 'bic']
stats_subset = stats[stats['metric'].isin(key_metrics)][['metric', 'value', 'split']]
print(stats_subset.to_string(index=False))

In [None]:
# Extract and interpret ICC
icc = stats[stats['metric'] == 'icc']['value'].values[0]
rmse = stats[stats['metric'] == 'rmse']['value'].values[0]
r_squared = stats[stats['metric'] == 'r_squared']['value'].values[0]
n_groups = int(stats[stats['metric'] == 'n_groups']['value'].values[0])

print("\n" + "="*60)
print("INTRACLASS CORRELATION COEFFICIENT (ICC)")
print("="*60)
print(f"\nICC = {icc:.4f} ({icc*100:.2f}%)")
print(f"\nInterpretation:")
print(f"  ‚Üí {icc*100:.1f}% of total variance is due to differences BETWEEN stores")
print(f"  ‚Üí {(1-icc)*100:.1f}% of total variance is due to variation WITHIN stores")

if icc < 0.3:
    print(f"\nüìä ICC < 0.3: LOW group effect")
    print(f"   Most variance is within stores. linear_reg() might be sufficient.")
elif icc < 0.7:
    print(f"\nüìä ICC 0.3-0.7: MEDIUM group effect")
    print(f"   Substantial between-store variation. panel_reg() is beneficial.")
else:
    print(f"\nüìä ICC > 0.7: HIGH group effect")
    print(f"   Most variance is between stores. panel_reg() is highly beneficial.")

print(f"\n" + "="*60)
print("MODEL PERFORMANCE")
print("="*60)
print(f"\nRMSE: {rmse:.3f} units")
print(f"R¬≤: {r_squared:.4f}")
print(f"Number of stores: {n_groups}")

In [None]:
# Visualize ICC
fig, ax = plt.subplots(figsize=(8, 6))

# Create pie chart
sizes = [icc, 1-icc]
labels = [f'Between-Store\nVariance\n({icc*100:.1f}%)', 
          f'Within-Store\nVariance\n({(1-icc)*100:.1f}%)']
colors = ['steelblue', 'lightcoral']
explode = (0.05, 0)

ax.pie(sizes, explode=explode, labels=labels, colors=colors, autopct='',
       shadow=True, startangle=90, textprops={'fontsize': 12, 'weight': 'bold'})
ax.set_title(f'Variance Decomposition (ICC = {icc:.4f})', fontsize=14, weight='bold')
plt.tight_layout()
plt.show()

print(f"\nüîë Key Takeaway: The ICC of {icc:.4f} indicates that {icc*100:.1f}% of sales variance")
print(f"   is explained by which store the observation comes from, justifying the use of panel_reg().")

## 4. Compare with Linear Regression

Let's compare panel_reg() with linear_reg() to see the differences:
- **Linear regression**: Would need 10 dummy variables (one per store) to capture store effects
- **Panel regression**: Uses 1 variance component to capture store effects

This is called **partial pooling** - panel_reg() borrows strength across groups.

In [None]:
# Fit linear regression with store dummies
spec_linear = linear_reg()
wf_linear = workflow().add_formula("sales ~ price + advertising + C(store_id)").add_model(spec_linear)
fit_linear = wf_linear.fit(sales_data)

print("‚úÖ Linear regression model fitted successfully!")

# Extract outputs
outputs_linear, coefficients_linear, stats_linear = fit_linear.extract_outputs()

# Compare number of parameters
n_params_panel = len(coefficients[coefficients['type'] == 'fixed'])
n_params_linear = len(coefficients_linear[coefficients_linear['type'] == 'fixed'])

print(f"\n" + "="*60)
print("MODEL COMPARISON")
print("="*60)
print(f"\nPanel Regression (Random Intercepts):")
print(f"  ‚Üí Fixed effects: {n_params_panel} parameters (Intercept, price, advertising)")
print(f"  ‚Üí Random effects: 1 variance component (store-level variation)")
print(f"  ‚Üí Total parameters: {n_params_panel + 1}")

print(f"\nLinear Regression (Fixed Effects):")
print(f"  ‚Üí Fixed effects: {n_params_linear} parameters (including 9 store dummies)")
print(f"  ‚Üí No random effects")
print(f"  ‚Üí Total parameters: {n_params_linear}")

print(f"\nüí° Efficiency Gain: Panel regression uses {n_params_linear - (n_params_panel + 1)} fewer parameters!")

In [None]:
# Compare performance metrics
rmse_panel = stats[stats['metric'] == 'rmse']['value'].values[0]
rmse_linear = stats_linear[stats_linear['metric'] == 'rmse']['value'].values[0]
r2_panel = stats[stats['metric'] == 'r_squared']['value'].values[0]
r2_linear = stats_linear[stats_linear['metric'] == 'r_squared']['value'].values[0]
aic_panel = stats[stats['metric'] == 'aic']['value'].values[0]
aic_linear = stats_linear[stats_linear['metric'] == 'aic']['value'].values[0]

comparison_df = pd.DataFrame({
    'Metric': ['RMSE', 'R¬≤', 'AIC', 'Parameters'],
    'Panel Regression': [f"{rmse_panel:.3f}", f"{r2_panel:.4f}", f"{aic_panel:.1f}", n_params_panel + 1],
    'Linear Regression': [f"{rmse_linear:.3f}", f"{r2_linear:.4f}", f"{aic_linear:.1f}", n_params_linear]
})

print("\nPerformance Comparison:")
print(comparison_df.to_string(index=False))

print(f"\nüéØ Both models have similar predictive performance, but panel_reg() is more")
print(f"   parsimonious (fewer parameters) and provides interpretable variance components.")

In [None]:
# Visualize coefficient comparison
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Panel regression coefficients
fixed_coefs = coefficients[coefficients['type'] == 'fixed'].copy()
fixed_coefs = fixed_coefs[fixed_coefs['variable'] != 'Intercept']  # Exclude intercept for clarity
axes[0].barh(fixed_coefs['variable'], fixed_coefs['coefficient'], color='steelblue')
axes[0].set_xlabel('Coefficient')
axes[0].set_title('Panel Regression: Fixed Effects')
axes[0].axvline(x=0, color='black', linestyle='--', linewidth=1)
axes[0].grid(True, alpha=0.3)

# Linear regression coefficients (exclude store dummies for clarity)
linear_main_coefs = coefficients_linear[
    (~coefficients_linear['variable'].str.contains('store_id', case=False, na=False)) &
    (coefficients_linear['variable'] != 'Intercept')
].copy()
axes[1].barh(linear_main_coefs['variable'], linear_main_coefs['coefficient'], color='coral')
axes[1].set_xlabel('Coefficient')
axes[1].set_title('Linear Regression: Main Effects')
axes[1].axvline(x=0, color='black', linestyle='--', linewidth=1)
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\nüìä Both models estimate similar price and advertising effects.")
print("   Panel regression captures store effects via variance components instead of dummies.")

## 5. Predictions: Training Stores vs New Stores

Panel regression handles predictions differently depending on whether the store was in the training data:

- **Training stores**: Uses fixed effects + store-specific random effect
- **New stores**: Uses fixed effects only (population average)

This is called **shrinkage** or **partial pooling**.

In [None]:
# Create test data for Store_01 (training store)
test_store_01 = pd.DataFrame({
    'store_id': ['Store_01'] * 10,
    'price': np.linspace(15, 25, 10),
    'advertising': [10.0] * 10
})

# Create test data for Store_NEW (new store)
test_store_new = pd.DataFrame({
    'store_id': ['Store_NEW'] * 10,
    'price': np.linspace(15, 25, 10),
    'advertising': [10.0] * 10
})

# Make predictions
preds_store_01 = fit.predict(test_store_01)
preds_store_new = fit.predict(test_store_new)

print("Predictions for Store_01 (training store):")
print(preds_store_01.head())
print(f"\nPredictions for Store_NEW (new store):")
print(preds_store_new.head())

In [None]:
# Visualize prediction differences
fig, ax = plt.subplots(figsize=(10, 6))

prices = test_store_01['price'].values
ax.plot(prices, preds_store_01['.pred'].values, 'o-', label='Store_01 (training)', linewidth=2, markersize=8)
ax.plot(prices, preds_store_new['.pred'].values, 's-', label='Store_NEW (new)', linewidth=2, markersize=8)

ax.set_xlabel('Price ($)', fontsize=12)
ax.set_ylabel('Predicted Sales', fontsize=12)
ax.set_title('Predictions: Training Store vs New Store', fontsize=14, weight='bold')
ax.legend(fontsize=11)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

mean_diff = (preds_store_01['.pred'].values - preds_store_new['.pred'].values).mean()
print(f"\nüìç Prediction Difference:")
print(f"   Training store (Store_01) predictions are {mean_diff:.2f} units higher on average.")
print(f"   This reflects Store_01's positive random intercept (above-average baseline).")
print(f"\n   New stores get the 'population average' prediction (no store-specific adjustment).")

## 6. Visualize Group-Specific Effects

Let's extract and visualize the random effects (group-specific intercepts) for each store.

In [None]:
# Extract random effects from the fitted model
random_effects_dict = fit.fit_data['random_effects']
group_col = fit.fit_data['group_col']

# Convert to DataFrame
random_effects_df = pd.DataFrame([
    {'store_id': group, 'random_intercept': effects['Group']}
    for group, effects in random_effects_dict.items()
]).sort_values('random_intercept', ascending=False).reset_index(drop=True)

# Add store type
random_effects_df = random_effects_df.merge(
    sales_data[['store_id', 'store_type']].drop_duplicates(),
    on='store_id'
)

print("Random Effects (Store-Specific Intercepts):")
print(random_effects_df.to_string(index=False))
print(f"\nPositive values: Store has higher baseline sales than average")
print(f"Negative values: Store has lower baseline sales than average")

In [None]:
# Visualize random effects
fig, ax = plt.subplots(figsize=(10, 6))

colors = ['steelblue' if t == 'Premium' else 'coral' for t in random_effects_df['store_type']]
bars = ax.barh(random_effects_df['store_id'], random_effects_df['random_intercept'], color=colors)

ax.axvline(x=0, color='black', linestyle='--', linewidth=2, label='Population Average')
ax.set_xlabel('Random Intercept (Deviation from Mean)', fontsize=12)
ax.set_ylabel('Store ID', fontsize=12)
ax.set_title('Store-Specific Random Intercepts (Panel Regression)', fontsize=14, weight='bold')
ax.grid(True, alpha=0.3, axis='x')
ax.legend(fontsize=11)

# Add custom legend for store types
from matplotlib.patches import Patch
legend_elements = [
    Patch(facecolor='steelblue', label='Premium'),
    Patch(facecolor='coral', label='Budget')
]
ax.legend(handles=legend_elements, loc='lower right', fontsize=10)

plt.tight_layout()
plt.show()

print("\nüèÜ Premium stores (blue) have positive random intercepts (higher baseline sales).")
print("üìâ Budget stores (coral) have negative random intercepts (lower baseline sales).")

## 7. Residual Diagnostics

In [None]:
# Create comprehensive residual diagnostic plots
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

residuals = outputs['residuals'].values
fitted = outputs['fitted'].values

# Plot 1: Residuals vs Fitted
axes[0, 0].scatter(fitted, residuals, alpha=0.5, s=20)
axes[0, 0].axhline(y=0, color='red', linestyle='--', linewidth=2)
axes[0, 0].set_xlabel('Fitted Values')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].set_title('Residuals vs Fitted')
axes[0, 0].grid(True, alpha=0.3)

# Plot 2: Q-Q Plot
stats.probplot(residuals, dist="norm", plot=axes[0, 1])
axes[0, 1].set_title('Normal Q-Q Plot')
axes[0, 1].grid(True, alpha=0.3)

# Plot 3: Histogram of residuals
axes[1, 0].hist(residuals, bins=30, edgecolor='black', alpha=0.7)
axes[1, 0].set_xlabel('Residuals')
axes[1, 0].set_ylabel('Frequency')
axes[1, 0].set_title('Distribution of Residuals')
axes[1, 0].axvline(x=0, color='red', linestyle='--', linewidth=2)
axes[1, 0].grid(True, alpha=0.3)

# Plot 4: Residuals by group (boxplot)
outputs.boxplot(column='residuals', by='group', ax=axes[1, 1])
axes[1, 1].set_xlabel('Store ID')
axes[1, 1].set_ylabel('Residuals')
axes[1, 1].set_title('Residuals by Store')
axes[1, 1].set_xticklabels(axes[1, 1].get_xticklabels(), rotation=45)
plt.suptitle('')  # Remove auto title

plt.tight_layout()
plt.show()

print("\n‚úÖ Residual diagnostics look good:")
print("   - Residuals vs Fitted: No clear patterns (homoskedasticity)")
print("   - Q-Q Plot: Residuals approximately normal")
print("   - Histogram: Centered at zero, roughly symmetric")
print("   - By Store: Similar residual distributions across stores")

## Summary and Key Takeaways

### What We Learned

1. **Panel regression** is designed for grouped/hierarchical data where observations within groups are correlated.

2. **Random intercepts** allow each group to have its own baseline level while sharing common slopes.

3. **ICC (Intraclass Correlation Coefficient)** quantifies the proportion of variance due to group differences:
   - ICC < 0.3: Consider linear_reg()
   - ICC 0.3-0.7: panel_reg() beneficial
   - ICC > 0.7: panel_reg() highly beneficial

4. **Variance components** partition total variance into:
   - Between-group variance (random intercept variance)
   - Within-group variance (residual variance)

5. **Partial pooling**: Panel regression borrows strength across groups, using fewer parameters than fixed effects models.

6. **Predictions** differ for training vs new groups:
   - Training groups: Fixed + random effects
   - New groups: Fixed effects only (population average)

### When to Use panel_reg()

‚úÖ **Use panel_reg() when:**
- Data has multiple groups/entities
- Observations within groups are correlated
- Groups have different baselines but similar slopes
- You want to estimate group-level variation
- You need predictions for new groups

‚ùå **Don't use panel_reg() when:**
- No clear group structure
- Very few groups (< 5)
- Groups are too small (< 3 observations per group)
- ICC is very low (< 0.1)

### Next Steps

In the next notebook, we'll explore:
- **Random slopes**: Allowing groups to have different slopes (not just intercepts)
- **Variance components**: Interpreting covariances between random effects
- **Model comparison**: When to use random slopes vs intercepts only