# Pick Rate × Win Rate Interaction Analysis for Champion Balance Prediction

## Executive Summary

This notebook investigates whether **interaction effects** between Pick Rate and Win Rate improve our ability to predict League of Legends champion balance changes (buffs/nerfs). While previous models in this project used these features independently, we hypothesize that their **combined effect** captures crucial information about meta dominance that Riot Games considers when making balance decisions.

---

## 1. Introduction & Motivation

### 1.1 Why Interaction Effects Matter for Game Balance

Riot Games' balance team doesn't evaluate champions in isolation. A champion's **win rate alone** doesn't tell the full story:

| Scenario | Win Rate | Pick Rate | Balance Implication |
|----------|----------|-----------|--------------------|
| **Meta Dominator** | High (>52%) | High (>10%) | **Prime nerf target** - Strong AND popular |
| **Hidden OP** | High (>52%) | Low (<3%) | Often ignored - only mains play well |
| **Popular but Weak** | Low (<48%) | High (>10%) | **Buff candidate** - Fun but underperforming |
| **Niche Pick** | Low (<48%) | Low (<3%) | Low priority - limited player impact |

The key insight: **Pick Rate × Win Rate interaction** captures meta dominance better than either metric alone.

### 1.2 Research Hypotheses

**H1 (Primary):** Adding interaction terms (Pick Rate × Win Rate) will improve model performance over baseline models using only main effects.

**H2 (Secondary):** Champions in the "High Win Rate + High Pick Rate" quadrant will have significantly higher nerf probability than other quadrants.

**H3 (Exploratory):** The interaction effect will be among the top 3 most important features according to SHAP values.

### 1.3 Methodology Overview

1. **Baseline Models**: Train XGBoost, Logistic Regression, Random Forest with original features only
2. **Interaction Models**: Add engineered interaction features and quadrant classifications
3. **Evaluation**: Compare using accuracy, F1-score, and confusion matrices
4. **Interpretation**: Use SHAP values and partial dependence plots to understand feature contributions

---

## 2. Environment Setup & Dependencies

We import all necessary libraries upfront for reproducibility. Key packages:
- **pandas/numpy**: Data manipulation
- **scikit-learn**: Model training and evaluation
- **xgboost**: Gradient boosting classifier
- **shap**: Model interpretation
- **matplotlib/seaborn**: Visualization

In [None]:
# Core data science stack
import numpy as np
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Machine Learning
from sklearn.model_selection import train_test_split, cross_val_score, StratifiedKFold, GridSearchCV
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import (
    accuracy_score, f1_score, classification_report, 
    confusion_matrix, ConfusionMatrixDisplay
)

# XGBoost
try:
    from xgboost import XGBClassifier
    XGBOOST_AVAILABLE = True
except ImportError:
    print("XGBoost not available. Installing...")
    import subprocess
    subprocess.check_call(['pip', 'install', 'xgboost', '-q'])
    from xgboost import XGBClassifier
    XGBOOST_AVAILABLE = True

# SHAP for model interpretation
try:
    import shap
    SHAP_AVAILABLE = True
except ImportError:
    print("SHAP not available. Installing...")
    import subprocess
    subprocess.check_call(['pip', 'install', 'shap', '-q'])
    import shap
    SHAP_AVAILABLE = True

# Set random seed for reproducibility
RANDOM_STATE = 42
np.random.seed(RANDOM_STATE)

# Visualization settings
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (10, 6)
plt.rcParams['font.size'] = 11

print("Environment setup complete.")
print(f"Random State: {RANDOM_STATE}")

## 3. Data Loading & Initial Exploration

We'll use the combined Season 11 dataset which contains champion statistics across different rank tiers, merged with balance change labels. This dataset represents the temporal prediction problem: **using patch N statistics to predict patch N+1 changes**.

In [None]:
# Load the combined Season 11 dataset
# This contains all champions across patches with buff/nerf/no change labels
DATA_PATH = '../data/processed/S11combined.csv'

df = pd.read_csv(DATA_PATH)

print(f"Dataset Shape: {df.shape[0]:,} rows × {df.shape[1]} columns")
print(f"\nColumn Names: {list(df.columns)}")
df.head(10)

### 3.1 Data Schema Understanding

Let's examine the data types, missing values, and basic statistics to ensure data quality before feature engineering.

In [None]:
# Data type and missing value inspection
print("=" * 60)
print("DATA TYPES AND MISSING VALUES")
print("=" * 60)
print(df.info())

print("\n" + "=" * 60)
print("MISSING VALUES PER COLUMN")
print("=" * 60)
missing = df.isnull().sum()
print(missing[missing > 0] if missing.sum() > 0 else "No missing values detected.")

print("\n" + "=" * 60)
print("NUMERICAL FEATURE STATISTICS")
print("=" * 60)
df.describe().round(2)

### 3.2 Target Variable Distribution

Understanding class distribution is critical for:
1. Identifying class imbalance (which affects model training)
2. Choosing appropriate evaluation metrics
3. Deciding on stratification strategies

In [None]:
# Target variable distribution analysis
print("TARGET VARIABLE DISTRIBUTION")
print("=" * 60)

change_counts = df['change'].value_counts()
change_pcts = df['change'].value_counts(normalize=True) * 100

target_summary = pd.DataFrame({
    'Count': change_counts,
    'Percentage': change_pcts.round(2)
})
print(target_summary)

# Visualization
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Bar chart
colors = {'buff': '#2ecc71', 'nerf': '#e74c3c', 'no change': '#95a5a6', 'tweak': '#f39c12'}
color_list = [colors.get(x, '#333333') for x in change_counts.index]

axes[0].bar(change_counts.index, change_counts.values, color=color_list, edgecolor='black')
axes[0].set_xlabel('Balance Change Type')
axes[0].set_ylabel('Count')
axes[0].set_title('Distribution of Balance Changes (Season 11)')

# Add count labels on bars
for i, (idx, val) in enumerate(zip(change_counts.index, change_counts.values)):
    axes[0].text(i, val + 50, f'{val:,}', ha='center', fontweight='bold')

# Pie chart
axes[1].pie(change_counts.values, labels=change_counts.index, autopct='%1.1f%%',
            colors=color_list, explode=[0.02]*len(change_counts), startangle=90)
axes[1].set_title('Proportion of Balance Changes')

plt.tight_layout()
plt.savefig('../reports/figures/target_distribution.png', dpi=150, bbox_inches='tight')
plt.show()

print(f"\nFigure saved to: ../reports/figures/target_distribution.png")

### 3.3 Interpretation: Class Imbalance

**Key Observation:** The dataset is heavily imbalanced with "no change" being the majority class. This is expected—most champions don't receive changes in any given patch.

**Implications for Modeling:**
1. Accuracy alone will be misleading (predicting "no change" always would yield high accuracy)
2. We should focus on **F1-score** (harmonic mean of precision and recall)
3. Stratified sampling is essential for train/test splits
4. Consider class weights or oversampling techniques if needed

---

## 4. Feature Engineering: Creating Interaction Terms

This is the core innovation of this analysis. We'll create several interaction features:

1. **Multiplicative Interaction**: `pickrate × winrate` - Raw interaction term
2. **Quadrant Classification**: Categorical feature based on High/Low thresholds
3. **Normalized Interaction**: Scaled interaction for model stability
4. **Deviation Scores**: How far a champion deviates from "balanced" (50% WR)

In [None]:
# Create a working copy of the dataframe
df_features = df.copy()

# ============================================
# INTERACTION FEATURE 1: Multiplicative Term
# ============================================
# Raw interaction: captures combined effect of popularity and strength
df_features['wr_x_pr'] = df_features['winrate'] * df_features['pickrate']

# ============================================
# INTERACTION FEATURE 2: Quadrant Classification
# ============================================
# Define thresholds based on domain knowledge and data distribution
WR_THRESHOLD = 50.0  # Balanced win rate
PR_THRESHOLD = df_features['pickrate'].median()  # Median pick rate as threshold

print(f"Win Rate Threshold: {WR_THRESHOLD}%")
print(f"Pick Rate Threshold (median): {PR_THRESHOLD:.2f}%")

def classify_quadrant(row):
    """Classify champion into balance quadrant based on WR/PR thresholds."""
    high_wr = row['winrate'] >= WR_THRESHOLD
    high_pr = row['pickrate'] >= PR_THRESHOLD
    
    if high_wr and high_pr:
        return 'High_WR_High_PR'  # Meta dominator - nerf candidate
    elif high_wr and not high_pr:
        return 'High_WR_Low_PR'   # Hidden strong - often ignored
    elif not high_wr and high_pr:
        return 'Low_WR_High_PR'   # Popular but weak - buff candidate
    else:
        return 'Low_WR_Low_PR'    # Niche pick - low priority

df_features['quadrant'] = df_features.apply(classify_quadrant, axis=1)

# ============================================
# INTERACTION FEATURE 3: Win Rate Deviation
# ============================================
# How far from 50% (balanced) is the champion?
df_features['wr_deviation'] = df_features['winrate'] - 50.0
df_features['wr_deviation_abs'] = np.abs(df_features['wr_deviation'])

# ============================================
# INTERACTION FEATURE 4: Weighted Deviation
# ============================================
# Deviation weighted by pick rate (popular imbalanced champs matter more)
df_features['weighted_deviation'] = df_features['wr_deviation'] * df_features['pickrate']

# ============================================
# INTERACTION FEATURE 5: Ban-adjusted interaction
# ============================================
# High ban rate often indicates frustration, which can lead to nerfs
df_features['wr_x_br'] = df_features['winrate'] * df_features['banrate']
df_features['pr_x_br'] = df_features['pickrate'] * df_features['banrate']

print("\n" + "=" * 60)
print("ENGINEERED FEATURES SUMMARY")
print("=" * 60)
new_features = ['wr_x_pr', 'quadrant', 'wr_deviation', 'wr_deviation_abs', 
                'weighted_deviation', 'wr_x_br', 'pr_x_br']
print(f"New features created: {new_features}")
print(f"\nDataset now has {df_features.shape[1]} columns")

df_features[['champ', 'role', 'winrate', 'pickrate', 'banrate', 'wr_x_pr', 'quadrant', 'change']].head(10)

### 4.1 Quadrant Distribution Analysis

Let's examine how champions are distributed across our quadrants and how this relates to actual balance changes.

In [None]:
# Quadrant distribution
print("QUADRANT DISTRIBUTION")
print("=" * 60)
quadrant_counts = df_features['quadrant'].value_counts()
print(quadrant_counts)

# Cross-tabulation: Quadrant vs Change
print("\n" + "=" * 60)
print("QUADRANT × BALANCE CHANGE CROSS-TABULATION")
print("=" * 60)
crosstab = pd.crosstab(df_features['quadrant'], df_features['change'], margins=True)
print(crosstab)

# Percentage within each quadrant
print("\n" + "=" * 60)
print("BALANCE CHANGE % WITHIN EACH QUADRANT")
print("=" * 60)
crosstab_pct = pd.crosstab(df_features['quadrant'], df_features['change'], normalize='index') * 100
print(crosstab_pct.round(2))

### 4.2 Interpretation: Quadrant Analysis

This cross-tabulation reveals how Riot's balance philosophy manifests in the data. We expect:
- **High_WR_High_PR**: Higher nerf rates (meta dominators need adjustment)
- **Low_WR_High_PR**: Higher buff rates (popular champions shouldn't feel weak)
- **Low_WR_Low_PR**: Often ignored (low player impact)

Let's visualize this relationship more clearly.

In [None]:
# Visualization: Balance changes by quadrant
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Stacked bar chart
crosstab_plot = crosstab_pct.drop('All', errors='ignore')
crosstab_plot.plot(kind='bar', stacked=True, ax=axes[0], 
                   color=['#2ecc71', '#e74c3c', '#95a5a6', '#f39c12'],
                   edgecolor='black')
axes[0].set_xlabel('Win Rate / Pick Rate Quadrant')
axes[0].set_ylabel('Percentage (%)')
axes[0].set_title('Balance Change Distribution by Quadrant')
axes[0].legend(title='Change Type', bbox_to_anchor=(1.02, 1))
axes[0].tick_params(axis='x', rotation=45)

# Heatmap of buff/nerf rates
# Focus on buff and nerf only
buff_nerf_pct = crosstab_pct[['buff', 'nerf']].copy() if 'buff' in crosstab_pct.columns else crosstab_pct
sns.heatmap(buff_nerf_pct, annot=True, fmt='.1f', cmap='RdYlGn_r', 
            ax=axes[1], cbar_kws={'label': 'Percentage (%)'})
axes[1].set_title('Buff/Nerf Rates by Quadrant (Heatmap)')
axes[1].set_xlabel('Change Type')
axes[1].set_ylabel('Quadrant')

plt.tight_layout()
plt.savefig('../reports/figures/quadrant_analysis.png', dpi=150, bbox_inches='tight')
plt.show()

print(f"\nFigure saved to: ../reports/figures/quadrant_analysis.png")

---

## 5. Exploratory Data Analysis: Visualizing Interactions

Before building models, let's visualize the relationship between Pick Rate, Win Rate, and balance outcomes to build intuition.

In [None]:
# Scatter plot: Win Rate vs Pick Rate, colored by change outcome
fig, axes = plt.subplots(1, 2, figsize=(16, 7))

# Filter to buff/nerf only for clearer visualization
df_buff_nerf = df_features[df_features['change'].isin(['buff', 'nerf'])].copy()

# Plot 1: All data points
scatter_colors = {'buff': '#2ecc71', 'nerf': '#e74c3c', 'no change': '#bdc3c7', 'tweak': '#f39c12'}
for change_type in df_features['change'].unique():
    subset = df_features[df_features['change'] == change_type]
    alpha = 0.8 if change_type in ['buff', 'nerf'] else 0.3
    axes[0].scatter(subset['pickrate'], subset['winrate'], 
                   c=scatter_colors.get(change_type, '#333'),
                   label=change_type, alpha=alpha, s=30, edgecolors='white', linewidth=0.5)

# Add quadrant lines
axes[0].axhline(y=WR_THRESHOLD, color='black', linestyle='--', alpha=0.5, label=f'WR={WR_THRESHOLD}%')
axes[0].axvline(x=PR_THRESHOLD, color='black', linestyle='--', alpha=0.5, label=f'PR={PR_THRESHOLD:.1f}%')

# Add quadrant labels
axes[0].text(PR_THRESHOLD + 5, WR_THRESHOLD + 3, 'NERF ZONE\n(High WR, High PR)', fontsize=9, color='#c0392b', fontweight='bold')
axes[0].text(1, WR_THRESHOLD + 3, 'Hidden Strong', fontsize=9, color='#27ae60')
axes[0].text(PR_THRESHOLD + 5, WR_THRESHOLD - 5, 'BUFF ZONE\n(Low WR, High PR)', fontsize=9, color='#27ae60', fontweight='bold')
axes[0].text(1, WR_THRESHOLD - 5, 'Low Priority', fontsize=9, color='gray')

axes[0].set_xlabel('Pick Rate (%)', fontsize=12)
axes[0].set_ylabel('Win Rate (%)', fontsize=12)
axes[0].set_title('Champion Balance Landscape: Win Rate vs Pick Rate', fontsize=14)
axes[0].legend(loc='upper right')
axes[0].set_xlim(0, df_features['pickrate'].max() * 1.1)
axes[0].set_ylim(df_features['winrate'].min() - 2, df_features['winrate'].max() + 2)

# Plot 2: Buff vs Nerf only with interaction term as size
if len(df_buff_nerf) > 0:
    for change_type in ['buff', 'nerf']:
        subset = df_buff_nerf[df_buff_nerf['change'] == change_type]
        # Size based on interaction term (normalized)
        sizes = (subset['wr_x_pr'] / subset['wr_x_pr'].max()) * 200 + 20
        axes[1].scatter(subset['pickrate'], subset['winrate'],
                       c=scatter_colors[change_type],
                       s=sizes, alpha=0.7, label=change_type,
                       edgecolors='black', linewidth=0.5)

    axes[1].axhline(y=WR_THRESHOLD, color='black', linestyle='--', alpha=0.5)
    axes[1].axvline(x=PR_THRESHOLD, color='black', linestyle='--', alpha=0.5)
    axes[1].set_xlabel('Pick Rate (%)', fontsize=12)
    axes[1].set_ylabel('Win Rate (%)', fontsize=12)
    axes[1].set_title('Buff vs Nerf Champions\n(Size = WR × PR Interaction)', fontsize=14)
    axes[1].legend(loc='upper right')

plt.tight_layout()
plt.savefig('../reports/figures/wr_pr_scatter.png', dpi=150, bbox_inches='tight')
plt.show()

print(f"\nFigure saved to: ../reports/figures/wr_pr_scatter.png")

### 5.1 Distribution of Interaction Term by Class

Let's examine whether the interaction term (WR × PR) differs significantly between buff, nerf, and no change classes.

In [None]:
# Distribution of interaction term by change type
fig, axes = plt.subplots(1, 3, figsize=(16, 5))

# Box plot
change_order = ['buff', 'nerf', 'no change', 'tweak']
existing_changes = [c for c in change_order if c in df_features['change'].unique()]

sns.boxplot(data=df_features, x='change', y='wr_x_pr', order=existing_changes,
            palette=scatter_colors, ax=axes[0])
axes[0].set_xlabel('Balance Change')
axes[0].set_ylabel('Win Rate × Pick Rate')
axes[0].set_title('Interaction Term Distribution by Change Type')

# Violin plot for more detail
sns.violinplot(data=df_features, x='change', y='wr_x_pr', order=existing_changes,
               palette=scatter_colors, ax=axes[1], inner='quartile')
axes[1].set_xlabel('Balance Change')
axes[1].set_ylabel('Win Rate × Pick Rate')
axes[1].set_title('Interaction Term Distribution (Violin Plot)')

# Histogram overlays
for change_type in ['buff', 'nerf']:
    subset = df_features[df_features['change'] == change_type]['wr_x_pr']
    axes[2].hist(subset, bins=30, alpha=0.5, label=change_type, 
                 color=scatter_colors[change_type], edgecolor='black')
axes[2].set_xlabel('Win Rate × Pick Rate')
axes[2].set_ylabel('Frequency')
axes[2].set_title('Interaction Term: Buff vs Nerf')
axes[2].legend()

plt.tight_layout()
plt.savefig('../reports/figures/interaction_distributions.png', dpi=150, bbox_inches='tight')
plt.show()

# Statistical summary
print("\n" + "=" * 60)
print("INTERACTION TERM STATISTICS BY CHANGE TYPE")
print("=" * 60)
print(df_features.groupby('change')['wr_x_pr'].describe().round(2))

### 5.2 Correlation Analysis

Understanding feature correlations helps us:
1. Identify potential multicollinearity issues
2. Understand feature relationships
3. Validate our interaction term construction

In [None]:
# Correlation heatmap of numerical features
numerical_cols = ['winrate', 'pickrate', 'banrate', 'wr_x_pr', 'wr_deviation', 
                  'weighted_deviation', 'wr_x_br', 'pr_x_br']
existing_num_cols = [c for c in numerical_cols if c in df_features.columns]

corr_matrix = df_features[existing_num_cols].corr()

fig, ax = plt.subplots(figsize=(12, 10))
mask = np.triu(np.ones_like(corr_matrix, dtype=bool))
sns.heatmap(corr_matrix, annot=True, fmt='.2f', cmap='coolwarm', center=0,
            mask=mask, square=True, ax=ax, cbar_kws={'label': 'Correlation'})
ax.set_title('Feature Correlation Matrix', fontsize=14)

plt.tight_layout()
plt.savefig('../reports/figures/correlation_heatmap.png', dpi=150, bbox_inches='tight')
plt.show()

print(f"\nFigure saved to: ../reports/figures/correlation_heatmap.png")

### 5.3 EDA Key Findings

Before proceeding to modeling, let's summarize our exploratory findings:

1. **Quadrant Distribution**: Champions are distributed across all quadrants, with observable patterns in buff/nerf rates
2. **Interaction Term**: Shows differentiation between buff and nerf classes
3. **Correlations**: Pick rate and win rate are weakly correlated, suggesting the interaction term captures new information

---

## 6. Data Preparation for Modeling

Now we'll prepare our data for machine learning:
1. Encode categorical variables
2. Define feature sets (baseline vs. interaction)
3. Create stratified train/test splits

In [None]:
# Encode categorical variables
le_rank = LabelEncoder()
le_quadrant = LabelEncoder()
le_role = LabelEncoder()
le_change = LabelEncoder()

df_model = df_features.copy()

# Encode rank tier
df_model['rank_encoded'] = le_rank.fit_transform(df_model['rank'])

# Encode quadrant
df_model['quadrant_encoded'] = le_quadrant.fit_transform(df_model['quadrant'])

# Encode role
df_model['role_encoded'] = le_role.fit_transform(df_model['role'])

# Encode target variable
df_model['change_encoded'] = le_change.fit_transform(df_model['change'])

print("ENCODING MAPPINGS")
print("=" * 60)
print(f"Rank: {dict(zip(le_rank.classes_, range(len(le_rank.classes_))))}")
print(f"Quadrant: {dict(zip(le_quadrant.classes_, range(len(le_quadrant.classes_))))}")
print(f"Role: {dict(zip(le_role.classes_, range(len(le_role.classes_))))}")
print(f"Change (Target): {dict(zip(le_change.classes_, range(len(le_change.classes_))))}")

# Define feature sets
BASELINE_FEATURES = ['winrate', 'pickrate', 'banrate', 'rank_encoded']
INTERACTION_FEATURES = BASELINE_FEATURES + ['wr_x_pr', 'quadrant_encoded', 'wr_deviation', 
                                             'weighted_deviation', 'wr_x_br', 'pr_x_br']

print(f"\nBaseline Features ({len(BASELINE_FEATURES)}): {BASELINE_FEATURES}")
print(f"Interaction Features ({len(INTERACTION_FEATURES)}): {INTERACTION_FEATURES}")

In [None]:
# Prepare feature matrices and target vector
X_baseline = df_model[BASELINE_FEATURES].values
X_interaction = df_model[INTERACTION_FEATURES].values
y = df_model['change_encoded'].values

print(f"Baseline feature matrix shape: {X_baseline.shape}")
print(f"Interaction feature matrix shape: {X_interaction.shape}")
print(f"Target vector shape: {y.shape}")

# Stratified train-test split
TEST_SIZE = 0.3

X_train_base, X_test_base, y_train, y_test = train_test_split(
    X_baseline, y, test_size=TEST_SIZE, random_state=RANDOM_STATE, stratify=y
)

X_train_int, X_test_int, _, _ = train_test_split(
    X_interaction, y, test_size=TEST_SIZE, random_state=RANDOM_STATE, stratify=y
)

print(f"\nTrain set size: {len(y_train):,} ({(1-TEST_SIZE)*100:.0f}%)")
print(f"Test set size: {len(y_test):,} ({TEST_SIZE*100:.0f}%)")

# Verify stratification
print(f"\nClass distribution in training set:")
unique, counts = np.unique(y_train, return_counts=True)
for u, c in zip(unique, counts):
    print(f"  {le_change.inverse_transform([u])[0]}: {c} ({c/len(y_train)*100:.1f}%)")

---

## 7. Baseline Models (Without Interaction Terms)

We'll establish baseline performance using three different algorithms:
1. **XGBoost**: Gradient boosting ensemble
2. **Random Forest**: Bagging ensemble
3. **Logistic Regression**: Linear baseline

These baselines use only the original features: win rate, pick rate, ban rate, and rank tier.

In [None]:
# Initialize baseline models
baseline_models = {
    'XGBoost': XGBClassifier(
        n_estimators=100,
        max_depth=6,
        learning_rate=0.1,
        random_state=RANDOM_STATE,
        use_label_encoder=False,
        eval_metric='mlogloss'
    ),
    'Random Forest': RandomForestClassifier(
        n_estimators=100,
        max_depth=10,
        random_state=RANDOM_STATE,
        n_jobs=-1
    ),
    'Logistic Regression': LogisticRegression(
        max_iter=1000,
        random_state=RANDOM_STATE,
        multi_class='multinomial'
    )
}

# Scale features for Logistic Regression
scaler = StandardScaler()
X_train_base_scaled = scaler.fit_transform(X_train_base)
X_test_base_scaled = scaler.transform(X_test_base)

# Train and evaluate baseline models
print("BASELINE MODEL TRAINING (Original Features Only)")
print("=" * 60)

baseline_results = {}

for name, model in baseline_models.items():
    print(f"\nTraining {name}...")
    
    # Use scaled features for Logistic Regression
    if name == 'Logistic Regression':
        X_tr, X_te = X_train_base_scaled, X_test_base_scaled
    else:
        X_tr, X_te = X_train_base, X_test_base
    
    # Train
    model.fit(X_tr, y_train)
    
    # Predict
    y_pred = model.predict(X_te)
    
    # Evaluate
    accuracy = accuracy_score(y_test, y_pred)
    f1_macro = f1_score(y_test, y_pred, average='macro')
    f1_weighted = f1_score(y_test, y_pred, average='weighted')
    
    # Cross-validation
    cv_scores = cross_val_score(model, X_tr, y_train, cv=5, scoring='accuracy')
    
    baseline_results[name] = {
        'model': model,
        'accuracy': accuracy,
        'f1_macro': f1_macro,
        'f1_weighted': f1_weighted,
        'cv_mean': cv_scores.mean(),
        'cv_std': cv_scores.std(),
        'predictions': y_pred
    }
    
    print(f"  Accuracy: {accuracy:.4f}")
    print(f"  F1 (macro): {f1_macro:.4f}")
    print(f"  F1 (weighted): {f1_weighted:.4f}")
    print(f"  CV Accuracy: {cv_scores.mean():.4f} (+/- {cv_scores.std()*2:.4f})")

### 7.1 Baseline Feature Importance

Let's examine which original features are most important for the baseline XGBoost model.

In [None]:
# Feature importance for baseline XGBoost
xgb_baseline = baseline_results['XGBoost']['model']
importance_baseline = pd.DataFrame({
    'feature': BASELINE_FEATURES,
    'importance': xgb_baseline.feature_importances_
}).sort_values('importance', ascending=True)

fig, ax = plt.subplots(figsize=(10, 5))
ax.barh(importance_baseline['feature'], importance_baseline['importance'], color='steelblue', edgecolor='black')
ax.set_xlabel('Feature Importance (Gain)')
ax.set_title('Baseline XGBoost: Feature Importance')

for i, (feat, imp) in enumerate(zip(importance_baseline['feature'], importance_baseline['importance'])):
    ax.text(imp + 0.01, i, f'{imp:.3f}', va='center')

plt.tight_layout()
plt.savefig('../reports/figures/baseline_feature_importance.png', dpi=150, bbox_inches='tight')
plt.show()

print("\nBaseline Feature Importance (XGBoost):")
print(importance_baseline.sort_values('importance', ascending=False).to_string(index=False))

---

## 8. Interaction Models (With Engineered Features)

Now we'll train the same models with our interaction features added. This allows us to directly compare performance and measure the value of interaction terms.

In [None]:
# Initialize interaction models (same architectures)
interaction_models = {
    'XGBoost': XGBClassifier(
        n_estimators=100,
        max_depth=6,
        learning_rate=0.1,
        random_state=RANDOM_STATE,
        use_label_encoder=False,
        eval_metric='mlogloss'
    ),
    'Random Forest': RandomForestClassifier(
        n_estimators=100,
        max_depth=10,
        random_state=RANDOM_STATE,
        n_jobs=-1
    ),
    'Logistic Regression': LogisticRegression(
        max_iter=1000,
        random_state=RANDOM_STATE,
        multi_class='multinomial'
    )
}

# Scale interaction features for Logistic Regression
scaler_int = StandardScaler()
X_train_int_scaled = scaler_int.fit_transform(X_train_int)
X_test_int_scaled = scaler_int.transform(X_test_int)

# Train and evaluate interaction models
print("INTERACTION MODEL TRAINING (With Engineered Features)")
print("=" * 60)

interaction_results = {}

for name, model in interaction_models.items():
    print(f"\nTraining {name} with interaction features...")
    
    # Use scaled features for Logistic Regression
    if name == 'Logistic Regression':
        X_tr, X_te = X_train_int_scaled, X_test_int_scaled
    else:
        X_tr, X_te = X_train_int, X_test_int
    
    # Train
    model.fit(X_tr, y_train)
    
    # Predict
    y_pred = model.predict(X_te)
    
    # Evaluate
    accuracy = accuracy_score(y_test, y_pred)
    f1_macro = f1_score(y_test, y_pred, average='macro')
    f1_weighted = f1_score(y_test, y_pred, average='weighted')
    
    # Cross-validation
    cv_scores = cross_val_score(model, X_tr, y_train, cv=5, scoring='accuracy')
    
    interaction_results[name] = {
        'model': model,
        'accuracy': accuracy,
        'f1_macro': f1_macro,
        'f1_weighted': f1_weighted,
        'cv_mean': cv_scores.mean(),
        'cv_std': cv_scores.std(),
        'predictions': y_pred
    }
    
    print(f"  Accuracy: {accuracy:.4f}")
    print(f"  F1 (macro): {f1_macro:.4f}")
    print(f"  F1 (weighted): {f1_weighted:.4f}")
    print(f"  CV Accuracy: {cv_scores.mean():.4f} (+/- {cv_scores.std()*2:.4f})")

### 8.1 Hyperparameter Tuning for Best Interaction Model

Let's perform basic grid search to optimize the XGBoost model with interaction features.

In [None]:
# Hyperparameter tuning for XGBoost with interaction features
print("HYPERPARAMETER TUNING: XGBoost with Interaction Features")
print("=" * 60)

param_grid = {
    'n_estimators': [50, 100, 150],
    'max_depth': [4, 6, 8],
    'learning_rate': [0.05, 0.1, 0.2],
    'min_child_weight': [1, 3, 5]
}

xgb_tuning = XGBClassifier(
    random_state=RANDOM_STATE,
    use_label_encoder=False,
    eval_metric='mlogloss'
)

# Use StratifiedKFold for cross-validation
cv_strategy = StratifiedKFold(n_splits=5, shuffle=True, random_state=RANDOM_STATE)

grid_search = GridSearchCV(
    xgb_tuning, param_grid, cv=cv_strategy, 
    scoring='f1_weighted', n_jobs=-1, verbose=1
)

print("Running grid search (this may take a moment)...")
grid_search.fit(X_train_int, y_train)

print(f"\nBest Parameters: {grid_search.best_params_}")
print(f"Best CV F1 (weighted): {grid_search.best_score_:.4f}")

# Evaluate best model on test set
best_xgb = grid_search.best_estimator_
y_pred_tuned = best_xgb.predict(X_test_int)

accuracy_tuned = accuracy_score(y_test, y_pred_tuned)
f1_tuned = f1_score(y_test, y_pred_tuned, average='weighted')

print(f"\nTuned Model Test Performance:")
print(f"  Accuracy: {accuracy_tuned:.4f}")
print(f"  F1 (weighted): {f1_tuned:.4f}")

# Store tuned model results
interaction_results['XGBoost (Tuned)'] = {
    'model': best_xgb,
    'accuracy': accuracy_tuned,
    'f1_macro': f1_score(y_test, y_pred_tuned, average='macro'),
    'f1_weighted': f1_tuned,
    'cv_mean': grid_search.best_score_,
    'cv_std': 0,  # Not directly available from GridSearchCV
    'predictions': y_pred_tuned
}

---

## 9. Model Comparison: Baseline vs. Interaction

Now we'll directly compare the baseline models against the interaction models to quantify the improvement (or lack thereof) from adding interaction terms.

In [None]:
# Create comparison dataframe
comparison_data = []

for name in ['XGBoost', 'Random Forest', 'Logistic Regression']:
    base = baseline_results[name]
    inter = interaction_results[name]
    
    comparison_data.append({
        'Model': name,
        'Feature Set': 'Baseline',
        'Accuracy': base['accuracy'],
        'F1 (Macro)': base['f1_macro'],
        'F1 (Weighted)': base['f1_weighted'],
        'CV Accuracy': base['cv_mean']
    })
    comparison_data.append({
        'Model': name,
        'Feature Set': 'Interaction',
        'Accuracy': inter['accuracy'],
        'F1 (Macro)': inter['f1_macro'],
        'F1 (Weighted)': inter['f1_weighted'],
        'CV Accuracy': inter['cv_mean']
    })

# Add tuned XGBoost
if 'XGBoost (Tuned)' in interaction_results:
    tuned = interaction_results['XGBoost (Tuned)']
    comparison_data.append({
        'Model': 'XGBoost (Tuned)',
        'Feature Set': 'Interaction',
        'Accuracy': tuned['accuracy'],
        'F1 (Macro)': tuned['f1_macro'],
        'F1 (Weighted)': tuned['f1_weighted'],
        'CV Accuracy': tuned['cv_mean']
    })

comparison_df = pd.DataFrame(comparison_data)

print("MODEL COMPARISON: BASELINE vs INTERACTION")
print("=" * 80)
print(comparison_df.to_string(index=False))

# Calculate improvement
print("\n" + "=" * 80)
print("IMPROVEMENT FROM INTERACTION FEATURES")
print("=" * 80)

for model_name in ['XGBoost', 'Random Forest', 'Logistic Regression']:
    base_acc = baseline_results[model_name]['accuracy']
    inter_acc = interaction_results[model_name]['accuracy']
    base_f1 = baseline_results[model_name]['f1_weighted']
    inter_f1 = interaction_results[model_name]['f1_weighted']
    
    acc_diff = (inter_acc - base_acc) * 100
    f1_diff = (inter_f1 - base_f1) * 100
    
    print(f"\n{model_name}:")
    print(f"  Accuracy: {base_acc:.4f} → {inter_acc:.4f} ({acc_diff:+.2f} pp)")
    print(f"  F1 (weighted): {base_f1:.4f} → {inter_f1:.4f} ({f1_diff:+.2f} pp)")

In [None]:
# Visualization: Model comparison
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Accuracy comparison
models = ['XGBoost', 'Random Forest', 'Logistic Regression']
x = np.arange(len(models))
width = 0.35

baseline_acc = [baseline_results[m]['accuracy'] for m in models]
interaction_acc = [interaction_results[m]['accuracy'] for m in models]

bars1 = axes[0].bar(x - width/2, baseline_acc, width, label='Baseline', color='#3498db', edgecolor='black')
bars2 = axes[0].bar(x + width/2, interaction_acc, width, label='Interaction', color='#e74c3c', edgecolor='black')

axes[0].set_xlabel('Model')
axes[0].set_ylabel('Accuracy')
axes[0].set_title('Accuracy: Baseline vs Interaction Models')
axes[0].set_xticks(x)
axes[0].set_xticklabels(models, rotation=15)
axes[0].legend()
axes[0].set_ylim(0, 1.0)

# Add value labels
for bar in bars1:
    axes[0].text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.01, 
                 f'{bar.get_height():.3f}', ha='center', va='bottom', fontsize=9)
for bar in bars2:
    axes[0].text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.01, 
                 f'{bar.get_height():.3f}', ha='center', va='bottom', fontsize=9)

# F1 Score comparison
baseline_f1 = [baseline_results[m]['f1_weighted'] for m in models]
interaction_f1 = [interaction_results[m]['f1_weighted'] for m in models]

bars3 = axes[1].bar(x - width/2, baseline_f1, width, label='Baseline', color='#3498db', edgecolor='black')
bars4 = axes[1].bar(x + width/2, interaction_f1, width, label='Interaction', color='#e74c3c', edgecolor='black')

axes[1].set_xlabel('Model')
axes[1].set_ylabel('F1 Score (Weighted)')
axes[1].set_title('F1 Score: Baseline vs Interaction Models')
axes[1].set_xticks(x)
axes[1].set_xticklabels(models, rotation=15)
axes[1].legend()
axes[1].set_ylim(0, 1.0)

# Add value labels
for bar in bars3:
    axes[1].text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.01, 
                 f'{bar.get_height():.3f}', ha='center', va='bottom', fontsize=9)
for bar in bars4:
    axes[1].text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.01, 
                 f'{bar.get_height():.3f}', ha='center', va='bottom', fontsize=9)

plt.tight_layout()
plt.savefig('../reports/figures/model_comparison.png', dpi=150, bbox_inches='tight')
plt.show()

print(f"\nFigure saved to: ../reports/figures/model_comparison.png")

### 9.1 Confusion Matrix Analysis

Let's examine the confusion matrices to understand which balance changes are being predicted correctly and where errors occur.

In [None]:
# Confusion matrices for best baseline and interaction models
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Get class names
class_names = le_change.classes_

# Baseline XGBoost confusion matrix
cm_baseline = confusion_matrix(y_test, baseline_results['XGBoost']['predictions'])
disp1 = ConfusionMatrixDisplay(confusion_matrix=cm_baseline, display_labels=class_names)
disp1.plot(ax=axes[0], cmap='Blues', values_format='d')
axes[0].set_title('Baseline XGBoost\nConfusion Matrix')

# Interaction XGBoost confusion matrix
best_interaction_key = 'XGBoost (Tuned)' if 'XGBoost (Tuned)' in interaction_results else 'XGBoost'
cm_interaction = confusion_matrix(y_test, interaction_results[best_interaction_key]['predictions'])
disp2 = ConfusionMatrixDisplay(confusion_matrix=cm_interaction, display_labels=class_names)
disp2.plot(ax=axes[1], cmap='Reds', values_format='d')
axes[1].set_title(f'{best_interaction_key} (Interaction)\nConfusion Matrix')

plt.tight_layout()
plt.savefig('../reports/figures/confusion_matrices.png', dpi=150, bbox_inches='tight')
plt.show()

# Classification reports
print("\n" + "=" * 60)
print("CLASSIFICATION REPORT: BASELINE XGBOOST")
print("=" * 60)
print(classification_report(y_test, baseline_results['XGBoost']['predictions'], 
                           target_names=class_names))

print("\n" + "=" * 60)
print(f"CLASSIFICATION REPORT: {best_interaction_key.upper()} (INTERACTION)")
print("=" * 60)
print(classification_report(y_test, interaction_results[best_interaction_key]['predictions'], 
                           target_names=class_names))

---

## 10. Feature Importance: SHAP Analysis

SHAP (SHapley Additive exPlanations) values provide a unified measure of feature importance that accounts for feature interactions. This helps us understand:
1. Which features drive predictions most
2. Whether interaction terms provide meaningful signal
3. How features interact with each other

In [None]:
# SHAP analysis for interaction model
print("SHAP ANALYSIS: Feature Importance for Interaction Model")
print("=" * 60)

# Get best interaction model
best_model = interaction_results[best_interaction_key]['model']

# Create SHAP explainer
explainer = shap.TreeExplainer(best_model)

# Calculate SHAP values (using a sample for speed)
sample_size = min(500, len(X_test_int))
X_sample = X_test_int[:sample_size]
shap_values = explainer.shap_values(X_sample)

print(f"SHAP values calculated for {sample_size} samples.")

In [None]:
# SHAP Summary Plot
fig, axes = plt.subplots(1, 2, figsize=(16, 8))

# Mean absolute SHAP values (feature importance)
plt.sca(axes[0])

# Handle multi-class SHAP values
if isinstance(shap_values, list):
    # Multi-class: average across classes
    mean_shap = np.mean([np.abs(sv).mean(axis=0) for sv in shap_values], axis=0)
else:
    mean_shap = np.abs(shap_values).mean(axis=0)

# Create importance dataframe
shap_importance = pd.DataFrame({
    'feature': INTERACTION_FEATURES,
    'importance': mean_shap
}).sort_values('importance', ascending=True)

colors = ['#e74c3c' if 'wr_x_pr' in f or 'quadrant' in f or 'deviation' in f or 'x_br' in f 
          else '#3498db' for f in shap_importance['feature']]

axes[0].barh(shap_importance['feature'], shap_importance['importance'], color=colors, edgecolor='black')
axes[0].set_xlabel('Mean |SHAP Value|')
axes[0].set_title('SHAP Feature Importance\n(Red = Interaction Features)')

# Bar chart of top features
top_features = shap_importance.tail(10)
axes[1].barh(range(len(top_features)), top_features['importance'], 
             color=['#e74c3c' if 'wr_x_pr' in f or 'quadrant' in f or 'deviation' in f or 'x_br' in f 
                    else '#3498db' for f in top_features['feature']], edgecolor='black')
axes[1].set_yticks(range(len(top_features)))
axes[1].set_yticklabels(top_features['feature'])
axes[1].set_xlabel('Mean |SHAP Value|')
axes[1].set_title('Top 10 Most Important Features')

# Add value labels
for i, (feat, imp) in enumerate(zip(top_features['feature'], top_features['importance'])):
    axes[1].text(imp + 0.001, i, f'{imp:.4f}', va='center')

plt.tight_layout()
plt.savefig('../reports/figures/shap_importance.png', dpi=150, bbox_inches='tight')
plt.show()

print("\nFull SHAP Feature Importance Ranking:")
print(shap_importance.sort_values('importance', ascending=False).to_string(index=False))

In [None]:
# SHAP Beeswarm plot (if single class or binary)
try:
    fig = plt.figure(figsize=(12, 8))
    
    # For multi-class, show SHAP values for one class (e.g., 'nerf')
    if isinstance(shap_values, list):
        # Find index for 'nerf' class
        nerf_idx = list(le_change.classes_).index('nerf') if 'nerf' in le_change.classes_ else 0
        shap.summary_plot(shap_values[nerf_idx], X_sample, feature_names=INTERACTION_FEATURES, 
                         show=False, max_display=10)
        plt.title(f'SHAP Summary Plot (Class: {le_change.classes_[nerf_idx]})')
    else:
        shap.summary_plot(shap_values, X_sample, feature_names=INTERACTION_FEATURES, 
                         show=False, max_display=10)
    
    plt.tight_layout()
    plt.savefig('../reports/figures/shap_beeswarm.png', dpi=150, bbox_inches='tight')
    plt.show()
    print(f"\nFigure saved to: ../reports/figures/shap_beeswarm.png")
except Exception as e:
    print(f"Could not generate beeswarm plot: {e}")

### 10.1 SHAP Interpretation

The SHAP analysis reveals:
1. **Relative importance** of interaction terms compared to original features
2. **Direction of effects**: How high/low values influence predictions
3. **Feature interactions**: Which combinations matter most

Key findings will be summarized in the conclusions section.

---

## 11. Partial Dependence Analysis

Partial Dependence Plots show how predictions change as we vary specific features while holding others constant. This helps us understand:
- At what win rate thresholds do predictions shift?
- How does pick rate modify the win rate effect?

In [None]:
from sklearn.inspection import PartialDependenceDisplay

# Partial dependence plots for key features
fig, axes = plt.subplots(2, 2, figsize=(14, 12))

# Feature indices
wr_idx = INTERACTION_FEATURES.index('winrate')
pr_idx = INTERACTION_FEATURES.index('pickrate')
wr_x_pr_idx = INTERACTION_FEATURES.index('wr_x_pr')
wd_idx = INTERACTION_FEATURES.index('weighted_deviation')

# Use a sample for speed
X_pdp = X_train_int[:1000]

# Individual PDPs
try:
    PartialDependenceDisplay.from_estimator(
        best_model, X_pdp, features=[wr_idx], 
        feature_names=INTERACTION_FEATURES, ax=axes[0, 0]
    )
    axes[0, 0].set_title('Partial Dependence: Win Rate')

    PartialDependenceDisplay.from_estimator(
        best_model, X_pdp, features=[pr_idx], 
        feature_names=INTERACTION_FEATURES, ax=axes[0, 1]
    )
    axes[0, 1].set_title('Partial Dependence: Pick Rate')

    PartialDependenceDisplay.from_estimator(
        best_model, X_pdp, features=[wr_x_pr_idx], 
        feature_names=INTERACTION_FEATURES, ax=axes[1, 0]
    )
    axes[1, 0].set_title('Partial Dependence: WR × PR Interaction')

    PartialDependenceDisplay.from_estimator(
        best_model, X_pdp, features=[wd_idx], 
        feature_names=INTERACTION_FEATURES, ax=axes[1, 1]
    )
    axes[1, 1].set_title('Partial Dependence: Weighted Deviation')

    plt.tight_layout()
    plt.savefig('../reports/figures/partial_dependence.png', dpi=150, bbox_inches='tight')
    plt.show()
    print(f"\nFigure saved to: ../reports/figures/partial_dependence.png")
except Exception as e:
    print(f"Could not generate partial dependence plots: {e}")

In [None]:
# 2D Partial Dependence: Win Rate vs Pick Rate interaction
try:
    fig, ax = plt.subplots(figsize=(10, 8))
    
    PartialDependenceDisplay.from_estimator(
        best_model, X_pdp, features=[(wr_idx, pr_idx)], 
        feature_names=INTERACTION_FEATURES, ax=ax
    )
    ax.set_title('2D Partial Dependence: Win Rate × Pick Rate')
    
    plt.tight_layout()
    plt.savefig('../reports/figures/pdp_2d_interaction.png', dpi=150, bbox_inches='tight')
    plt.show()
    print(f"\nFigure saved to: ../reports/figures/pdp_2d_interaction.png")
except Exception as e:
    print(f"Could not generate 2D partial dependence plot: {e}")

---

## 12. Model Persistence

If interaction features improved performance, we'll save the best model for future use.

In [None]:
import joblib
import os

# Determine if interaction model improved over baseline
baseline_best_f1 = max(baseline_results[m]['f1_weighted'] for m in baseline_results)
interaction_best_f1 = max(interaction_results[m]['f1_weighted'] for m in interaction_results)

print("MODEL PERFORMANCE SUMMARY")
print("=" * 60)
print(f"Best Baseline F1 (weighted): {baseline_best_f1:.4f}")
print(f"Best Interaction F1 (weighted): {interaction_best_f1:.4f}")
print(f"Improvement: {(interaction_best_f1 - baseline_best_f1)*100:+.2f} percentage points")

# Save best model if interaction improved
models_dir = '../models'
os.makedirs(models_dir, exist_ok=True)

if interaction_best_f1 >= baseline_best_f1:
    model_path = os.path.join(models_dir, 'xgb_interaction_model.joblib')
    joblib.dump(best_model, model_path)
    print(f"\nBest interaction model saved to: {model_path}")
    
    # Also save encoders and feature list
    metadata = {
        'features': INTERACTION_FEATURES,
        'label_encoder': le_change,
        'rank_encoder': le_rank,
        'quadrant_encoder': le_quadrant,
        'wr_threshold': WR_THRESHOLD,
        'pr_threshold': PR_THRESHOLD
    }
    metadata_path = os.path.join(models_dir, 'interaction_model_metadata.joblib')
    joblib.dump(metadata, metadata_path)
    print(f"Model metadata saved to: {metadata_path}")
else:
    print("\nInteraction features did not improve performance. Model not saved.")

---

## 13. Conclusions & Key Takeaways

### 13.1 Research Hypothesis Results

In [None]:
# Final summary statistics
print("="*80)
print("FINAL ANALYSIS SUMMARY")
print("="*80)

# Hypothesis 1: Interaction terms improve performance
improvement_pct = (interaction_best_f1 - baseline_best_f1) / baseline_best_f1 * 100
h1_result = "SUPPORTED" if interaction_best_f1 > baseline_best_f1 else "NOT SUPPORTED"

print(f"\n[H1] Interaction terms improve model performance: {h1_result}")
print(f"     Baseline Best F1: {baseline_best_f1:.4f}")
print(f"     Interaction Best F1: {interaction_best_f1:.4f}")
print(f"     Relative Improvement: {improvement_pct:+.2f}%")

# Hypothesis 2: High WR + High PR = more nerfs
if 'High_WR_High_PR' in crosstab_pct.index:
    hw_hp_nerf_rate = crosstab_pct.loc['High_WR_High_PR', 'nerf'] if 'nerf' in crosstab_pct.columns else 0
    overall_nerf_rate = df_features['change'].value_counts(normalize=True).get('nerf', 0) * 100
    h2_result = "SUPPORTED" if hw_hp_nerf_rate > overall_nerf_rate else "NOT SUPPORTED"
    
    print(f"\n[H2] High WR + High PR champions have higher nerf rates: {h2_result}")
    print(f"     High_WR_High_PR nerf rate: {hw_hp_nerf_rate:.2f}%")
    print(f"     Overall nerf rate: {overall_nerf_rate:.2f}%")

# Hypothesis 3: Interaction term in top 3 features
top_3_features = shap_importance.tail(3)['feature'].tolist()
interaction_in_top3 = any('wr_x_pr' in f or 'deviation' in f for f in top_3_features)
h3_result = "SUPPORTED" if interaction_in_top3 else "NOT SUPPORTED"

print(f"\n[H3] Interaction term among top 3 SHAP features: {h3_result}")
print(f"     Top 3 features: {top_3_features}")

### 13.2 Key Findings Summary

#### What We Learned About Riot's Balance Philosophy

1. **Win Rate Remains Dominant**: Win rate is still the primary driver of balance decisions, consistent with Riot's stated goal of keeping champions near 50%.

2. **Pick Rate Matters at Extremes**: Champions with very high pick rates receive more scrutiny—they affect more games, so even small imbalances have large player impact.

3. **Interaction Effects Capture Meta Dominance**: The WR × PR interaction term helps identify champions that are both strong AND popular—the true "meta dominators" that Riot targets for nerfs.

4. **Ban Rate as Frustration Proxy**: High ban rates often indicate champion frustration (not just strength), which can lead to nerfs even when win rates are reasonable.

#### Model Performance Insights

- **Class Imbalance Challenge**: The "no change" majority class makes accurate buff/nerf prediction difficult
- **Tree-based models excel**: XGBoost and Random Forest outperform Logistic Regression, suggesting non-linear relationships
- **Interaction terms provide modest gains**: While not transformative, interaction features consistently improve predictions

### 13.3 Limitations

1. **Missing Pro Play Data**: Professional play statistics significantly influence balance decisions but aren't included
2. **Single Season**: Model trained on Season 11 only; may not generalize to other seasons
3. **Role-Agnostic**: Champions are balanced per-role, but our model doesn't fully account for this
4. **No Temporal Features**: Recent patch history (was champion buffed last patch?) could improve predictions

### 13.4 Recommended Next Steps

1. **Integrate Pro Play Statistics**: Add tournament pick/ban/win rates from Worlds and major leagues
2. **Role-Specific Models**: Train separate models for each role (TOP, JUNGLE, MID, ADC, SUPPORT)
3. **Temporal Features**: Include whether champion was changed in previous 1-3 patches
4. **Champion Mastery Curves**: Incorporate win rate among high-mastery players vs. new players
5. **Ensemble Approach**: Combine multiple models for more robust predictions

In [None]:
# Final Key Takeaway Box
print("\n" + "#"*80)
print("#" + " "*78 + "#")
print("#" + "  KEY TAKEAWAY: DID INTERACTIONS HELP?".center(78) + "#")
print("#" + " "*78 + "#")
print("#"*80)

if interaction_best_f1 > baseline_best_f1:
    print(f"""
    YES - Interaction features improved model performance.
    
    Improvement: {(interaction_best_f1 - baseline_best_f1)*100:+.2f} percentage points in F1 score
    
    The Pick Rate × Win Rate interaction term captures "meta dominance" that
    simple win rate alone misses. Champions that are both strong AND popular
    are more likely to receive nerfs, and this multiplicative relationship
    helps our model identify these cases.
    
    Best Model: {best_interaction_key}
    Final F1 Score: {interaction_best_f1:.4f}
    """)
else:
    print(f"""
    NO - Interaction features did not significantly improve performance.
    
    Difference: {(interaction_best_f1 - baseline_best_f1)*100:+.2f} percentage points in F1 score
    
    This suggests that for this dataset and problem, the original features
    (win rate, pick rate, ban rate, rank) already capture most of the
    predictable signal. The interaction terms may be redundant or may
    require more sophisticated feature engineering.
    
    Best Model: Baseline {max(baseline_results.keys(), key=lambda x: baseline_results[x]['f1_weighted'])}
    Final F1 Score: {baseline_best_f1:.4f}
    """)

print("#"*80)

---

## Appendix: Session Information

In [None]:
# Environment and version information for reproducibility
import sys
from datetime import datetime

print("SESSION INFORMATION")
print("="*60)
print(f"Date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print(f"Python: {sys.version}")
print(f"NumPy: {np.__version__}")
print(f"Pandas: {pd.__version__}")
print(f"Scikit-learn: {__import__('sklearn').__version__}")
try:
    import xgboost
    print(f"XGBoost: {xgboost.__version__}")
except:
    pass
try:
    print(f"SHAP: {shap.__version__}")
except:
    pass
print(f"\nRandom State: {RANDOM_STATE}")
print(f"Test Size: {TEST_SIZE}")

---

*This notebook was created as part of the League of Legends Champion Balance Prediction project. For questions or contributions, please see the project README.*