# 🎯 SHAP and XAI: Hands-on Tutorial\n\n## Model Explainability - From Theory to Practice\n\n**Based on Lecture 20: Model Explainability - SHAP and Deep Learning XAI**\n\n---\n\n### 📚 Learning Objectives\n\n1. ✅ Understand Shapley values from game theory\n2. ✅ Implement different SHAP explainers\n3. ✅ Create and interpret SHAP visualizations\n4. ✅ Apply SHAP to real-world problems\n\n**Duration**: ~2 hours\n\n**Instructor**: Ho-min Park (homin.park@ghent.ac.kr)\n

---\n# 📦 Part 0: Setup\n

In [None]:
# Install (uncomment if needed)\n# !pip install shap scikit-learn matplotlib seaborn pandas numpy\nprint('✅ Ready!')\n

In [None]:
import numpy as np\nimport pandas as pd\nimport matplotlib.pyplot as plt\nimport seaborn as sns\nimport warnings\nwarnings.filterwarnings('ignore')\n\nfrom sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.metrics import accuracy_score\nfrom sklearn.datasets import load_iris, load_breast_cancer\n\nimport shap\n\nplt.style.use('seaborn-v0_8-darkgrid')\nRANDOM_STATE = 42\nnp.random.seed(RANDOM_STATE)\n\nprint(f'✅ Libraries imported! SHAP v{shap.__version__}')\n

In [None]:
# Load datasets\niris = load_iris(as_frame=True)\niris_df = pd.DataFrame(iris.data, columns=iris.feature_names)\niris_df['target'] = iris.target\n\ncancer = load_breast_cancer(as_frame=True)\ncancer_df = pd.DataFrame(cancer.data, columns=cancer.feature_names)\ncancer_df['target'] = cancer.target\n\n# Create credit dataset\nn = 1000\ncredit_df = pd.DataFrame({\n    'Age': np.random.randint(22, 70, n),\n    'Income': np.random.randint(20000, 150000, n),\n    'Credit_Score': np.random.randint(300, 850, n),\n    'Debt_Ratio': np.random.uniform(0, 0.6, n),\n    'Employment_Years': np.random.randint(0, 40, n)\n})\n\nscore = ((credit_df['Income'] / 50000) * 0.3 +\n         (credit_df['Credit_Score'] / 850) * 0.4 +\n         (1 - credit_df['Debt_Ratio']) * 0.2 +\n         (credit_df['Employment_Years'] / 40) * 0.1)\ncredit_df['Approved'] = (score + np.random.normal(0, 0.1, n) > 0.5).astype(int)\n\nprint(f'📊 Datasets loaded:')\nprint(f'  Iris: {iris_df.shape}')\nprint(f'  Cancer: {cancer_df.shape}')\nprint(f'  Credit: {credit_df.shape} (Approval: {credit_df["Approved"].mean():.1%})')\n

---\n# 📊 Part 1: SHAP Fundamentals\n\nSHAP (SHapley Additive exPlanations) is based on **cooperative game theory**.\n\n### Core Formula:\n```\nf(x) = φ₀ + φ₁ + φ₂ + ... + φₙ\n```\n\nwhere:\n- `f(x)` = model prediction\n- `φ₀` = base value (average)\n- `φᵢ` = SHAP value for feature i\n

## Exercise 1: Understanding Shapley Values\n\n### 📖 Concept\n\nShapley values answer: **How should we fairly distribute credit among features?**\n\nCalculation considers ALL possible feature coalitions.\n

In [None]:
# Manual calculation example\npredictions = {\n    'empty': 250,\n    'sqft': 300,\n    'location': 280,\n    'sqft+location': 340\n}\n\nprint('🏠 House Price Predictions ($K):')\nfor k, v in predictions.items():\n    print(f'  {k:20} → ${v}')\n\n# Shapley for Square Feet\nshap_sqft = (\n    (1/2) * (predictions['sqft'] - predictions['empty']) +\n    (1/2) * (predictions['sqft+location'] - predictions['location'])\n)\n\nprint(f'\\n✅ Shapley(Square Feet) = ${shap_sqft:.1f}K')\n\n# Visualize\nfig, ax = plt.subplots(figsize=(10, 5))\nfeatures = ['Base', 'Sq Feet', 'Location', 'Final']\nvalues = [250, 50, 30, 330]\ncolors = ['lightblue', 'green', 'green', 'darkblue']\n\nax.bar(features, values, color=colors, alpha=0.7, edgecolor='black')\nfor i, v in enumerate(values):\n    ax.text(i, v, f'${v}K', ha='center', va='bottom', fontweight='bold')\n\nax.set_ylabel('Value ($K)', fontweight='bold')\nax.set_title('SHAP Decomposition', fontweight='bold')\nplt.tight_layout()\nplt.show()\n\nprint('💡 Additive: 250 + 50 + 30 = 330 ✓')\n

### ✏️ Your Turn\n\n**Task**: Calculate Shapley value for Location feature.\n

In [None]:
# Your code here\n# TODO: Calculate Shapley for Location\n\n