# 🔍 Explainable AI (XAI) - Hands-on Learning Notebook## Course: Lecture 19 - Model Explainability### Learning Objectives- Understand complexity-interpretability trade-off- Implement intrinsically interpretable models  - Apply feature importance methods- Use model-agnostic explanations (LIME, Surrogate Models)### Structure1. Setup & Data2. Exercise 1: Complexity vs Interpretability3. Exercise 2: Linear Model Interpretation4. Exercise 3: Decision Trees5. Exercise 4: Permutation Importance6. Exercise 5: Partial Dependence Plots7. Exercise 6: ICE Plots8. Exercise 7: Surrogate Models9. Exercise 8: LIME Explanations**Instructor:** Ho-min Park | homin.park@ghent.ac.kr

## 🔧 Setup

In [None]:
# Import librariesimport numpy as npimport pandas as pdimport matplotlib.pyplot as pltimport seaborn as snsimport warningswarnings.filterwarnings('ignore')from sklearn.model_selection import train_test_splitfrom sklearn.preprocessing import StandardScalerfrom sklearn.metrics import accuracy_scorefrom sklearn.linear_model import LogisticRegressionfrom sklearn.tree import DecisionTreeClassifier, plot_treefrom sklearn.ensemble import RandomForestClassifierfrom sklearn.neural_network import MLPClassifierfrom sklearn.inspection import permutation_importance, PartialDependenceDisplayplt.style.use('seaborn-v0_8-whitegrid')np.random.seed(42)print("✅ Setup complete!")

## 📊 Data: Credit Risk Dataset

In [None]:
# Create synthetic credit risk datanp.random.seed(42)n = 1000data = pd.DataFrame({    'Age': np.random.randint(20, 71, n),    'Income': np.random.gamma(3, 15, n) + 20,    'Credit_Score': np.random.normal(650, 80, n).clip(300, 850),    'Debt_Ratio': np.random.beta(2, 5, n),    'Employment_Years': np.random.exponential(8, n).clip(0, 40),    'Education': np.random.choice([0,1,2,3], n, p=[0.3,0.4,0.2,0.1])})# Target: Loan approval (based on risk score)risk = (0.3*(data['Credit_Score']-300)/550 +         0.2*data['Income']/150 +         0.2*(1-data['Debt_Ratio']) +        0.15*data['Employment_Years']/40 +        0.15*data['Education']/3)prob = 1/(1 + np.exp(-5*(risk-0.5)))data['Approved'] = (prob + np.random.normal(0, 0.1, n) > 0.5).astype(int)print(f"Dataset: {data.shape}")print(f"Approval rate: {data['Approved'].mean():.1%}")data.head()

In [None]:
# Split dataX = data.drop('Approved', axis=1)y = data['Approved']X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)scaler = StandardScaler()X_train_scaled = pd.DataFrame(scaler.fit_transform(X_train), columns=X.columns)X_test_scaled = pd.DataFrame(scaler.transform(X_test), columns=X.columns)print(f"Train: {len(X_train)}, Test: {len(X_test)}")

---## Exercise 1: Complexity vs Interpretability Trade-off### Concept- **Simple models**: High interpretability, may underfit- **Complex models**: High accuracy, "black boxes"### TaskTrain 4 models and visualize the trade-off

In [None]:
# Train modelsmodels = {    'Logistic Regression': LogisticRegression(max_iter=1000),    'Decision Tree (d=3)': DecisionTreeClassifier(max_depth=3),    'Random Forest': RandomForestClassifier(n_estimators=100),    'Neural Network': MLPClassifier(hidden_layer_sizes=(50,))}results = []for name, model in models.items():    model.fit(X_train_scaled, y_train)    acc = accuracy_score(y_test, model.predict(X_test_scaled))    interp = {'Logistic Regression': 0.95, 'Decision Tree (d=3)': 0.90,               'Random Forest': 0.40, 'Neural Network': 0.20}[name]    results.append({'Model': name, 'Accuracy': acc, 'Interpretability': interp})    print(f"{name}: Acc={acc:.3f}, Interp={interp:.2f}")results_df = pd.DataFrame(results)

In [None]:
# Visualize trade-offplt.figure(figsize=(10, 6))for i, row in results_df.iterrows():    plt.scatter(row['Interpretability'], row['Accuracy'], s=200, alpha=0.6)    plt.annotate(row['Model'], (row['Interpretability'], row['Accuracy']),                 xytext=(5, 5), textcoords='offset points')plt.xlabel('Interpretability', fontsize=12, fontweight='bold')plt.ylabel('Accuracy', fontsize=12, fontweight='bold')plt.title('Model Complexity vs Interpretability Trade-off', fontsize=14, fontweight='bold')plt.grid(alpha=0.3)plt.show()

---## Exercise 2: Linear Model Interpretation### ConceptLinear models: y = β₀ + β₁x₁ + β₂x₂ + ...**Interpretation**:- **Coefficient magnitude**: Feature importance- **Sign**: Positive/negative relationship

In [None]:
# Train logistic regressionlr = LogisticRegression(max_iter=1000)lr.fit(X_train_scaled, y_train)# Extract coefficientscoef_df = pd.DataFrame({    'Feature': X.columns,    'Coefficient': lr.coef_[0]}).sort_values('Coefficient')print(coef_df)

In [None]:
# Visualize coefficientsplt.figure(figsize=(10, 6))colors = ['red' if c < 0 else 'green' for c in coef_df['Coefficient']]plt.barh(coef_df['Feature'], coef_df['Coefficient'], color=colors, alpha=0.7)plt.xlabel('Coefficient Value', fontweight='bold')plt.title('Feature Coefficients (Linear Model)', fontsize=14, fontweight='bold')plt.axvline(0, color='black', linestyle='--')plt.grid(alpha=0.3, axis='x')plt.show()

---## Exercise 3: Decision Tree Transparency### ConceptDecision trees create **IF-THEN rules**:- Easy to follow logic path- Visualize entire tree structure- Extract feature importance

In [None]:
# Train decision treetree = DecisionTreeClassifier(max_depth=3, random_state=42)tree.fit(X_train, y_train)acc = accuracy_score(y_test, tree.predict(X_test))print(f"Accuracy: {acc:.3f}")# Feature importanceimp_df = pd.DataFrame({    'Feature': X.columns,    'Importance': tree.feature_importances_}).sort_values('Importance', ascending=False)print(imp_df)

In [None]:
# Visualize treeplt.figure(figsize=(20, 10))plot_tree(tree, feature_names=X.columns, class_names=['Reject', 'Approve'],          filled=True, rounded=True, fontsize=10)plt.title('Decision Tree (depth=3)', fontsize=16, fontweight='bold')plt.show()

---## Exercise 4: Permutation Importance### ConceptMeasure feature importance by:1. Train model2. Shuffle one feature3. Measure performance drop4. Importance = baseline - shuffled

In [None]:
# Train Random Forestrf = RandomForestClassifier(n_estimators=100, random_state=42)rf.fit(X_train, y_train)# Compute permutation importanceperm_imp = permutation_importance(rf, X_test, y_test, n_repeats=30, random_state=42)perm_df = pd.DataFrame({    'Feature': X.columns,    'Importance': perm_imp.importances_mean,    'Std': perm_imp.importances_std}).sort_values('Importance', ascending=False)print(perm_df)

In [None]:
# Visualize with error barsplt.figure(figsize=(10, 6))plt.barh(perm_df['Feature'], perm_df['Importance'],          xerr=perm_df['Std'], color='steelblue', alpha=0.7)plt.xlabel('Importance (Performance Drop)', fontweight='bold')plt.title('Permutation Importance', fontsize=14, fontweight='bold')plt.grid(alpha=0.3, axis='x')plt.show()

---## Exercise 5: Partial Dependence Plots (PDP)### ConceptPDP shows **marginal effect** of a feature:- How does prediction change as we vary the feature?- Reveals non-linear relationships- Model-agnostic

In [None]:
# Create PDP for key featuresfig, ax = plt.subplots(figsize=(14, 10))PartialDependenceDisplay.from_estimator(    rf, X_train,     features=['Credit_Score', 'Income', 'Debt_Ratio', 'Age'],    kind='average',    n_cols=2,    ax=ax)plt.suptitle('Partial Dependence Plots', fontsize=16, fontweight='bold')plt.tight_layout()plt.show()

---## Exercise 6: ICE (Individual Conditional Expectation) Plots### ConceptICE = PDP for **each instance separately**- Shows heterogeneity- Reveals subgroups- PDP = Average of ICE curves

In [None]:
# Create ICE plotsn_samples = 50ice_idx = np.random.choice(X_train.index, size=n_samples, replace=False)X_ice = X_train.loc[ice_idx]fig, ax = plt.subplots(figsize=(14, 10))PartialDependenceDisplay.from_estimator(    rf, X_ice,    features=['Credit_Score', 'Income', 'Debt_Ratio', 'Age'],    kind='both',  # Shows both ICE and PDP    n_cols=2,    ax=ax,    line_kw={'alpha': 0.3},    pd_line_kw={'color': 'red', 'linewidth': 3})plt.suptitle(f'ICE Plots ({n_samples} instances) + PDP (red)', fontsize=16, fontweight='bold')plt.tight_layout()plt.show()

---## Exercise 7: Surrogate Models### ConceptApproximate **black-box** with **interpretable model**:Process:1. Train complex model (RF)2. Get predictions  3. Train simple model (DT) on those predictions4. Interpret surrogate

In [None]:
# Train complex modelcomplex_model = RandomForestClassifier(n_estimators=200, max_depth=None)complex_model.fit(X_train, y_train)complex_acc = accuracy_score(y_test, complex_model.predict(X_test))# Generate predictionsy_train_pred = complex_model.predict(X_train)y_test_pred = complex_model.predict(X_test)# Train surrogatesurrogate = DecisionTreeClassifier(max_depth=5, random_state=42)surrogate.fit(X_train, y_train_pred)  # Train on complex model predictions!# Measure fidelityfidelity = accuracy_score(y_test_pred, surrogate.predict(X_test))surr_acc = accuracy_score(y_test, surrogate.predict(X_test))print(f"Complex model accuracy: {complex_acc:.3f}")print(f"Surrogate fidelity: {fidelity:.3f} (matches black-box {fidelity*100:.0f}% of time)")print(f"Surrogate accuracy: {surr_acc:.3f}")

In [None]:
# Visualize surrogateplt.figure(figsize=(20, 12))plot_tree(surrogate, feature_names=X.columns, class_names=['Reject', 'Approve'],          filled=True, rounded=True, fontsize=10)plt.title(f'Surrogate Model (depth=5, fidelity={fidelity:.2f})', fontsize=16, fontweight='bold')plt.show()

---## Exercise 8: LIME (Local Interpretable Model-agnostic Explanations)### ConceptExplain **individual predictions**:1. Perturb instance2. Get black-box predictions3. Fit local linear model4. Interpret coefficients**Key**: Explains this specific instance (not global behavior)

In [None]:
# Install LIMEtry:    import lime    import lime.lime_tabularexcept:    import sys    !{sys.executable} -m pip install lime --quiet    import lime    import lime.lime_tabularprint("✅ LIME ready")

In [None]:
# Create LIME explainerexplainer = lime.lime_tabular.LimeTabularExplainer(    np.array(X_train),    feature_names=X.columns.tolist(),    class_names=['Rejected', 'Approved'],    mode='classification')# Select instance to explainidx = 0instance = X_test.iloc[idx].valuespred = rf.predict([instance])[0]prob = rf.predict_proba([instance])[0]print(f"Instance {idx}:")print(f"Prediction: {'Approved' if pred==1 else 'Rejected'} ({prob[pred]:.1%} confidence)")

In [None]:
# Generate LIME explanationexplanation = explainer.explain_instance(    instance,    rf.predict_proba,    num_features=6,    num_samples=5000)# Extract explanationexp_list = explanation.as_list()# Visualizefig, ax = plt.subplots(figsize=(10, 6))features = [e[0].split()[0] for e in exp_list]contrib = [e[1] for e in exp_list]colors = ['green' if c > 0 else 'red' for c in contrib]plt.barh(features, contrib, color=colors, alpha=0.7)plt.xlabel('Feature Contribution', fontweight='bold')plt.title(f'LIME Explanation: Instance {idx}', fontsize=14, fontweight='bold')plt.axvline(0, color='black', linestyle='--')plt.grid(alpha=0.3, axis='x')plt.show()print("\nFeature Contributions:")for feat, cont in exp_list:    print(f"  {feat}: {cont:+.3f}")

---## 📚 Summary### What We Learned1. ✅ **Trade-off**: Complexity vs Interpretability2. ✅ **Linear Models**: Coefficient interpretation3. ✅ **Decision Trees**: IF-THEN rules, visualization4. ✅ **Permutation Importance**: Model-agnostic feature ranking5. ✅ **PDP**: Marginal feature effects6. ✅ **ICE**: Instance-level heterogeneity7. ✅ **Surrogate Models**: Approximate black-box8. ✅ **LIME**: Local instance explanations### Method Selection Guide- **Global explanation** → Permutation Importance, PDP- **Local explanation** → LIME, ICE- **Interpretable model** → Linear, Decision Tree- **Black-box approximation** → Surrogate Model### Best Practices- Use **multiple methods**- Validate with **domain experts**- Document **limitations**- Match method to **audience**---## 🎓 Congratulations!You completed the XAI Hands-on Notebook!**Next**: Apply to your own datasets**Contact**: homin.park@ghent.ac.kr