# Model Explainability and Interpretation

## 1. Notebook overview

This notebook explains the predictions made by our final logistic regression model.

We focus on model transparency and interpretability using:
- **SHAP** (SHapley Additive exPlanations): Global and local explanations
- **LIME** (Local Interpretable Model-agnostic Explanations): Per-instance feature influence

Objectives:
- Understand which features most influence attrition risk
- Generate interpretable explanations for key test cases
- Provide both global and individual-level insights to support business decision-making

## 2. Load final model and test data

We load:
- The trained pipeline (`logreg_final_model.joblib`)
- The optimized threshold (`logreg_threshold.joblib`)
- The untouched test set (`X_test.csv`, `y_test.csv`)

In [None]:
import joblib
import pandas as pd
import shap
import os

# Paths
model_path = "../models/logreg_final_model.joblib"
threshold_path = "../models/logreg_threshold.joblib"
data_path = "../data/processed"

# Load final model pipeline and threshold
model = joblib.load(model_path)
threshold = joblib.load(threshold_path)

# Load test data
X_test = pd.read_csv(os.path.join(data_path, "X_test.csv"))
y_test = pd.read_csv(os.path.join(data_path, "y_test.csv"))

print("Model, threshold, and test data loaded.")
print(f"Custom threshold: {threshold:.3f}")
print(f"X_test shape: {X_test.shape}")

# Extract logistic regression model from pipeline
logreg = model.named_steps["logreg"]
scaler = model.named_steps["scaler"]

# Scale X_test (SHAP needs raw model input)
X_test_scaled = scaler.transform(X_test)

## 3. Global explanation using SHAP

We use SHAP (SHapley Additive exPlanations) to interpret global model behavior.

Key visualizations:
- **Bar Plot**: Mean absolute SHAP value per feature (global importance)
- **Beeswarm Plot**: Shows both magnitude and direction of SHAP values
- **Summary Plot**: Combines feature impact and value distributions

These help us answer:
- Which features most affect attrition predictions?
- Do they push predictions up (higher risk) or down (lower risk)?
- Are effects monotonic or conditional?

In [None]:
import shap
import matplotlib.pyplot as plt

# Use KernelExplainer (model-agnostic) for logistic regression
explainer = shap.Explainer(logreg.predict_proba, X_test_scaled)
shap_values = explainer(X_test_scaled)

# Plot 1: Global feature importance (bar)
shap.plots.bar(shap_values, max_display=15)
plt.title("SHAP Global Feature Importance")
plt.show()

# Plot 2: Beeswarm plot
shap.plots.beeswarm(shap_values, max_display=15)
plt.title("SHAP Beeswarm Plot")
plt.show()

# Plot 3: Summary plot (colored by feature values)
shap.summary_plot(shap_values.values, features=X_test, feature_names=X_test.columns)

## Identify high-risk predictions

We rank test set predictions by predicted attrition probability and match them to their actual outcomes:

- True Positive (TP): Model correctly predicted attrition
- False Positive (FP): Model wrongly predicted attrition
- False Negative (FN): Model missed an attrition case
- True Negative (TN): Model correctly predicted retention

We'll use these to select interesting examples for SHAP force/waterfall plots.

In [None]:
import numpy as np

# Predict probability and apply threshold
y_proba = model.predict_proba(X_test)[:, 1]
y_pred = (y_proba >= threshold).astype(int)

# Create result DataFrame
results = X_test.copy()
results['actual'] = y_test.values
results['predicted'] = y_pred
results['proba'] = y_proba

# Classification category
conditions = [
    (results['actual'] == 1) & (results['predicted'] == 1),
    (results['actual'] == 0) & (results['predicted'] == 1),
    (results['actual'] == 1) & (results['predicted'] == 0),
    (results['actual'] == 0) & (results['predicted'] == 0),
]
choices = ['TP', 'FP', 'FN', 'TN']
results['category'] = np.select(conditions, choices)

# Sort by risk
high_risk = results.sort_values(by='proba', ascending=False)

# Show top 10 highest risk predictions
high_risk[['proba', 'actual', 'predicted', 'category']].head(10)

## 4. Local explanation (SHAP)

We select a few test instances (e.g., one true positive, one false negative) and visualize:
- Force plots
- Waterfall plots

Goal: Understand how the model combines features to arrive at each prediction.

In [None]:
# Select one example from each case type for SHAP interpretation

# Highest-risk correct prediction (TP)
sample_tp = results[results['category'] == 'TP'].sort_values(by='proba', ascending=False).head(1)

# Highest-risk incorrect prediction (FP)
sample_fp = results[results['category'] == 'FP'].sort_values(by='proba', ascending=False).head(1)

# Highest-risk missed attrition (FN)
sample_fn = results[results['category'] == 'FN'].sort_values(by='proba', ascending=False).head(1)

# Lowest-risk correct non-attrition (TN)
sample_tn = results[results['category'] == 'TN'].sort_values(by='proba', ascending=True).head(1)

# Combine for inspection
samples = pd.concat([sample_tp, sample_fp, sample_fn, sample_tn])
samples['id'] = ['TP (Correct Risk)',
                 'FP (Overpredicted)',
                 'FN (Missed Risk)',
                 'TN (Correct Low Risk)']
samples.set_index('id', inplace=True)

# Show table for reference
samples[['proba', 'actual', 'predicted', 'category']]

### SHAP waterfall plots (4 cases)

We visualize SHAP waterfall plots for:
- **True Positive** (TP): correctly predicted attrition
- **False Positive** (FP): overpredicted attrition
- **False Negative** (FN): missed attrition
- **True Negative** (TN): correctly predicted no attrition

These plots show how each feature influences the model’s decision relative to the baseline.

In [None]:
# Recalculate SHAP values for entire test set (if needed)
explainer = shap.Explainer(logreg.predict_proba, X_test_scaled)
shap_values = explainer(X_test_scaled)

# Helper: map sample row to original index
sample_indices = samples.index.to_list()
raw_indices = samples.index.map(lambda label: samples.loc[label].name).to_list()

# Display SHAP waterfall plots for each sample
for idx, label in zip(raw_indices, sample_indices):
    print(f"\n🔍 {label}")
    shap.plots.waterfall(shap_values[idx], max_display=15)

## LIME explanations (4 cases)

We use LIME (Local Interpretable Model-agnostic Explanations) to explain individual predictions.

- LIME fits a local linear model near the selected instance.
- It tells us which features were most responsible for the model's decision.
- These are especially useful for presenting to non-technical stakeholders.

We apply LIME to the same four cases:
- True Positive
- False Positive
- False Negative
- True Negative

In [None]:
from lime.lime_tabular import LimeTabularExplainer

# Convert scaled data to numpy for LIME
X_test_array = X_test_scaled

# Set up LIME explainer
lime_explainer = LimeTabularExplainer(
    training_data=X_test_array,
    feature_names=X_test.columns.tolist(),
    class_names=['No Attrition', 'Attrition'],
    mode='classification',
    discretize_continuous=True
)

In [None]:
# Loop over the same 4 sample indices
for idx, label in zip(raw_indices, sample_indices):
    print(f"\n🔍 {label}")
    
    # Explain instance
    explanation = lime_explainer.explain_instance(
        X_test_array[idx],
        model.predict_proba,
        num_features=10,
        top_labels=1
    )
    
    # Show in notebook
    explanation.show_in_notebook(show_table=True, show_all=False)

In [None]:
for idx, label in zip(raw_indices, sample_indices):
    explanation = lime_explainer.explain_instance(
        X_test_array[idx],
        model.predict_proba,
        num_features=10,
        top_labels=1
    )
    html_path = f"lime_explanation_{label.replace(' ', '_').lower()}.html"
    explanation.save_to_file(html_path)
    print(f"Saved: {html_path}")

## SHAP vs LIME – Explanation Comparison

We compare SHAP and LIME explanations for four key test cases:

| Case                  | SHAP (Top 3 features)                                | LIME (Top 3 features)                                |
|-----------------------|------------------------------------------------------|------------------------------------------------------|
| **TP – Correct Risk** | - OverTime = Yes ↑<br>- Age = low ↑<br>- JobLevel = low ↑ | - OverTime = Yes ↑<br>- Age = low ↑<br>- JobLevel = low ↑ |
| **FP – Overpredicted**| - MonthlyIncome = high ↓<br>- EnvironmentSatisfaction = low ↑<br>- DistanceFromHome = high ↑ | - MonthlyIncome = high ↓<br>- DistanceFromHome = high ↑<br>- JobInvolvement = low ↑ |
| **FN – Missed Risk**  | - OverTime = No ↓<br>- DistanceFromHome = high ↑<br>- JobSatisfaction = low ↑ | - OverTime = No ↓<br>- Age = high ↓<br>- TrainingTimesLastYear = low ↑ |
| **TN – Correct Safe** | - Age = high ↓<br>- OverTime = No ↓<br>- MonthlyIncome = high ↓ | - Age = high ↓<br>- OverTime = No ↓<br>- JobLevel = high ↓ |

⬆️ = pushes prediction toward "Attrition"  
⬇️ = pushes prediction toward "No Attrition"

---

**Observations:**
- SHAP and LIME generally agree on direction and importance of core features.
- SHAP captures global context better (e.g., rare interactions), while LIME is more interpretable per-row.
- LIME can surface subtle edge-case influences that SHAP smooths over.


## Final Summary and Takeaways

In this notebook, we explored the interpretability of our final Logistic Regression model using two complementary techniques: **SHAP** and **LIME**.

### 🔍 Key Findings

- **Global importance**:
  - Features like `OverTime`, `JobLevel`, `MonthlyIncome`, and `Age` consistently influenced attrition risk.
  - `OverTime = Yes` was the most dominant driver of predicted attrition.
  
- **Local explanations**:
  - For individual employees, both SHAP and LIME identified intuitive patterns that align with real-world HR expectations.
  - False negatives (missed attrition) often involved employees with less obvious risk indicators (e.g., no overtime, moderate income).
  - False positives (incorrectly flagged) tended to involve high earners with some stress-related factors (e.g., high commute distance).

- **SHAP vs LIME**:
  - SHAP provided **consistent, mathematically grounded** insights tied to the overall model behavior.
  - LIME offered **simplified, locally faithful** explanations ideal for stakeholder presentation.
  - Both methods agreed on most key contributors, but offered complementary perspectives.

### 📦 Next Steps

- Integrate these interpretability tools into a dashboard or reporting pipeline.
- Use SHAP and LIME to support **HR decision-making**, especially for proactive retention strategies.
- Explore model retraining on newer employee data to adapt to evolving patterns.

This completes the modeling and interpretability pipeline. We now have a robust, explainable model that is both **technically sound** and **business-ready**.