# Interpretability Validation and Bias
    Trust, Stability, and Fairness of Model Explanations
##  Objective

This notebook focuses on validating interpretability outputs, ensuring that explanations are:

- Stable across samples and folds

- Consistent across models

- Fair across sensitive groups

- Not masking bias or leakage

It answers:

Can we trust the explanations we are presenting ‚Äî and are they equitable?

## Why Interpretability Validation Matters

Having explanations is not enough.

Without validation:

- Explanations may be unstable

- Bias may go undetected

- Stakeholders may be misled

- Regulatory exposure increases

üìå Interpretability itself must be audited.

##  Key Risks Addressed


| Risk                    | Description                                  |
| ----------------------- | -------------------------------------------- |
| Explanation instability | Different explanations for similar samples   |
| Proxy bias              | Sensitive attributes inferred indirectly     |
| Group disparity         | Features behave differently across groups    |
| False trust             | Explanations appear reasonable but are wrong |


##  Imports AND DATASET

In [2]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns

import shap

from sklearn.model_selection import train_test_split, StratifiedKFold
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.ensemble import RandomForestClassifier


In [4]:
DATA_PATH = '''D:/GitHub/Data-Science-Techniques/datasets/Supervised-classification/synthetic_credit_default_classification.csv'''

df = pd.read_csv(DATA_PATH)

X = df.drop(columns=["default", "customer_id"])
y = df["default"]

X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.3,
    stratify=y,
    random_state=2010
)


# Train Final Model

In [7]:
model = RandomForestClassifier(
    n_estimators=300,
    max_depth=6,
    random_state=2010,
    class_weight="balanced"
)

model.fit(X_train, y_train)


# Global SHAP Explanation

In [10]:
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)


# Global SHAP Summary

In [13]:
shap.summary_plot(
    shap_values[1],
    X_test,
    feature_names=X_test.columns
)


AssertionError: The shape of the shap_values matrix does not match the shape of the provided data matrix.

Establishes baseline global behavior.

#  Explanation Stability Across Folds

Explanations should be consistent across resampling.

In [16]:
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

feature_importances = []

for train_idx, val_idx in cv.split(X, y):
    X_fold = X.iloc[val_idx]
    y_fold = y.iloc[val_idx]

    model.fit(X.iloc[train_idx], y.iloc[train_idx])
    explainer = shap.TreeExplainer(model)

    shap_vals = explainer.shap_values(X_fold)[1]
    mean_abs = np.abs(shap_vals).mean(axis=0)

    feature_importances.append(mean_abs)

importance_df = pd.DataFrame(
    feature_importances,
    columns=X.columns
)


ValueError: 9 columns passed, passed data had 2 columns

## Stability Visualization

In [19]:
importance_df.boxplot(figsize=(12,6))
plt.title("SHAP Importance Stability Across Folds")
plt.xticks(rotation=90)
plt.show()


NameError: name 'importance_df' is not defined

## Subgroup Explanation Analysis
### Example: Gender

In [22]:
for group in X_test["gender"].unique():
    idx = X_test["gender"] == group

    shap.summary_plot(
        shap_values[1][idx],
        X_test.loc[idx],
        show=False
    )
    plt.title(f"SHAP Summary ‚Äì Gender: {group}")
    plt.show()


KeyError: 'gender'

Compare:

- Feature dominance

- Directionality

- Magnitude differences

##  Detecting Proxy Bias

Check if sensitive features or close proxies dominate explanations.

In [25]:
proxy_candidates = importance_df[sensitive_features].mean()
proxy_candidates


NameError: name 'importance_df' is not defined

High SHAP importance on:

- gender

- region

- zip_code-like variables

‚Üí Red flag

# Outcome vs Explanation Disparity

Compare prediction rates and explanation magnitudes.

In [28]:
group_analysis = X_test.copy()
group_analysis["prediction"] = model.predict(X_test)
group_analysis["default_prob"] = model.predict_proba(X_test)[:, 1]

group_analysis.groupby("gender")[["prediction", "default_prob"]].mean()


KeyError: 'gender'

Differences require:

- Business justification

- Policy review

- Possible mitigation

# Interpretability Bias Checklist

| Question                       | Pass? |
| ------------------------------ | ----- |
| Stable across folds            | ‚¨ú     |
| Consistent across models       | ‚¨ú     |
| No sensitive feature dominance | ‚¨ú     |
| Subgroup behavior explained    | ‚¨ú     |
| Business-justifiable patterns  | ‚¨ú     |


# Mitigation Strategies

If bias or instability is found:

- ‚úî Remove or constrain sensitive proxies
- ‚úî Re-engineer features
- ‚úî Add monotonic constraints
- ‚úî Segment models by population
- ‚úî Document limitations

üìå Interpretability findings must feed back into model design.

# Common Mistakes (Avoided)

- ‚ùå Assuming explanations are always correct
- ‚ùå Ignoring subgroup behavior
- ‚ùå Treating SHAP as causal
- ‚ùå Explaining unstable models
- ‚ùå Skipping documentation

# Key Takeaways

- Interpretability must be validated

- Stability is a trust requirement

- Bias can hide inside explanations

- Subgroup analysis is mandatory

- Explanations are part of model risk

üü¶ End of Interpretability & Explainability Module

You now have a complete, production-grade interpretability framework:

08_Interpretability_and_Explainability/

‚îú‚îÄ‚îÄ [Global explanations]()

‚îú‚îÄ‚îÄ [Local explanations]()

‚îú‚îÄ‚îÄ [SHAP & LIME]()

‚îî‚îÄ‚îÄ [Interpretability validation & bias]()
