[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/AIPI-590-XAI/Duke-AI-XAI/blob/main/assignments/machine_learning_court.ipynb)

# 👩‍⚖️ ⚖️ Machine Learning Court



## ⚖️ Case 1: Loan Denial Dispute – UCI Adult Income Dataset
#### 🔍 Scenario

Jane Dow, a 37-year-old professional woman with a Bachelor's degree and full-time executive role, applied for a premium credit product. The bank’s model—trained to predict income level as a proxy for eligibility—classified her as earning ≤$50K, resulting in denial. She disputes the fairness of the decision.

#### 🟥 Prosecution
Evaluate whether the model’s decision may have been influenced by inappropriate or unfair reasoning. Explore whether the explanation aligns with what should be expected in a fair credit decision.

#### 🟦 Defense
Justify the decision based on the model’s learned patterns. Consider how well the explanation supports the classification and whether similar profiles are treated consistently.

In [14]:
# 📦 Case 1: Loan Approval Prediction (Adult Income Dataset)

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report

# Load Data
from sklearn.datasets import fetch_openml

adult = fetch_openml(name="adult", version=2, as_frame=True)
df = adult.frame

# Clean and preprocess
df = df.dropna()
df = df.copy()
encoders = {}
label_cols = df.select_dtypes(include="category").columns.tolist()

for col in label_cols:
    le = LabelEncoder()
    df[col] = df[col].astype(str)
    df[col] = le.fit_transform(df[col])
    encoders[col] = le

X = df.drop(["class", "fnlwgt"], axis=1)
y = df["class"].apply(lambda x: 1 if x == ">50K" else 0)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Train simple RF model
model_adult = RandomForestClassifier(random_state=42)
model_adult.fit(X_train, y_train)
print(classification_report(y_test, model_adult.predict(X_test)))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00      9045

    accuracy                           1.00      9045
   macro avg       1.00      1.00      1.00      9045
weighted avg       1.00      1.00      1.00      9045



In [15]:
# 🎯 Focus Instance: Loan Rejection Case (Jane Dow)

jane_encoded = {
    "age": 37,
    "workclass": encoders["workclass"].transform(["Private"])[0],
    "education": encoders["education"].transform(["Bachelors"])[0],
    "marital-status": encoders["marital-status"].transform(["Never-married"])[0],
    "occupation": encoders["occupation"].transform(["Exec-managerial"])[0],
    "relationship": encoders["relationship"].transform(["Not-in-family"])[0],
    "race": encoders["race"].transform(["White"])[0],
    "sex": encoders["sex"].transform(["Female"])[0],
    "hours-per-week": 50,
    "native-country": encoders["native-country"].transform(["United-States"])[0],
    "capital-gain": 0,
    "capital-loss": 0,
    "education-num": 13,
}

jane_df = pd.DataFrame([jane_encoded])
jane_df = jane_df[X_train.columns]
pred = model_adult.predict(jane_df)
print(
    "Prediction for Jane Dow (Loan Eligibility):",
    "Approved" if pred[0] == 1 else "Denied",
)

Prediction for Jane Dow (Loan Eligibility): Denied


### Defense
I was assigned the role of defense for this case. I think for defense we need to explore local explanations, as opposed to global, as we discussed in class as we are interested in why this specific instance (Jane) received the label that it did (denied)

Dr. Bent has a notebook on her GitHub that I use as a guide for implementing these techniques  
(https://github.com/AIPI-590-XAI/Duke-AI-XAI/blob/main/explainable-ml-example-notebooks/local_explanations.ipynb)

In [16]:
import lime
from lime.lime_tabular import LimeTabularExplainer

# LIME 🍋‍🟩
LIME Process: (From Dr. Bent's Notes)

* Select instance of interest
* Perturb your dataset and get black box predictions for perturbed samples
* Generate a new dataset consisting of perturbed samples (variations of your data) and the corresponding predictions
* Train an interpretable model, weighted by the proximity of sampled instances to the instance of interest
* Interpret the local model to explain prediction

In [17]:
feature_names = list(X_train.columns)
print(f"Feature Names:{feature_names}")
print(f"Number of Features: {len(feature_names)}")

class_names = [0, 1]  # 0 for denied, 1 for approved

Feature Names:['age', 'workclass', 'education', 'education-num', 'marital-status', 'occupation', 'relationship', 'race', 'sex', 'capital-gain', 'capital-loss', 'hours-per-week', 'native-country']
Number of Features: 13


In [18]:
# Define categorical features (indices in the feature list)
categorical_feature_indices = []
categorical_feature_names = []

for i, col in enumerate(feature_names):
    if col in [
        "workclass",
        "education",
        "marital-status",
        "occupation",
        "relationship",
        "race",
        "sex",
        "native-country",
    ]:
        categorical_feature_indices.append(i)
        categorical_feature_names.append(col)

print(f"Categorical feature indices: {categorical_feature_indices}")
print(f"Categorical feature names: {categorical_feature_names}")

# Class names
class_names = ["<=50K", ">50K"]

# Create categorical names mapping (index -> list of category names)
categorical_names = {}


# Helper function to get original category names from encoders
def get_categories_from_encoder(encoder):
    """Get the original category names from a LabelEncoder"""
    return list(encoder.classes_)


# Map each categorical feature index to its categories
for i, col_name in enumerate(feature_names):
    if col_name in encoders:
        categorical_names[i] = get_categories_from_encoder(encoders[col_name])

print(f"\nCategorical names mapping:")
for idx, names in categorical_names.items():
    print(f"  Feature {idx} ({feature_names[idx]}): {names}")

Categorical feature indices: [1, 2, 4, 5, 6, 7, 8, 12]
Categorical feature names: ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race', 'sex', 'native-country']

Categorical names mapping:
  Feature 1 (workclass): ['Federal-gov', 'Local-gov', 'Private', 'Self-emp-inc', 'Self-emp-not-inc', 'State-gov', 'Without-pay']
  Feature 2 (education): ['10th', '11th', '12th', '1st-4th', '5th-6th', '7th-8th', '9th', 'Assoc-acdm', 'Assoc-voc', 'Bachelors', 'Doctorate', 'HS-grad', 'Masters', 'Preschool', 'Prof-school', 'Some-college']
  Feature 4 (marital-status): ['Divorced', 'Married-AF-spouse', 'Married-civ-spouse', 'Married-spouse-absent', 'Never-married', 'Separated', 'Widowed']
  Feature 5 (occupation): ['Adm-clerical', 'Armed-Forces', 'Craft-repair', 'Exec-managerial', 'Farming-fishing', 'Handlers-cleaners', 'Machine-op-inspct', 'Other-service', 'Priv-house-serv', 'Prof-specialty', 'Protective-serv', 'Sales', 'Tech-support', 'Transport-moving']
  Feature 6 (relati

In [19]:
# Define kernel_width
kernel_width = 3

# Initialize LIME explainer
explainer = lime.lime_tabular.LimeTabularExplainer(
    X_train.values,
    class_names=class_names,
    feature_names=feature_names,
    categorical_features=categorical_feature_indices,
    categorical_names=categorical_names,
    kernel_width=kernel_width,
)

In [20]:
# looking at Jane Dow's case

# uses the explainer on Jane's case
jane_exp = explainer.explain_instance(
    jane_df.values[0], model_adult.predict_proba, num_features=len(feature_names)
)

jane_exp.show_in_notebook()



IndexError: index 1 is out of bounds for axis 1 with size 1

In [26]:
print(f"\nLIME Explanation for Jane Dow:")
print(
    f"Prediction: {'Approved (>50K)' if jane_exp.predict_proba[1] > 0.5 else 'Denied (<=50K)'}"
)
print(f"Prediction probability: {jane_exp.predict_proba[1]:.3f} for >50K")

print(f"\nFeature contributions (sorted by importance):")
for feature, weight in jane_exp.as_list():
    direction = "SUPPORTS" if weight > 0 else "OPPOSES"
    print(f"  {feature}: {weight:.4f} ({direction} >50K prediction)")


LIME Explanation for Jane Dow:


NameError: name 'jane_exp' is not defined

### For the purposes of defense in this court case, it could be interesting to see if the model behaves similarly if Jane was a man instead of a woman

In [25]:
# 1. Jane if she were male
jane_male = jane_encoded.copy()
jane_male["sex"] = encoders["sex"].transform(["Male"])[0]
jane_male_df = pd.DataFrame([jane_male])
jane_male_df = jane_male_df[X_train.columns]

# Get prediction and explanation for male version
male_pred = model_adult.predict(jane_male_df)[0]
male_explanation = explainer.explain_instance(
    jane_male_df.values[0], model_adult.predict_proba, num_features=len(feature_names)
)

male_explanation.show_in_notebook()

print(f"Original Jane: {'Denied' if pred[0] == 0 else 'Approved'}")
print(f"If Jane were male: {'Denied' if male_pred == 0 else 'Approved'}")



TypeError: AnchorBaseBeam.anchor_beam() got an unexpected keyword argument 'num_features'

# Anchors ⚓️

In [21]:
from anchor import anchor_tabular

In [22]:
# Initialize Anchors explainer
explainer = anchor_tabular.AnchorTabularExplainer(
    class_names, feature_names, X_train.values, categorical_names
)

In [23]:
# Explain the prediction using Anchors
exp = explainer.explain_instance(jane_df.values[0], model_adult.predict, threshold=0.80)



In [24]:
# Print the prediction, precision, and coverage
print("Anchor: %s" % (" AND ".join(exp.names())))
print("Precision: %.2f" % exp.precision())
print("Coverage: %.2f" % exp.coverage())

Anchor: 
Precision: 1.00
Coverage: 1.00
