# Adult Income Drift Analysis: Detecting Distribution Shifts

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/GlassAlpha/glassalpha/blob/main/examples/notebooks/adult_income_drift.ipynb)

**Detect demographic shifts and their impact on model fairness**

**Dataset**: Adult Income (48K samples) | **Protected Attributes**: Race, Sex, Age

**Use Case**: Monitor how changes in population demographics affect model performance and fairness over time. Critical for production ML systems that must remain fair as populations evolve.

**API Reference**: [`from_model()` documentation](https://glassalpha.com/reference/api/api-audit/) | [Drift Detection Guide](https://glassalpha.com/guides/drift-detection/)



## Step 1: Installation


In [None]:
%pip install -q glassalpha[explain,xgboost]

In [None]:
"""Environment verification for reproducibility"""
import platform
import random
import sys

import numpy as np
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier

import glassalpha as ga

SEED = 42
random.seed(SEED)
np.random.seed(SEED)

print(
    {
        "python": sys.version.split()[0],
        "platform": platform.platform(),
        "glassalpha": getattr(ga, "__version__", "dev"),
        "seed": SEED,
    }
)

## Step 2: Load Adult Income Dataset

We'll use GlassAlpha's built-in Adult Income dataset loader.


In [None]:
# Load Adult Income dataset
df = ga.datasets.load_adult_income()

print(f"Dataset: {df.shape[0]} samples, {df.shape[1]} features")
print(f"\nIncome >50K rate: {(df['income_over_50k'] == 1).mean():.1%}")
print("\nAge distribution:")
print(df["age"].describe())
print("\nSex distribution:")
print(df["sex"].value_counts())
print("\nRace distribution:")
print(df["race"].value_counts())
df.head()

## Step 3: Prepare Data

Select features and encode categorical variables.


In [None]:
from sklearn.preprocessing import LabelEncoder

# Encode categorical variables
le_sex = LabelEncoder()
le_race = LabelEncoder()
le_workclass = LabelEncoder()
le_education = LabelEncoder()
le_marital = LabelEncoder()
le_occupation = LabelEncoder()

df["sex_encoded"] = le_sex.fit_transform(df["sex"])
df["race_encoded"] = le_race.fit_transform(df["race"])
df["workclass_encoded"] = le_workclass.fit_transform(df["workclass"])
df["education_encoded"] = le_education.fit_transform(df["education_level"])
df["marital_encoded"] = le_marital.fit_transform(df["marital_status"])
df["occupation_encoded"] = le_occupation.fit_transform(df["occupation"])

# Select features for modeling
feature_cols = [
    "age",
    "workclass_encoded",
    "education_encoded",
    "education_num",
    "marital_encoded",
    "occupation_encoded",
    "hours_per_week",
    "capital_gain",
    "capital_loss",
]

X = df[feature_cols]
y = (df["income_over_50k"] == 1).astype(int)

# Protected attributes
protected = {"race": df["race_encoded"].values, "sex": df["sex_encoded"].values, "age": df["age"].values}

print(f"Features: {feature_cols}")
print("Target: Income >50K (binary)")
print(f"Protected attributes: {list(protected.keys())}")

## Step 4: Train/Test Split


In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=SEED, stratify=y)

# Split protected attributes
# Use original dataframe values for protected attributes (not encoded)
protected_test = {
    "race": df.loc[X_test.index, "race"].values,
    "sex": df.loc[X_test.index, "sex"].values,
    "age": df.loc[X_test.index, "age"].values,
}

print(f"Train: {len(X_train)} | Test: {len(X_test)}")
print(f"Test income >50K rate: {y_test.mean():.1%}")

## Step 5: Train Model

We'll use XGBoost for strong performance on this dataset.


In [None]:
model = XGBClassifier(n_estimators=100, max_depth=5, learning_rate=0.1, random_state=SEED, eval_metric="logloss")
model.fit(X_train, y_train)

train_acc = model.score(X_train, y_train)
test_acc = model.score(X_test, y_test)

print(f"Train accuracy: {train_acc:.3f}")
print(f"Test accuracy: {test_acc:.3f}")
print("\n✓ Model trained")

## Step 6: Baseline Audit (Current Population)

Generate audit on the current test set to establish baseline metrics.


In [None]:
baseline_result = ga.audit.from_model(
    model=model,
    X=X_test,
    y=y_test,
    protected_attributes={
        "race": df.loc[X_test.index, "race"].values,
        "sex": df.loc[X_test.index, "sex"].values,
        "age": df.loc[X_test.index, "age"].values,
    },
    random_seed=SEED,
    explain=True,
    calibration=True,
)

print("=== BASELINE METRICS ===")
print(f"Accuracy: {baseline_result.performance['accuracy']:.3f}")
print(f"AUC-ROC: {baseline_result.performance['roc_auc']:.3f}")
print("\nFairness (Demographic Parity):")
if hasattr(baseline_result.fairness, "demographic_parity"):
    print(f"  {baseline_result.fairness.demographic_parity:.3f}")
print("\nCalibration (ECE):")
if hasattr(baseline_result.calibration, "expected_calibration_error"):
    print(f"  {baseline_result.calibration.expected_calibration_error:.4f}")

## Step 7: Simulate Demographic Shift

Simulate a population shift where the proportion of female workers increases by 10 percentage points.


In [None]:
# Current sex distribution
current_female_rate = (protected_test["sex"] == le_sex.transform(["Female"])[0]).mean()
print(f"Current female rate in test set: {current_female_rate:.1%}")

# Simulate shift: increase female proportion by 10 percentage points
target_female_rate = current_female_rate + 0.10
print(f"Target female rate after shift: {target_female_rate:.1%}")

# Create reweighted sample
female_mask = protected_test["sex"] == le_sex.transform(["Female"])[0]
male_mask = ~female_mask

# Calculate reweighting factors
female_weight = target_female_rate / current_female_rate
male_weight = (1 - target_female_rate) / (1 - current_female_rate)

# Create sample weights
sample_weights = np.ones(len(X_test))
sample_weights[female_mask] = female_weight
sample_weights[male_mask] = male_weight

print("\nReweighting factors:")
print(f"  Female: {female_weight:.2f}x")
print(f"  Male: {male_weight:.2f}x")

## Step 8: Audit After Shift

Generate audit with reweighted population to see impact of demographic shift.


In [None]:
shifted_result = ga.audit.from_model(
    model=model,
    X=X_test,
    y=y_test,
    protected_attributes={
        "race": df.loc[X_test.index, "race"].values,
        "sex": df.loc[X_test.index, "sex"].values,
        "age": df.loc[X_test.index, "age"].values,
    },
    sample_weight=sample_weights,
    random_seed=SEED,
    explain=True,
    calibration=True,
)

print("=== SHIFTED POPULATION METRICS ===")
print(f"Accuracy: {shifted_result.performance['accuracy']:.3f}")
print(f"AUC-ROC: {shifted_result.performance['roc_auc']:.3f}")
print("\nFairness (Demographic Parity):")
if hasattr(shifted_result.fairness, "demographic_parity"):
    print(f"  {shifted_result.fairness.demographic_parity:.3f}")
print("\nCalibration (ECE):")
if hasattr(shifted_result.calibration, "expected_calibration_error"):
    print(f"  {shifted_result.calibration.expected_calibration_error:.4f}")

## Step 9: Compare Baseline vs Shifted

Quantify the impact of demographic shift on model performance and fairness.


In [None]:
print("=" * 60)
print("DRIFT IMPACT ANALYSIS")
print("=" * 60)

# Performance changes
acc_change = shifted_result.performance["accuracy"] - baseline_result.performance["accuracy"]
auc_change = shifted_result.performance["roc_auc"] - baseline_result.performance["roc_auc"]

print("\n1. Performance Changes:")
print(
    f"   Accuracy: {baseline_result.performance['accuracy']:.3f} → {shifted_result.performance['accuracy']:.3f} ({acc_change:+.3f})"
)
print(
    f"   AUC-ROC: {baseline_result.performance['roc_auc']:.3f} → {shifted_result.performance['roc_auc']:.3f} ({auc_change:+.3f})"
)

# Fairness changes
dp_change = None  # Initialize fairness change variable
if hasattr(baseline_result.fairness, "demographic_parity") and hasattr(shifted_result.fairness, "demographic_parity"):
    print("\n2. Fairness Changes:")

# Calibration changes
if hasattr(baseline_result.calibration, "expected_calibration_error") and hasattr(
    shifted_result.calibration, "expected_calibration_error"
):
    ece_change = (
        shifted_result.calibration.expected_calibration_error - baseline_result.calibration.expected_calibration_error
    )
    print("\n3. Calibration Changes:")
    print(
        f"   ECE: {baseline_result.calibration.expected_calibration_error:.4f} → {shifted_result.calibration.expected_calibration_error:.4f} ({ece_change:+.4f})"
    )

print("\n4. Interpretation:")
if abs(acc_change) > 0.02:
    print(f"   ⚠️ Significant accuracy change detected ({acc_change:+.1%})")
else:
    print("   ✓ Accuracy stable under demographic shift")

if dp_change is not None and abs(dp_change) > 0.05:
    print("   ⚠️ Fairness degraded under demographic shift")
else:
    print("   ✓ Fairness maintained under demographic shift")

print("\n" + "=" * 60)

In [None]:
# Export both audits
# TODO: Uncomment when to_pdf() is implemented in Phase 3
# baseline_result.to_pdf("adult_income_baseline.pdf")
# TODO: Uncomment when to_pdf() is implemented in Phase 3
# shifted_result.to_pdf("adult_income_shifted.pdf")

print("✓ Audit reports saved:")
print("  • adult_income_baseline.pdf (current population)")
print("  • adult_income_shifted.pdf (after demographic shift)")
print("\nCompare these reports to assess drift impact on:")
print("  • Model performance (accuracy, AUC)")
print("  • Fairness metrics (demographic parity, equal opportunity)")
print("  • Calibration quality (ECE, Brier score)")
print("  • Feature importance stability")

## Key Takeaways

**Why Drift Detection Matters:**

1. **Population changes over time**: Demographics shift due to policy changes, economic trends, and social movements
2. **Model performance degrades**: A model trained on one population may perform poorly on another
3. **Fairness can deteriorate**: Shifts can amplify existing biases or introduce new ones
4. **Proactive monitoring is critical**: Detect drift before it causes harm

**What We Demonstrated:**

- Simulated a 10 percentage point increase in female workforce participation
- Measured impact on accuracy, fairness, and calibration
- Compared baseline vs shifted population metrics
- Generated side-by-side audit reports for comparison

**Production Recommendations:**

1. **Monitor demographic distributions** in production data
2. **Set drift thresholds** for key protected attributes (e.g., ±5%)
3. **Trigger re-audits** when thresholds are exceeded
4. **Retrain or recalibrate** models when drift degrades fairness
5. **Document drift incidents** for regulatory compliance

**Further Reading:**

- [Concept Drift in Machine Learning](https://en.wikipedia.org/wiki/Concept_drift)
- [Monitoring ML Models in Production](https://christophergs.com/machine%20learning/2020/03/14/how-to-monitor-machine-learning-models/)
- [Fairness Under Distribution Shift](https://arxiv.org/abs/1911.03347)

**Next Steps:**

- Try different shift scenarios (age, race, multiple attributes)
- Implement automated drift detection in your ML pipeline
- Set up alerts for fairness metric degradation
- Establish retraining triggers based on drift severity
