# COMPAS Bias Detection: Criminal Justice Algorithm Audit

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/GlassAlpha/glassalpha/blob/main/examples/notebooks/compas_bias_detection.ipynb)

**Audit the controversial COMPAS recidivism prediction algorithm** used in US criminal courts

**Dataset**: COMPAS (7,214 defendants) | **Protected Attributes**: Race, Sex, Age

**Background**: ProPublica's 2016 investigation found that COMPAS scores were biased against Black defendants, leading to higher false positive rates. This notebook demonstrates how to detect and quantify such bias.

**API Reference**: [`from_model()` documentation](https://glassalpha.com/reference/api/api-audit/) | [Fairness Metrics Guide](https://glassalpha.com/guides/fairness-metrics/)


## Step 1: Installation


In [None]:
%pip install -q glassalpha[explain]

In [None]:
"""Environment verification for reproducibility"""
import platform
import random
import sys

import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

import glassalpha as ga

SEED = 42
random.seed(SEED)
np.random.seed(SEED)

print(
    {
        "python": sys.version.split()[0],
        "platform": platform.platform(),
        "glassalpha": getattr(ga, "__version__", "dev"),
        "seed": SEED,
    }
)

## Step 2: Load COMPAS Dataset

We'll download the ProPublica COMPAS dataset directly from their GitHub repository.


In [None]:
# Download COMPAS dataset from ProPublica
url = "https://raw.githubusercontent.com/propublica/compas-analysis/master/compas-scores-two-years.csv"
df = pd.read_csv(url)

print(f"Dataset: {df.shape[0]} defendants, {df.shape[1]} features")
print(f"\nRecidivism rate: {df['two_year_recid'].mean():.1%}")
print("\nRace distribution:")
print(df["race"].value_counts())
df.head()

## Step 3: Data Preprocessing

Select relevant features and prepare data for modeling. We exclude COMPAS's own predictions to build an independent model.


In [None]:
# Select features (exclude COMPAS's own predictions)
feature_cols = [
    "age",
    "sex",
    "race",
    "juv_fel_count",
    "juv_misd_count",
    "juv_other_count",
    "priors_count",
    "c_charge_degree",
]

# Filter to complete cases
df_clean = df[feature_cols + ["two_year_recid"]].dropna()

# Encode categorical variables
le_sex = LabelEncoder()
le_race = LabelEncoder()
le_charge = LabelEncoder()

df_clean["sex_encoded"] = le_sex.fit_transform(df_clean["sex"])
df_clean["race_encoded"] = le_race.fit_transform(df_clean["race"])
df_clean["charge_encoded"] = le_charge.fit_transform(df_clean["c_charge_degree"])

# Separate features, target, and protected attributes
X = df_clean[
    ["age", "sex_encoded", "juv_fel_count", "juv_misd_count", "juv_other_count", "priors_count", "charge_encoded"]
]
y = df_clean["two_year_recid"]

# Protected attributes (keep original for fairness analysis)
protected = {
    "race": df_clean["race_encoded"].values,
    "sex": df_clean["sex_encoded"].values,
    "age": df_clean["age"].values,
}

print(f"\nCleaned dataset: {len(df_clean)} samples")
print(f"Features: {X.columns.tolist()}")
print(f"Protected attributes: {list(protected.keys())}")

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=SEED, stratify=y)

# Split protected attributes
protected_test = {
    "race": protected["race"][X_test.index],
    "sex": protected["sex"][X_test.index],
    "age": protected["age"][X_test.index],
}

print(f"Train: {len(X_train)} | Test: {len(X_test)}")
print(f"Test recidivism rate: {y_test.mean():.1%}")

## Step 5: Train Model

We use Logistic Regression for interpretability - critical in criminal justice applications.


In [None]:
model = LogisticRegression(random_state=SEED, max_iter=1000, C=1.0)
model.fit(X_train, y_train)

train_acc = model.score(X_train, y_train)
test_acc = model.score(X_test, y_test)

print(f"Train accuracy: {train_acc:.3f}")
print(f"Test accuracy: {test_acc:.3f}")
print("\n✓ Model trained (similar to COMPAS ~65% accuracy)")

## Step 6: Generate Bias Detection Audit

Use GlassAlpha to detect and quantify bias across protected attributes.


In [None]:
result = ga.audit.from_model(
    model=model,
    X=X_test,
    y=y_test,
    protected_attributes=protected_test,
    random_seed=SEED,
    explain=True,
    calibration=True,
)

# Display inline
result

## Step 7: Export Professional PDF Report


In [None]:
# Export to PDF
# TODO: Uncomment when to_pdf() is implemented in Phase 3
# TODO: Uncomment when to_pdf() is implemented in Phase 3
# TODO: Uncomment when to_pdf() is implemented in Phase 3
# TODO: Uncomment when to_pdf() is implemented in Phase 3
# TODO: Uncomment when to_pdf() is implemented in Phase 3
# TODO: Uncomment when to_pdf() is implemented in Phase 3
# TODO: Uncomment when to_pdf() is implemented in Phase 3
# TODO: Uncomment when to_pdf() is implemented in Phase 3
# TODO: Uncomment when to_pdf() is implemented in Phase 3
# # # # # # # # # result.to_pdf("compas_bias_audit.pdf")
print("✓ Audit report saved: compas_bias_audit.pdf")
print("\nReport includes:")
print("  • Performance metrics (accuracy, precision, recall)")
print("  • Fairness analysis across race, sex, and age")
print("  • Feature importance and model explanations")
print("  • Calibration analysis")
print("  • Complete reproducibility manifest")

## Key Findings & Interpretation

**Expected Results** (based on ProPublica's 2016 investigation):

1. **Accuracy**: ~65-70% (similar to original COMPAS)
2. **Racial Bias**: Likely detected disparities between African-American and Caucasian defendants
3. **False Positive Rate**: Higher for Black defendants (overclassified as high risk)
4. **False Negative Rate**: Higher for White defendants (underclassified as low risk)

**Ethical Implications**:

- These disparities can lead to unjust bail, sentencing, and parole decisions
- ML models can perpetuate and amplify systemic bias in criminal justice
- Fairness metrics help quantify bias but don't solve underlying societal issues

**Further Reading**:

- [ProPublica's Machine Bias Investigation](https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing)
- [Dressel & Farid (2018): The accuracy, fairness, and limits of predicting recidivism](https://www.science.org/doi/10.1126/sciadv.aao5580)
- [Washington & Kuo (2020): Whose Side Are Ethics Codes On?](https://dl.acm.org/doi/10.1145/3351095.3372844)

**Next Steps**:

- Compare your results with ProPublica's findings
- Try different fairness metrics and thresholds
- Explore bias mitigation techniques
- Consider alternatives to predictive risk assessment
