# Model Facts Example: COMPAS
This notebook provides an example of how to generate a Model Facts from the results of the COMPAS analysis completed by ProPublica exposing bias in the COMPAS recidivism risk analysis model.

We will focus on the violent recidivism model and dataset.

The results are published in this news article: https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing

The data used is available here: https://github.com/propublica/compas-analysis/tree/master

As an example of the impact of undefined and non-transparent definitions of fairness and bias, Northpointe (creators of COMPAS) has their rebuttal available here: https://go.volarisgroup.com/rs/430-MBX-989/images/ProPublica_Commentary_Final_070616.pdf

Contents are as follows:
0. Setup
1. Data Processing
2. *Create Model Facts Label*
3. Compare the impact of other scoring measures

## 0. Setup
- Download the `cox-violent-parsed.csv` dataset from https://github.com/propublica/compas-analysis/tree/master
- Load packages and data

Columns of interest are:
- `v_score_text`: The COMPAS model score label (Low, Medium, High) from their violent recidivism prediction model
- `v_decile_score`: The COMPAS quantitative score (1-10)
- `is_violent_recid`: Whether or not they commited a new violent crime within two years after release (0 = not a recidivist)
- `priors_count`: The number of prior crimes. For the sake of this example, we will use this as a baseline for comparison

In [1]:
import pandas as pd
import datetime as dt
from sklearn import metrics

from modelfacts import  ModelFacts

In [2]:
data = pd.read_csv('../data/compas/cox-violent-parsed.csv')
data.shape

(18316, 52)

## 1. Data Preprocessing
- Filter out bad data:
    - Some of the data is missing scores or have start dates before end dates, demonstrating potential errors in the data. This is in line with the Propublica team's method in https://github.com/propublica/compas-analysis/blob/master/Compas%20Analysis.ipynb
- Create binary predictions (for accuracy calculations): we label all data with "High" COMPAS labels to be a prediction of violent recidivism. "Medium" and "Low" are set to 0. 
    - The Propublica team labeled High and Medium as 1. However, the Northpointe team took issue with this in their rebuttal
- Create a baseline prediction: While no baseline model is available, we create our own baseline under the assumption that people with a greater criminal history (`prior_counts`) are more likely to be a violent recidivist. 
    - We establish a threshold to mirror the percentage of violent recidivists in the data. 
    - This baseline should not be used in real life. It propagates the systemic biases caused by racism and overpolicing of Black and Brown neighborhoods.
- Format data for easy Model Facts label generation 

In [3]:
# filter out unreliable data
data = data[(~data['score_text'].isna()) & (data['end']>data['start'])].copy()
# new data size
len(data)

18178

In [4]:
split = {'High': 1, 'Medium':0, 'Low':0}
# create baseline 
# based on priors count 
# the cutoff is the linear quantile of the percent of recidivists
perc_recid = sum(data['is_violent_recid'])/len(data)
cutoff = data['priors_count'].quantile(q=1-perc_recid)
data['baseline_text'] = data['priors_count'].where(data['priors_count']>0, 'Low')\
    .where((data['priors_count']<=0) | (data['priors_count']>cutoff), 'Medium')\
    .where(data['priors_count']<=cutoff, 'High')

data['baseline_proba'] = data['priors_count']
data['baseline'] = data['baseline_text'].replace(split)
data['pred'] = data['v_score_text'].replace(split)

  data['baseline'] = data['baseline_text'].replace(split)
  data['pred'] = data['v_score_text'].replace(split)


## 2. Create Model Facts Label
Generate the label using this data
- The Standard Score is the F1-score as this is an imbalanced classification problem
- The Training Score is AUC, as noted in Northpointe's paper: https://journals.sagepub.com/doi/abs/10.1177/0093854808326545
- There is clearly a large discrepancy between African-Americans and Caucasian's accuracy, with the former's F-1 being almost 2x higher. While one may usually want higher scores, in real life context this discrepancy has resulted in African-American defendants being more negatively impacted than Caucasian defendants, continuing the cycle of systemic racism in the criminal justice system.

In [5]:
# set admin details
application = "Predicting risk of violent recidivism using COMPAS. " \
"The target class is predicting violent recidivism"
model_type = "classification"
warnings = "This model has been demonstrated to propagate biases by ProPublica." \
 " Its creators claim this model is unbiased, under the predictive parity paradigm using AUC." \
 " Without a clear definition of fairness, it should not be used in decision making"
source = "Data from Broward County, Florida https://github.com/propublica/compas-analysis/tree/master." \
" Model created by Northpointe"
train_date = max(pd.to_datetime(data['compas_screening_date'])) # date compas model was used
test_data_date = dt.datetime(2016,5,23) # propublica article publication
data_split = "NA/100"
data_size = len(data)
true_col = "is_violent_recid"
pred_col = "pred"
baseline_col = "baseline"
pred_proba = "v_decile_score"
baseline_proba = "baseline_proba"
age_col = "age" # which column has age information in it
demo_cols = ['race', 'sex', 'age'] # which columns has additional demographic information in it
# define scoring functions
st_score = 'f1_score'
st_kwargs = {} # e.g. {'average':'macro'} any of the keyword arguments for the score of interest
t_score = 'roc_auc_score'
t_kwargs = {} # same usage as st kwargs
t_score_func = getattr(metrics, t_score)
st_score_func = getattr(metrics, st_score)


In [6]:
# create model facts object
model_facts = ModelFacts(data, true_col, pred_col, baseline_col, 
                         st_score_func, t_score_func, classification = True, 
                         pred_proba = pred_proba, baseline_proba = baseline_proba,
                         t_proba = True)
# calculate various stats along demographics
mf_compas = model_facts(demo_cols, age_col = age_col,
    train_date = train_date, test_data_date = test_data_date,
    data_size = data_size, data_split = data_split,
    st_kwargs = st_kwargs, t_kwargs = t_kwargs)
# create the Model Facts label from the data
table = model_facts.make_label(mf_compas, application, warnings, source, show = True)

Model Facts,Model Facts,Model Facts,Model Facts
Application: Predicting risk of violent recidivism using COMPAS. The target class is predicting violent recidivism,Application: Predicting risk of violent recidivism using COMPAS. The target class is predicting violent recidivism.1,Application: Predicting risk of violent recidivism using COMPAS. The target class is predicting violent recidivism.2,Application: Predicting risk of violent recidivism using COMPAS. The target class is predicting violent recidivism.3
,,,
,,,
Model Type,classification,,
Model Train Date,31 December 2014,,
Test Data Date,23 May 2016,,
Dataset Size,18178,,
%Train/%Test,NA/100,,
Accuracy,Accuracy,Accuracy,Accuracy
,Name,Raw Score,% Over Baseline
Standard Score,f1_score,0.172,95.4


In [7]:
table.save('model_facts_compas.png', web_driver = "firefox", window_size = (640,360));

## 3. Comparison
As demonstrated in Northpointe's rebuttal, the scoring metric you use can be quite effective in masking biases. 

We recreate a Model Facts label using AUC to calculate demographic statistics. 

In [8]:
# all other variables stay the same
t_score = 'f1_score'
st_score = 'roc_auc_score'
st_proba = True
t_proba = False
t_score_func = getattr(metrics, t_score)
st_score_func = getattr(metrics, st_score)

# create model facts object
model_facts = ModelFacts(data, true_col, pred_col, baseline_col, 
                         st_score_func, t_score_func, classification = True, 
                         pred_proba = pred_proba, baseline_proba = baseline_proba,
                         st_proba = st_proba, t_proba = t_proba)
# calculate various stats along demographics
mf_compas_auc= model_facts(demo_cols, age_col = age_col,
    train_date = train_date, test_data_date = test_data_date,
    data_size = data_size, data_split = data_split,
    st_kwargs = st_kwargs, t_kwargs = t_kwargs)
# create the Model Facts label from the data
table = model_facts.make_label(mf_compas_auc, application, warnings, source, show = True)

Model Facts,Model Facts,Model Facts,Model Facts
Application: Predicting risk of violent recidivism using COMPAS. The target class is predicting violent recidivism,Application: Predicting risk of violent recidivism using COMPAS. The target class is predicting violent recidivism.1,Application: Predicting risk of violent recidivism using COMPAS. The target class is predicting violent recidivism.2,Application: Predicting risk of violent recidivism using COMPAS. The target class is predicting violent recidivism.3
,,,
,,,
Model Type,classification,,
Model Train Date,31 December 2014,,
Test Data Date,23 May 2016,,
Dataset Size,18178,,
%Train/%Test,NA/100,,
Accuracy,Accuracy,Accuracy,Accuracy
,Name,Raw Score,% Over Baseline
Standard Score,roc_auc_score,0.648,6.96


In [9]:
table.save('model_facts_compas_auc.png', web_driver = "firefox", window_size = (640,360));

In [10]:
# all other variables stay the same
t_score = 'f1_score'
st_score = 'precision_score'
st_proba = False
t_proba = False
t_score_func = getattr(metrics, t_score)
st_score_func = getattr(metrics, st_score)

# create model facts object
model_facts = ModelFacts(data, true_col, pred_col, baseline_col, 
                         st_score_func, t_score_func, classification = True, 
                         pred_proba = pred_proba, baseline_proba = baseline_proba,
                         st_proba = st_proba, t_proba = t_proba)
# calculate various stats along demographics
mf_compas_ppv= model_facts(demo_cols, age_col = age_col,
    train_date = train_date, test_data_date = test_data_date,
    data_size = data_size, data_split = data_split,
    st_kwargs = st_kwargs, t_kwargs = t_kwargs)
# create the Model Facts label from the data
table = model_facts.make_label(mf_compas_ppv, application, warnings, source, show = True)

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


Model Facts,Model Facts,Model Facts,Model Facts
Application: Predicting risk of violent recidivism using COMPAS. The target class is predicting violent recidivism,Application: Predicting risk of violent recidivism using COMPAS. The target class is predicting violent recidivism.1,Application: Predicting risk of violent recidivism using COMPAS. The target class is predicting violent recidivism.2,Application: Predicting risk of violent recidivism using COMPAS. The target class is predicting violent recidivism.3
,,,
,,,
Model Type,classification,,
Model Train Date,31 December 2014,,
Test Data Date,23 May 2016,,
Dataset Size,18178,,
%Train/%Test,NA/100,,
Accuracy,Accuracy,Accuracy,Accuracy
,Name,Raw Score,% Over Baseline
Standard Score,precision_score,0.135,43.0


In [11]:
table.save('model_facts_compas_ppv.png', web_driver = "firefox", window_size = (640,360));