# Agenda

* **Dataset**
* **Metrics**
* **Debiasing methods**

## Dataset

We'll work on the recidivism prediction model COMPAS (Correctional Offender Management Profiling for Alternative Sanctions).

The dataset was take from [Propublica Compas-analysis Github project](https://github.com/propublica/compas-analysis)

It is used to predict whether a person will reofend in the future which in turn affect the future of the defendand.

We have a bunch of variables describing:
* previous offenses
* current crime
* age, 
* race, 
* sex 

In [4]:
import pandas as pd

data = pd.read_csv('../data/raw/compas-scores-two-years.csv')
data.head(10)

Unnamed: 0,id,name,first,last,compas_screening_date,sex,dob,age,age_cat,race,...,v_decile_score,v_score_text,v_screening_date,in_custody,out_custody,priors_count.1,start,end,event,two_year_recid
0,1,miguel hernandez,miguel,hernandez,2013-08-14,Male,1947-04-18,69,Greater than 45,Other,...,1,Low,2013-08-14,2014-07-07,2014-07-14,0,0,327,0,0
1,3,kevon dixon,kevon,dixon,2013-01-27,Male,1982-01-22,34,25 - 45,African-American,...,1,Low,2013-01-27,2013-01-26,2013-02-05,0,9,159,1,1
2,4,ed philo,ed,philo,2013-04-14,Male,1991-05-14,24,Less than 25,African-American,...,3,Low,2013-04-14,2013-06-16,2013-06-16,4,0,63,0,1
3,5,marcu brown,marcu,brown,2013-01-13,Male,1993-01-21,23,Less than 25,African-American,...,6,Medium,2013-01-13,,,1,0,1174,0,0
4,6,bouthy pierrelouis,bouthy,pierrelouis,2013-03-26,Male,1973-01-22,43,25 - 45,Other,...,1,Low,2013-03-26,,,2,0,1102,0,0
5,7,marsha miles,marsha,miles,2013-11-30,Male,1971-08-22,44,25 - 45,Other,...,1,Low,2013-11-30,2013-11-30,2013-12-01,0,1,853,0,0
6,8,edward riddle,edward,riddle,2014-02-19,Male,1974-07-23,41,25 - 45,Caucasian,...,2,Low,2014-02-19,2014-03-31,2014-04-18,14,5,40,1,1
7,9,steven stewart,steven,stewart,2013-08-30,Male,1973-02-25,43,25 - 45,Other,...,3,Low,2013-08-30,2014-05-22,2014-06-03,3,0,265,0,0
8,10,elizabeth thieme,elizabeth,thieme,2014-03-16,Female,1976-06-03,39,25 - 45,Caucasian,...,1,Low,2014-03-16,2014-03-15,2014-03-18,0,2,747,0,0
9,13,bo bradac,bo,bradac,2013-11-04,Male,1994-06-10,21,Less than 25,Caucasian,...,5,Medium,2013-11-04,2015-01-06,2015-01-07,1,0,428,1,1


In [2]:
print(data.columns)

Index(['id', 'name', 'first', 'last', 'compas_screening_date', 'sex', 'dob',
       'age', 'age_cat', 'race', 'juv_fel_count', 'decile_score',
       'juv_misd_count', 'juv_other_count', 'priors_count',
       'days_b_screening_arrest', 'c_jail_in', 'c_jail_out', 'c_case_number',
       'c_offense_date', 'c_arrest_date', 'c_days_from_compas',
       'c_charge_degree', 'c_charge_desc', 'is_recid', 'r_case_number',
       'r_charge_degree', 'r_days_from_arrest', 'r_offense_date',
       'r_charge_desc', 'r_jail_in', 'r_jail_out', 'violent_recid',
       'is_violent_recid', 'vr_case_number', 'vr_charge_degree',
       'vr_offense_date', 'vr_charge_desc', 'type_of_assessment',
       'decile_score.1', 'score_text', 'screening_date',
       'v_type_of_assessment', 'v_decile_score', 'v_score_text',
       'v_screening_date', 'in_custody', 'out_custody', 'priors_count.1',
       'start', 'end', 'event', 'two_year_recid'],
      dtype='object')


## Problem
Defandands were treated unfairly

<img src="../images/propublica_low_high.png" alt="Drawing" style="width: 500px;"/>

Systematically wrong

<img src="../images/propublica.jpeg" alt="Drawing" style="width: 1000px;"/>

## References:

* [Propublica article](https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing)

## Simplifications/Transformations

**We will work on the subset of columns**

In [4]:
TARGET_COLS = ['two_year_recid']
COMPAS_SCORES_COLS = ['decile_score']
TIME_COLS = ['c_jail_in', 'c_jail_out']
NUMERICAL_FEATURE_COLS = ['age',
                          'juv_fel_count','juv_misd_count','juv_other_count',
                          'priors_count']
CATAGORICAL_FEATURE_COLS = ['sex','race',
                            'c_charge_degree']
PROTECTED_COLS = ['sex','race']

data[CATAGORICAL_FEATURE_COLS+NUMERICAL_FEATURE_COLS+TIME_COLS+TARGET_COLS]

Unnamed: 0,sex,race,c_charge_degree,age,juv_fel_count,juv_misd_count,juv_other_count,priors_count,c_jail_in,c_jail_out,two_year_recid
0,Male,Other,F,69,0,0,0,0,2013-08-13 06:03:42,2013-08-14 05:41:20,0
1,Male,African-American,F,34,0,0,0,0,2013-01-26 03:45:27,2013-02-05 05:36:53,1
2,Male,African-American,F,24,0,0,1,4,2013-04-13 04:58:34,2013-04-14 07:02:04,1
3,Male,African-American,F,23,0,1,0,1,,,0
4,Male,Other,F,43,0,0,0,2,,,0
...,...,...,...,...,...,...,...,...,...,...,...
7209,Male,African-American,F,23,0,0,0,0,2013-11-22 05:18:27,2013-11-24 02:59:20,0
7210,Male,African-American,F,23,0,0,0,0,2014-01-31 07:13:54,2014-02-02 04:03:52,0
7211,Male,Other,F,57,0,0,0,0,2014-01-13 05:48:01,2014-01-14 07:49:46,0
7212,Female,African-American,M,33,0,0,0,3,2014-03-08 08:06:02,2014-03-09 12:18:04,0


In [5]:
data['c_jail_in'] = pd.to_datetime(data['c_jail_in'])
data['c_jail_out'] = pd.to_datetime(data['c_jail_out'])

def jail_time(row):
    return (row['c_jail_out'] - row['c_jail_in']).days

data['jail_time'] = data.apply(jail_time, axis=1).fillna(0)
data[['c_jail_in','c_jail_out','jail_time']]

NUMERICAL_FEATURE_COLS.append('jail_time')

**We will encode categoricals**

In [6]:
from category_encoders import OrdinalEncoder

data_preprocessed = data[NUMERICAL_FEATURE_COLS+CATAGORICAL_FEATURE_COLS+COMPAS_SCORES_COLS+TARGET_COLS]
encoder = OrdinalEncoder(cols=CATAGORICAL_FEATURE_COLS)
data_preprocessed = encoder.fit_transform(data_preprocessed)

**We will format score to deciles**

In [7]:
data_preprocessed['compas_score'] = data_preprocessed['decile_score']/10.
data_preprocessed['compas_class'] = (data_preprocessed['decile_score']>6).astype(int)

In [8]:
data_preprocessed.head(10)

Unnamed: 0,age,juv_fel_count,juv_misd_count,juv_other_count,priors_count,jail_time,sex,race,c_charge_degree,decile_score,two_year_recid,compas_score,compas_class
0,69,0,0,0,0,0.0,1,1,1,1,0,0.1,0
1,34,0,0,0,0,10.0,1,2,1,3,1,0.3,0
2,24,0,0,1,4,1.0,1,2,1,4,1,0.4,0
3,23,0,1,0,1,0.0,1,2,1,8,0,0.8,1
4,43,0,0,0,2,0.0,1,1,1,1,0,0.1,0
5,44,0,0,0,0,1.0,1,1,2,1,0,0.1,0
6,41,0,0,0,14,6.0,1,3,1,6,1,0.6,0
7,43,0,0,0,3,0.0,1,1,1,4,0,0.4,0
8,39,0,0,0,0,2.0,2,3,2,1,0,0.1,0
9,21,0,0,0,1,0.0,1,3,1,3,1,0.3,0


In [9]:
data_preprocessed.to_csv('../data/processed/compas-scores-two-years-processed.csv',index=None)

## Question

### **Can we build something better?**

<img src="../images/lets_build.jpg" alt="Drawing" style="width: 600px;"/>
