In this notebook we will train several classifiers as models to predict recidivism. We will analyze how these classifiers perform.

In [44]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

In [45]:
X = np.loadtxt("X.csv")
y = np.loadtxt("y.csv")
X_pandas = pd.read_csv("X_pandas.csv")

First we will partition our data into three sets: training, evaluation, and holdout. We will use the training set to build our models. We will use the evaluation set to evaluate how well our models perform. The holdout set is reserved for the end of the project; once we have tinkered with the models, we will see how well they perform on a fresh holdout set. Final conclusions should be based on analysis on the holdout set. 

In [13]:
#set random seed for reproducibility
seed = 20
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=seed)
X_eval, X_hold, y_eval, y_hold = train_test_split(X_test, y_test, test_size=0.45)

Let's first train a logistic regression classifier.

In [36]:
logreg = LogisticRegression(C=100000, max_iter=10000)
logreg.fit(X_train,y_train)

LogisticRegression(C=100000, class_weight=None, dual=False,
          fit_intercept=True, intercept_scaling=1, max_iter=10000,
          multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,
          solver='liblinear', tol=0.0001, verbose=0, warm_start=False)

In [42]:
print("The accuracy is {0:.4f}".format(logreg.score(X_test, y_test)))

The accuracy is 0.6914


Let's create a vector of predicted labels. Each predicted label is either 0 (if the algorithm thinks the person will not recommit a crime) or 1 (if the algorithm thinks the person will recommit a crime). We can imagine a decision-making process where people who get a predicted label = 1 are denied pre-trial release and peopple who get a predicted label = 0 are released pre-trial.

In [43]:
y_pred = logreg.predict(X_test)

We are interested in analyzing how are algorithms assigns decision based on attributes like race or gender since we want to analyze whether the algorithm is unfair.  

Create a mapping to find the column location of each variable. This is needed since NumPy doesn't maintain the column labels as strings (only as numbers). 

In [None]:
feat_map = list(X_pandas.columns)[1:]

Create boolean vectors indicating membership in each race or gender group. 

**TODO** Finish for race_Caucasian, race_Hispanic, and race_Other.

In [95]:
males = X_test[:,feat_map.index('sex_Male')] == 1
afr_am = X_test[:,feat_map.index('race_African-American')] == 1

# Fairness Metrics
## Demographic parity

Demographic parity is a fairness concept that requires that the probability of an outcome be the same for two demographic groups. As an example, consider an algorithm determining loan approvals and consider that we are concerned about gender discrimination. The algorithm would not satisfy demographic parity if the probability of loan approval for women is 0.3 but the probability of loan approval for men is 0.8. 

Does our logistic regression classifier satisfy demographic parity with respect to race or gender? 
**TODO** Compute the means for the other race groups and gender groups. Compare the differences. How large of a difference do you think we should accept under the notion of demographic parity?

In [96]:
afr_am_mean = y_pred[afr_am].mean()
print("Logistic regression denies pre-trial release to {0:.1f}% of African-Americans".format(afr_am_mean*100))

Logistic regression denies pre-trial release to 51.6% of African-Americans


In [101]:
y_pred[X_test[:,feat_map.index('sex_Female')] == 1].mean()

0.12200435729847495

In [79]:
X_pandas.sex_Male.value_counts()

1    4997
0    1175
Name: sex_Male, dtype: int64