# Grid Search with the COMPAS Dataset

This notebook demonstrates the use of the grid search algorithm from `fairlearn` on the [COMPAS dataset from ProPublica](https://raw.githubusercontent.com/propublica/compas-analysis/master/compas-scores-two-years.csv). This dataset comes from the criminal justice system, with the labels (0 or 1) representing the two-year recidivism rate, specifically whether or not a given offender is re-arrested within two years (with a 0 representing no arrest). Models based on this dataset are used in bail decisions.

## Loading and Examining the Data

We start by loading the dataset using the `tempeh` package (there may be some warnings, if you do not have `pytorch`, `keras` or `tensorflow` installed in your environment; these may be ignored). The data are already split into training and test sets:

In [None]:
import pandas as pd
import numpy as np
from tempeh.configurations import datasets

compas_dataset = datasets['compas']()
X_train = pd.DataFrame(compas_dataset.X_train, columns=compas_dataset.features)
y_train = pd.Series(compas_dataset.y_train.reshape(-1).astype(int), name="two_year_recid")
X_test = pd.DataFrame(compas_dataset.X_test, columns=compas_dataset.features)
y_test = pd.Series(compas_dataset.y_test.reshape(-1).astype(int), name="two_year_recid")
sensitive_features_train = pd.Series(compas_dataset.race_train)
sensitive_features_test = pd.Series(compas_dataset.race_test)

We can examine the features:

In [None]:
X_train

In this example, we treat Race as the sensitive attribute. The dataset has already been reduced to only have two values, "African-American" and "Caucasian", with approximately two thirds of the samples being African-American:

In [None]:
np.unique(sensitive_features_train, return_counts=True)

Note that race does not feature in the feature data itself.

## Training an unmitigated model

Before attempting to mitigate any disparity, we should first train a model without regard to fairness. For simplicity, we will use a logistic regression model, as implemented by `scikit-learn`:

In [None]:
from sklearn.linear_model import LogisticRegression

unconstrained_predictor = LogisticRegression(solver='liblinear')
unconstrained_predictor.fit(X_train, y_train)

With the model trained, we can examine it in the Fairness Dashboard. There are a number of sections which we can examine.

First is the Accuracy - the fraction of cases where the model gave the right answer. While the overall accuracy is a little over 66%, this number hides some complexity. While both subgroups had a similar overall accuracy, we can see that African-Americans had a much higher overestimation error (i.e. the model predicts that they will be rearrested when they were not) while Caucasians have a much higher underestimation error (i.e. the model predicts that they will not rearrested, but they were).

If we instead look at the Recall (which measures model's ability to find all of the positive samples), we can see a much lower score for Caucasians than African-Americans. The story for the Specificity (which measures the ability of a model to find all of the negative samples - in this case, thoses where there was no rearrest) is reversed, with Caucasians having a specificity of nearly 80%, but African-Americans only showing a specificity score of about 65%.

In [None]:
from fairlearn.widget import FairlearnDashboard

predicted_ys = [unconstrained_predictor.predict(X_test).tolist()]
sensitive_features_mapped = list(map(lambda x: [x], sensitive_features_test.values))

FairlearnDashboard(sensitive_features=sensitive_features_mapped,
                   true_y=y_test.values,
                   predicted_ys=predicted_ys,
                   class_names=None,
                   feature_names=["Race"],
                   is_classifier=True)

## Selecting the Disparity Constraint

Before we can try to reduce disparity, we must first ask what the relevant constraint on the disparity should be. There are two options currently in `fairlearn` - Demographic Parity and Equalized Odds. While `fairlearn` produce models which reduce violation of the specified constraint, that does not mean that the models are *fairer* in the broader societal context.

In the following, we use $A$ for the sensitive attribute, $Y$ for the true values and $\hat{Y}$ for the predicted values. Since we have a binary classification problem, $Y , \hat{Y} \in \{ 0, 1 \}$.

Demographic Parity requires that $P( \hat{Y} | A ) = P(\hat{Y})$. That is, each subgroup (African-Americans and Caucasians in this case) should be equally likely to get a positive prediction (which in this example means "rearrested").

Equalized Odds requires that $P( \hat{Y} | A, Y ) = P( \hat{Y} | Y)$, which corresponds to two separate equations for the two possible values of $Y$. For the case $Y=1$, this is equivalent to equalizing the true positive rates (also known as "Recall") across groups. In the $Y=0$ case, this is equivalent to equalizing the false positive rates (also known as "Fall-Out") across groups.

If we are using our model to make bail decisions, we want to minimise the number of offences commited when out on bail. We use the rearrest feature as a proxy for this (note that there are a number of issues with doing so). Demographic Parity does not make much sense in this case - what that will do is equalise the chances of predicting a rearrest. In contrast, Equalized Odds is a reasonable criterion to use - we will be aiming to predict rearrests correctly at equal rates for African-Americans and Caucasians, and also predict rearrests which would not actually have occurred at equal rates.

## Reducing Disparity with Grid Search

The `GridSearch` class in `fairlearn` implements a simplified version of the exponentiated gradient reduction of [Agarwal et al. 2018](https://arxiv.org/abs/1803.02453). The user supplies a standard ML estimator, which is treated as a blackbox. `GridSearch` works by generating a sequence of relabellings and reweightings, and trains a predictor for each.

We start by constructing the `GridSearch` estimator:

In [None]:
from fairlearn.reductions import GridSearch
from fairlearn.reductions import EqualizedOdds

sweep = GridSearch(LogisticRegression(solver='liblinear', fit_intercept=True),
                   constraints=EqualizedOdds(),
                   grid_size=51)

Next, we run this on our dataset:

In [None]:
sweep.fit(X_train, y_train, sensitive_features=sensitive_features_train)

Although `GridSearch` behaves like an predictor, by implementing a `fit` method, in this case we want to extract all of the models which were trained as part of the search:

In [None]:
predictors = [ z.predictor for z in sweep.all_results]

We can generate predictions from all of these, and show a new Fairness Dashboard. One thing to note about all of these is that the range in disparity is generally larger than the range in accuracy. This means that one usually only needs to sacrifice a small amount of accuracy in order to gain a substantial reduction in disparity.

In [None]:
sweep_predicted_ys = [p.predict(X_test).tolist() for p in predictors]

FairlearnDashboard(sensitive_features=sensitive_features_mapped,
                   true_y=y_test.values,
                   predicted_ys=sweep_predicted_ys,
                   class_names=None,
                   feature_names=["Race"],
                   is_classifier=True)