# Fairness and Bias in Machine Learning

Today, we will reproduce part of the analysis performed by ProPublica about the COMPAS algorithm, which is used to predict the probability of recidivism (i.e. committing a crime in the future after the current one).

# COMPAS Algorithm

The COMPAS algorithm, which stands for **Correctional Offender Management Profiling for Alternative Sanctions**, is essentially a tool used to predict the likelihood of a criminal defendant reoffending - that is, committing another crime in the future.

Here's an analogy to help explain: Imagine you're trying to predict the weather. You might look at things like the current temperature, humidity, wind speed, and so on. Then, based on that information, you make a prediction: Is it going to rain later today, or not?

The COMPAS algorithm works in a similar way, but instead of predicting the weather, it's predicting a person's behavior. It looks at various factors about a person, such as **their age, their criminal history, and their responses to a questionnaire**. Based on these factors, the algorithm makes a prediction: Is this person likely to commit another crime in the future, or not?

These predictions are then used to **help judges make decisions in criminal cases**. For example, a judge might use the prediction to help decide whether a defendant should be released on bail before their trial, or to help decide what sentence to give a defendant who's been found guilty.

However, the COMPAS algorithm has been **controversial**. A 2016 investigation by ProPublica, a non-profit investigative journalism organization, found that the **algorithm was biased against black defendants**. The algorithm predicted that black defendants were more likely to reoffend than they actually were, and less likely to predict that white defendants would reoffend than they actually did. This has sparked a lot of debate about the use of algorithms in the criminal justice system, and how to ensure that these algorithms are fair.

# Loading the Data

Let's import the libraries we will need for this tutorial:

In [None]:
import pandas as pd
import numpy as np
from datetime import datetime
import seaborn as sns
import matplotlib.pyplot as plt

plt.style.use("ggplot")
plt.rcParams["figure.figsize"] = [11, 4]

Then we can load the data:

In [None]:
df_raw = pd.read_csv("../data/compas-scores-two-years.csv")

In [None]:
df_raw.info()

## Column selection

We will limit our analysis to the following columns:

- **id**: defendant identifier
- **sex**: The defendant's sex (Male or Female)
- **age**: The defendant's age
- **age_cat**: A categorical representation of the defendant's age, often binned into age groups for easier analysis.
- **race**: The defendant's race.
- **days_b_screening_arrest**: The number of days between the COMPAS screening date and the date of the defendant's arrest.
- **c_charge_degree**: The degree of the crime the defendant is charged with (F : felony, M : misdemeanor)
- **priors_count**: number of previous offenses
- **decile_score**: This is a score, ranging from 1 to 10, given by the COMPAS tool. The score represents the perceived risk that a defendant will re-offend. Higher scores indicate a higher perceived risk.
- **score_text**: A textual description of the risk score. This can be 'Low', 'Medium', or 'High'.
- **c_jail_in**: The date and time the defendant entered jail
- **c_jail_out**: The date and time the defendant left jail
- **is_recid**: This is a binary field that indicates whether the defendant is a recidivist, i.e., whether they re-offended after the COMPAS assessment. A value of 1 indicates that they did re-offend, while a value of 0 indicates that they did not.
- **two_year_recid**: A binary field that indicates whether the defendant re-offended within two years after the COMPAS assessment. A value of 1 indicates that they did re-offend, while a value of 0 indicates that they did not.

In [None]:
columns_of_interest = ["id", "sex", "age", "age_cat", "race", "days_b_screening_arrest",
                       "c_charge_degree", "priors_count", "decile_score", "score_text", "v_score_text",
                       "c_jail_in", "c_jail_out", "is_recid", "two_year_recid"]
df = df_raw[columns_of_interest].copy()

### Before starting the analysis, ProPublica performed a cleaning of some rows because of missing data:

> There are a number of reasons remove rows because of missing data:
> - If the charge date of a defendants COMPAS scored crime was not within 30 days from when the person was arrested, we assume that because of data quality reasons, that we do not have the right offense.
> - We coded the recidivist flag -- is_recid -- to be -1 if we could not find a COMPAS case at all.
> - In a similar vein, ordinary traffic offenses -- those with a c_charge_degree of 'O' -- will not result in Jail time are removed (only two of them).
> - We filtered the underlying data from Broward county to include only those rows representing people who had either recidivated in two years, or had at least two years outside of a correctional facility.

In [None]:
# Applying the filter used by ProPublica
# creating a filter (or mask) to select certain rows from the dataframe df based on specific conditions.
selection = ((df["days_b_screening_arrest"] <= 30) & (df["days_b_screening_arrest"] >= -30)
            & (df["is_recid"] != -1)
            & (df["c_charge_degree"] != "O")
            & (df["score_text"] != "N/A"))

'''
df["days_b_screening_arrest"] <= 30 and df["days_b_screening_arrest"] >= -30: This selects the records where the number of days between the COMPAS screening date and the date of arrest is within 30 days before or after the screening.

df["is_recid"] != -1: This selects the records where is_recid (which indicates whether the person is a recidivist, i.e., a repeat offender) is not equal to -1. A value of -1 here may indicate missing or erroneous data.

df["c_charge_degree"] != "O": This selects the records where the charge degree is not 'O'.

df["score_text"] != "N/A": This selects the records where the score_text (the text description of the risk score) is not 'N/A'. 'N/A' usually means 'Not Available', indicating missing data.
'''

df = df[selection].copy()
df.info()

In [None]:
df.head()

# Exploratory Data Analysis

Distribution by age:

In [None]:
df.age_cat.value_counts()

We can also get the normalised values:

In [None]:
df.age_cat.value_counts(normalize=True)

Of course, we could create a visualisation of these values

In [None]:
sns.countplot(x=df["age_cat"], color="green")
plt.show()

We can also specifie a `hue` parameter to separate each count stack into different bars (depending on another feature):

In [None]:
# sns.countplot(x="age_cat", hue="score_text", data=df)
# plt.show()

Distributions by race

In [None]:
display(df["race"].value_counts())
plt.rcParams['figure.figsize'] = [11, 4]
sns.countplot(x=df["race"], color="green")
plt.show()

In [None]:
# sns.countplot(x="race", hue="score_text", data=df)
# plt.show()

Let's see now the text-based score values

In [None]:
df.score_text.value_counts()

COMPAS also tries to predict if recivism will be violent:

In [None]:
df.v_score_text.value_counts()

Lastly, the dataset is mostly composed by men:

In [None]:
df.sex.value_counts(normalize=True)

### Correlation between COMPAS score and jail duration

From the article:

> In 2008, the sheriff’s office decided that instead of building another jail, it would begin using Northpointe’s risk scores (e.g. COMPAS) to help identify which defendants were low risk enough to be released on bail pending trial. Since then, nearly everyone arrested in Broward has been scored soon after being booked. (People charged with murder and other capital crimes are not scored because they are not eligible for pretrial release.)

Based on this, it would be expected to find a correlation between jail time and the score provided by COMPAS.

Let's calculate the correlation between the jail time and the decile score of a defendant:

In [None]:
# Convert jail date columns to the appropriate type for operating with them
for col in ["c_jail_in", "c_jail_out"]:
    df[col] = pd.to_datetime(df[col])

df[["c_jail_in", "c_jail_out"]]

In [None]:
def get_jail_duration_days(row):
  return (row["c_jail_out"] - row["c_jail_in"]).days

df["jail_duration_days"] = df.apply(get_jail_duration_days, axis=1)
df["jail_duration_days"].corr(df["decile_score"])

# Above code can also be written this way:
# print(df["decile_score"].corr(df["jail_duration_days"]))

In statistical terms, a correlation coefficient (often denoted by "r") ranges from -1 to +1. The closer the value is to +1 or -1, the stronger the relationship between the variables. A correlation of +1 indicates a perfect positive relationship, a correlation of -1 indicates a perfect negative relationship, and a correlation of 0 indicates no relationship.

Here's a general guide to interpreting correlation coefficient values:

1. Very strong relationship (±0.8 to ±1.0)
2. Strong relationship (±0.6 to ±0.8)
3. Moderate relationship (±0.4 to ±0.6)
4. Weak relationship (±0.2 to ±0.4)
5. Very weak or no relationship (0 to ±0.2)

In this case, a correlation coefficient of 0.2075 would suggest a weak positive relationship between decile_score and jail_duration_days. This means that as the decile_score increases, the jail_duration_days tends to slightly increase as well, but the relationship is not very strong. This suggests that other factors apart from the decile_score also significantly influence the jail_duration_days. It also indicates that the decile_score alone cannot be used to predict the jail_duration_days reliably.

We can also use seaborn's `regplot` that you have learned in the previous lectures

In [None]:
sns.regplot(x=df["decile_score"], y=df["jail_duration_days"])
plt.show()

**Question**

Can we extrapolate anything from this relation? Can we conclude that COMPAS is increasing jail time?

**Answer**

**What can we conclude then?**

There is a slight positive correlation between the total jail duration and the COMPAS recidivism score*. Nothing more, nothing less

### Distribution of recidivism numeric scores depending on race:

In [None]:
df_black = df[df.race == "African-American"]
sns.countplot(x=df_black["decile_score"], color="green")
plt.ylim(0, 650)
plt.title("African-American Defendants' Decile Scores")
plt.show()

**Exercise**: Create the same count plot visualisation for the `Caucasian` rows

In [None]:
# Your turn: create a countplot for Caucasian
df_white = df[df.race == "Caucasian"]
sns.countplot(x=df_white["decile_score"], color="green")
plt.ylim(0, 650)
plt.title("Caucasian Defendants' Decile Scores")
plt.show()

What can we say about the difference between these plots?

# Analysis of accuracy for the COMPAS classification

The accuracy is similar for the different races (~60%), but there seems to be an issue on how the algorithm fails to predict a value:
    
- African American are more prone to **false positives**, i.e., being incorrectly classified as "high risk" (and not reoffending later)
- On the contrary, caucasian/white defendants suffer from the opposite treatment, with a high number of **false negatives**: they are more likely to be wrongly labeled as "low risk"

We will create a `high_risk` binary column: this value will be 0 if `score_text` is "Low", and 1 for the "Medium" and "High" values. We will use this column as the predicted outcome of the COMPAS algorithm

In [None]:
df["score_text"].unique()

In [None]:
def get_high_risk(score_text_value):
  if score_text_value == "Low":
    return 0
  else:
    return 1

df["high_risk"] = df["score_text"].apply(get_high_risk)

In [None]:
df[["decile_score", "score_text", "high_risk"]].drop_duplicates().sort_values(by=["decile_score"])

Our truth column is `is_recid`, which indicates whether a defendant commited a new crime after the first one.

As we have both the COMPAS prediction and the real outcome, we can check how well the score performs:

In [None]:
from sklearn.metrics import classification_report

real = df["is_recid"]
predicted = df["high_risk"]

print(classification_report(real, predicted))

Do these metrics change for race subsets? [Relevant image](https://en.wikipedia.org/wiki/File:Precisionrecall.svg)

In [None]:
df_white = df[df.race == "Caucasian"]
print(classification_report(df_white["is_recid"], df_white["high_risk"]))

**Exercise**: do the same for `African-American` defendants

In [None]:
# Your turn: obtain a classification_report for the subset of African-American defendants
df_black = df[df.race == "African-American"]
print(classification_report(df_black["is_recid"], df_black["high_risk"]))

This and (many many) other metrics can be calculated with existing fairness tools, such as the one explained next. 

# Analysing fairness and bias with an existing library: Fairlearn

[Fairlearn](https://fairlearn.org/) is one of the available tools (more at the end of this notebook) to evaluate the fairness of a classifier. It also includes methods to both train a dataset trying to avoid biases, and to try mitigate issues of an unfair dataset a posteriori (i.e. after training)

The process of using fairlearn involves the following steps:

- Determine the metrics to evaluate (e.g. accuracy, false positive/negative rates)
- Knowing both the predictions and the true values of the outcome (e.g. `high_risk` and `is_recid`)
- Indicating the sensitive features (e.g. race)

As metrics, we will use the support (i.e. number of rows in the subgroup), the selection rate (i.e. number of "positive" outcomes, in our case of high risk predictions), accuracy, false positive rate and false negative rate

In [None]:
!pip install fairlearn

from fairlearn.metrics import MetricFrame
from fairlearn.metrics import selection_rate, false_negative_rate, false_positive_rate
from sklearn.metrics import accuracy_score

# We can define our own metrics if not present in a library.
# Functions need to take true and predicted outcomes as parameters
def support(y_true, y_score):
    return len(y_true)

metrics = {'support': support, 
           'selection_rate': selection_rate,
           'accuracy': accuracy_score,
           'FNR': false_negative_rate, 
           'FPR': false_positive_rate}

Now we create a `MetricsFrame` instance using `race` as the sensitive feature

In [None]:
mf_by_race = MetricFrame(metrics,
                         df["is_recid"],  # the real outcome
                         df["high_risk"], # the predicted outcome
                         sensitive_features=df["race"])

We can get a look at the general metrics values with the `overall` attribute. You can check whether the following values match the ones we calculated some cells above (they should):

In [None]:
mf_by_race.overall

Now, we can get the feature values for every group:

In [None]:
mf_by_race.by_group

The library also includes easy ways to plot those results:

Is there anything in those plots that catches your attention?

In [None]:
mf_by_race.by_group.plot.bar(subplots=True, figsize=(10, 14))
plt.show()

In the previous calculations, the fact that some of the race groups are very poorly represented in the dataset altered the results (e.g. when calculating maximum differences between groups). We will re-create the metric frame but only for `Caucasian`, `African-American` and `Hispanic` defendants: 

In [None]:
df_filtered = df[df["race"].isin(["Caucasian", "African-American", "Hispanic"])]
mf_by_race = MetricFrame(metrics,
                         df_filtered["is_recid"],  # the real outcome
                         df_filtered["high_risk"], # the predicted outcome
                         sensitive_features=df_filtered["race"])

In [None]:
mf_by_race.by_group.plot.bar(subplots=True, figsize=(10, 14))
plt.show()

It is also possible to easily **operate with these metrics** through some built-in functions.

For instance, let's calculate the maximum metrics difference between groups:

Does these results for FNR and FPR match our manual calculations?

In [None]:
mf_by_race.difference(method='between_groups')

It's also possible to define **control features** for the analysis. A controlled feature has its value fixed, so that its variability cannot affect the calculated metrics for the sensitive groups.

Let's use `age_cat` as controlled feature, now over the race-filtered dataframe as defined in the above exercise:

In [None]:
mf_race_agecontrolled = MetricFrame(metrics,
                                    df_filtered["is_recid"],  # the real outcome
                                    df_filtered["high_risk"], # the predicted outcome
                                    sensitive_features=df_filtered["race"],
                                    control_features=df_filtered["age_cat"])

Now, the `overall` resoults are distributed based on the controlled feature:

In [None]:
mf_race_agecontrolled.overall

**Exercise**: Spend a couple of minutes analysing the following metric values by group. Is there any value that seems way off?

In [None]:
mf_race_agecontrolled.by_group

In [None]:
mf_race_agecontrolled.difference(method='between_groups')

# (Optional) Another tool to analyse fairness of different metrics: Aequitas

[Aequitas](http://aequitas.dssg.io/) is an open-source bias and fairness auditing toolkit to search for discrimination and bias in machine learning models. You can check its documentation [here](https://dssg.github.io/aequitas/). They have a [notebook example](https://colab.research.google.com/github/dssg/aequitas/blob/update_compas_notebook/docs/source/examples/compas_demo.ipynb) where they demonstrate their tool over the COMPAS dataset.

In [None]:
!pip install aequitas
from aequitas.group import Group
from aequitas.bias import Bias
from aequitas.fairness import Fairness
import aequitas.plot as ap

In [None]:
# Same COMPAS dataset
# Some columns need to have specific names, so for brevity we just use the provided version
df_aeq = pd.read_csv("https://github.com/dssg/aequitas/raw/master/examples/data/compas_for_aequitas.csv")
df_aeq.head()

`label_value` is the true outcome (i.e. `is_recid`), while `score` is the predicted value (`high_risk`)

Aequitas includes facilities to do the same metrics calculations as we have done above with fairlearn. These metrics are obtained through an instance of the `Group` class:

In [None]:
g = Group()
xtab, _ = g.get_crosstabs(df_aeq) # you can think of aequitas' crosstabs as fairlearn's MetricFrame

# Some of the calculated counts:
absolute_metrics = g.list_absolute_metrics(xtab)
xtab[[col for col in xtab.columns if col not in absolute_metrics]]

In [None]:
# and calculated metrics
xtab[['attribute_name', 'attribute_value'] + absolute_metrics].round(2)

Bias and disparities are calculated with an instance of the `Bias` class. Disparity calculations are done with respect to a reference group. According to the following cell above, we use `Caucasian`, `Male` and `25 - 45` as the reference for each categorical feature:

In [None]:
b = Bias()
bdf = b.get_disparity_predefined_groups(xtab, original_df=df_aeq, 
                                        ref_groups_dict={'race':'Caucasian', 'sex':'Male', 'age_cat':'25 - 45'})

In [None]:
# View disparity metrics added to dataframe
bdf[['attribute_name', 'attribute_value'] +
     b.list_disparities(bdf)].style

As looking at the disparities of the table above can be a bit tedious, Aequitas offers some beautiful and interactive visualisations:

In [None]:
metrics = ['fpr','fnr']
disparity_tolerance = 1.25 # threshold to determine if there is a disparity of the metrics

ap.summary(bdf, metrics, fairness_threshold = disparity_tolerance)

In [None]:
ap.disparity(bdf, metrics, 'race', fairness_threshold = disparity_tolerance)

**Exercise**: create visualisations to search for disparities in the age categories

In [None]:
# Your turn: repeat the above plots around the age_cat feature
ap.disparity(bdf, metrics, 'age_cat', fairness_threshold = disparity_tolerance)

**Optional Exercise**: try other metrics to check for disparate impact. For instance, the Aequitas notebook uses FDR (False Discovery Rate)

FDR = False Positives / Predicted Positives, where Predicted Positives = False Positives + True Positives

In [None]:
# Your turn: try out other metrics that might be interesting!



# Additional resources: Tools for fairness & bias auditing

In this notebook we have seen how to audit a machine learning model for some metrics of interest to search for any present discrimination or bias. Once the metrics to analyse are clear, the procedure to check for biases is fairly similar

Here you have fairness-oriented tools that can be used to check for biases in our dataset (and more! Some of these tools offer ways to "fix the unfairness" of a dataset - you can check that for Fairnet in [this notebook](https://github.com/fairlearn/fairlearn/blob/main/notebooks/Binary%20Classification%20with%20the%20UCI%20Credit-card%20Default%20Dataset.ipynb))

- IBM's AI Fairness: [webpage](https://aif360.mybluemix.net/), [Python documentation](https://aif360.readthedocs.io/en/latest/index.html)
- Microsoft's fairlearn: [webpage](https://fairlearn.org/), [publication](https://www.microsoft.com/en-us/research/uploads/prod/2020/05/Fairlearn_WhitePaper-2020-09-22.pdf)
- Google's What-If tool: [webpage](https://pair-code.github.io/what-if-tool/)
- Amazon's Sagemaker Clarify: [webpage](https://aws.amazon.com/sagemaker/clarify/), [article](https://aws.amazon.com/blogs/aws/new-amazon-sagemaker-clarify-detects-bias-and-increases-the-transparency-of-machine-learning-models/)
- University of Chicago's Aequitas: [webpage](http://aequitas.dssg.io/)