# Fairness modeling: a hands-on introduction
This notebook will introduce you to fairlearn. Fairlearn is a fairness modelling tool that is developed and maintained by Microsoft. The tool allows the user to assess whether a machine learning model is fair. It also implemented methods to solve unfairness. We use fairlearn and not another tool for a couple of reasons:
- The tool is free to use: it is an open-source package that can be used with python;
- The tool is complete: it can be used to assess unfairness and also to solve the problem of unfairness;
- The tool has a very nice visual interface for Jupyter notebooks which makes it very easy to learn;
- The tool is used in industry by for example EY;
- The tool has excellent documentation compared to the other tools that are available.

## Data
In order to demonstrate fairlearn, the [adult income](https://www.kaggle.com/wenruliu/adult-income-dataset) dataset will be used. This is a very commonly used dataset. Usually the target variable of the dataset is the income variable which denotes whether a person makes more or less than 50k per year. We will however consider a different target variable. The target variable in our case denotes whether or not a person paid back a loan. The original dataset can be found below.

In [1]:
import warnings
warnings.filterwarnings('ignore')

import pandas as pd

df = pd.read_csv("adult.csv")
df.head()

Unnamed: 0,age,workclass,fnlwgt,education,educational-num,marital-status,occupation,relationship,race,gender,capital-gain,capital-loss,hours-per-week,native-country,income
0,25,Private,226802,11th,7,Never-married,Machine-op-inspct,Own-child,Black,Male,0,0,40,United-States,<=50K
1,38,Private,89814,HS-grad,9,Married-civ-spouse,Farming-fishing,Husband,White,Male,0,0,50,United-States,<=50K
2,28,Local-gov,336951,Assoc-acdm,12,Married-civ-spouse,Protective-serv,Husband,White,Male,0,0,40,United-States,>50K
3,44,Private,160323,Some-college,10,Married-civ-spouse,Machine-op-inspct,Husband,Black,Male,7688,0,40,United-States,>50K
4,18,?,103497,Some-college,10,Never-married,?,Own-child,White,Female,0,0,30,United-States,<=50K


We will create the new target variable in a very simple way. In case the person made more than 50K, then we assume they paid back the loan. Otherwise, they did not. We also drop the income variable from the dataset because it can perfectly predict the target variable.

In [2]:
df.loc[:, "target"] = df.income.apply(lambda x: int(x == ">50K")) # int(): Fairlearn dashboard requires integer target
df = df.drop("income", axis = 1)

## Building a simple model
The dataframe contains over 48 000 observations. For the sake of simplicity, we will not do any variable selection and just include all the variables in the dataframe. **We will also not split the data into a train and test dataset since this is just for the purpose of demonstration**. There is however one step that we need to take. Some of the variables in our data (i.e. **race** and **gender**) can be considered as sensitive or protected attributes. It is a best practice to not include these variables in the dataset because you do not want the model to take them into account. We will however save them in a new variable because they will be needed in the future.

In [3]:
race = df.pop("race") # Pop function drops and assigns at the same time
gender = df.pop("gender")

In order to use the variables in a model, we need to transform the categorical columns into dummy variables. Before we do that, we will take away the target variable.

In [4]:
target = df.pop("target")
df = pd.get_dummies(df, drop_first = False) # decision tree will be used as the model -> don't drop first

For the demonstration, I will be using a simple decision tree classifier and use it to make predictions.

In [5]:
from sklearn.tree import DecisionTreeClassifier

classifier = DecisionTreeClassifier(min_samples_leaf=10, max_depth=4) # parameters have not been tuned
classifier.fit(df, target)

# Note that we are predicting using the same data as we used for training, this is just for the sake of example
# Never do this in real life
prediction = classifier.predict(df)

To be able to properly assess the performance of the model, we need to know the class imbalance.

In [6]:
target.value_counts(normalize = True)

0    0.760718
1    0.239282
Name: target, dtype: float64

## Using fairlearn to assess the unfairness
In order to use fairlearn, we first have install the fairlearn library as well as the ipywidgets library. You can install them using the command prompt and the commands below (run them one by one).

In [7]:
# pip install fairlearn
# pip install ipywidgets

There are two ways we can make use of the fairlearn package. The first way is by using functions that are defined in the package. The second way is by using the visual interface. First, we will focus on using the functions.

### Identifying fairness
The functions in the fairlearn package that can be used to identify unfairness mostly make use of the sklearn metrics. The most commonly used metrics ar:
- **Accuracy_score**: the set of labels predicted for a sample must exactly match the corresponding set of labels in y_true;
- **AUC**: computes Area Under the Curve (AUC of the ROC using the trapezoidal rule;
- **Balanced_accuracy_score**: it is defined as the average of recall obtained on each class;
- **F1_score**: the F1 score can be interpreted as a weighted average of the precision and recall;
- **Recall_score**:  the recall is intuitively the ability of the classifier to find all the positive samples.
- **Precision_score**: the precision is intuitively the ability of the classifier not to label as positive a sample that is negative.

We can calculate these metrics for every group that falls under a protected variable. By example, we can compute the accuracy for males and females separately. In fairlean, we do this by calling the *group_summary* function.

In [8]:
from fairlearn.metrics import group_summary
from sklearn.metrics import accuracy_score

group_summary(accuracy_score , target, prediction, sensitive_features = gender)

{'overall': 0.8443552680070431,
 'by_group': {'Female': 0.9251482213438735, 'Male': 0.8042879019908117}}

Besides being able to use the sklearn metrics, fairlearn also allows the user to calculate specialized metrics for fairness like demographic parity, equalized odds etc. The following metrics are available as part of *fairlearn.metrics*:
- **demographic_parity_difference**: the difference between the largest and the smallest group-level selection rate across all values of the sensitive parameter. A value of 0 means that all groups have the same selection rate;
- **demographic_parity_ratio**: the ratio instead of the difference.
- **difference_from_summary**: the difference between the maximum and the minimum metric value across groups (using the sklearn metrics);
- **equalized_odds_difference**: the smaller of two metrics, the true positive rate and the false positive rate. An equalized odds difference of 0 means that all groups have the same true positive, true negative, false positive and false negative rate;
- **equalized_odds_ratio**: the ratio instead of the difference;
- **false_positive_rate**: self explanatory;
- **false_negative_rate**: self explanatory;
- **true_positive_rate**: self explanatory;
- **true_negative_rate**: self explanatory.

A complete list can be found [here](https://fairlearn.github.io/api_reference/fairlearn.metrics.html). In order to make a decision of whether the model is fair, we need to define cut-off values for these metrics. Microsoft has not specified these cut-off values in the fairlearn tool. Luckily, IBM did specify them in their Fairness 360 tool.

The cut-off values for the different metrics are:
- Demograpic parity difference: if the absolute value is smaller than 0.1, the model can be considered fair
- Equalized odds difference: if the absolute value is smaller than 0.1, the model can be considered fair
- Equal opportunity difference:  if the absolute value is smaller than 0.1, the model can be considered fair
- Demographic parity ratio: fairness for this metric is between 0.8 and 1.25

As an example, let's calculate the demographic parity difference, followed by the equalized odds difference for the gender.

In [9]:
from fairlearn.metrics import demographic_parity_difference, equalized_odds_difference, demographic_parity_ratio

dpd = demographic_parity_difference(target, prediction, sensitive_features = gender)
eod = equalized_odds_difference(target, prediction, sensitive_features = gender)
dpr = demographic_parity_ratio(target, prediction, sensitive_features = gender)

print("Demographic parity difference: {}".format(round(dpd, 2)))
print("Equalized odds difference: {}".format(round(eod, 2)))
print("Demographic parity ratio: {}".format(round(dpr, 2)))

Demographic parity difference: 0.15
Equalized odds difference: 0.08
Demographic parity ratio: 0.3


We can also calculate the differences for the sklearn metrics by using the difference_from_summary function. Let's illustrate this with the accuracy.

In [10]:
from fairlearn.metrics import difference_from_summary

sum_acc = group_summary(accuracy_score, target, prediction, sensitive_features = gender)
accd = difference_from_summary(sum_acc)
print("Difference in accuracy: {}".format(round(accd, 2)))

Difference in accuracy: 0.12


Of course, if we have to do this for every metric, this is not user-friendly. Luckily, fairlean has an **interactive dashboard** that you can use to assess the fairness. This metric is part of *fairlearn.widget* and can be called using the *FairlearnDashboard* method. Let's illustrate this:

In [11]:
from fairlearn.widget import FairlearnDashboard

FairlearnDashboard(y_true = target,
                   y_pred = prediction,
                   sensitive_features = gender,
                   sensitive_feature_names = ["gender"])

FairlearnWidget(value={'true_y': [0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1…

<fairlearn.widget._fairlearn_dashboard.FairlearnDashboard at 0x1ea16e138e0>

We can do the same but this time for the race protected variable.

In [12]:
FairlearnDashboard(y_true = target,
                   y_pred = prediction,
                   sensitive_features = race,
                   sensitive_feature_names = ["race"])

FairlearnWidget(value={'true_y': [0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1…

<fairlearn.widget._fairlearn_dashboard.FairlearnDashboard at 0x1ea1adeca60>

### Mitigating unfairness
In the previous section, we have identified some problems with the algorithm. Let's focus now on trying to solve the issues with the gender unfairness. Related to this gender unfairness we have identified that:
- Men that should get a loan are disadvantage compared to women that should get a loan
- Women in general are disadvantaged for getting a loan

We can try to solve these issues by using build in fairness mitigation methods. In general, we use one of the *Reduction* method. These reduction methods requires three parameters:
1. Base_estimater: the estimator that was used (usually comes from sklearn). In our case we used a DecisionTreeClassifier.
2. Constraints: the fairness constraints that should be satisfied by the model. In fairlearn we have (for binary classification) the *DemographicParity*, *TruePositiveRateParity*, *EqualizedOdds* and the *ErrorRateParity* constraint.
3. Sensitive features: which sensitive feature you want to take into account (in the fit method)

The two reduction methods are: ExponentiatedGradient or GridSearch.

Let's try to implement this on the gender unfairness. Since we are interested in getting the disparity in predictions right, we will start by implementing the demographic parity constraint. Then, we will try to solve the other unfairness by implementing the equalized odds constraint.

##### DemographicParity

In [13]:
from fairlearn.reductions import ExponentiatedGradient, DemographicParity, EqualizedOdds

classifier = DecisionTreeClassifier(min_samples_leaf=10, max_depth=4)
dp = DemographicParity()
reduction = ExponentiatedGradient(classifier, dp)

reduction.fit(df, target, sensitive_features=gender)
prediction_dp = reduction.predict(df)

##### EqualizedOdds

In [14]:
eo = EqualizedOdds()
reduction = ExponentiatedGradient(classifier, eo)

reduction.fit(df, target, sensitive_features=gender)
prediction_eo = reduction.predict(df)

In [15]:
FairlearnDashboard(y_true = target,
                   y_pred = {"prediction_original" : prediction,
                             "prediction_dp": prediction_dp,
                             "prediction_eo": prediction_eo},
                   sensitive_features = gender,
                   sensitive_feature_names = ["gender"])

FairlearnWidget(value={'true_y': [0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1…

<fairlearn.widget._fairlearn_dashboard.FairlearnDashboard at 0x1ea1b11df10>

**Notice how little accuracy we are sacrificing in order to get rid of the disparity in predictions (almost completely gone)!**

How do we implement this in practice? Instead of using the classifier to predict, as we did initially, we just use the reduction that is trained by using the demographic parity constraint to make the predictions. In the end, this means that we only need a couple of extra lines of code to create a fair ML model. We did of course have to spends more time on the exploration phase.

### Postprocessing technique

We can also use the postprocessing package of fairlearn. More specifically the *ThresholdOptimizer*. This optimizer determines the optimal threshold using a trade-off between an objective (e.g. optimizing accuracy) and a constraint (e.g. the DemographicParity constraint).

This ThresholdOptimizer only requires an estimator. By default it has the objective to optimize for accuracy score and the default constraint is DemographicParity. How to tweak these parameters can be found [here](https://fairlearn.github.io/api_reference/fairlearn.postprocessing.html). Let's implement this:

In [16]:
from fairlearn.postprocessing import ThresholdOptimizer

reduction = ThresholdOptimizer(estimator = classifier)
reduction.fit(df, target, sensitive_features = gender)
prediction_threshold = reduction.predict(df, sensitive_features = gender)

Let's see how well this performs:

In [17]:
FairlearnDashboard(y_true = target,
                   y_pred = {"prediction_original" : prediction,
                             "prediction_dp": prediction_dp,
                             "prediction_eo": prediction_eo,
                             "prediction_threshold": prediction_threshold},
                   sensitive_features = gender,
                   sensitive_feature_names = ["gender"])

FairlearnWidget(value={'true_y': [0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1…

<fairlearn.widget._fairlearn_dashboard.FairlearnDashboard at 0x1ea1b450550>