## Introduction to Fairlearn: Performing a GridSearch with Census Data

This notebook shows how to use Fairlearn and the Fairness dashboard to
generate predictors for the Census dataset. This dataset is a
classification problem - given a range of data about 32,000 individuals,
predict whether their annual income is above or below fifty thousand
dollars per year.

For the purposes of this notebook, we shall treat this as a loan
decision problem. We will pretend that the label indicates whether or
not each individual repaid a loan in the past. We will use the data to
train a predictor to predict whether previously unseen individuals will
repay a loan or not. The assumption is that the model predictions are
used to decide whether an individual should be offered a loan.

We will first train a fairness-unaware predictor and show that it leads
to unfair decisions under a specific notion of fairness called
*demographic parity*. We then mitigate unfairness by applying the
`GridSearch`{.sourceCode} algorithm from the Fairlearn package.


### Import the Required Libraries

In [1]:
import pandas as pd

from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.linear_model import LogisticRegression

from fairlearn.reductions import GridSearch
from fairlearn.reductions import DemographicParity, ErrorRate
from fairlearn.widget import FairlearnDashboard

%matplotlib inline

### Import the Dataset

In [2]:
data = fetch_openml(data_id=1590, as_frame=True)

X_raw = data.data
y = (data.target == '>50K') * 1

# Take a quick look at the data
print("{0} Observations x {1} Features".format(len(X_raw), len(X_raw.columns)))
X_raw.head()

48842 Observations x 14 Features


Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country
0,25.0,Private,226802.0,11th,7.0,Never-married,Machine-op-inspct,Own-child,Black,Male,0.0,0.0,40.0,United-States
1,38.0,Private,89814.0,HS-grad,9.0,Married-civ-spouse,Farming-fishing,Husband,White,Male,0.0,0.0,50.0,United-States
2,28.0,Local-gov,336951.0,Assoc-acdm,12.0,Married-civ-spouse,Protective-serv,Husband,White,Male,0.0,0.0,40.0,United-States
3,44.0,Private,160323.0,Some-college,10.0,Married-civ-spouse,Machine-op-inspct,Husband,Black,Male,7688.0,0.0,40.0,United-States
4,18.0,,103497.0,Some-college,10.0,Never-married,,Own-child,White,Female,0.0,0.0,30.0,United-States


### Separate the Sensitive Feature(s)
**Sex** is a Sensitive Feature (i.e., a feature that could lead to biased predictions).

In [3]:
A = X_raw["sex"]
X = X_raw.drop(labels=['sex'], axis=1)

### Perform Data Preprocessing

In [4]:
# One-Hot Encode the Categorical Features
X = pd.get_dummies(X)

# Scale the Numerical Features
sc = StandardScaler()
X_scaled = sc.fit_transform(X)

# Use the Preprocessed Independent Features (X) to Create a Pandas DataFrame
X_scaled = pd.DataFrame(X_scaled, columns=X.columns)

# Label Encode the Dependent (Target) Feature (y)
le = LabelEncoder()
y = le.fit_transform(y)

### Perform a Train/Test Split

In [5]:
X_train, X_test, y_train, y_test, A_train, A_test = train_test_split(X_scaled, y, A,
                                                                     test_size=0.2,
                                                                     random_state=0,
                                                                     stratify=y)

# Work around indexing bug
X_train = X_train.reset_index(drop=True)
A_train = A_train.reset_index(drop=True)
X_test = X_test.reset_index(drop=True)
A_test = A_test.reset_index(drop=True)

### First, Train an Unmitigated (Fairness-Unaware) Predictor as a Baseline:
To demonstrate the effect of Fairlearn:
- Train a ML predictor that does not mitigate the effects of bias
- Load the fairness-unaware predictor into the Fairness dashboard to assess its fairness

In [6]:
unmitigated_predictor = LogisticRegression(solver='liblinear', fit_intercept=True)
unmitigated_predictor.fit(X_train, y_train)

LogisticRegression(solver='liblinear')

In [7]:
FairlearnDashboard(sensitive_features=A_test, sensitive_feature_names=['sex'],
                   y_true=y_test,
                   y_pred={"unmitigated": unmitigated_predictor.predict(X_test)})

FairlearnWidget(value={'true_y': [0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1…

  warn("The FairlearnDashboard will move from Fairlearn to the "


<fairlearn.widget._fairlearn_dashboard.FairlearnDashboard at 0x7f764c0210b8>

Looking at the disparity in accuracy, we see that males have an error
about three times greater than the females. More interesting is the
disparity in opportunity - males are offered loans at three times the
rate of females.

Despite the fact that we removed the feature from the training data, our
predictor still discriminates based on sex. This demonstrates that
simply ignoring a sensitive feature when fitting a predictor rarely
eliminates unfairness. There will generally be enough other features
correlated with the removed feature to lead to disparate impact.


### Then, Mitigate Bias using Fairlearn GridSearch
Supply a standard ML estimator, which is treated as a blackbox. GridSearch generates a sequence of relabellings and reweightings; training a predictor for each iteration.

For this example, demographic parity (on the sensitive feature of sex) is specified as the fairness metric. Demographic parity requires that individuals are offered the opportunity (are approved for a loan in this example) independent of membership in the sensitive class (i.e., females and males should be offered loans at the same rate).


In [8]:
sweep = GridSearch(LogisticRegression(solver='liblinear', fit_intercept=True),
                   constraints=DemographicParity(), grid_size=71)

Our algorithms provide `fit()`{.sourceCode} and `predict()`{.sourceCode}
methods, so they behave in a similar manner to other ML packages in
Python. We do however have to specify two extra arguments to
`fit()`{.sourceCode} - the column of sensitive feature labels, and also
the number of predictors to generate in our sweep.

After `fit()`{.sourceCode} completes, we extract the full set of
predictors from the fairlearn.reductions.GridSearch object.


In [9]:
sweep.fit(X_train, y_train, sensitive_features=A_train)
predictors = sweep.predictors_

We could load these predictors into the Fairness dashboard now. However,
the plot would be somewhat confusing due to their number. In this case,
we are going to remove the predictors which are dominated in the
error-disparity space by others from the sweep (note that the disparity
will only be calculated for the sensitive feature; other potentially
sensitive features will not be mitigated). In general, one might not
want to do this, since there may be other considerations beyond the
strict optimization of error and disparity (of the given sensitive
feature).


In [10]:
errors, disparities = [], []
for m in predictors:
    def classifier(X): return m.predict(X)

    error = ErrorRate()
    error.load_data(X_train, pd.Series(y_train), sensitive_features=A_train)
    disparity = DemographicParity()
    disparity.load_data(X_train, pd.Series(y_train), sensitive_features=A_train)

    errors.append(error.gamma(classifier)[0])
    disparities.append(disparity.gamma(classifier).max())

all_results = pd.DataFrame({"predictor": predictors, "error": errors, "disparity": disparities})

non_dominated = []
for row in all_results.itertuples():
    errors_for_lower_or_eq_disparity = all_results["error"][all_results["disparity"] <= row.disparity]
    if row.error <= errors_for_lower_or_eq_disparity.min():
        non_dominated.append(row.predictor)

Finally, we can put the dominant models into the Fairness dashboard,
along with the unmitigated model.


In [11]:
dashboard_predicted = {"unmitigated": unmitigated_predictor.predict(X_test)}
for i in range(len(non_dominated)):
    key = "dominant_model_{0}".format(i)
    value = non_dominated[i].predict(X_test)
    dashboard_predicted[key] = value


FairlearnDashboard(sensitive_features=A_test, sensitive_feature_names=['sex'],
                   y_true=y_test, y_pred=dashboard_predicted)

  warn("The FairlearnDashboard will move from Fairlearn to the "


FairlearnWidget(value={'true_y': [0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1…

<fairlearn.widget._fairlearn_dashboard.FairlearnDashboard at 0x7f7642ad29e8>

We see a Pareto front forming - the set of predictors which represent
optimal tradeoffs between accuracy and disparity in predictions. In the
ideal case, we would have a predictor at (1,0) - perfectly accurate and
without any unfairness under demographic parity (with respect to the
sensitive feature "sex"). The Pareto front represents the closest we can
come to this ideal based on our data and choice of estimator. Note the
range of the axes - the disparity axis covers more values than the
accuracy, so we can reduce disparity substantially for a small loss in
accuracy.

By clicking on individual models on the plot, we can inspect their
metrics for disparity and accuracy in greater detail. In a real example,
we would then pick the model which represented the best trade-off
between accuracy and disparity given the relevant business constraints.
