# Démonstration FairLearn

<img src="https://fairlearn.github.io/_static/images/header-image.png" width="400">

> https://fairlearn.github.io/

The Fairlearn package has two components:

- A dashboard for assessing which groups are negatively impacted by a model, and for comparing multiple models in terms of various fairness and accuracy metrics.
- Algorithms for mitigating unfairness in a variety of AI tasks and along a variety of fairness definitions.

In [1]:
import sys
sys.version

'3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 19:07:31) \n[GCC 7.3.0]'

In [2]:
import datetime
now = datetime.datetime.now()
print(now)

2020-07-08 08:25:31.084447


In [3]:
#!pip install azureml-contrib-fairness

In [4]:
#!pip install fairlearn==0.4.6

In [5]:
from sklearn.model_selection import train_test_split
from fairlearn.widget import FairlearnDashboard
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import LabelEncoder, StandardScaler
import pandas as pd
import shap

# Load the census dataset
X_raw, Y = shap.datasets.adult()
X_raw["Race"].value_counts().to_dict()


# (Optional) Separate the "sex" and "race" sensitive features out and drop them from the main data prior to training your model
A = X_raw[['Sex','Race']]
X = X_raw.drop(labels=['Sex', 'Race'],axis = 1)
X = pd.get_dummies(X)

sc = StandardScaler()
X_scaled = sc.fit_transform(X)
X_scaled = pd.DataFrame(X_scaled, columns=X.columns)

# Perform some standard data preprocessing steps to convert the data into a format suitable for the ML algorithms
le = LabelEncoder()
Y = le.fit_transform(Y)

# Split data into train and test
from sklearn.model_selection import train_test_split
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test, A_train, A_test = train_test_split(X_scaled, 
                                                    Y, 
                                                    A,
                                                    test_size = 0.2,
                                                    random_state=0,
                                                    stratify=Y)

In [6]:
# Work around indexing issue
X_train = X_train.reset_index(drop=True)
A_train = A_train.reset_index(drop=True)
X_test = X_test.reset_index(drop=True)
A_test = A_test.reset_index(drop=True)

# Improve labels
A_test.Sex.loc[(A_test['Sex'] == 0)] = 'female'
A_test.Sex.loc[(A_test['Sex'] == 1)] = 'male'

A_test.Race.loc[(A_test['Race'] == 0)] = 'Amer-Indian-Eskimo'
A_test.Race.loc[(A_test['Race'] == 1)] = 'Asian-Pac-Islander'
A_test.Race.loc[(A_test['Race'] == 2)] = 'Black'
A_test.Race.loc[(A_test['Race'] == 3)] = 'Other'
A_test.Race.loc[(A_test['Race'] == 4)] = 'White'

# Train a classification model
lr_predictor = LogisticRegression(solver='liblinear', fit_intercept=True)
lr_predictor.fit(X_train, Y_train)


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


LogisticRegression(solver='liblinear')

In [7]:
# (Optional) View this model in Fairlearn's fairness dashboard, and see the disparities which appear:
from fairlearn.widget import FairlearnDashboard
FairlearnDashboard(sensitive_features=A_test, 
                   sensitive_feature_names=['Sex', 'Race'],
                   y_true=Y_test,
                   y_pred={"lr_model": lr_predictor.predict(X_test)})

FairlearnWidget(value={'true_y': [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0…

<fairlearn.widget._fairlearn_dashboard.FairlearnDashboard at 0x7fb98b77e278>