# Tutorial on Building Fair AI Models

## Presenter: Moninder Singh, IBM Research AI

## PyData New York, 2018

<a id="toc"></a>

## Table of Contents

[1. Summary](#summary)<br>
[2. Data Used](#data_used)<br>
[3. Learning model from Adult dataset to predict income](#2015-data)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[3.1. Load data & create splits for learning/validating/testing model](#2015-data-load)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[3.2. Learning model from original data](#original-2015-data)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[3.2.1. Metrics for original data](#original-2015-metrics)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[3.2.2. Learning Logistic Regression (LR) classifier from original data](#lr_orig)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[3.2.2.1. Training LR model from original data](#lr-train)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[3.2.2.2. Validating LR model from original data](#lr-validate)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[3.2.2.3. Testing LR model from original data](#lr-test)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[3.3 Bias Mitigation using post-processing technique - Reject Option Classifier](#reweighing-2015)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[3.3.1. Estimate optimal parameters for the ROC method](#reweighing-2015-transform)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[3.3.2. Transforming/Evaluating model](#lr_transf)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[3.3.3.2. Transforming and Computing Predictions from Validation Set](#lr-rw-validate)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[3.3.3.3. Transforming and Computing Predictions from Test Set](#lr-rw-test)<br>


<a id="summary"></a>

## 1. Summary

[Back to TOC](#toc)<br>

The notebook demonstrates how a variety of fairness metrics and bias mitigation algorithms can used to detect and reduce bias when learning classifiers.

Additional information is available in the accompanying [presentation](https://github.com/monindersingh/pydata2018_fairAI_models_tutorial/blob/master/pydata_nyc_2018_building_fair_AI_models.pdf).

An open source Python toolkit, [AI Fairness 360 (AIF360)](https://github.com/IBM/AIF360), is used for this demonstration. This toolkit provides additional [examples](https://github.com/IBM/AIF360/tree/master/examples) of bias detection/mitigation as well as [guidance](http://aif360.mybluemix.net/resources#guidance) figuring out what metrics and algorithms will be most apropriate for a given problem.

Bias detection is demonstrated using several metrics, including disparate impact, average odds difference, statistical parity difference, equal opportunity difference, and Theil index.

Bias mitigation is explored via reject option classification (post-processing technique).


<a id="use_case"></a>

<a id="data_used"></a>

## 2. Data used

[Back to TOC](#toc)<br>

#### Adult Dataset

Source: [https://archive.ics.uci.edu/ml/datasets/adult](https://archive.ics.uci.edu/ml/datasets/adult).

This dataset is used to predict whether income exceeds $50K/yr. 

Contains 48842 instances, mix of continuous and discrete. `adult.data` contains
the first 32561 instances which comprise the original training set while
`adult.test` contains an additional 16281 instances which comprise the original
test set.



In [1]:
# Load all necessary packages
import sys
sys.path.append("../")
import numpy as np

#Datasets
from aif360.datasets import BinaryLabelDataset
from aif360.datasets import AdultDataset



#Fairness Metrics
from aif360.metrics import BinaryLabelDatasetMetric
from aif360.metrics import ClassificationMetric

#Scalers
from sklearn.preprocessing import StandardScaler

from sklearn.metrics import accuracy_score

#Classifiers
from sklearn.linear_model import LogisticRegression

#Bias Mitigation Techniques
from aif360.algorithms.postprocessing.reject_option_classification\
        import RejectOptionClassification

#Utils
from aif360.metrics.utils import compute_boolean_conditioning_vector
from aif360.algorithms.preprocessing.optim_preproc_helpers.data_preproc_functions\
        import load_preproc_data_adult, load_preproc_data_german, load_preproc_data_compas
from aif360.algorithms.postprocessing.reject_option_classification\
        import RejectOptionClassification

from IPython.display import Markdown, display
%matplotlib inline

import matplotlib.pyplot as plt

#LIME
from aif360.datasets.lime_encoder import LimeEncoder
import lime
import lime.lime_tabular


from tqdm import tqdm
from warnings import warn






from IPython.display import Markdown, display
import matplotlib.pyplot as plt
from ipywidgets import interactive, FloatSlider

<a id="2015-data"></a>

## 3. Learning models from Adult dataset

[Back to TOC](#toc)<br>

<a id="2015-data-load"></a>

### 3.1. Load data & create splits for learning/validating/testing model

[Back to TOC](#toc)<br>

In [2]:
# Get the dataset and split into train, validate, and test

np.random.seed(1)

privileged_groups = [{'sex': 1}]
unprivileged_groups = [{'sex': 0}]
adult_orig = load_preproc_data_adult(['sex'])

metric_name = "Statistical parity difference"

# Upper and lower bound on the fairness metric used
metric_ub = 0.05
metric_lb = -0.05


adult_orig_train, adult_orig_validate, adult_orig_test = \
                                    adult_orig.split([0.7,0.85], shuffle=True)

sens_attr = adult_orig_train.protected_attribute_names[0]


**Show Adult dataset details**

In [3]:
# print out some labels, names, etc.
display(Markdown("#### Training Dataset shape"))
print(adult_orig_train.features.shape)
display(Markdown("#### Validation Dataset shape"))
print(adult_orig_validate.features.shape)
display(Markdown("#### Test Dataset shape"))
print(adult_orig_test.features.shape)
display(Markdown("#### Favorable and unfavorable labels"))
print(adult_orig_train.favorable_label, adult_orig_train.unfavorable_label)
print(adult_orig_train.metadata['label_maps'][0][adult_orig_train.favorable_label],\
      adult_orig_train.metadata['label_maps'][0][adult_orig_train.unfavorable_label])
display(Markdown("#### Protected attribute names"))
print(adult_orig_train.protected_attribute_names)
display(Markdown("#### Privileged and unprivileged protected attribute values"))
print(adult_orig_train.privileged_protected_attributes, 
      adult_orig_train.unprivileged_protected_attributes)
print(adult_orig_train.metadata['protected_attribute_maps'][0][adult_orig_train.privileged_protected_attributes[0][0]],
      adult_orig_train.metadata['protected_attribute_maps'][0][adult_orig_train.unprivileged_protected_attributes[0][0]])


display(Markdown("#### Dataset feature names"))
print(adult_orig_train.feature_names)

#### Training Dataset shape

(34189, 18)


#### Validation Dataset shape

(7326, 18)


#### Test Dataset shape

(7327, 18)


#### Favorable and unfavorable labels

1.0 0.0
>50K <=50K


#### Protected attribute names

['sex']


#### Privileged and unprivileged protected attribute values

[array([1.])] [array([0.])]
Male Female


#### Dataset feature names

['race', 'sex', 'Age (decade)=10', 'Age (decade)=20', 'Age (decade)=30', 'Age (decade)=40', 'Age (decade)=50', 'Age (decade)=60', 'Age (decade)=>=70', 'Education Years=6', 'Education Years=7', 'Education Years=8', 'Education Years=9', 'Education Years=10', 'Education Years=11', 'Education Years=12', 'Education Years=<6', 'Education Years=>12']


<a id="original-2015-data"></a>

### 3.2. Learning model from original data

[Back to TOC](#toc)<br>

<a id="original-2015-metrics"></a>

#### 3.2.1. Metrics for original data

In [4]:
# Metric for the original dataset
sens_idx = adult_orig_train.protected_attribute_names.index(sens_attr)
privileged_groups =  [{sens_attr:adult_orig_train.privileged_protected_attributes[sens_idx][0]}]
unprivileged_groups = [{sens_attr:adult_orig_train.unprivileged_protected_attributes[sens_idx][0]}]
metric_adult_orig_train = BinaryLabelDatasetMetric(adult_orig_train, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)
display(Markdown("#### Original training dataset"))
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % \
                                                  metric_adult_orig_train.mean_difference())
print("Disparate impact (ratio of unprivileged favorable mean to privileged favorable mean) = %f" % \
                                                  metric_adult_orig_train.disparate_impact())


#### Original training dataset

Difference in mean outcomes between unprivileged and privileged groups = -0.190698
Disparate impact (ratio of unprivileged favorable mean to privileged favorable mean) = 0.366580


<a id="lr_orig"></a>

#### 3.2.2. Learning Logistic Regression (LR) classifier from original data

<a id="lr-train"></a>

#### 3.2.2.1. Training LR model from original data

In [5]:
#Train model on given dataset

dataset = adult_orig_train  # data to train on

scale = StandardScaler().fit(dataset.features)   # remember the scale

model = LogisticRegression()       # model to learn

X_train = scale.transform(dataset.features)      #apply the scale
y_train = dataset.labels.ravel()


model.fit(X_train, y_train,
        sample_weight=dataset.instance_weights)
y_train_pred = model.predict(X_train)

#save model
adult_orig_lr = model
adult_orig_lr_scale = scale

# positive class index
pos_ind = np.where(model.classes_ == adult_orig_train.favorable_label)[0][0]

adult_orig_train_pred = adult_orig_train.copy(deepcopy=True)
adult_orig_train_pred.labels = y_train_pred



#### Obtain scores for validation and test sets

In [6]:
adult_orig_validate_pred = adult_orig_validate.copy(deepcopy=True)
X_valid = adult_orig_lr_scale.transform(adult_orig_validate_pred.features)
y_valid = adult_orig_validate_pred.labels
adult_orig_validate_pred.scores = adult_orig_lr.predict_proba(X_valid)[:,pos_ind].reshape(-1,1)

adult_orig_test_pred = adult_orig_test.copy(deepcopy=True)
X_test = adult_orig_lr_scale.transform(adult_orig_test_pred.features)
y_test = adult_orig_test_pred.labels
adult_orig_test_pred.scores = adult_orig_lr.predict_proba(X_test)[:,pos_ind].reshape(-1,1)

### Find the optimal parameters from the validation set

#### Best threshold for classification only (no fairness)

In [7]:
num_thresh = 100
ba_arr = np.zeros(num_thresh)
class_thresh_arr = np.linspace(0.01, 0.99, num_thresh)
for idx, class_thresh in enumerate(class_thresh_arr):
    
    fav_inds = adult_orig_validate_pred.scores > class_thresh
    adult_orig_validate_pred.labels[fav_inds] = adult_orig_validate_pred.favorable_label
    adult_orig_validate_pred.labels[~fav_inds] = adult_orig_validate_pred.unfavorable_label
    
    classified_metric_orig_valid = ClassificationMetric(adult_orig_validate,
                                             adult_orig_validate_pred, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)
    
    ba_arr[idx] = 0.5*(classified_metric_orig_valid.true_positive_rate()\
                       +classified_metric_orig_valid.true_negative_rate())

best_ind = np.where(ba_arr == np.max(ba_arr))[0][0]
best_class_thresh = class_thresh_arr[best_ind]

print("Best balanced accuracy (no fairness constraints) = %.4f" % np.max(ba_arr))
print("Optimal classification threshold (no fairness constraints) = %.4f" % best_class_thresh)

Best balanced accuracy (no fairness constraints) = 0.7500
Optimal classification threshold (no fairness constraints) = 0.2377


<a id="lr-validate"></a>

#### 3.2.2.2. Validating LR model from original data

In [8]:
# Metrics function
from collections import OrderedDict
from aif360.metrics import ClassificationMetric

def compute_metrics(dataset_true, dataset_pred, 
                    unprivileged_groups, privileged_groups,
                    disp = True):
    """ Compute the key metrics """
    classified_metric_pred = ClassificationMetric(dataset_true,
                                                 dataset_pred, 
                                                 unprivileged_groups=unprivileged_groups,
                                                 privileged_groups=privileged_groups)
    metrics = OrderedDict()
    metrics["Balanced accuracy"] = 0.5*(classified_metric_pred.true_positive_rate()+
                                             classified_metric_pred.true_negative_rate())
    metrics["Statistical parity difference"] = classified_metric_pred.statistical_parity_difference()
    metrics["Disparate impact"] = classified_metric_pred.disparate_impact()
    metrics["Average odds difference"] = classified_metric_pred.average_odds_difference()
    metrics["Equal opportunity difference"] = classified_metric_pred.equal_opportunity_difference()
    metrics["Theil index"] = classified_metric_pred.theil_index()
    
    if disp:
        for k in metrics:
            print("%s = %.4f" % (k, metrics[k]))
    
    return metrics

In [9]:
# Metrics for the test set
fav_inds = adult_orig_validate_pred.scores > best_class_thresh
adult_orig_validate_pred.labels[fav_inds] = adult_orig_validate_pred.favorable_label
adult_orig_validate_pred.labels[~fav_inds] = adult_orig_validate_pred.unfavorable_label

display(Markdown("#### Validation set"))
display(Markdown("##### Raw predictions - No fairness constraints, only maximizing balanced accuracy"))

metric_valid_bef = compute_metrics(adult_orig_validate, adult_orig_validate_pred, 
                unprivileged_groups, privileged_groups)

#### Validation set

##### Raw predictions - No fairness constraints, only maximizing balanced accuracy

Balanced accuracy = 0.7500
Statistical parity difference = -0.3842
Disparate impact = 0.2590
Average odds difference = -0.3279
Equal opportunity difference = -0.3746
Theil index = 0.1074


<a id="lr-test"></a>

#### 3.2.2.3. Testing LR model

In [10]:
# Metrics for the test set
fav_inds = adult_orig_test_pred.scores > best_class_thresh
adult_orig_test_pred.labels[fav_inds] = adult_orig_test_pred.favorable_label
adult_orig_test_pred.labels[~fav_inds] = adult_orig_test_pred.unfavorable_label

display(Markdown("#### Test set"))
display(Markdown("##### Raw predictions - No fairness constraints, only maximizing balanced accuracy"))

metric_test_bef = compute_metrics(adult_orig_test, adult_orig_test_pred, 
                unprivileged_groups, privileged_groups)

#### Test set

##### Raw predictions - No fairness constraints, only maximizing balanced accuracy

Balanced accuracy = 0.7422
Statistical parity difference = -0.3908
Disparate impact = 0.2689
Average odds difference = -0.3386
Equal opportunity difference = -0.3780
Theil index = 0.1101


<a id="rf_orig"></a>

<a id="rf-train"></a>

<a id="reweighing-2015"></a>

### 3.3. Bias Mitigation using Post-Processing technique - Reject Option Classifier

[Back to TOC](#toc)<br>

<a id="reweighing-2015-transform"></a>

#### 3.3.1. Estimate optimal parameters for the ROC method

In [11]:
ROC = RejectOptionClassification(unprivileged_groups=unprivileged_groups, 
                                 privileged_groups=privileged_groups, 
                                 low_class_thresh=0.01, high_class_thresh=0.99,
                                  num_class_thresh=100, num_ROC_margin=50,
                                  metric_name=metric_name,
                                  metric_ub=metric_ub, metric_lb=metric_lb)
ROC = ROC.fit(adult_orig_validate, adult_orig_validate_pred)

In [12]:
print("Optimal classification threshold (with fairness constraints) = %.4f" % ROC.classification_threshold)
print("Optimal ROC margin = %.4f" % ROC.ROC_margin)

Optimal classification threshold (with fairness constraints) = 0.5049
Optimal ROC margin = 0.1819


<a id="reweighing-2015-metrics"></a>

<a id="lr_transf"></a>

#### 3.3.2. Transforming/Evaluating model

<a id="lr-rw-validate"></a>

#### 3.3.2.1. Transforming and Computing Predictions from Validation Set

In [13]:
adult_transf_validate_pred = ROC.predict(adult_orig_validate_pred)

display(Markdown("#### Validation set"))
display(Markdown("##### Transformed predictions - With fairness constraints"))
metric_validate_aft = compute_metrics(adult_orig_validate, adult_transf_validate_pred, 
                unprivileged_groups, privileged_groups)

#### Validation set

##### Transformed predictions - With fairness constraints

Balanced accuracy = 0.5989
Statistical parity difference = -0.0340
Disparate impact = 0.6887
Average odds difference = -0.0042
Equal opportunity difference = -0.0222
Theil index = 0.2134


<a id="lr-rw-test"></a>

#### 3.3.3.2. Transforming and Computing Predictions from Test Set

In [14]:
adult_transf_test_pred = ROC.predict(adult_orig_test_pred)

display(Markdown("#### Test set"))
display(Markdown("##### Transformed predictions - With fairness constraints"))
metric_test_aft = compute_metrics(adult_orig_test, adult_transf_test_pred, 
                unprivileged_groups, privileged_groups)

#### Test set

##### Transformed predictions - With fairness constraints

Balanced accuracy = 0.6030
Statistical parity difference = -0.0436
Disparate impact = 0.6164
Average odds difference = -0.0162
Equal opportunity difference = -0.0337
Theil index = 0.2183


<a id="deployment-2015-2015"></a>