#Identifying and Mitigating Bias: An Example

The following notebook demonstrates how one might go about identifying and mitigating bias in for a neural network model that attempts to predict whether a person makes more than $50,000 per year based upon census data. You can read about the data set here:

https://archive.ics.uci.edu/ml/datasets/adult

## Installing Necessary Packages

See explanations of these packages below.

In [None]:
!pip install lime
!pip install --no-dependencies aif360

## Building a Simple Neural Network
The following neural network is merely for the sake of illustration. I have not optimized it for performance at all. There are four hidden layers, each with dropout of 0.5.

In [None]:
import numpy as np

from aif360.datasets import AdultDataset
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.datasets import boston_housing
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.models import Sequential

### The Adult Dataset
Predict whether income exceeds $50K/yr based on census data. Also known as "Census Income" dataset.

In [None]:
census_dataset = AdultDataset(instance_weights_name='fnlwgt',
                              features_to_drop=[])

In [None]:
census_dataset.protected_attribute_names

In [None]:
train, valid = census_dataset.split(num_or_size_splits=[0.8], shuffle=True, seed=1000)

In [None]:
x_train_raw, y_train = train.features, train.labels
x_valid_raw, y_valid = valid.features, valid.labels

In [None]:
scaler = MinMaxScaler()
scaler.fit(census_dataset.features)
x_train = scaler.transform(x_train_raw)
x_valid = scaler.transform(x_valid_raw)

### The Neural Network Model

In [None]:
model = Sequential()
model.add(Dense(256, activation='relu', input_shape=(x_train.shape[1],)))
model.add(Dropout(0.5))
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))

In [None]:
model.summary()

In [None]:
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

In [None]:
model.fit(x_train,
          y_train,
          batch_size=32,
          epochs=25,
          validation_data=(x_valid, y_valid),
          callbacks=[EarlyStopping(monitor='val_acc',
                                   patience=5,
                                   restore_best_weights=True)])

In [None]:
loss, accuracy = model.evaluate(x=x_valid, y=y_valid)

In [None]:
model.predict([[x_valid[3]]])
        

## Building Trust: Explain a Prediction Using LIME
LIME stands for "Local Interpretable Model-Agnostic Explanations". LIME "explains the predictions of any classifier in an interpretable and faithful manner, by learning an interpretable model locally around the prediction." See the following paper:

https://arxiv.org/pdf/1602.04938v1.pdf

We will use this package:

https://github.com/marcotcr/lime

In [None]:
from lime import lime_tabular

In [None]:
categorical_indices = [2, 3, 5] + list(range(7, 99))
tabular_explainer = lime_tabular.LimeTabularExplainer(x_train,
                                                      feature_names=train.feature_names,
                                                      class_names=train.label_names,
                                                      categorical_features=categorical_indices,
                                                      verbose=True,
                                                      mode='classification',)

In [None]:
# reshape model.predict to use in explainer
def predict(numpy_array):
    p = model.predict(numpy_array).reshape(-1, 1)
    return np.hstack((1-p, p))

In [None]:
x_valid_index = 33

In [None]:
explainer = tabular_explainer.explain_instance(x_valid[x_valid_index],
                                               predict,
                                               num_features=99)

In [None]:
# what is the true y?
y_valid[x_valid_index]

In [None]:
explainer.as_list()

In [None]:
explainer.show_in_notebook(show_table=True)  # race == 0: not caucasian; sex == 0: female

## Check for Bias across Entire Dataset and Mitigate if Necessary

In [None]:
from aif360.datasets import BinaryLabelDataset
from aif360.metrics import BinaryLabelDatasetMetric, DatasetMetric
import pandas as pd

In [None]:
dataset_metric = BinaryLabelDatasetMetric(train,
                                          unprivileged_groups=[{'race': 0, 'sex': 0}],
                                          privileged_groups=[{'race': 1, 'sex': 1}])

#### Discover How Fair the Model Is

- **Disparate Impact** (ideal = 1)
    - unprivileged favorable outcome rates / privileged favorable outcome rates
- **Statistical Parity Difference** (ideal = 0)
    - unprivileged favorable outcome rates - privileged favorable outcome rates
- **Equal Opportunity Difference** (ideal = 0)
    - unprivileged true positive rate / privileged true positive rate

In [None]:
dataset_metric.disparate_impact()  # ideal == 1

In [None]:
dataset_metric.statistical_parity_difference()  # ideal == 0

#### Try Mitigating Bias In-Process

In [None]:
from aif360.algorithms.inprocessing.adversarial_debiasing import AdversarialDebiasing
from tensorflow.keras.backend import get_session

In [None]:
adversarial_model = AdversarialDebiasing(unprivileged_groups=[{'race': 0, 'sex': 0}],
                                         privileged_groups=[{'race': 1, 'sex': 1}],
                                         scope_name='main',
                                         sess=get_session(),
                                         seed=1000)

In [None]:
adversarial_model.fit(train)

In [None]:
new_dataset = adversarial_model.predict(train)

In [None]:
new_dataset_metric = BinaryLabelDatasetMetric(new_dataset,
                                              unprivileged_groups=[{'race': 0, 'sex': 0}],
                                              privileged_groups=[{'race': 1, 'sex': 1}])

In [None]:
old_di = dataset_metric.disparate_impact()
new_di = new_dataset_metric.disparate_impact()
print('Disparate Impact:', old_di, '=>', new_di, '(ideal == 1)')

In [None]:
old_sp = dataset_metric.statistical_parity_difference()  # ideal == 0
new_sp = new_dataset_metric.statistical_parity_difference()
print('Statistical Parity:', old_sp, '=>', new_sp, '(ideal == 0)')

In [None]:
new_accuracy = np.sum(train.labels == adversarial_model.predict(train).labels) / train.labels.shape[0]
print('Accuracy of Model:', accuracy, '=>', new_accuracy)