###**Part One**

**AIF 360 Summary**

Machine learning models are used for high stakes decisions about people. It causes statistical discrimination, which is objectionable when it places certain privileged groups at systematic advantage and certain unprivileged groups at systematic disadvantage resulting in unwanted bias.

AI Fairness 360 (AIF360), an extensible open source toolkit for detecting, understanding, and mitigating algorithmic biases. The goals of AIF360 are to promote a deeper understanding of fairness metrics and mitigation techniques; to enable an open common platform for fairness researchers and industry practitioners to share and benchmark their algorithms; and to help facilitate the transition of fairness research algorithms to use in an industrial setting. 

AIF360 brings bias metrics, bias mitigation algorithms, bias metric explanations, and industrial usability together. By integrating these aspects, AIF360 can enable stronger collaboration between AI fairness researchers and practitioners, helping to translate the collective research results to practicing data scientists, data engineers, and developers deploying solutions in a variety of industries. 

Its pipeline consists of pre-processing, in-processing, and post-processing techniques to mitigate bias. Pre-processing techniques modify the original dataset to try and mitigate bias while in-processing techniques obtain the same goal via model training. Post-processing bias mitigation involves modifying the predictions of the model to promote fairness.

Binary Label Dataset Metrics shed insight on bias between unprivleged and privileged groups and offer many measures of bias. Classification Metrics go one step further and allow for the comparison of Binary Label Metrics.



###**Part Two**

**Detecting and mitigating age bias on credit decisions**
The goal of this tutorial is to introduce the basic functionality of AI Fairness 360 to an interested developer who may not have a background in bias detection and mitigation.

**Biases and Machine Learning**

A machine learning model makes predictions of an outcome for a particular instance. (Given an instance of a loan application, predict if the applicant will repay the loan.) The model makes these predictions based on a training dataset, where many other instances (other loan applications) and actual outcomes (whether they repaid) are provided. Thus, a machine learning algorithm will attempt to find patterns, or generalizations, in the training dataset to use when a prediction for a new instance is needed. (For example, one pattern it might discover is "if a person has salary > USD 40K and has outstanding debt < USD 5, they will repay the loan".) In many domains this technique, called supervised machine learning, has worked very well.

However, sometimes the patterns that are found may not be desirable or may even be illegal. For example, a loan repay model may determine that age plays a significant role in the prediction of repayment because the training dataset happened to have better repayment for one age group than for another. This raises two problems: 1) the training dataset may not be representative of the true population of people of all age groups, and 2) even if it is representative, it is illegal to base any decision on a applicant's age, regardless of whether this is a good prediction based on historical data.

AI Fairness 360 is designed to help address this problem with fairness metrics and bias mitigators. Fairness metrics can be used to check for bias in machine learning workflows. Bias mitigators can be used to overcome bias in the workflow to produce a more fair outcome.

The loan scenario describes an intuitive example of illegal bias. However, not all undesirable bias in machine learning is illegal it may also exist in more subtle ways. For example, a loan company may want a diverse portfolio of customers across all income levels, and thus, will deem it undesirable if they are making more loans to high income levels over low income levels. Although this is not illegal or unethical, it is undesirable for the company's strategy.

As these two examples illustrate, a bias detection and/or mitigation toolkit needs to be tailored to the particular bias of interest. More specifically, it needs to know the attribute or attributes, called protected attributes, that are of interest: race is one example of a protected attribute and age is a second.

**The Machine Learning Workflow**

To understand how bias can enter a machine learning model, we first review the basics of how a model is created in a supervised machine learning process.

![alt text](https://nbviewer.jupyter.org/github/IBM/AIF360/blob/master/examples/images/Complex_NoProc_V3.jpg)

First, the process starts with a training dataset, which contains a sequence of instances, where each instance has two components: the features and the correct prediction for those features. Next, a machine learning algorithm is trained on this training dataset to produce a machine learning model. This generated model can be used to make a prediction when given a new instance. A second dataset with features and correct predictions, called a test dataset, is used to assess the accuracy of the model. Since this test dataset is the same format as the training dataset, a set of instances of features and prediction pairs, often these two datasets derive from the same initial dataset. A random partitioning algorithm is used to split the initial dataset into training and test datasets.

Bias can enter the system in any of the three steps above. The training data set may be biased in that its outcomes may be biased towards particular kinds of instances. The algorithm that creates the model may be biased in that it may generate models that are weighted towards particular features in the input. The test data set may be biased in that it has expectations on correct answers that may be biased. These three points in the machine learning process represent points for testing and mitigating bias. In AI Fairness 360 codebase, we call these points pre-processing, in-processing, and post-processing.

**AI Fairness 360**

We are now ready to utilize AI Fairness 360 (aif360) to detect and mitigate bias. We will use the German credit dataset, splitting it into a training and test dataset. We will look for bias in the creation of a machine learning model to predict if an applicant should be given credit based on various features from a typical credit application. The protected attribute will be "Age", with "1" (older than or equal to 25) and "0" (younger than 25) being the values for the privileged and unprivileged groups, respectively. For this first tutorial, we will check for bias in the initial training data, mitigate the bias, and recheck. More sophisticated machine learning workflows are given in the author tutorials and demo notebooks in the codebase.

Here are the steps involved

**Step 1: Write import statements**
**Step 2: Set bias detection options, load dataset, and split between train and test**
**Step 3: Compute fairness metric on original training dataset**
**Step 4: Mitigate bias by transforming the original dataset**
**Step 5: Compute fairness metric on transformed training dataset**

**Step 1 Import Statements**

As with any python program, the first step will be to import the necessary packages. Below we import several components from the aif360 package. We import the GermanDataset, metrics to check for bias, and classes related to the algorithm we will use to mitigate bias.

In [0]:
pip install aif360



In [0]:
pip install -I tensorflow==1.13.1

Collecting tensorflow==1.13.1
  Using cached https://files.pythonhosted.org/packages/77/63/a9fa76de8dffe7455304c4ed635be4aa9c0bacef6e0633d87d5f54530c5c/tensorflow-1.13.1-cp36-cp36m-manylinux1_x86_64.whl
Collecting six>=1.10.0
  Using cached https://files.pythonhosted.org/packages/ee/ff/48bde5c0f013094d729fe4b0316ba2a24774b3ff1c52d924a8a4cb04078a/six-1.15.0-py2.py3-none-any.whl
Collecting tensorboard<1.14.0,>=1.13.0
  Using cached https://files.pythonhosted.org/packages/0f/39/bdd75b08a6fba41f098b6cb091b9e8c7a80e1b4d679a581a0ccd17b10373/tensorboard-1.13.1-py3-none-any.whl
Processing /root/.cache/pip/wheels/7c/06/54/bc84598ba1daf8f970247f550b175aaaee85f68b4b0c5ab2c6/termcolor-1.1.0-cp36-none-any.whl
Collecting astor>=0.6.0
  Using cached https://files.pythonhosted.org/packages/c3/88/97eef84f48fa04fbd6750e62dcceafba6c63c81b7ac1420856c8dcc0a3f9/astor-0.8.1-py2.py3-none-any.whl
Collecting keras-applications>=1.0.6
  Using cached https://files.pythonhosted.org/packages/71/e3/19762fdfc62877ae9

In [0]:
# Load all necessary packages
import sys
sys.path.insert(1, "../")  

import urllib
import numpy as np
np.random.seed(0)

from aif360.datasets import GermanDataset, CompasDataset
from aif360.metrics import BinaryLabelDatasetMetric, ClassificationMetric
from aif360.metrics.utils import compute_boolean_conditioning_vector
from aif360.algorithms.preprocessing import Reweighing, OptimPreproc
from aif360.algorithms.preprocessing.optim_preproc_helpers.opt_tools import OptTools
from aif360.algorithms.preprocessing.optim_preproc_helpers.distortion_functions import get_distortion_german
from aif360.algorithms.preprocessing.optim_preproc_helpers.data_preproc_functions import load_preproc_data_compas
from aif360.algorithms.postprocessing.reject_option_classification import RejectOptionClassification
from aif360.algorithms.inprocessing.adversarial_debiasing import AdversarialDebiasing

from sklearn.metrics import accuracy_score
import tensorflow as tf

from IPython.display import Markdown, display

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


**Step 2 Load dataset, specifying protected attribute, and split dataset into train and test**

In Step 2 we load the initial dataset, setting the protected attribute to be age. We then splits the original dataset into training and testing datasets. Although we will use only the training dataset in this tutorial, a normal workflow would also use a test dataset for assessing the efficacy (accuracy, fairness, etc.) during the development of a machine learning model. Finally, we set two variables (to be used in Step 3) for the privileged (1) and unprivileged (0) values for the age attribute. These are key inputs for detecting and mitigating bias, which will be Step 3 and Step 4.

In [0]:
urllib.request.urlretrieve ("https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/german.data", "/usr/local/lib/python3.6/dist-packages/aif360/data/raw/german/german.data")

dataset_orig = GermanDataset(
    protected_attribute_names=['age'],           # this dataset also contains protected
                                                 # attribute for "sex" which we do not
                                                 # consider in this evaluation
    privileged_classes=[lambda x: x >= 25],      # age >=25 is considered privileged
    features_to_drop=['personal_status', 'sex'] # ignore sex-related attributes
)
dataset_orig_train, dataset_orig_test = dataset_orig.split([0.7], shuffle=True)
privileged_groups = [{'age': 1}]
unprivileged_groups = [{'age': 0}]


**Step 3 Compute fairness metric on original training dataset**

Now that we've identified the protected attribute 'age' and defined privileged and unprivileged values, we can use aif360 to detect bias in the dataset. One simple test is to compare the percentage of favorable results for the privileged and unprivileged groups, subtracting the former percentage from the latter. A negative value indicates less favorable outcomes for the unprivileged groups. This is implemented in the method called mean_difference on the BinaryLabelDatasetMetric class. The code below performs this check and displays the output, showing that the difference is -0.169905.

In [0]:
metric_orig_train = BinaryLabelDatasetMetric(dataset_orig_train, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)
display(Markdown("#### Original training dataset"))
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_train.mean_difference())

metric_orig_test = BinaryLabelDatasetMetric(dataset_orig_test, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)
display(Markdown("#### Original testing dataset"))
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_test.mean_difference())

#### Original training dataset

Difference in mean outcomes between unprivileged and privileged groups = -0.169905


#### Original testing dataset

Difference in mean outcomes between unprivileged and privileged groups = -0.006313


**Step 4 Mitigate bias by transforming the original dataset**

The previous step showed that the privileged group was getting 17% more positive outcomes in the training dataset. Since this is not desirable, we are going to try to mitigate this bias in the training dataset. As stated above, this is called pre-processing mitigation because it happens before the creation of the model.

AI Fairness 360 implements several pre-processing mitigation algorithms. We will choose the Reweighing algorithm [1], which is implemented in the Reweighing class in the aif360.algorithms.preprocessing package. This algorithm will transform the dataset to have more equity in positive outcomes on the protected attribute for the privileged and unprivileged groups.

We then call the fit and transform methods to perform the transformation, producing a newly transformed training dataset (dataset_transf_train).

[1] F. Kamiran and T. Calders, "Data Preprocessing Techniques for Classification without Discrimination," Knowledge and Information Systems, 2012.

In [0]:
RW = Reweighing(unprivileged_groups=unprivileged_groups,
                privileged_groups=privileged_groups)
dataset_transf_train = RW.fit_transform(dataset_orig_train)
dataset_transf_test = RW.fit_transform(dataset_orig_test)

**Step 5 Compute fairness metric on transformed dataset**

Now that we have a transformed dataset, we can check how effective it was in removing bias by using the same metric we used for the original training dataset in Step 3. Once again, we use the function mean_difference in the BinaryLabelDatasetMetric class. We see the mitigation step was very effective, the difference in mean outcomes is now 0.0. So we went from a 17% advantage for the privileged group to equality in terms of mean outcome.

In [0]:
metric_transf_train = BinaryLabelDatasetMetric(dataset_transf_train, 
                                               unprivileged_groups=unprivileged_groups,
                                               privileged_groups=privileged_groups)
display(Markdown("#### Transformed training dataset"))
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_transf_train.mean_difference())

#### Transformed training dataset

Difference in mean outcomes between unprivileged and privileged groups = 0.000000


**Summary**

The purpose of this tutorial is to give a new user to bias detection and mitigation a gentle introduction to some of the functionality of AI Fairness 360. A more complete use case would take the next step and see how the transformed dataset impacts the accuracy and fairness of a trained model. This is implemented in the demo notebook in the examples directory of toolkit, called demo_reweighing_preproc.ipynb. I highly encourage readers to view that notebook as it is generalization and extension of this simple tutorial.

There are many metrics one can use to detect the presence of bias. AI Fairness 360 provides many of them for your use. Since it is not clear which of these metrics to use, we also provide some guidance. Likewise, there are many different bias mitigation algorithms one can employ, many of which are in AI Fairness 360. Other tutorials will demonstrate the use of some of these metrics and mitigations algorithms.

As mentioned earlier, both fairness metrics and mitigation algorithms can be performed at various stages of the machine learning pipeline. We recommend checking for bias as often as possible, using as many metrics are relevant for the application domain. We also recommend incorporating bias detection in an automated continouus integration pipeline to ensure bias awareness as a software project evolves.

###**Part Three**

Above, we imported the German Dataset and split it into training and testing data. We also re-weighted the training and testing data to try and mitigate bias. On top of this pre-processing bias mitigation. mitigation, AIF360's pipeline includes in-processing and post-processing techniques. We use adversarial debiasing to train a model that improves fairness and reject option classification to modify the predictions from our model to even further improve fairness.

###Adversarial Debiasing
This in-process technique limits the influence of the protected attribute in the decision by making it difficult to determine the protected attribute from the final prediction. This creates a model that reduces bias and promotes fairness for both groups.

###Reject Option Classification
This post-processing technique looks at points near the decision boundary and switches predictions of unprivileged groups to favorable and privileged groups to unfavorable to further reduce bias from the model. It selects a region around the boundary with a pre-set margin and designed to catch points of highest uncertainty. This model doesn't just give equality for privileged and unprivileged groups, it goes a step further by giving preferential treatment to the unprivileged group.

In [0]:
def print_binary_label_metrics(training_data, test_data):
    metric_train = BinaryLabelDatasetMetric(training_data, unprivileged_groups=unprivileged_groups, privileged_groups=privileged_groups)
    print("Training set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_train.mean_difference())

    metric_test = BinaryLabelDatasetMetric(test_data, unprivileged_groups=unprivileged_groups, privileged_groups=privileged_groups)
    print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_test.mean_difference())



def print_classification_metrics(features, labels):
    classified_metric = ClassificationMetric(features, labels, unprivileged_groups=unprivileged_groups, privileged_groups=privileged_groups)
    print("Classification accuracy = %f" % classified_metric.accuracy())
    TPR = classified_metric.true_positive_rate()
    TNR = classified_metric.true_negative_rate()
    bal_acc_debiasing_test = 0.5*(TPR+TNR)
    print("Balanced classification accuracy = %f" % bal_acc_debiasing_test)
    print("Statistical parity difference = %f" % classified_metric.statistical_parity_difference())
    print("Disparate impact = %f" % classified_metric.disparate_impact())
    print("Equal opportunity difference = %f" % classified_metric.equal_opportunity_difference())
    print("Average odds difference = %f" % classified_metric.average_odds_difference())
    print("Theil_index = %f" % classified_metric.theil_index())


Above, we define two functions for printing performance metrics. We will use these functions many times throughout our pipeline.

###Binary Label Dataset Metric
Base class for all AIF datasets with binary outcomes. We use this method to compare the mean difference in outcomes between privileged and unprivileged groups, this sample statistics is a very telling glimpse into the amount of bias in favorable outcomes.

###Classification Metric
This class is used for comparing two binary label dataset metrics. In our case, we use this to compare our original datasets to the predictions made by our model. From this, we learn the following:

Classification accuracy - Total accuracy as a decimal point value.

Balanced classification accuracy - Mean between the rate of true positive and true negative predictions.
Statistical parity difference - The difference of the rate of favorable outcomes from the unprivileged group compared to the privileged group.

Disparate impact - The ratio of favorable outcomes for the unprivileged group compared to the privileged group.

Equal opportunity difference - The difference of true positive rates between the unprivileged and the privileged groups.


Average odds difference - Average difference of false positive rate and true positive rate between unprivileged and privileged groups.

Theil_index - Entropy of favorable outcomes for all individuals, privileged and unprivileged.



In [0]:
display(Markdown("#### Mean differences on original dataset"))
print_binary_label_metrics(dataset_orig_train, dataset_orig_test)

display(Markdown("#### Mean differences on reweighted dataset"))
print_binary_label_metrics(dataset_transf_train, dataset_transf_test)


#### Mean differences on original dataset

Training set: Difference in mean outcomes between unprivileged and privileged groups = -0.169905
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.006313


#### Mean differences on reweighted dataset

Training set: Difference in mean outcomes between unprivileged and privileged groups = 0.000000
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.000000


As a baseline, we compare the mean differences in favorable outcomes in the privileged and unprivileged in both the weighted and unweighted groups. Looking at this, we can make two takeaways. 

First, there is a nearly 17% differerence in outcomes between original training data but only a 0.6% difference in ourcomes with the testing data. This is strange, our test size and shuffle must have put most of the unfavorable outcomes for unprivileged group in the training data but not the test data. This skewed data is likely to lead to unusual model behavior.

Second, as we noticed earlier using reweighting during pre-processing mitigates the bias in outcomes so there is no difference between the training and test data.

Next, we will use adversarial debiasing in our model. To compare performance, we will use two models: one with "debias" set to False and one with "debias" set to True. Ultimately, we should hope to notice a significant improvement in the second model.

In [0]:
sess = tf.Session()
undebiased_model = AdversarialDebiasing(privileged_groups = privileged_groups, unprivileged_groups = unprivileged_groups, scope_name='plain_classifier', debias=False, sess=sess)
undebiased_model.fit(dataset_transf_train)
undebiased_train_pred = undebiased_model.predict(dataset_transf_train) 
undebiased_test_pred = undebiased_model.predict(dataset_transf_test)
display(Markdown("#### Mean differences on undebiased dataset"))
print_binary_label_metrics(undebiased_train_pred, undebiased_test_pred)


display(Markdown("#### Undebiased training classification metrics"))
print_classification_metrics(dataset_transf_train, undebiased_train_pred)

display(Markdown("#### Undebiased test classification metrics"))
print_classification_metrics(dataset_transf_test, undebiased_test_pred)



For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.

Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
epoch 0; iter: 0; batch classifier loss: 45.500259
epoch 1; iter: 0; batch classifier loss: 56.475246
epoch 2; iter: 0; batch classifier loss: 50.392849
epoch 3; iter: 0; batch classifier loss: 52.382019
epoch 4; iter: 0; batch classifier loss: 46.202183
epoch 5; iter: 0; batch classifier loss: 37.198967
epoch 6; iter: 0; batch classifier loss: 41.929314
epoch 7; iter: 0; batch classifier loss: 24.895828
epoch 8; iter: 0; batch classifier loss: 32.772141
epoch 9; iter: 0; batch classifier loss: 31.921862
epoch 10; iter: 0; batch classifier loss: 33.727707
epoch 11; iter:

#### Mean differences on undebiased dataset

Training set: Difference in mean outcomes between unprivileged and privileged groups = 0.000000
Test set: Difference in mean outcomes between unprivileged and privileged groups = 0.000000


#### Undebiased training classification metrics

Classification accuracy = 0.700000
Balanced classification accuracy = 0.500000
Statistical parity difference = 0.000000
Disparate impact = 1.000000
Equal opportunity difference = 0.000000
Average odds difference = 0.000000
Theil_index = 0.057550


#### Undebiased test classification metrics

Classification accuracy = 0.700000
Balanced classification accuracy = 0.500000
Statistical parity difference = 0.000000
Disparate impact = 1.000000
Equal opportunity difference = 0.000000
Average odds difference = 0.000000
Theil_index = 0.057550


These results are very strange, we had the "debias" option set to False and yet all of our bias metrics indicate near-perfect fairness. Not only this but we have identical metrics in both our training and test datasets. Let's move on to our debiased model and see what the results look like.

In [0]:
sess.close()
tf.reset_default_graph()
sess = tf.Session()

debiased_model = AdversarialDebiasing(privileged_groups = privileged_groups, unprivileged_groups = unprivileged_groups, scope_name='debiased_classifier', debias=True, sess=sess)
debiased_model.fit(dataset_transf_train)
debiased_train_pred = debiased_model.predict(dataset_transf_train) 
debiased_test_pred = debiased_model.predict(dataset_transf_test)
display(Markdown("#### Mean differences on debiased dataset"))
print_binary_label_metrics(debiased_train_pred, debiased_test_pred)


display(Markdown("#### Debiasing training classification metrics"))
print_classification_metrics(dataset_transf_train, debiased_train_pred)

display(Markdown("#### Debiasing test classification metrics"))
print_classification_metrics(dataset_transf_test, debiased_test_pred)


epoch 0; iter: 0; batch classifier loss: 110.634178; batch adversarial loss: 0.495368
epoch 1; iter: 0; batch classifier loss: 77.749153; batch adversarial loss: 0.476633
epoch 2; iter: 0; batch classifier loss: 46.263870; batch adversarial loss: 0.520275
epoch 3; iter: 0; batch classifier loss: 56.376030; batch adversarial loss: 0.588132
epoch 4; iter: 0; batch classifier loss: 31.395973; batch adversarial loss: 0.571279
epoch 5; iter: 0; batch classifier loss: 55.675900; batch adversarial loss: 0.583158
epoch 6; iter: 0; batch classifier loss: 52.526287; batch adversarial loss: 0.555581
epoch 7; iter: 0; batch classifier loss: 37.205658; batch adversarial loss: 0.544708
epoch 8; iter: 0; batch classifier loss: 40.441658; batch adversarial loss: 0.561097
epoch 9; iter: 0; batch classifier loss: 52.100716; batch adversarial loss: 0.560176
epoch 10; iter: 0; batch classifier loss: 56.181961; batch adversarial loss: 0.564439
epoch 11; iter: 0; batch classifier loss: 52.194016; batch adve

#### Mean differences on debiased dataset

Training set: Difference in mean outcomes between unprivileged and privileged groups = 0.000000
Test set: Difference in mean outcomes between unprivileged and privileged groups = 0.000000


#### Debiasing training classification metrics

Classification accuracy = 0.700000
Balanced classification accuracy = 0.500000
Statistical parity difference = 0.000000
Disparate impact = 1.000000
Equal opportunity difference = 0.000000
Average odds difference = 0.000000
Theil_index = 0.057550


#### Debiasing test classification metrics

Classification accuracy = 0.700000
Balanced classification accuracy = 0.500000
Statistical parity difference = 0.000000
Disparate impact = 1.000000
Equal opportunity difference = 0.000000
Average odds difference = 0.000000
Theil_index = 0.057550


The results are the exact same again in our debiased model between both our training and testing datasets. This is extremely unexpected and requires some digging.

In [0]:
print("Training original dataset labels mean = ", dataset_orig_train.labels.mean())
print("Test original dataset labels mean = ", dataset_orig_test.labels.mean())


print("Training undebiased labels mean = ", undebiased_train_pred.labels.mean())
print("Test undebiased labels mean = ", undebiased_test_pred.labels.mean())

print("Training debiased labels mean = ", debiased_train_pred.labels.mean())
print("Test debiased labels mean = ", debiased_test_pred.labels.mean())

Training original dataset labels mean =  1.3
Test original dataset labels mean =  1.3
Training undebiased labels mean =  1.0
Test undebiased labels mean =  1.0
Training debiased labels mean =  1.0
Test debiased labels mean =  1.0


We can now see our problem. Based on our training and test split and with the "shuffle" option set to True, there is exactly a 70% rate of favorable outcomes in both training and test data. Both models default to predicting favorable outcomes for every row which is why we get these seemingly perfectly unbiased predictions.

Moving on, lets try Reject Option Classification on these debiased outcomes to see if this has any improvement on this current dataset.

In [0]:
ROC = RejectOptionClassification(unprivileged_groups=unprivileged_groups, privileged_groups=privileged_groups)
ROC = ROC.fit(dataset_transf_train, debiased_train_pred)

print("Train: Optimal classification threshold = %.4f" % ROC.classification_threshold)

roc_train_pred = ROC.predict(debiased_train_pred)
roc_test_pred = ROC.predict(debiased_test_pred)

display(Markdown("#### ROC training classification metrics"))
print_classification_metrics(debiased_train_pred, roc_train_pred)

display(Markdown("#### ROC testing classification metrics"))
print_classification_metrics(debiased_test_pred, roc_test_pred)

Train: Optimal classification threshold = 0.9900


#### ROC training classification metrics

Classification accuracy = 0.972669
Balanced classification accuracy = nan
Statistical parity difference = -0.008218
Disparate impact = 0.991562
Equal opportunity difference = -0.008218
Average odds difference = nan
Theil_index = 0.027518


  TPR=TP / P, TNR=TN / N, FPR=FP / N, FNR=FN / P,
  GTPR=GTP / P, GTNR=GTN / N, GFPR=GFP / N, GFNR=GFN / P,


#### ROC testing classification metrics

Classification accuracy = 0.983297
Balanced classification accuracy = nan
Statistical parity difference = -0.012837
Disparate impact = 0.986965
Equal opportunity difference = -0.012837
Average odds difference = nan
Theil_index = 0.016807


Ironically, we are now actually less fair. We notice the statistical parity and equal opportunity differences are now slightly skewed in favor of the privileged group. We also notice that two fields print "nan". There are no true negatives or false positives so this means invalid values for metrics relying on these ratios.

Just for fun, we decided we would like to also try this pipeline on the Compas dataset. The Compas dataset predicts a criminal's likelihood of re-committing a crime. Given the recent controversey surrounding racism in America, we thought it would be very interesting to observe the bias impact of race in our pipeline to try and improve fairness.

In [0]:
urllib.request.urlretrieve ("https://raw.githubusercontent.com/propublica/compas-analysis/master/compas-scores-two-years.csv", "/usr/local/lib/python3.6/dist-packages/aif360/data/raw/compas/compas-scores-two-years.csv")
dataset_orig = load_preproc_data_compas(['race'])
dataset_orig_train, dataset_orig_test = dataset_orig.split([0.7], shuffle=True)
privileged_groups = [{'race': 1}]
unprivileged_groups = [{'race': 0}]

metric_orig_train = BinaryLabelDatasetMetric(dataset_orig_train, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)
display(Markdown("#### Original training dataset"))
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_train.mean_difference())

metric_orig_test = BinaryLabelDatasetMetric(dataset_orig_test, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)
display(Markdown("#### Original testing dataset"))
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_test.mean_difference())

RW = Reweighing(unprivileged_groups=unprivileged_groups,
                privileged_groups=privileged_groups)
dataset_transf_train = RW.fit_transform(dataset_orig_train)
dataset_transf_test = RW.fit_transform(dataset_orig_test)

metric_transf_train = BinaryLabelDatasetMetric(dataset_transf_train, 
                                               unprivileged_groups=unprivileged_groups,
                                               privileged_groups=privileged_groups)
display(Markdown("#### Transformed training dataset"))
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_transf_train.mean_difference())

display(Markdown("#### Mean differences on original dataset"))
print_binary_label_metrics(dataset_orig_train, dataset_orig_test)

display(Markdown("#### Mean differences on reweighted dataset"))
print_binary_label_metrics(dataset_transf_train, dataset_transf_test)

print_classification_metrics(dataset_orig_train, dataset_transf_train)
print_classification_metrics(dataset_orig_test, dataset_transf_test)



#### Original training dataset

Difference in mean outcomes between unprivileged and privileged groups = -0.147152


#### Original testing dataset

Difference in mean outcomes between unprivileged and privileged groups = -0.098170


#### Transformed training dataset

Difference in mean outcomes between unprivileged and privileged groups = 0.000000


#### Mean differences on original dataset

Training set: Difference in mean outcomes between unprivileged and privileged groups = -0.147152
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.098170


#### Mean differences on reweighted dataset

Training set: Difference in mean outcomes between unprivileged and privileged groups = 0.000000
Test set: Difference in mean outcomes between unprivileged and privileged groups = 0.000000


Here we see a much more equitable split of bias between our training and test datasets. There is a bias of about 15% and 10% respectively for our training and test datasets in favor of the privileged group. After reweighting the dataset we get a much more equitable amount of bias between both groups. Now, let's see how our transformed datasets work on an undebiased adversarial model.

In [0]:
sess = tf.Session()
undebiased_model = AdversarialDebiasing(privileged_groups = privileged_groups, unprivileged_groups = unprivileged_groups, scope_name='plain_classifier', debias=False, sess=sess)
undebiased_model.fit(dataset_transf_train)
undebiased_train_pred = undebiased_model.predict(dataset_transf_train) 
undebiased_test_pred = undebiased_model.predict(dataset_transf_test)
display(Markdown("#### Mean differences on undebiased dataset"))
print_binary_label_metrics(undebiased_train_pred, undebiased_test_pred)


display(Markdown("#### Undebiased training classification metrics"))
print_classification_metrics(dataset_transf_train, undebiased_train_pred)

display(Markdown("#### Undebiased test classification metrics"))
print_classification_metrics(dataset_transf_test, undebiased_test_pred)


epoch 0; iter: 0; batch classifier loss: 0.709799
epoch 1; iter: 0; batch classifier loss: 0.661800
epoch 2; iter: 0; batch classifier loss: 0.632821
epoch 3; iter: 0; batch classifier loss: 0.605321
epoch 4; iter: 0; batch classifier loss: 0.686599
epoch 5; iter: 0; batch classifier loss: 0.614292
epoch 6; iter: 0; batch classifier loss: 0.600114
epoch 7; iter: 0; batch classifier loss: 0.605269
epoch 8; iter: 0; batch classifier loss: 0.567148
epoch 9; iter: 0; batch classifier loss: 0.649088
epoch 10; iter: 0; batch classifier loss: 0.612801
epoch 11; iter: 0; batch classifier loss: 0.624373
epoch 12; iter: 0; batch classifier loss: 0.629660
epoch 13; iter: 0; batch classifier loss: 0.651005
epoch 14; iter: 0; batch classifier loss: 0.605420
epoch 15; iter: 0; batch classifier loss: 0.571757
epoch 16; iter: 0; batch classifier loss: 0.662565
epoch 17; iter: 0; batch classifier loss: 0.641205
epoch 18; iter: 0; batch classifier loss: 0.601421
epoch 19; iter: 0; batch classifier loss:

#### Mean differences on undebiased dataset

Training set: Difference in mean outcomes between unprivileged and privileged groups = -0.286749
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.263062


#### Undebiased training classification metrics

Classification accuracy = 0.652080
Balanced classification accuracy = 0.649831
Statistical parity difference = -0.286749
Disparate impact = 0.605566
Equal opportunity difference = -0.268766
Average odds difference = -0.287616
Theil_index = 0.210318


#### Undebiased test classification metrics

Classification accuracy = 0.647825
Balanced classification accuracy = 0.644367
Statistical parity difference = -0.263062
Disparate impact = 0.630762
Equal opportunity difference = -0.206288
Average odds difference = -0.268653
Theil_index = 0.233372


From a biased standpoint, these metrics are pretty bad and indicate high bias in favor of the privileged group. Similar values for our statistical praity, equal opportunity, and average odds all highlight a strong likelihood of favorable outcome for caucasians in our model. Our low disparate impact also lends support. Compared to our last dataset, we have different values between our training and test datasets proving that we have a more randomized split. Now, let's try debiasing our adversarial debiased model.

In [0]:
sess.close()
tf.reset_default_graph()
sess = tf.Session()

debiased_model = AdversarialDebiasing(privileged_groups = privileged_groups, unprivileged_groups = unprivileged_groups, scope_name='debiased_classifier', debias=True, sess=sess)
debiased_model.fit(dataset_transf_train)
debiased_train_pred = debiased_model.predict(dataset_transf_train) 
debiased_test_pred = debiased_model.predict(dataset_transf_test)
display(Markdown("#### Mean differences on debiased dataset"))
print_binary_label_metrics(debiased_train_pred, debiased_test_pred)


display(Markdown("#### Debiasing training classification metrics"))
print_classification_metrics(dataset_transf_train, debiased_train_pred)

display(Markdown("#### Debiasing test classification metrics"))
print_classification_metrics(dataset_transf_test, debiased_test_pred)


epoch 0; iter: 0; batch classifier loss: 0.677774; batch adversarial loss: 0.716305
epoch 1; iter: 0; batch classifier loss: 0.628508; batch adversarial loss: 0.715566
epoch 2; iter: 0; batch classifier loss: 0.625697; batch adversarial loss: 0.690304
epoch 3; iter: 0; batch classifier loss: 0.615126; batch adversarial loss: 0.676155
epoch 4; iter: 0; batch classifier loss: 0.574680; batch adversarial loss: 0.634857
epoch 5; iter: 0; batch classifier loss: 0.598757; batch adversarial loss: 0.678938
epoch 6; iter: 0; batch classifier loss: 0.665726; batch adversarial loss: 0.705665
epoch 7; iter: 0; batch classifier loss: 0.583390; batch adversarial loss: 0.686738
epoch 8; iter: 0; batch classifier loss: 0.631026; batch adversarial loss: 0.667654
epoch 9; iter: 0; batch classifier loss: 0.604111; batch adversarial loss: 0.659266
epoch 10; iter: 0; batch classifier loss: 0.640399; batch adversarial loss: 0.668405
epoch 11; iter: 0; batch classifier loss: 0.611143; batch adversarial loss:

#### Mean differences on debiased dataset

Training set: Difference in mean outcomes between unprivileged and privileged groups = 0.079449
Test set: Difference in mean outcomes between unprivileged and privileged groups = 0.096882


#### Debiasing training classification metrics

Classification accuracy = 0.656641
Balanced classification accuracy = 0.656085
Statistical parity difference = 0.079449
Disparate impact = 1.168391
Equal opportunity difference = 0.090928
Average odds difference = 0.078896
Theil_index = 0.244396


#### Debiasing test classification metrics

Classification accuracy = 0.658337
Balanced classification accuracy = 0.656715
Statistical parity difference = 0.096882
Disparate impact = 1.204878
Equal opportunity difference = 0.145139
Average odds difference = 0.092130
Theil_index = 0.248094


These are great results! A disparate impact above 1 and positive values for all other key bias metrics indicate more favorable outcomes for non-caucasians in our model. These are a marked improvement over the negative indicators in our undebiased model so we can clearly see that this model is working very effectively. Also worth noting, we are improving fairness while not trading off with accuracy. Next, let's test our model on the original dataset to see if our reweighted pre-processing is actually working or if the step is unnecessary. 

In [0]:
sess.close()
tf.reset_default_graph()
sess = tf.Session()

debiased_model = AdversarialDebiasing(privileged_groups = privileged_groups, unprivileged_groups = unprivileged_groups, scope_name='debiased_classifier', debias=True, sess=sess)
debiased_model.fit(dataset_orig_train)
debiased_train_pred = debiased_model.predict(dataset_orig_train) 
debiased_test_pred = debiased_model.predict(dataset_orig_test)
display(Markdown("#### Mean differences on debiased dataset WITHOUT pre-processing"))
print_binary_label_metrics(debiased_train_pred, debiased_test_pred)


display(Markdown("#### Debiasing training classification metrics WITHOUT pre-processing"))
print_classification_metrics(dataset_orig_train, debiased_train_pred)

display(Markdown("#### Debiasing test classification metrics WITHOUT pre-processing"))
print_classification_metrics(dataset_orig_test, debiased_test_pred)


epoch 0; iter: 0; batch classifier loss: 0.718103; batch adversarial loss: 0.835899
epoch 1; iter: 0; batch classifier loss: 0.665195; batch adversarial loss: 0.830785
epoch 2; iter: 0; batch classifier loss: 0.697499; batch adversarial loss: 0.840313
epoch 3; iter: 0; batch classifier loss: 0.654141; batch adversarial loss: 0.766697
epoch 4; iter: 0; batch classifier loss: 0.613985; batch adversarial loss: 0.827360
epoch 5; iter: 0; batch classifier loss: 0.752280; batch adversarial loss: 0.732054
epoch 6; iter: 0; batch classifier loss: 0.661658; batch adversarial loss: 0.754647
epoch 7; iter: 0; batch classifier loss: 0.699477; batch adversarial loss: 0.747449
epoch 8; iter: 0; batch classifier loss: 0.579388; batch adversarial loss: 0.772329
epoch 9; iter: 0; batch classifier loss: 0.660209; batch adversarial loss: 0.729917
epoch 10; iter: 0; batch classifier loss: 0.617608; batch adversarial loss: 0.710395
epoch 11; iter: 0; batch classifier loss: 0.626631; batch adversarial loss:

#### Mean differences on debiased dataset WITHOUT pre-processing

Training set: Difference in mean outcomes between unprivileged and privileged groups = -0.226713
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.187631


#### Debiasing training classification metrics WITHOUT pre-processing

Classification accuracy = 0.671630
Balanced classification accuracy = 0.667824
Statistical parity difference = -0.226713
Disparate impact = 0.687640
Equal opportunity difference = -0.159519
Average odds difference = -0.183673
Theil_index = 0.191776


#### Debiasing test classification metrics WITHOUT pre-processing

Classification accuracy = 0.667298
Balanced classification accuracy = 0.660438
Statistical parity difference = -0.187631
Disparate impact = 0.734117
Equal opportunity difference = -0.117409
Average odds difference = -0.163888
Theil_index = 0.205933


These results are a clear step in the wrong direction. The negative values for the difference statistics imply a lack of parity between privileged and unprivileged group so while adversarial debiasing on the original dataset offers a slight improvement in fairness compared to the undebiased model it is apparent that pre-processing is key to achieving desirable results. While we have obtained very fair results with reweighting and adversarial debiasing, our accuracy hovers around 65% which isn't great. Next, let's look at rejection option classification to try to maintain fairness while also optimizing accuracy.

In [0]:
ROC = RejectOptionClassification(unprivileged_groups=unprivileged_groups, privileged_groups=privileged_groups)
ROC = ROC.fit(dataset_transf_train, debiased_train_pred)

print("Train: Optimal classification threshold = %.4f" % ROC.classification_threshold)

roc_train_pred = ROC.predict(debiased_train_pred)
roc_test_pred = ROC.predict(debiased_test_pred)

display(Markdown("#### ROC training classification metrics"))
print_classification_metrics(debiased_train_pred, roc_train_pred)

display(Markdown("#### ROC testing classification metrics"))
print_classification_metrics(debiased_test_pred, roc_test_pred)

Train: Optimal classification threshold = 0.4951


#### ROC training classification metrics

Classification accuracy = 0.983360
Balanced classification accuracy = 0.982694
Statistical parity difference = 0.038141
Disparate impact = 1.074330
Equal opportunity difference = 0.000000
Average odds difference = -0.039104
Theil_index = 0.006142


#### ROC testing classification metrics

Classification accuracy = 0.985904
Balanced classification accuracy = 0.984935
Statistical parity difference = 0.060576
Disparate impact = 1.118966
Equal opportunity difference = 0.000000
Average odds difference = -0.034438
Theil_index = 0.005197


Reject option classifiers are supposed to swap predictions in favor of the unprivileged group when predictions are near the decision boundary. Ironically, doing this actually reduced bias in favor of the unprivileged group compared to the model we built earlier. Does that mean our results are worse? No, I don't think so because there is a marked improvement in accuracy. We built a classifier that is both very accurate (Well above 0.98 accuracy) and has no evidence of bias per the key bias metrics. In summarization, our pipeline offers the best of both worlds by getting very fair models to mitigate bias while not sacrificing accuracy.