# Assigment : Fairness 
## Objective
In this assignment, we are examining the steps followed for implementing the fairness identification and mitigation software available in the AI Fairness 360 library

First, we will install the qif360 library

In [1]:
# Install aif360
!pip install aif360

Collecting aif360
  Downloading https://files.pythonhosted.org/packages/99/54/09e0674fc1370072385d64e0282eff0857e3d78c3abd7d6471200cf7a00d/aif360-0.2.2-py2.py3-none-any.whl (56.4MB)
Installing collected packages: aif360
Successfully installed aif360-0.2.2


As the next step, we will import the required dataset and Classes from the aif360 library.

In [2]:
# import Classes
import numpy as np

from aif360.datasets import GermanDataset

from aif360.metrics import BinaryLabelDatasetMetric

from aif360.algorithms.preprocessing import Reweighing

from IPython.display import Markdown, display

## Age based identification and mitigation

In this step, we will define the attribute used for identifyng the priveleged class. In this instance, we hypothesise loan applicants who are 25 and older to be enjoying a distinct privelege for availing loans, with younger applicants being biased against. Concurrently, we will drop the gender and personal status features to remove the influence of these on the bias calculations

In [3]:
# Set criteria for priveleged class
dataset_orig = GermanDataset(

    protected_attribute_names=['age'],                           

    privileged_classes=[lambda x: x >= 25],     

    features_to_drop=['personal_status', 'sex'] 

   )

For the purpose of training and testing, the dataset is split on a 70:30 basis, as follows:

In [4]:
dataset_orig_train, dataset_orig_test = dataset_orig.split([0.7], shuffle=True)

We will now define two features based on the age criteria we have set earlier and assign variables for those

In [6]:
privileged_groups = [{'age': 1}]

unprivileged_groups = [{'age': 0}]

In this step, we will call the BinaryLabelDatasetMetric class and use the mean_differene method on the same to calculate the mean difference in classifying applicants from the two groups we have identified based on age, as low/no risk and high risk for defaulting on their payments. We notice that there is a 13% higher probability for a favorable classification of applicants older than or equal to 25, indicating a clear bias in the process.

In [7]:
metric_orig_train = BinaryLabelDatasetMetric(dataset_orig_train, 

                                             unprivileged_groups=unprivileged_groups,

                                             privileged_groups=privileged_groups)

display(Markdown("#### Original training dataset"))

print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_train.mean_difference())

#### Original training dataset

Difference in mean outcomes between unprivileged and privileged groups = -0.130527


We will now re-weight the features to remove the influence of the bias in age, using the Reweighing class. We will then transform the original dataset by using the fit_transform method

In [8]:
RW = Reweighing(unprivileged_groups=unprivileged_groups,

                privileged_groups=privileged_groups)

dataset_transf_train = RW.fit_transform(dataset_orig_train)  

In order to check the efficacy of the ai360 alrorithm, we will check for bias in the transformed dataset. As in evident, the bias relating to age has been completely eliminated.

In [9]:
metric_transf_train = BinaryLabelDatasetMetric(dataset_transf_train, 

                                               unprivileged_groups=unprivileged_groups,

                                               privileged_groups=privileged_groups)

display(Markdown("#### Transformed training dataset"))

print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_transf_train.mean_difference())

#### Transformed training dataset

Difference in mean outcomes between unprivileged and privileged groups = -0.000000


## Gender based identification and mitigation

In this section, we will do a similar identification and mitigation of bias based on gender, with the hypothesis that females are biased against

In [24]:
dataset_orig_sex = GermanDataset(

    protected_attribute_names=['sex'],                           

    privileged_classes=[lambda x: x == 'male'],     

    features_to_drop=['personal_status', 'age'] 

   )

In [25]:
dataset_orig_train_sex, dataset_orig_test_sex = dataset_orig_sex.split([0.7], shuffle=True)

This time, we will define two features based on the gender criteria and assign variables for those

In [26]:
privileged_groups = [{'sex': 1}]

unprivileged_groups = [{'sex': 0}]

Surely enough, we notice that there is a 5% higher probability for a favorable classification of male applicants indicating a clear bias in the process.

In [28]:
metric_orig_train_sex = BinaryLabelDatasetMetric(dataset_orig_train_sex, 

                                             unprivileged_groups=unprivileged_groups,

                                             privileged_groups=privileged_groups)

display(Markdown("#### Original training dataset"))

print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_train_sex.mean_difference())

#### Original training dataset

Difference in mean outcomes between unprivileged and privileged groups = -0.055217


We will now re-weight the features to remove the influence of the bias in age, using the Reweighing class. We will then transform the original dataset by using the fit_transform method

In [30]:
RW = Reweighing(unprivileged_groups=unprivileged_groups,

                privileged_groups=privileged_groups)

dataset_transf_train_sex = RW.fit_transform(dataset_orig_train_sex)  

On checking the bias metric, we now notice that the bias relating to gender has been completely eliminated.

In [31]:
metric_transf_train_sex = BinaryLabelDatasetMetric(dataset_transf_train_sex, 

                                               unprivileged_groups=unprivileged_groups,

                                               privileged_groups=privileged_groups)

display(Markdown("#### Transformed training dataset"))

print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_transf_train_sex.mean_difference())

#### Transformed training dataset

Difference in mean outcomes between unprivileged and privileged groups = 0.000000


## Report

This module of the course provided a revelatory insight into the influence of bias in machine learning / data science analysis. The bias could be at the level of the individual or groups. Bias can creep into the process either by way of a bias in the training data or in the algorithm or both. In order to identify and mitigate the biases, AI Fairness 360 provides a set of tools that are very effective and user friendly. 
While evolving a bias mitiagtion strategy, the data scientists / data stewards need to decide on the objectives - whether the goal is to remove bias at the individual or group levels. Also, considerable thought has to be given to the various options available for mitigation - pre-processing, in-process or post-processing, depending on the stages in the workflow.
The aif360 tool is extremly versatile and simple to deploy. Its methods are intuitive and effective. 