### Example notebook for doing fairness analysis using aif360 library

In [66]:
# Import necessary modules
import pandas as pd
from aif360.datasets import StandardDataset
from aif360.metrics import BinaryLabelDatasetMetric
from aif360.explainers import MetricTextExplainer
from aif360.explainers import MetricTextExplainer


### Read and view the dataset

In [67]:
df = pd.read_csv('student-mat.csv', delimiter=';')
df.head()

Unnamed: 0,school,sex,age,address,famsize,Pstatus,Medu,Fedu,Mjob,Fjob,...,famrel,freetime,goout,Dalc,Walc,health,absences,G1,G2,G3
0,GP,F,18,U,GT3,A,4,4,at_home,teacher,...,4,3,4,1,1,3,6,5,6,6
1,GP,F,17,U,GT3,T,1,1,at_home,other,...,5,3,3,1,1,3,4,5,5,6
2,GP,F,15,U,LE3,T,1,1,at_home,other,...,4,3,2,2,3,3,10,7,8,10
3,GP,F,15,U,GT3,T,4,2,health,services,...,3,2,2,1,1,5,2,15,14,15
4,GP,F,16,U,GT3,T,3,3,other,other,...,4,3,2,1,2,5,4,6,10,10


### Drop any rows with NaNs

In [68]:
df.dropna(axis=0,inplace=True)
df.shape

(395, 33)

### Calculate the grade average of three exams (Grades range 0-20)

In [69]:
df['G_avg'] = (df.G1.astype(float)+df.G2.astype(float)+df.G3.astype(float))/3

### Create a result column where G_avg >= 14 is a good result (1) and bad (0) otherwise

In [70]:
df['result'] = df['G_avg'].apply(lambda x: 1 if x >= 14.0 else 0)

In [71]:
df[['result','G_avg']]

Unnamed: 0,result,G_avg
0,0,5.666667
1,0,5.333333
2,0,8.333333
3,1,14.666667
4,0,8.666667
...,...,...
390,0,9.000000
391,1,15.333333
392,0,8.333333
393,0,11.000000


In [72]:
df.result.value_counts()

0    314
1     81
Name: result, dtype: int64

### Select internet access column to analyze fairness w.r.t the target column (result)

In [73]:
sub_df = df[['internet','result']]

### Convert column to numeric

In [74]:
sub_df.internet.replace('yes','1.0' ,inplace=True)
sub_df.internet.replace('no','0.0',inplace=True)

sub_df.internet = sub_df.internet.astype(float)

Here, we create an AIF dataset by determining the target column, favorable class label (1, e.g. good result) and the protected attribute (internet column). As for privileged class, we choose internet==1 which reflects the students who has internet access at home.

In [75]:
aif_dataset = StandardDataset(sub_df,label_name='result',favorable_classes=[1],protected_attribute_names=['internet'],privileged_classes=[lambda x: x== 1],)

In [76]:
aif_dataset

               instance weights            features labels
                                protected attribute       
                                           internet       
instance names                                            
0                           1.0                 0.0    0.0
1                           1.0                 1.0    0.0
2                           1.0                 1.0    0.0
3                           1.0                 1.0    1.0
4                           1.0                 0.0    0.0
...                         ...                 ...    ...
390                         1.0                 0.0    0.0
391                         1.0                 1.0    1.0
392                         1.0                 0.0    0.0
393                         1.0                 1.0    0.0
394                         1.0                 1.0    0.0

[395 rows x 3 columns]

### We analyze the fairness of the dataset w.r.t internet column using Statistical Parity Difference and Disparate Impact metrics

In [77]:
privileged_groups = [{'internet': 1}]
unprivileged_groups = [{'internet': 0}]

metric = BinaryLabelDatasetMetric(aif_dataset, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric.mean_difference())

Difference in mean outcomes between unprivileged and privileged groups = -0.118863


In [78]:
explainer_train = MetricTextExplainer(metric)

print(explainer_train.disparate_impact())

Disparate impact (probability of favorable outcome for unprivileged instances / probability of favorable outcome for privileged instances): 0.47153972153972157


### Metrics show that the privileged group (students who have internet at home) tend to receive better average grade compared to the unprivileged group.