#### This notebook demonstrates the use of adversarial debiasing algorithm to learn a fair classifier.
Adversarial debiasing [1] is an in-processing technique that learns a classifier to maximize prediction accuracy and simultaneously reduce an adversary's ability to determine the protected attribute from the predictions. This approach leads to a fair classifier as the predictions cannot carry any group discrimination information that the adversary can exploit. We will see how to use this algorithm for learning models with and without fairness constraints and apply them on the Adult dataset.

In [1]:
!which python

/home/lalor/miniconda3/envs/reddit/bin/python


In [2]:
import aif360
print(aif360.__version__)

0.6.0


In [3]:
%matplotlib inline
# Load all necessary packages
import sys
sys.path.append("../")
from aif360.datasets import BinaryLabelDataset
from aif360.datasets import AdultDataset, GermanDataset, CompasDataset
from aif360.metrics import BinaryLabelDatasetMetric
from aif360.metrics import ClassificationMetric
from aif360.metrics.utils import compute_boolean_conditioning_vector

from aif360.algorithms.preprocessing.optim_preproc_helpers.data_preproc_functions import load_preproc_data_adult, load_preproc_data_compas, load_preproc_data_german

from aif360.algorithms.inprocessing.adversarial_error_debiasing import AdversarialErrorDebiasing
from aif360.algorithms.inprocessing.adversarial_debiasing import AdversarialDebiasing

from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler, MaxAbsScaler
from sklearn.metrics import accuracy_score

from IPython.display import Markdown, display
import matplotlib.pyplot as plt

import tensorflow.compat.v1 as tf
tf.disable_eager_execution()
tf.random.set_random_seed(
    42
)


#### Load dataset and set options

In [4]:
# Get the dataset and split into train and test
dataset_orig = load_preproc_data_adult()

#privileged_groups = [{'race': 1}]
#unprivileged_groups = [{'race': 0}]

privileged_groups = [{'sex': 1}]
unprivileged_groups = [{'sex': 0}]

dataset_orig_train, dataset_orig_test = dataset_orig.split([0.7], shuffle=True)

In [5]:
# print out some labels, names, etc.
display(Markdown("#### Training Dataset shape"))
print(dataset_orig_train.features.shape)
display(Markdown("#### Favorable and unfavorable labels"))
print(dataset_orig_train.favorable_label, dataset_orig_train.unfavorable_label)
display(Markdown("#### Protected attribute names"))
print(dataset_orig_train.protected_attribute_names)
display(Markdown("#### Privileged and unprivileged protected attribute values"))
print(dataset_orig_train.privileged_protected_attributes, 
      dataset_orig_train.unprivileged_protected_attributes)
display(Markdown("#### Dataset feature names"))
print(dataset_orig_train.feature_names)

#### Training Dataset shape

(34189, 18)


#### Favorable and unfavorable labels

1.0 0.0


#### Protected attribute names

['sex', 'race']


#### Privileged and unprivileged protected attribute values

[array([1.]), array([1.])] [array([0.]), array([0.])]


#### Dataset feature names

['race', 'sex', 'Age (decade)=10', 'Age (decade)=20', 'Age (decade)=30', 'Age (decade)=40', 'Age (decade)=50', 'Age (decade)=60', 'Age (decade)=>=70', 'Education Years=6', 'Education Years=7', 'Education Years=8', 'Education Years=9', 'Education Years=10', 'Education Years=11', 'Education Years=12', 'Education Years=<6', 'Education Years=>12']


#### Metric for original training data

In [6]:
# Metric for the original dataset
metric_orig_train = BinaryLabelDatasetMetric(dataset_orig_train, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)
display(Markdown("#### Original training dataset"))
print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_train.mean_difference())
metric_orig_test = BinaryLabelDatasetMetric(dataset_orig_test, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)
print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_test.mean_difference())

#### Original training dataset

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.194507
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.194577


In [7]:
min_max_scaler = MaxAbsScaler()
dataset_orig_train.features = min_max_scaler.fit_transform(dataset_orig_train.features)
dataset_orig_test.features = min_max_scaler.transform(dataset_orig_test.features)
metric_scaled_train = BinaryLabelDatasetMetric(dataset_orig_train, 
                             unprivileged_groups=unprivileged_groups,
                             privileged_groups=privileged_groups)
display(Markdown("#### Scaled dataset - Verify that the scaling does not affect the group label statistics"))
print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_scaled_train.mean_difference())
metric_scaled_test = BinaryLabelDatasetMetric(dataset_orig_test, 
                             unprivileged_groups=unprivileged_groups,
                             privileged_groups=privileged_groups)
print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_scaled_test.mean_difference())


#### Scaled dataset - Verify that the scaling does not affect the group label statistics

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.194507
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.194577


### Learn plan classifier without debiasing

In [8]:
# Load post-processing algorithm that equalizes the odds
# Learn parameters with debias set to False
sess = tf.Session()
plain_model = AdversarialDebiasing(privileged_groups = privileged_groups,
                          unprivileged_groups = unprivileged_groups,
                          scope_name='plain_classifier',
                          debias=False,
                          sess=sess)

In [9]:
plain_model.fit(dataset_orig_train)

Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
epoch 0; iter: 0; batch classifier loss: 0.701448
epoch 0; iter: 200; batch classifier loss: 0.364368
epoch 1; iter: 0; batch classifier loss: 0.334042
epoch 1; iter: 200; batch classifier loss: 0.450820
epoch 2; iter: 0; batch classifier loss: 0.493291
epoch 2; iter: 200; batch classifier loss: 0.382364
epoch 3; iter: 0; batch classifier loss: 0.402964
epoch 3; iter: 200; batch classifier loss: 0.388148
epoch 4; iter: 0; batch classifier loss: 0.517890
epoch 4; iter: 200; batch classifier loss: 0.443150
epoch 5; iter: 0; batch classifier loss: 0.392776
epoch 5; iter: 200; batch classifier loss: 0.419714
epoch 6; iter: 0; batch classifier loss: 0.501296
epoch 6; iter: 200; batch classifier loss: 0.359420
epoch 7; iter: 0; batch classifier loss: 0.404994
epoch 7; iter: 200; batch classifier loss: 0.450537
epoch 8; iter: 0; batch classifier loss: 0.454901
epoch 8; iter: 200;

<aif360.algorithms.inprocessing.adversarial_debiasing.AdversarialDebiasing at 0x7f64bd771550>

In [10]:
# Apply the plain model to test data
dataset_nodebiasing_train = plain_model.predict(dataset_orig_train)
dataset_nodebiasing_test = plain_model.predict(dataset_orig_test)

In [11]:
# Metrics for the dataset from plain model (without debiasing)
display(Markdown("#### Plain model - without debiasing - dataset metrics"))
metric_dataset_nodebiasing_train = BinaryLabelDatasetMetric(dataset_nodebiasing_train, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_nodebiasing_train.mean_difference())

metric_dataset_nodebiasing_test = BinaryLabelDatasetMetric(dataset_nodebiasing_test, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_nodebiasing_test.mean_difference())

display(Markdown("#### Plain model - without debiasing - classification metrics"))
classified_metric_nodebiasing_test = ClassificationMetric(dataset_orig_test, 
                                                 dataset_nodebiasing_test,
                                                 unprivileged_groups=unprivileged_groups,
                                                 privileged_groups=privileged_groups)
print("Test set: Classification accuracy = %f" % classified_metric_nodebiasing_test.accuracy())
TPR = classified_metric_nodebiasing_test.true_positive_rate()
TNR = classified_metric_nodebiasing_test.true_negative_rate()
bal_acc_nodebiasing_test = 0.5*(TPR+TNR)
print("Test set: Balanced classification accuracy = %f" % bal_acc_nodebiasing_test)
print("Test set: Disparate impact = %f" % classified_metric_nodebiasing_test.disparate_impact())
print("Test set: Equal opportunity difference = %f" % classified_metric_nodebiasing_test.equal_opportunity_difference())
print("Test set: Average odds difference = %f" % classified_metric_nodebiasing_test.average_odds_difference())
print("Test set: Theil_index = %f" % classified_metric_nodebiasing_test.theil_index())

#### Plain model - without debiasing - dataset metrics

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.210455
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.206837


#### Plain model - without debiasing - classification metrics

Test set: Classification accuracy = 0.801474
Test set: Balanced classification accuracy = 0.656691
Test set: Disparate impact = 0.000000
Test set: Equal opportunity difference = -0.444112
Test set: Average odds difference = -0.272758
Test set: Theil_index = 0.183319


### Apply in-processing algorithm based on adversarial learning

In [12]:
sess.close()
tf.reset_default_graph()
sess = tf.Session()

In [13]:
# Learn parameters with debias set to True
debiased_model = AdversarialDebiasing(privileged_groups = privileged_groups,
                          unprivileged_groups = unprivileged_groups,
                          scope_name='debiased_classifier',
                          debias=True,
                          sess=sess)

In [14]:
debiased_model.fit(dataset_orig_train)

epoch 0; iter: 0; batch classifier loss: 0.718750; batch adversarial loss: 0.693419
epoch 0; iter: 200; batch classifier loss: 0.483406; batch adversarial loss: 0.650898
epoch 1; iter: 0; batch classifier loss: 0.571576; batch adversarial loss: 0.686218
epoch 1; iter: 200; batch classifier loss: 0.555408; batch adversarial loss: 0.679587
epoch 2; iter: 0; batch classifier loss: 0.528648; batch adversarial loss: 0.624486
epoch 2; iter: 200; batch classifier loss: 0.533744; batch adversarial loss: 0.646680
epoch 3; iter: 0; batch classifier loss: 0.416740; batch adversarial loss: 0.587934
epoch 3; iter: 200; batch classifier loss: 0.410672; batch adversarial loss: 0.602441
epoch 4; iter: 0; batch classifier loss: 0.413156; batch adversarial loss: 0.637226
epoch 4; iter: 200; batch classifier loss: 0.440639; batch adversarial loss: 0.653229
epoch 5; iter: 0; batch classifier loss: 0.471307; batch adversarial loss: 0.651639
epoch 5; iter: 200; batch classifier loss: 0.498305; batch adversa

<aif360.algorithms.inprocessing.adversarial_debiasing.AdversarialDebiasing at 0x7f63c3789400>

In [15]:
# Apply the plain model to test data
dataset_debiasing_train = debiased_model.predict(dataset_orig_train)
dataset_debiasing_test = debiased_model.predict(dataset_orig_test)

In [16]:
# Metrics for the dataset from plain model (without debiasing)
display(Markdown("#### Plain model - without debiasing - dataset metrics"))
print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_nodebiasing_train.mean_difference())
print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_nodebiasing_test.mean_difference())

# Metrics for the dataset from model with debiasing
display(Markdown("#### Model - with debiasing - dataset metrics"))
metric_dataset_debiasing_train = BinaryLabelDatasetMetric(dataset_debiasing_train, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_debiasing_train.mean_difference())

metric_dataset_debiasing_test = BinaryLabelDatasetMetric(dataset_debiasing_test, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_debiasing_test.mean_difference())



display(Markdown("#### Plain model - without debiasing - classification metrics"))
print("Test set: Classification accuracy = %f" % classified_metric_nodebiasing_test.accuracy())
TPR = classified_metric_nodebiasing_test.true_positive_rate()
TNR = classified_metric_nodebiasing_test.true_negative_rate()
bal_acc_nodebiasing_test = 0.5*(TPR+TNR)
print("Test set: Balanced classification accuracy = %f" % bal_acc_nodebiasing_test)
print("Test set: Disparate impact = %f" % classified_metric_nodebiasing_test.disparate_impact())
print("Test set: Equal opportunity difference = %f" % classified_metric_nodebiasing_test.equal_opportunity_difference())
print("Test set: Average odds difference = %f" % classified_metric_nodebiasing_test.average_odds_difference())
print("Test set: Theil_index = %f" % classified_metric_nodebiasing_test.theil_index())



display(Markdown("#### Model - with debiasing - classification metrics"))
classified_metric_debiasing_test = ClassificationMetric(dataset_orig_test, 
                                                 dataset_debiasing_test,
                                                 unprivileged_groups=unprivileged_groups,
                                                 privileged_groups=privileged_groups)
print("Test set: Classification accuracy = %f" % classified_metric_debiasing_test.accuracy())
TPR = classified_metric_debiasing_test.true_positive_rate()
TNR = classified_metric_debiasing_test.true_negative_rate()
bal_acc_debiasing_test = 0.5*(TPR+TNR)
print("Test set: Balanced classification accuracy = %f" % bal_acc_debiasing_test)
print("Test set: Disparate impact = %f" % classified_metric_debiasing_test.disparate_impact())
print("Test set: Equal opportunity difference = %f" % classified_metric_debiasing_test.equal_opportunity_difference())
print("Test set: Average odds difference = %f" % classified_metric_debiasing_test.average_odds_difference())
print("Test set: Theil_index = %f" % classified_metric_debiasing_test.theil_index())

#### Plain model - without debiasing - dataset metrics

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.210455
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.206837


#### Model - with debiasing - dataset metrics

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.086405
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.088343


#### Plain model - without debiasing - classification metrics

Test set: Classification accuracy = 0.801474
Test set: Balanced classification accuracy = 0.656691
Test set: Disparate impact = 0.000000
Test set: Equal opportunity difference = -0.444112
Test set: Average odds difference = -0.272758
Test set: Theil_index = 0.183319


#### Model - with debiasing - classification metrics

Test set: Classification accuracy = 0.792602
Test set: Balanced classification accuracy = 0.669734
Test set: Disparate impact = 0.565360
Test set: Equal opportunity difference = -0.040387
Test set: Average odds difference = -0.030369
Test set: Theil_index = 0.174534



    References:
    [1] B. H. Zhang, B. Lemoine, and M. Mitchell, "Mitigating UnwantedBiases with Adversarial Learning," 
    AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society, 2018.

In [17]:
sess.close()
tf.reset_default_graph()
sess = tf.Session()

In [18]:
# Error debiasing
# Learn parameters with debias set to True
debiased_model2 = AdversarialErrorDebiasing(privileged_groups = privileged_groups,
                          unprivileged_groups = unprivileged_groups,
                          scope_name='debiasede_classifier',
                          debias=True,
                          sess=sess)
debiased_model2.fit(dataset_orig_train)

# Apply the plain model to test data
dataset_debiasing_train2 = debiased_model2.predict(dataset_orig_train)
dataset_debiasing_test2 = debiased_model2.predict(dataset_orig_test)

# Metrics for the dataset from plain model (without debiasing)
display(Markdown("#### Plain model - without debiasing - dataset metrics"))
print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_nodebiasing_train.mean_difference())
print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_nodebiasing_test.mean_difference())

# Metrics for the dataset from model with debiasing
display(Markdown("#### Model - with debiasing - dataset metrics"))
metric_dataset_debiasing_train = BinaryLabelDatasetMetric(dataset_debiasing_train, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_debiasing_train.mean_difference())

metric_dataset_debiasing_test = BinaryLabelDatasetMetric(dataset_debiasing_test, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_debiasing_test.mean_difference())

# Metrics for the dataset from model with error debiasing
display(Markdown("#### Model - with error debiasing - dataset metrics"))
metric_dataset_debiasing_train2 = BinaryLabelDatasetMetric(dataset_debiasing_train2, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_debiasing_train2.mean_difference())

metric_dataset_debiasing_test2 = BinaryLabelDatasetMetric(dataset_debiasing_test2, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_debiasing_test2.mean_difference())



display(Markdown("#### Plain model - without debiasing - classification metrics"))
print("Test set: Classification accuracy = %f" % classified_metric_nodebiasing_test.accuracy())
TPR = classified_metric_nodebiasing_test.true_positive_rate()
TNR = classified_metric_nodebiasing_test.true_negative_rate()
bal_acc_nodebiasing_test = 0.5*(TPR+TNR)
print("Test set: Balanced classification accuracy = %f" % bal_acc_nodebiasing_test)
print("Test set: Disparate impact = %f" % classified_metric_nodebiasing_test.disparate_impact())
print("Test set: Equal opportunity difference = %f" % classified_metric_nodebiasing_test.equal_opportunity_difference())
print("Test set: Average odds difference = %f" % classified_metric_nodebiasing_test.average_odds_difference())
print("Test set: Theil_index = %f" % classified_metric_nodebiasing_test.theil_index())



display(Markdown("#### Model - with debiasing - classification metrics"))
classified_metric_debiasing_test = ClassificationMetric(dataset_orig_test, 
                                                 dataset_debiasing_test,
                                                 unprivileged_groups=unprivileged_groups,
                                                 privileged_groups=privileged_groups)
print("Test set: Classification accuracy = %f" % classified_metric_debiasing_test.accuracy())
TPR = classified_metric_debiasing_test.true_positive_rate()
TNR = classified_metric_debiasing_test.true_negative_rate()
bal_acc_debiasing_test = 0.5*(TPR+TNR)
print("Test set: Balanced classification accuracy = %f" % bal_acc_debiasing_test)
print("Test set: Disparate impact = %f" % classified_metric_debiasing_test.disparate_impact())
print("Test set: Equal opportunity difference = %f" % classified_metric_debiasing_test.equal_opportunity_difference())
print("Test set: Average odds difference = %f" % classified_metric_debiasing_test.average_odds_difference())
print("Test set: Theil_index = %f" % classified_metric_debiasing_test.theil_index())



display(Markdown("#### Model - with error debiasing - classification metrics"))
classified_metric_debiasing_test2 = ClassificationMetric(dataset_orig_test, 
                                                 dataset_debiasing_test2,
                                                 unprivileged_groups=unprivileged_groups,
                                                 privileged_groups=privileged_groups)
print("Test set: Classification accuracy = %f" % classified_metric_debiasing_test2.accuracy())
TPR = classified_metric_debiasing_test2.true_positive_rate()
TNR = classified_metric_debiasing_test2.true_negative_rate()
bal_acc_debiasing_test2 = 0.5*(TPR+TNR)
print("Test set: Balanced classification accuracy = %f" % bal_acc_debiasing_test2)
print("Test set: Disparate impact = %f" % classified_metric_debiasing_test2.disparate_impact())
print("Test set: Equal opportunity difference = %f" % classified_metric_debiasing_test2.equal_opportunity_difference())
print("Test set: Average odds difference = %f" % classified_metric_debiasing_test2.average_odds_difference())
print("Test set: Theil_index = %f" % classified_metric_debiasing_test2.theil_index())

epoch 0; iter: 0; batch classifier loss: 0.683797; batch adversarial loss: 0.755281
epoch 0; iter: 200; batch classifier loss: 0.550037; batch adversarial loss: 0.727945
epoch 1; iter: 0; batch classifier loss: 0.688435; batch adversarial loss: 0.682828
epoch 1; iter: 200; batch classifier loss: 0.482157; batch adversarial loss: 0.652845
epoch 2; iter: 0; batch classifier loss: 0.515063; batch adversarial loss: 0.658760
epoch 2; iter: 200; batch classifier loss: 0.324849; batch adversarial loss: 0.642040
epoch 3; iter: 0; batch classifier loss: 0.477608; batch adversarial loss: 0.625772
epoch 3; iter: 200; batch classifier loss: 0.404170; batch adversarial loss: 0.648867
epoch 4; iter: 0; batch classifier loss: 0.462056; batch adversarial loss: 0.662486
epoch 4; iter: 200; batch classifier loss: 0.386695; batch adversarial loss: 0.604904
epoch 5; iter: 0; batch classifier loss: 0.352023; batch adversarial loss: 0.619014
epoch 5; iter: 200; batch classifier loss: 0.419654; batch adversa

#### Plain model - without debiasing - dataset metrics

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.210455
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.206837


#### Model - with debiasing - dataset metrics

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.086405
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.088343


#### Model - with error debiasing - dataset metrics

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.078092
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.076155


#### Plain model - without debiasing - classification metrics

Test set: Classification accuracy = 0.801474
Test set: Balanced classification accuracy = 0.656691
Test set: Disparate impact = 0.000000
Test set: Equal opportunity difference = -0.444112
Test set: Average odds difference = -0.272758
Test set: Theil_index = 0.183319


#### Model - with debiasing - classification metrics

Test set: Classification accuracy = 0.792602
Test set: Balanced classification accuracy = 0.669734
Test set: Disparate impact = 0.565360
Test set: Equal opportunity difference = -0.040387
Test set: Average odds difference = -0.030369
Test set: Theil_index = 0.174534


#### Model - with error debiasing - classification metrics

Test set: Classification accuracy = 0.790760
Test set: Balanced classification accuracy = 0.670140
Test set: Disparate impact = 0.625511
Test set: Equal opportunity difference = -0.009590
Test set: Average odds difference = -0.010079
Test set: Theil_index = 0.174065


In [19]:
sess.close()
tf.reset_default_graph()
sess = tf.Session()

In [20]:
# Error debiasing
# Learn parameters with debias set to True
debiased_model3 = AdversarialErrorDebiasing(privileged_groups = privileged_groups,
                          unprivileged_groups = unprivileged_groups,
                          scope_name='debiasedae_classifier',
                          debias=True,
                          sess=sess,
                          absolute=True)
debiased_model3.fit(dataset_orig_train)

# Apply the plain model to test data
dataset_debiasing_train3 = debiased_model3.predict(dataset_orig_train)
dataset_debiasing_test3 = debiased_model3.predict(dataset_orig_test)


epoch 0; iter: 0; batch classifier loss: 0.708291; batch adversarial loss: 0.937322
epoch 0; iter: 200; batch classifier loss: 0.569984; batch adversarial loss: 0.857559
epoch 1; iter: 0; batch classifier loss: 0.541225; batch adversarial loss: 0.801067
epoch 1; iter: 200; batch classifier loss: 0.451444; batch adversarial loss: 0.712054
epoch 2; iter: 0; batch classifier loss: 0.494566; batch adversarial loss: 0.711163
epoch 2; iter: 200; batch classifier loss: 0.415045; batch adversarial loss: 0.678254
epoch 3; iter: 0; batch classifier loss: 0.415971; batch adversarial loss: 0.664181
epoch 3; iter: 200; batch classifier loss: 0.392761; batch adversarial loss: 0.663756
epoch 4; iter: 0; batch classifier loss: 0.417671; batch adversarial loss: 0.651717
epoch 4; iter: 200; batch classifier loss: 0.568655; batch adversarial loss: 0.639279
epoch 5; iter: 0; batch classifier loss: 0.467180; batch adversarial loss: 0.666735
epoch 5; iter: 200; batch classifier loss: 0.377802; batch adversa

In [21]:

# Metrics for the dataset from plain model (without debiasing)
display(Markdown("#### Plain model - without debiasing - dataset metrics"))
print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_nodebiasing_train.mean_difference())
print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_nodebiasing_test.mean_difference())

# Metrics for the dataset from model with debiasing
display(Markdown("#### Model - with debiasing - dataset metrics"))
metric_dataset_debiasing_train = BinaryLabelDatasetMetric(dataset_debiasing_train, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_debiasing_train.mean_difference())

metric_dataset_debiasing_test = BinaryLabelDatasetMetric(dataset_debiasing_test, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_debiasing_test.mean_difference())

# Metrics for the dataset from model with error debiasing
display(Markdown("#### Model - with error debiasing - dataset metrics"))
metric_dataset_debiasing_train2 = BinaryLabelDatasetMetric(dataset_debiasing_train2, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_debiasing_train2.mean_difference())

metric_dataset_debiasing_test2 = BinaryLabelDatasetMetric(dataset_debiasing_test2, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_debiasing_test2.mean_difference())



display(Markdown("#### Plain model - without debiasing - classification metrics"))
print("Test set: Classification accuracy = %f" % classified_metric_nodebiasing_test.accuracy())
TPR = classified_metric_nodebiasing_test.true_positive_rate()
TNR = classified_metric_nodebiasing_test.true_negative_rate()
bal_acc_nodebiasing_test = 0.5*(TPR+TNR)
print("Test set: Balanced classification accuracy = %f" % bal_acc_nodebiasing_test)
print("Test set: Disparate impact = %f" % classified_metric_nodebiasing_test.disparate_impact())
print("Test set: Equal opportunity difference = %f" % classified_metric_nodebiasing_test.equal_opportunity_difference())
print("Test set: Average odds difference = %f" % classified_metric_nodebiasing_test.average_odds_difference())
print("Test set: Theil_index = %f" % classified_metric_nodebiasing_test.theil_index())



display(Markdown("#### Model - with debiasing - classification metrics"))
classified_metric_debiasing_test = ClassificationMetric(dataset_orig_test, 
                                                 dataset_debiasing_test,
                                                 unprivileged_groups=unprivileged_groups,
                                                 privileged_groups=privileged_groups)
print("Test set: Classification accuracy = %f" % classified_metric_debiasing_test.accuracy())
TPR = classified_metric_debiasing_test.true_positive_rate()
TNR = classified_metric_debiasing_test.true_negative_rate()
bal_acc_debiasing_test = 0.5*(TPR+TNR)
print("Test set: Balanced classification accuracy = %f" % bal_acc_debiasing_test)
print("Test set: Disparate impact = %f" % classified_metric_debiasing_test.disparate_impact())
print("Test set: Equal opportunity difference = %f" % classified_metric_debiasing_test.equal_opportunity_difference())
print("Test set: Average odds difference = %f" % classified_metric_debiasing_test.average_odds_difference())
print("Test set: Theil_index = %f" % classified_metric_debiasing_test.theil_index())



display(Markdown("#### Model - with error debiasing - classification metrics"))
classified_metric_debiasing_test2 = ClassificationMetric(dataset_orig_test, 
                                                 dataset_debiasing_test2,
                                                 unprivileged_groups=unprivileged_groups,
                                                 privileged_groups=privileged_groups)
print("Test set: Classification accuracy = %f" % classified_metric_debiasing_test2.accuracy())
TPR = classified_metric_debiasing_test2.true_positive_rate()
TNR = classified_metric_debiasing_test2.true_negative_rate()
bal_acc_debiasing_test2 = 0.5*(TPR+TNR)
print("Test set: Balanced classification accuracy = %f" % bal_acc_debiasing_test2)
print("Test set: Disparate impact = %f" % classified_metric_debiasing_test2.disparate_impact())
print("Test set: Equal opportunity difference = %f" % classified_metric_debiasing_test2.equal_opportunity_difference())
print("Test set: Average odds difference = %f" % classified_metric_debiasing_test2.average_odds_difference())
print("Test set: Theil_index = %f" % classified_metric_debiasing_test2.theil_index())

display(Markdown("#### Model - with absolute error debiasing - classification metrics"))
classified_metric_debiasing_test3 = ClassificationMetric(dataset_orig_test, 
                                                 dataset_debiasing_test3,
                                                 unprivileged_groups=unprivileged_groups,
                                                 privileged_groups=privileged_groups)
print("Test set: Classification accuracy = %f" % classified_metric_debiasing_test3.accuracy())
TPR = classified_metric_debiasing_test3.true_positive_rate()
TNR = classified_metric_debiasing_test3.true_negative_rate()
bal_acc_debiasing_test3 = 0.5*(TPR+TNR)
print("Test set: Balanced classification accuracy = %f" % bal_acc_debiasing_test3)
print("Test set: Disparate impact = %f" % classified_metric_debiasing_test3.disparate_impact())
print("Test set: Equal opportunity difference = %f" % classified_metric_debiasing_test3.equal_opportunity_difference())
print("Test set: Average odds difference = %f" % classified_metric_debiasing_test3.average_odds_difference())
print("Test set: Theil_index = %f" % classified_metric_debiasing_test3.theil_index())

#### Plain model - without debiasing - dataset metrics

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.210455
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.206837


#### Model - with debiasing - dataset metrics

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.086405
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.088343


#### Model - with error debiasing - dataset metrics

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.078092
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.076155


#### Plain model - without debiasing - classification metrics

Test set: Classification accuracy = 0.801474
Test set: Balanced classification accuracy = 0.656691
Test set: Disparate impact = 0.000000
Test set: Equal opportunity difference = -0.444112
Test set: Average odds difference = -0.272758
Test set: Theil_index = 0.183319


#### Model - with debiasing - classification metrics

Test set: Classification accuracy = 0.792602
Test set: Balanced classification accuracy = 0.669734
Test set: Disparate impact = 0.565360
Test set: Equal opportunity difference = -0.040387
Test set: Average odds difference = -0.030369
Test set: Theil_index = 0.174534


#### Model - with error debiasing - classification metrics

Test set: Classification accuracy = 0.790760
Test set: Balanced classification accuracy = 0.670140
Test set: Disparate impact = 0.625511
Test set: Equal opportunity difference = -0.009590
Test set: Average odds difference = -0.010079
Test set: Theil_index = 0.174065


#### Model - with absolute error debiasing - classification metrics

Test set: Classification accuracy = 0.800996
Test set: Balanced classification accuracy = 0.654370
Test set: Disparate impact = 0.000000
Test set: Equal opportunity difference = -0.437126
Test set: Average odds difference = -0.268230
Test set: Theil_index = 0.184701
