# FAIR IN-PROCESSING

This notebook implements the Adersarial Debiasing in-processor [(Zhang et al. 2018)](https://dl.acm.org/doi/abs/10.1145/3278721.3278779).

The modeling is performed separately for each combination of training folds. This is controlled with `use_fold` variable. To fit adervsarial debiasing on a different combination of training folds, set `use_fold` to a specific value and restar the kernel.

A further analysis of the processor outputs is performed in `code_05_inprocess3.R`.

The notebook loads the data exported in `code_00_partitinoing.ipynb` and applies pre-processors. The processor predictions are exported as CSV files.

## 1. Parameters and preparations

In [1]:
##### PARAMETERS

# working paths
%run 'code_00_working_paths.py'

# sepcify data set
# one of ['bene', 'german', 'uk', 'taiwan', 'pkdd', 'gmsc', 'homecredit']
data = 'taiwan'

# partitioning
num_folds = 5
use_fold  = 4 # one of [0, 1, ..., num_folds-1]
seed      = 1

In [2]:
##### IN-PROCESSOR PARAMS

adversary_loss_weight = 0.1 # other options: [0.1, 0.01, 0.001]

In [None]:
##### PACKAGES

import sys
sys.path.append(func_path)

import pickle
import numpy as np
import time

from load_data import *

import tensorflow as tf

from aif360.metrics import BinaryLabelDatasetMetric
from aif360.metrics import ClassificationMetric
from aif360.metrics.utils import compute_boolean_conditioning_vector
from aif360.algorithms.inprocessing.adversarial_debiasing import AdversarialDebiasing

from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import MaxAbsScaler

import matplotlib.pyplot as plt

## 2. Data import

In [4]:
##### RANDOM SEED

np.random.seed(seed)

In [5]:
##### LOAD PARTITIONING

dataset_orig_test = pickle.load(open(data_path + 'prepared/' + data + '_orig_test.pkl', 'rb'))
te                = dataset_orig_test.convert_to_dataframe()[0]

print(te.shape)

(15000, 186)


In [6]:
##### DATA PREP

# protected attribute
protected           = 'AGE'
privileged_groups   = [{'AGE': 1}] 
unprivileged_groups = [{'AGE': 0}]

## 3. Fair processing

In [7]:
##### MODELING

# timer
cv_start = time.time()

# loop through training folds
for fold in range(num_folds):
    
    ##### LOAD DATA
    
    # select fold combination
    if fold != use_fold:
        continue
    
    # feedback
    print('-'*30)
    print('- FOLD ' + str(fold) + '...')
    print('-'*30)

    # import data
    data_train = pickle.load(open(data_path + 'prepared/' + data + '_scaled_' + str(fold) + '_train.pkl', 'rb'))
    data_valid = pickle.load(open(data_path + 'prepared/' + data + '_scaled_' + str(fold) + '_valid.pkl', 'rb'))
    data_test  = pickle.load(open(data_path + 'prepared/' + data + '_scaled_' + str(fold) + '_test.pkl',  'rb'))
    

    ##### MODELING

    # start tensorflow session
    sess = tf.Session()

    # fit adversarial debiasing
    debiased_model = AdversarialDebiasing(privileged_groups     = privileged_groups,
                                          unprivileged_groups   = unprivileged_groups,
                                          debias                = True,
                                          adversary_loss_weight = adversary_loss_weight,
                                          scope_name            = 'debiased_classifier',
                                          sess                  = sess)
    debiased_model.fit(data_train)
    
    # apply the model to valid data
    scores_valid = debiased_model.predict(data_valid).scores
    advdebias_predictions = pd.DataFrame()
    advdebias_predictions['scores']  = scores_valid
    advdebias_predictions['targets'] = data_valid.labels.flatten()
    advdebias_predictions.to_csv(res_path + 'intermediate/' + data + '_' + str(fold) + '_AD_' + str(adversary_loss_weight) + '_predictions_valid.csv', 
                                 index  = None, 
                                 header = True)
    
    # apply the model to test data
    scores_test = debiased_model.predict(data_test).scores
    advdebias_predictions = pd.DataFrame()
    advdebias_predictions['scores'] = scores_test
    advdebias_predictions.to_csv(res_path + 'intermediate/' + data + '_' + str(fold) + '_AD_' + str(adversary_loss_weight) + '_predictions_test.csv', 
                                 index  = None, 
                                 header = True)
    
    # print performance
    print('')
    print('Finished in {:.2f} minutes'.format((time.time() - cv_start) / 60))

------------------------------
- FOLD 4...
------------------------------



The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where





epoch 0; iter: 0; batch classifier loss: 0.692659; batch adversarial loss: 0.923862
epoch 0; iter: 200; batch classifier loss: 0.604467; batch adversarial loss: 0.841103
epoch 1; iter: 0; batch classifier loss: 0.480069; batch adversarial loss: 0.873279
epoch 1; iter: 200; batch classifier loss: 0.540533; batch adversarial loss: 0.658383
epoch