# *fairlib*: A Unified Framework for Assessing and Improving Fairness

Xudong Han$^{1}$, &nbsp; Aili Shen$^{1,2, a}$, &nbsp; Yitong Li$^{3}$, &nbsp; Lea Frermann$^{1}$, &nbsp; Timothy Baldwin$^{1,4}$, &nbsp; and &nbsp; Trevor Cohn$^{1}$  

$^{1}$ The University of Melbourne

$^{2}$ Alexa AI, Amazon

$^{3}$ Huawei Technologies Co., Ltd.

$^{4}$ MBZUAI

<img src="https://upload.wikimedia.org/wikipedia/en/thumb/e/ed/Logo_of_the_University_of_Melbourne.svg/330px-Logo_of_the_University_of_Melbourne.svg.png" height="100"/> &nbsp; &nbsp; &nbsp; &nbsp;
<img src="https://2019.emnlp.org/assets/images/logos/huawei-logo.png" height="100"/> &nbsp; &nbsp; &nbsp; &nbsp;
<img src="https://upload.wikimedia.org/wikipedia/en/5/55/Mohamed_bin_Zayed_University_of_Artificial_Intelligence_logo.png" height="100"/>

---
- $^a$ Work carried out at The University of Melbourne
- *fairlib* is licensed under the **Apache License 2.0**

[GitHub](https://github.com/HanXudong/fairlib), [Docs](https://hanxudong.github.io/fairlib/), [PyPI](https://pypi.org/project/fairlib/)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/HanXudong/fairlib/blob/main/tutorial/fairlib_demo.ipynb)

In this video, we will demostrate how to:
1.   Install *fairlib*
2.   Access fairness benchmark datasets
3.   Train a vanilla model without debiasing, and measure fairness
4.   Improve fairness with most recent debiasing methods
5.   Analyze the results, such as creating tables and figures

In [1]:
import fairlib

In [1]:
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

import logging

from fairlib.src.base_options import BaseOptions
from fairlib.src import networks

args = {
    # The name of the dataset, corresponding dataloader will be used,
    "dataset":  "Rob_gender",

    # Specifiy the path to the input data
    "data_dir": "../../../../datasets/RoB/sex",

    # Device for computing, -1 is the cpu; non-negative numbers indicate GPU id.
    "device_id":    0,
    
    "emb_size": 768,
    
    "encoder_architecture": "BERT",
    
    "epochs": 1,

    # The default path for saving experimental results
    "results_dir":  r"./fairlib/results",

    # Will be used for saving experimental results
    "project_dir":  r"dev",

    # We will focusing on TPR GAP, implying the Equalized Odds for binary classification.
    "GAP_metric_name":  "TPR_GAP",

    # The overall performance will be measured as accuracy
    "Performance_metric_name":  "accuracy",
    # Model selections are based on distance to optimum, see section 4 in our paper for more details
    "selection_criterion":  "accuracy",

    # Default dirs for saving checkpoints
    "checkpoint_dir":   "models",
    "checkpoint_name":  "BEST_checkpoint",
    "test_batch_size":32,
    "batch_size":32,

    # Loading experimental results
    "n_jobs":   1,
}

options = BaseOptions()
state = options.get_state(args=args)

# Init the model
#model = networks.get_main_model(state)

#model.train_self()

INFO:root:Unexpected args: ['-f', '/beegfs/home/artem.vazhentsev/.local/share/jupyter/runtime/kernel-fb2be37b-b00e-4e3d-8924-24703227c52c.json']
INFO:root:Logging to ./fairlib/results/dev/Rob_gender/test/output.log


2023-03-16 18:10:58 [INFO ]  Base directory is ./fairlib/results/dev/Rob_gender/test
2023-03-16 18:10:58 [INFO ]  Options: 
2023-03-16 18:10:58 [INFO ]  	BT: null
2023-03-16 18:10:58 [INFO ]  	BTObj: null
2023-03-16 18:10:58 [INFO ]  	DyBT: null
2023-03-16 18:10:58 [INFO ]  	DyBTObj: null
2023-03-16 18:10:58 [INFO ]  	DyBTalpha: 0.1
2023-03-16 18:10:58 [INFO ]  	DyBTinit: original
2023-03-16 18:10:58 [INFO ]  	FCL: false
2023-03-16 18:10:58 [INFO ]  	FCLObj: g
2023-03-16 18:10:58 [INFO ]  	GAP_metric_name: TPR_GAP
2023-03-16 18:10:58 [INFO ]  	GBT: false
2023-03-16 18:10:58 [INFO ]  	GBTObj: null
2023-03-16 18:10:58 [INFO ]  	GBT_N: null
2023-03-16 18:10:58 [INFO ]  	GBT_alpha: 1
2023-03-16 18:10:58 [INFO ]  	INLP: false
2023-03-16 18:10:58 [INFO ]  	INLP_by_class: false
2023-03-16 18:10:58 [INFO ]  	INLP_discriminator_reweighting: null
2023-03-16 18:10:58 [INFO ]  	INLP_min_acc: 0.0
2023-03-16 18:10:58 [INFO ]  	INLP_n: 190
2023-03-16 18:10:58 [INFO ]  	Performance_metric_name: accura

In [2]:
state.opt.train_generator.dataset.X.shape

(13825, 128)

In [3]:
state.opt.dev_generator.dataset.X.shape

(1536, 128)

In [4]:
import numpy as np
from sklearn.model_selection import KFold
X = np.concatenate([state.opt.train_generator.dataset.X, state.opt.dev_generator.dataset.X])[:100]
y = np.concatenate([state.opt.train_generator.dataset.y, state.opt.dev_generator.dataset.y])[:100]
protected_label = np.concatenate([state.opt.train_generator.dataset.protected_label, state.opt.dev_generator.dataset.protected_label])[:100]

if state.encoder_architecture=="BERT":
    token_type_ids = np.concatenate([state.opt.train_generator.dataset.token_type_ids, state.opt.dev_generator.dataset.token_type_ids])[:100]
    mask = np.concatenate([state.opt.train_generator.dataset.mask, state.opt.dev_generator.dataset.mask])[:100]

In [5]:
X.shape, mask.shape

((100, 128), (100, 128))

In [9]:
def gen_data(a=[1,2,3,4,5]):
    for x in a:
        yield x
        
        
for x in gen_data():
    print(x)

1
2
3
4
5


In [6]:
from sklearn.model_selection import KFold
kf = KFold(n_splits=5, random_state=state.base_seed, shuffle=True)

for i, (train_index, dev_index) in enumerate(kf.split(X)):
    state.opt.train_generator.dataset.X, state.opt.dev_generator.dataset.X = X[train_index], X[dev_index]
    state.opt.train_generator.dataset.y, state.opt.dev_generator.dataset.y = y[train_index], y[dev_index]
    state.opt.train_generator.dataset.protected_label, state.opt.dev_generator.dataset.protected_label = protected_label[train_index], protected_label[dev_index]
    
    if state.encoder_architecture=="BERT":
        state.opt.train_generator.dataset.token_type_ids, state.opt.dev_generator.dataset.token_type_ids = token_type_ids[train_index], token_type_ids[dev_index]
        state.opt.train_generator.dataset.mask, state.opt.dev_generator.dataset.mask = mask[train_index], mask[dev_index]
        
    model = networks.get_main_model(state)
    model.train_self()

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.transform.dense.bias', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


2023-03-16 18:11:44 [INFO ]  MLP( 
2023-03-16 18:11:44 [INFO ]    (output_layer): Linear(in_features=300, out_features=2, bias=True)
2023-03-16 18:11:44 [INFO ]    (AF): Tanh()
2023-03-16 18:11:44 [INFO ]    (dropout): Dropout(p=0, inplace=False)
2023-03-16 18:11:44 [INFO ]    (hidden_layers): ModuleList(
2023-03-16 18:11:44 [INFO ]      (0): Linear(in_features=768, out_features=300, bias=True)
2023-03-16 18:11:44 [INFO ]      (1): Dropout(p=0, inplace=False)
2023-03-16 18:11:44 [INFO ]      (2): Tanh()
2023-03-16 18:11:44 [INFO ]      (3): Linear(in_features=300, out_features=300, bias=True)
2023-03-16 18:11:44 [INFO ]      (4): Dropout(p=0, inplace=False)
2023-03-16 18:11:44 [INFO ]      (5): Tanh()
2023-03-16 18:11:44 [INFO ]    )
2023-03-16 18:11:44 [INFO ]    (criterion): CrossEntropyLoss()
2023-03-16 18:11:44 [INFO ]  )
2023-03-16 18:11:44 [INFO ]  Total number of parameters: 321602 

2023-03-16 18:11:44 [INFO ]  BERTClassifier( 
2023-03-16 18:11:44 [INFO ]    (bert): BertModel(


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.transform.dense.bias', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


2023-03-16 18:12:03 [INFO ]  MLP( 
2023-03-16 18:12:03 [INFO ]    (output_layer): Linear(in_features=300, out_features=2, bias=True)
2023-03-16 18:12:03 [INFO ]    (AF): Tanh()
2023-03-16 18:12:03 [INFO ]    (dropout): Dropout(p=0, inplace=False)
2023-03-16 18:12:03 [INFO ]    (hidden_layers): ModuleList(
2023-03-16 18:12:03 [INFO ]      (0): Linear(in_features=768, out_features=300, bias=True)
2023-03-16 18:12:03 [INFO ]      (1): Dropout(p=0, inplace=False)
2023-03-16 18:12:03 [INFO ]      (2): Tanh()
2023-03-16 18:12:03 [INFO ]      (3): Linear(in_features=300, out_features=300, bias=True)
2023-03-16 18:12:03 [INFO ]      (4): Dropout(p=0, inplace=False)
2023-03-16 18:12:03 [INFO ]      (5): Tanh()
2023-03-16 18:12:03 [INFO ]    )
2023-03-16 18:12:03 [INFO ]    (criterion): CrossEntropyLoss()
2023-03-16 18:12:03 [INFO ]  )
2023-03-16 18:12:03 [INFO ]  Total number of parameters: 321602 

2023-03-16 18:12:03 [INFO ]  BERTClassifier( 
2023-03-16 18:12:03 [INFO ]    (bert): BertModel(


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.transform.dense.bias', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


2023-03-16 18:12:19 [INFO ]  MLP( 
2023-03-16 18:12:19 [INFO ]    (output_layer): Linear(in_features=300, out_features=2, bias=True)
2023-03-16 18:12:19 [INFO ]    (AF): Tanh()
2023-03-16 18:12:19 [INFO ]    (dropout): Dropout(p=0, inplace=False)
2023-03-16 18:12:19 [INFO ]    (hidden_layers): ModuleList(
2023-03-16 18:12:19 [INFO ]      (0): Linear(in_features=768, out_features=300, bias=True)
2023-03-16 18:12:19 [INFO ]      (1): Dropout(p=0, inplace=False)
2023-03-16 18:12:19 [INFO ]      (2): Tanh()
2023-03-16 18:12:19 [INFO ]      (3): Linear(in_features=300, out_features=300, bias=True)
2023-03-16 18:12:19 [INFO ]      (4): Dropout(p=0, inplace=False)
2023-03-16 18:12:19 [INFO ]      (5): Tanh()
2023-03-16 18:12:19 [INFO ]    )
2023-03-16 18:12:19 [INFO ]    (criterion): CrossEntropyLoss()
2023-03-16 18:12:19 [INFO ]  )
2023-03-16 18:12:19 [INFO ]  Total number of parameters: 321602 

2023-03-16 18:12:19 [INFO ]  BERTClassifier( 
2023-03-16 18:12:19 [INFO ]    (bert): BertModel(


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.transform.dense.bias', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


2023-03-16 18:12:36 [INFO ]  MLP( 
2023-03-16 18:12:36 [INFO ]    (output_layer): Linear(in_features=300, out_features=2, bias=True)
2023-03-16 18:12:36 [INFO ]    (AF): Tanh()
2023-03-16 18:12:36 [INFO ]    (dropout): Dropout(p=0, inplace=False)
2023-03-16 18:12:36 [INFO ]    (hidden_layers): ModuleList(
2023-03-16 18:12:36 [INFO ]      (0): Linear(in_features=768, out_features=300, bias=True)
2023-03-16 18:12:36 [INFO ]      (1): Dropout(p=0, inplace=False)
2023-03-16 18:12:36 [INFO ]      (2): Tanh()
2023-03-16 18:12:36 [INFO ]      (3): Linear(in_features=300, out_features=300, bias=True)
2023-03-16 18:12:36 [INFO ]      (4): Dropout(p=0, inplace=False)
2023-03-16 18:12:36 [INFO ]      (5): Tanh()
2023-03-16 18:12:36 [INFO ]    )
2023-03-16 18:12:36 [INFO ]    (criterion): CrossEntropyLoss()
2023-03-16 18:12:36 [INFO ]  )
2023-03-16 18:12:36 [INFO ]  Total number of parameters: 321602 

2023-03-16 18:12:36 [INFO ]  BERTClassifier( 
2023-03-16 18:12:36 [INFO ]    (bert): BertModel(


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.transform.dense.bias', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


2023-03-16 18:12:52 [INFO ]  MLP( 
2023-03-16 18:12:52 [INFO ]    (output_layer): Linear(in_features=300, out_features=2, bias=True)
2023-03-16 18:12:52 [INFO ]    (AF): Tanh()
2023-03-16 18:12:52 [INFO ]    (dropout): Dropout(p=0, inplace=False)
2023-03-16 18:12:52 [INFO ]    (hidden_layers): ModuleList(
2023-03-16 18:12:52 [INFO ]      (0): Linear(in_features=768, out_features=300, bias=True)
2023-03-16 18:12:52 [INFO ]      (1): Dropout(p=0, inplace=False)
2023-03-16 18:12:52 [INFO ]      (2): Tanh()
2023-03-16 18:12:52 [INFO ]      (3): Linear(in_features=300, out_features=300, bias=True)
2023-03-16 18:12:52 [INFO ]      (4): Dropout(p=0, inplace=False)
2023-03-16 18:12:52 [INFO ]      (5): Tanh()
2023-03-16 18:12:52 [INFO ]    )
2023-03-16 18:12:52 [INFO ]    (criterion): CrossEntropyLoss()
2023-03-16 18:12:52 [INFO ]  )
2023-03-16 18:12:52 [INFO ]  Total number of parameters: 321602 

2023-03-16 18:12:52 [INFO ]  BERTClassifier( 
2023-03-16 18:12:52 [INFO ]    (bert): BertModel(


In [8]:
!nvidia-smi

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Thu Mar 16 18:10:07 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA GeForce ...  Off  | 00000000:02:00.0 Off |                  N/A |
| 21%   37C    P2    58W / 250W |  10903MiB / 11178MiB |      0%      Default |
|                               |            