## GAN Data-Debiasing

### Table of contents :
1. [Introduction](#1.-Introduction)
2. [Train Attribute Classifier](#2.-Train-Attribute-Classifier)
	* [Compute Fairness](#Compute-fairness-of-Attribute-classifier)
3. [Mitigate bias using GAN's Latent Space De-biasing.](#3.-Mitigate-the-bias-using-GAN's-Latent-Space-De-biasing)
4. [Train Attribute classifier with debiased dataset](#4.-Train-Attribute-classifier-with-debiased-dataset)
	* [Compute Fairness](#Compute-Fairness)
5. [Summary](#5.-Summary)

### 1. Introduction


Welcome !


Application of Deep Learning for vision problems has proliferated in recent times. Its widespread application necessitates need for "Fairness". Deep Learning models trained with only labels as beacons possibly learn spurious correlations between labels and certain features [or attributes].


for ex., if we train a model to classify "people wearing hats", the classifier may inadvertantly associate sunglasses with hats if the dataset has narrow distribution of mostly outdoor images (people often wear both hats and sunglasses together outdoors and take them off indoors). Because of this correlation in the training data classifier trained to recognize a hat may rely on the presence of glasses. As a result, the classifier may fail to recognize a hat in the absence of sunglasses. Capturing perfectly balanced dataset is not feasible in many cases. In such cases, as shown in this notebook, one could train classifiers while mitigating biases that stem from these correlations, by using [Fair Attribute Classification through Latent Space De-biasing](https://arxiv.org/abs/2012.01469).

### Preparation
Let's start by installing nnabla and accessing [nnabla-examples repository](https://github.com/sony/nnabla-examples). If you're running on Colab, make sure that your Runtime setting is set as GPU, which can be set up from the top menu (Runtime → change runtime type), and make sure to click **Connect** on the top right-hand side of the screen before you start.

Before we go into detailed explanation, here is a sneak peek into the steps involved in the process:


In [None]:
# Preparation
# May show warnings for newly imported packages if run in Colab default python environment.
# Please click the `RESTART RUNTIME` to run the following script correctly.
# The error message of conflicts is acceptable.
!git clone https://github.com/sony/nnabla-examples.git
!pip install albumentations
!pip install nnabla-ext-cuda116

In [None]:
%cd nnabla-examples/responsible_ai/gan_data_debiased

In [None]:
import cv2
from google.colab.patches import cv2_imshow
img = cv2.imread('images/gan_data_debiasing_workflow.png')
cv2_imshow(img)

As illustrated in the picture, we use [GAN](https://arxiv.org/abs/1406.2661) technique to generate realistic-looking images and perturb these images in the underlying latent space to generate training data that is balanced for each protected attribute. Then we augment the original dataset with this generated data and empirically demonstrate that target classifiers trained on the augmented dataset exhibits a number of both quantitative and qualitative benefits.


Let's first train a baseline attribute classifier with original [celebA](https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html) dataset, analyze model fairness. Then we shall train the target attribute classifier with a balanced dataset and original dataset to make the model fair if fairness metric is not satisfactory.


In [None]:
import os
import glob
import pickle
import shutil
import numpy as np
from PIL import Image
import albumentations as A
import nnabla as nn
from nnabla.ext_utils import get_extension_context
from nnabla.utils.data_iterator import data_iterator_simple
import classifier as clf
from utils import utils

Let us train an `Arched Eyebrow` classifier that is not dependent on gender expression. For this, we require a dataset that has target labels (`Arched eyebrow`) as well as gender expression labels. [celebA](https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html) is a dataset with 2,022,599 images of celebrity faces, each with 40 binary attributes labels. We assume the `Male` attribute corresponds to gender expression and the target attribute is `Arched Eyebrow`.


NNabla provides various utilities for using data for training/validation. Here, we will use NNabla data_iterator_simple.


In [None]:
def data_iterator_celeba(img_path,attr_path, batch_size,
                         target_attribute=1,protected_attribute=20,
                         num_samples=-1,augment=False, shuffle=False, rng=None):
    """
    create celebA data iterator
    Args:
        img_path (str) : image path directory 
        attr_path (str) : celebA attribute file path (ex: list_attr_celeba.txt)
        batch_size (int) :  number of samples contained in each generated batch
        target_attribute (int) : target attribute (ex: Arched EyeBrows (1), Bushy Eyebrows(12), smilling (31),etc..)
        protected_attribute (int): protected attribute (ex: Male (20), Pale_Skin (26))
        num_samples (int) : number of samples taken in data loader
                            (if num_samples=-1, it will take all the images in the dataset)
        augment (bool) : data augmentation (True for training)
        shuffle (bool) : shuffle the data (True /False)
        rng : None
    Returns:
        simple data iterator
    """
    
    imgs = glob.glob("{}/*.jpg".format(img_path))
    label_file = open(attr_path, 'r')
    label_file = label_file.readlines()
    labels = {}
    # skipping first two rows(headers)
    for i in range(2, len(label_file)):
        temp = label_file[i].strip().split()

        labels[os.path.join(img_path, temp[0])] = np.array(
            [int((int(temp[target_attribute + 1]) + 1) / 2), int((int(temp[protected_attribute + 1]) + 1) / 2)])
    
    # as per the author's citation, we have transformed the input image
    # (resize to 64×64, 256×256, 224×224)
    pre_process = [(64, 64), (256, 256), (224, 224)]
    mean_normalize = (0.485, 0.456, 0.406)
    std_normalize = (0.229, 0.224, 0.225)
    
    if augment:
        transform = A.Compose([
            A.Resize(pre_process[0][0], pre_process[0][1]),
            A.Resize(pre_process[1][0], pre_process[1][1]),
            A.RandomCrop(width=pre_process[2][0], height=pre_process[2][1]),
            A.HorizontalFlip(p=0.5),
            A.Normalize(mean=mean_normalize, std=std_normalize)
        ])
    
    else:
        transform = A.Compose([
            A.Resize(pre_process[0][0], pre_process[0][1]),
            A.Resize(pre_process[1][0], pre_process[1][1]),
            A.CenterCrop(width=pre_process[2][0], height=pre_process[2][1]),
            A.Normalize(mean=mean_normalize, std=std_normalize)
        ])
    if num_samples == -1:
        num_samples = len(imgs)
    else:
        print("Num. of data ({}) is used for debugging".format(num_samples))
        
    def load_func(i):
        # crop the aligned & cropped 178×218 images to 128x128
        cx = 121
        cy = 89
        c_pixels = 64
        img = Image.open(imgs[i])
        img = np.array(img.convert('RGB'))
        img = img[cx - c_pixels:cx+c_pixels, cy-c_pixels:cy+c_pixels]
        # transform
        transformed_image = transform(image=img)['image'].transpose(2, 0, 1)
        return transformed_image, labels[imgs[i]]
    return data_iterator_simple(load_func, num_samples, batch_size, shuffle=shuffle, rng=rng, with_file_cache=False)

In [None]:
def split_celeba_dataset(img_path, attr_path, out_dir, split="test"):
    
    """
    split the celebA dataset
    Args:
        img_path (str) : image path directory 
        attr_path (str) : celebA attribute file path (ex: list_attr_celeba.txt)
        out_dir (str) : Path where the split data to be saved
        split (string) : split the dataset depends on the split attribute(train, valid, and test)
    """
    # as per the author's citation, we are splitting the dataset
    train_beg = 0  # train starts from
    valid_beg = 162770  # valid starts from
    test_beg = 182610  # test starts from
    
    label_file = open(attr_path, 'r')
    label_file = label_file.readlines()
    
    # skipping the first two rows for header
    total_samples = len(label_file) - 2
    if split == 'train':
        number_samples = valid_beg - train_beg
        beg = train_beg
    
    elif split == 'valid':
        number_samples = test_beg - valid_beg
        beg = valid_beg
    
    elif split == 'test':
        number_samples = total_samples - test_beg
        beg = test_beg
    else:
        print('Error')
        return
    
    if not os.path.exists(out_dir):
        os.makedirs(out_dir)
    
    for i in range(beg + 2, beg + number_samples + 2):
        temp = label_file[i].strip().split()
        src_dir = os.path.join(img_path,temp[0])
        dst_dir = os.path.join(out_dir,temp[0])
        shutil.copy(src_dir, dst_dir)
    print("splitting completed")
        

### 2. Train Attribute Classifier

Let's start with importing basic modules for GPU first.

In [None]:
from nnabla.ext_utils import get_extension_context
ctx = get_extension_context('cudnn')
nn.set_default_context(ctx)

#### Now let's run Attribute Classifier 

For training Attribute Classifier, we have taken [ResNet-50](https://nnabla.org/pretrained-models/nnp_models/imagenet/Resnet-50/Resnet-50.nnp) pretrained on [ImageNet](https://image-net.org/) as the base architecture. We replaced the linear layer in ResNet with two linear layers with the hidden layer of size 2048. Dropout and ReLU are applied between these. We train all models with binary cross entropy loss for 20 epochs with a batch size of 32. We use the [Adam](https://arxiv.org/abs/1412.6980) optimizer with a learning rate of 1e-3.


We have trained the Attribute Classifier and saved model with best accuracy on validation set. If you want to train the baseline Attribute Classifier from the scratch please refer to our GitHub page and follow the steps.


Now let us get the pre-trained weights for the classifier and load the model. Then we shall check model fairness.


In [None]:
# download the celeba dataset and unzip
URL = "https://www.dropbox.com/s/d1kjpkqklf0uw77/celeba.zip?dl=0"
ZIP_FILE= "./data/celeba.zip"
!mkdir -p ./data/
!wget -N $URL -O $ZIP_FILE
!unzip $ZIP_FILE -d ./data/
!rm $ZIP_FILE

In [None]:
# download the pre-trained weights
!wget https://nnabla.org/pretrained-models/nnabla-examples/responsible_ai/gan_data_debiased/baseline.h5
!wget https://nnabla.org/pretrained-models/nnabla-examples/responsible_ai/gan_data_debiased/val_baseline.pkl

In [None]:
nn.clear_parameters()

attribute_classifier_model = clf.attribute_classifier(model_load_path=r'baseline.h5')
# split the dataset
split_celeba_dataset(r'./data/celeba/images', r'./data/celeba/list_attr_celeba.txt', r'./test',split="test")
# load dataloader

test = data_iterator_celeba(img_path= r'./test',
                            attr_path= r'./data/celeba/list_attr_celeba.txt',
                            batch_size=32, target_attribute=1, protected_attribute=20)
cal_thresh = pickle.load(open(r'val_baseline.pkl', 'rb'))['cal_thresh']

### Compute Attribute classifier fairness


Let's start our investigation of classifier model fairness by analyzing the predictions made on test set. In this tutorial, we use average precission (AP) metric to measure classifier accuracy and three metrics to compute model fairness. First, we measure the [difference in equality of opportunity](https://arxiv.org/abs/2004.01355) (DEO), i.e. the absolute difference in False Negative Rate (FNR) for protected attribute group. As our second fairness metric, we use the [bias amplification](https://arxiv.org/abs/2102.12594) (BA) metric proposed by Wang and Russakovsky. Intuitively, BA measures how much more often a target attribute is predicted with a protected attribute than the ground truth value. Both DEO and BA fluctuate based on the chosen classification threshold. Therefore, as our final fairness metric, we use a threshold-invariant metric that measures the divergence between score distributions [(KL)](https://arxiv.org/abs/2006.10667).


In [None]:
test_targets, test_scores = attribute_classifier_model.get_scores(test)
test_pred = np.where(test_scores > cal_thresh, 1, 0)

ap = utils.get_average_precision(test_targets[:, 0], test_scores)
deo = utils.get_difference_equality_opportunity(test_targets[:, 1],
                                                     test_targets[:, 0], test_pred)
ba = utils.get_bias_amplification(test_targets[:, 1],
                                     test_targets[:, 0], test_pred)
kl = utils.get_kl_divergence(test_targets[:, 1], test_targets[:, 0], test_scores)

print('Test results: ')
print('AP : {:.1f}', 100 * ap)
print('DEO : {:.1f}', 100 * deo)
print('BA : {:.1f}', 100 * ba)
print('KL : {:.1f}', kl)

### 3. Mitigate the bias using GAN's Latent Space De-biasing

---



---



As mentioned earlier in this notebook, to debias the dataset we need to generate a balanced synthetic dataset.


To generate images, we use a Progressive GAN from nnabla-examples page. This is with 512-D latent space trained on the CelebA dataset. We use 10000 synthetic images, labeled with baseline attribute classifiers and learn hyperplanes (for both target and protected ht, hg) in the latent space with scikit-learn’s linear SVM implementation.


In this tutorial, Progressive GAN training is skipped intentionally. Please refer to our [GitHub page](https://github.com/sony/nnabla-examples/tree/master/GANs/pggan) for more info if you are interested in training PG GAN.




### 4. Train Attribute classifier with debiased dataset
Well!
Now we can train Attribute Classifier using both balanced synthetic dataset and original dataset. If you want to train the GAN-debiased model from the scratch please refer to our GitHub page and follow the steps.


Let's get pre-trained weights of GAN-debiased model and check model fairness.

In [None]:
# download the pre-trained weights
!wget https://nnabla.org/pretrained-models/nnabla-examples/responsible_ai/gan_data_debiased/gan_data_debised.h5
!wget https://nnabla.org/pretrained-models/nnabla-examples/responsible_ai/gan_data_debiased/val_gan_data_debised.pkl

In [None]:
nn.clear_parameters()
attribute_classifier_debiased_model = clf.attribute_classifier(model_load_path=r'gan_data_debised.h5')
cal_thresh = pickle.load(open(r'val_gan_data_debised.pkl', 'rb'))['cal_thresh']

### Compute Fairness 

In [None]:
test_targets, test_scores = attribute_classifier_debiased_model.get_scores(test)
test_pred = np.where(test_scores > cal_thresh, 1, 0)

ap = utils.get_average_precision(test_targets[:, 0], test_scores)
deo = utils.get_difference_equality_opportunity(test_targets[:, 1],
                                                     test_targets[:, 0], test_pred)
ba = utils.get_bias_amplification(test_targets[:, 1],
                                     test_targets[:, 0], test_pred)
kl = utils.get_kl_divergence(test_targets[:, 1], test_targets[:, 0], test_scores)

print('Test results: ')
print('AP : {:.1f}', 100 * ap)
print('DEO : {:.1f}', 100 * deo)
print('BA : {:.1f}', 100 * ba)
print('KL : {:.1f}', kl)

If you observe above results with four metrics, GAN-debiased model performs better on all three fairness metrics: DEO, BA, and KL Divergence, while maintaining comparable AP over the baseline `Arched Brows` Classifier.

### 5. Summary

In this tutorial, we have shown how GAN-based data augmentation method could be employed to train a fairer Attribute Classifier in the presense of correlation between target label (Arched eyebrows) and a protected attribute (Gender Expression). In the same way, you may try different attribute classifiers. If there is bias in baseline Attribute Classifier model, try to balance the training data with an approach like GAN for data augmentation. To train different attribute classifiers, please refer to our GitHub page and follow the steps.

### References
1. Ramaswamy, Vikram V., Sunnie SY Kim, and Olga Russakovsky. "Fair attribute classification through latent space de-biasing." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021
2. Goodfellow, Ian, et al. "Generative adversarial nets." Advances in neural information processing systems 27 (2014).
3. Liu, Ziwei, et al. "Large-scale celebfaces attributes (celeba) dataset." Retrieved August 15.2018 (2018): 11.
4. Kingma, Diederik P., and Jimmy Ba. "Adam: A method for stochastic optimization." arXiv preprint arXiv:1412.6980 (2014).
5. Lokhande, Vishnu Suresh, et al. "Fairalm: Augmented lagrangian method for training fair models with little regret." European Conference on Computer Vision. Springer, Cham, 2020.
6. Wang, Angelina, and Olga Russakovsky. "Directional bias amplification." International Conference on Machine Learning. PMLR, 2021.
7. Chen, Mingliang, and Min Wu. "Towards threshold invariant fair classification." Conference on Uncertainty in Artificial Intelligence. PMLR, 2020.