In [1]:
import sys
sys.path.append('../../')

## Template - Bias Mitigation Benchmark ([Holistic AI](https://research.holisticai.com))

**Task:** Regression

**Type:** Postprocessing


This notebook is a template for the Bias Mitigation Benchmark. It can be used to mitigate bias in datasets and models. The notebook is based on the [Holistic AI open source library](https://github.com/holistic-ai/holisticai) and follows the bias mitigation benchmark outlined in [Holistic AI](https://research.holisticai.com).

### Template Structure

The template have the following steps:

1. Setup definition: 
    - select a task: `binary_classification`, `multiclass_classification`, `regression`, `clustering`, `recommender`
    - select a type: `inprocessing`, `preprocessing`, `postprocessing`
2. Mitigator class
    - create a class for you custom mitigator
3. Evaluation
    - evaluate your mitigator and compare it with other mitigators
4. Submission
    - do you have good results? Then submit your mitigator to the Bias Mitigation Benchmark


### Step 1: Setup Definition

In [2]:
from holisticai.benchmark.tasks import task_name, get_task

print(task_name)

['binary_classification', 'multiclass_classification', 'regression', 'clustering', 'recommender']


In [3]:
# load a task
task = get_task("regression")

In [4]:
# benchmark for the task by type
data = task.benchmark(type='postprocessing')
data

Dataset,Average RFS,crime
Mitigator,Unnamed: 1_level_1,Unnamed: 2_level_1
PluginEstimationAndCalibration,1.194964,1.194964
WassersteinBarycenter,1.075849,1.075849


### Step 2: Mitigator Class

In [12]:
from typing import Optional

import numpy as np

from holisticai.utils.transformers.bias import BMPostprocessing as BMPost

from holisticai.bias.mitigation.postprocessing.plugin_estimator_and_recalibration.algorithm import PluginEstimationAndCalibrationAlgorithm

class MyPostprocessingMitigator(BMPost):
    """
    Plugin Estimation and Calibration postprocessing optimizes over calibrated regressor outputs via a
    smooth optimization. The rates of convergence of the proposed estimator were derived in terms of
    the risk and fairness constraint.

    References:
        Chzhen, Evgenii, et al. "Fair regression via plug-in estimator and recalibration with statistical
        guarantees." Advances in Neural Information Processing Systems 33 (2020): 19137-19148.
    """

    def __init__(self, L: Optional[int] = 25, beta: Optional[float] = 0.1):
        """Create a Plugin Estimation and Calibration Post-processing instance."""
        self.algorithm = PluginEstimationAndCalibrationAlgorithm(L=L, beta=beta)

    def fit(self, y_pred: np.ndarray, group_a: np.ndarray, group_b: np.ndarray):
        """
        Compute a fair regression function by minimizing the squared error subject to a fairness constraint.

        Description
        ----------
        Compute a fair predictor by estimating a regression function by standard methods and then estimate
        the thresholds to solve the minimization problem.

        Parameters
        ----------
        y_pred : array-like
            Predicted vector (num_examples, ).
        group_a : array-like
            Group membership vector (binary)
        group_b : array-like
            Group membership vector (binary)
        Returns
        -------
        Self
        """
        params = self._load_data(y_pred=y_pred, group_a=group_a, group_b=group_b)

        group_a = params["group_a"] == 1
        group_b = params["group_b"] == 1
        y_pred = params["y_pred"]

        sensitive_features = np.stack([group_a, group_b], axis=1)
        self.algorithm.fit(y_pred, sensitive_features)
        return self

    def transform(
        self,
        y_pred: np.ndarray,
        group_a: np.ndarray,
        group_b: np.ndarray,
    ):
        """
        Apply transform function to predictions and likelihoods

        Description
        ----------
        Use a fitted probability to change the output label and invert the likelihood

        Parameters
        ----------
        y_pred : array-like
            Predicted vector (nb_examlpes,)
        group_a : array-like
            Group membership vector (binary)
        group_b : array-like
            Group membership vector (binary)
        threshold : float
            float value to discriminate between 0 and 1

        Returns
        -------
        dictionnary with new predictions
        """
        params = self._load_data(y_pred=y_pred, group_a=group_a, group_b=group_b)

        group_a = params["group_a"] == 1
        group_b = params["group_b"] == 1
        y_pred = params["y_pred"]
        sensitive_features = np.stack([group_a, group_b], axis=1)

        new_y_pred = self.algorithm.transform(y_pred, sensitive_features)

        return {"y_pred": new_y_pred}

    def fit_transform(
        self, y_pred: np.ndarray, group_a: np.ndarray, group_b: np.ndarray
    ):
        """
        Fit and transform

        Description
        ----------
        Fit and transform

        Parameters
        ----------
        y_pred : array-like
            Predicted vector (num_examples,).
        group_a : array-like
            Group membership vector (binary)
        group_b : array-like
            Group membership vector (binary)
        Returns
        -------
        dictionnary with new predictions
        """
        return self.fit(
            y_pred,
            group_a,
            group_b,
        ).transform(y_pred, group_a, group_b)

### Step 3: Evaluation

In [13]:
my_mitigator = MyPostprocessingMitigator()

task.run_benchmark(mitigator = my_mitigator, type = 'postprocessing')

Regression Benchmark initialized for MyPostprocessingMitigator


100%|██████████| 1/1 [00:11<00:00, 11.33s/it]


In [15]:
task.evaluate_table()

Dataset,Average RFS,crime
Mitigator,Unnamed: 1_level_1,Unnamed: 2_level_1
MyPostprocessingMitigator,1.194964,1.194964
PluginEstimationAndCalibration,1.194964,1.194964
WassersteinBarycenter,1.075849,1.075849


### Step 4: Submission

In [None]:
task.submit()

MyMitigator benchmark submitted
MyMitigator benchmark submitted
https://holistic-ai.com/benchmark/binary_classification
