# Mitigating Algorithmic Bias

This chapter demonstrates a few representative algorithms for mitigating algorithmic bias. As discussed in the Chapter {doc}`1-1-intro`, algorithmic bias can arise from (i) pre-existing bias in data, (ii) bias introduced during model training, and (iii) bias introduced when making predictions / decisions. Accordingly, to mitigate these biases, there are at least three types of approaches:

- **Pre-processing Approaches**: pre-process training data to remove existing bias, before training models;
- **In-processing Approaches**: modify how models are trained to impose fairness as a learning objective or constraint;
- **Post-processing Approaches**: post-process model outputs (e.g., predictions or predicted probabilities) to satisfy certain fairness objective.

We will use the [LSAC Bar Passage Data](https://eric.ed.gov/?id=ED469370) for illustration. This data, originally collected by {cite:t}`wightman1998lsac`, contains the bar passage outcomes and demographic information of over 20,000 individuals.

```{admonition} Data: Compas Recidivism Dataset
:class: note
- Location: "data/bar_pass_prediction.csv"
- Shape: (22407, 5)
- Note: original dataset has a few more columns. They have been removed for cleaner demonstration.
```

We will use ```pass_bar``` as the outcome variable of interest, and treat ```race``` as the sensitive feature. We will focus only on white and black races, and remove any rows with missing data.

In [1]:
import pandas as pd
bar = pd.read_csv('../data/bar_pass_prediction.csv')
bar = bar[bar['race'].isin(['white', 'black'])]
bar.dropna(inplace = True)
bar.reset_index(drop = True, inplace = True)
bar.head()

Unnamed: 0,lsat,ugpa,gender,race,pass_bar
0,44.0,3.5,female,white,1
1,29.0,3.5,female,white,1
2,36.0,3.5,male,white,1
3,39.0,3.5,male,white,1
4,48.0,3.5,male,white,1


In [2]:
# we will use lsat, upga, gender, and race as the features
X = bar[['lsat', 'ugpa', 'gender', 'race']]
Y = bar['pass_bar']
# many ML algorithms take numerical input, so let's convert the categorical variables to numerical
X = pd.get_dummies(X, columns = ['gender', 'race'], drop_first = True, dtype=int)
X.columns

Index(['lsat', 'ugpa', 'gender_male', 'race_white'], dtype='object')

In [3]:
# Let's first build a baseline classifier for demonstration
# using random forest here, please feel free to try other techniques
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
# we will use 70% of the data for training and 30% for testing
# setting random_state for reproducibility
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.3, random_state = 42)

# train the random forest classifier
# this dataset is quite imbalanced, so we will set class_weight to balanced
rf_clf = RandomForestClassifier(n_estimators = 100, random_state = 42, class_weight = 'balanced')
rf_clf.fit(X_train, y_train)

# make predictions on the testing data
y_pred = rf_clf.predict(X_test)

In [4]:
# Throughout this chapter, we will evaluate multiple models in terms of both predictive performance and fairness
# For predictive performance: we will report the accuracy, precision, recall, and F1 score
# note that we set pos_label = 0 because class 0 (not passing the bar) is the minority class in this imbalanced dataset
# For fairness: we will report demographic disparity and equalized odds disparity
# let's create a function so that we don't need to repeat the same code multiple times
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
from fairlearn import metrics
def evaluate_model(y_test, y_pred):
    accuracy = accuracy_score(y_test, y_pred)
    precision, recall, f1, _ = precision_recall_fscore_support(y_test, y_pred, pos_label = 0, average = 'binary')
    DD = metrics.demographic_parity_difference(y_test, y_pred, sensitive_features = X_test['race_white'])
    EOD = metrics.equalized_odds_difference(y_test, y_pred, sensitive_features = X_test['race_white'])
    # print all metrics
    print('Accuracy:', accuracy)
    print('Precision:', precision)
    print('Recall:', recall)
    print('F1 Score:', f1)
    print('Demographic Disparity:', DD)
    print('Equalized Odds Disparity:', EOD)

In [5]:
# evaluate model
evaluate_model(y_test, y_pred)

Accuracy: 0.844939338540801
Precision: 0.07529722589167767
Recall: 0.19655172413793104
F1 Score: 0.10888252148997135
Demographic Disparity: 0.16762272165217595
Equalized Odds Disparity: 0.21173469387755106


## Pre-Processing Approaches

### Naive Approach: Remove Sensitive Feature

The idea of pre-processing is to modify the data used for model training to remove the existing bias. Perhaps a seemingly obvious pre-processing approach is to simply drop the sensitive group attribute (```race``` in this case). After all, if the model is "blind" to race, it cannot have racial bias, right? Well, let's try it out.

In [19]:
# now build another classifier without the race column
X_norace_train = X_train.drop(columns = ['race_white'])
X_norace_test = X_test.drop(columns = ['race_white'])
# train the random forest classifier
rf_clf = RandomForestClassifier(n_estimators = 100, random_state = 42, class_weight = 'balanced')
rf_clf.fit(X_norace_train, y_train)
# make predictions on the testing data
y_pred_norace = rf_clf.predict(X_norace_test)
# evaluate the model
evaluate_model(y_test, y_pred_norace)

Accuracy: 0.8311450889147416
Precision: 0.07790697674418605
Recall: 0.23103448275862068
F1 Score: 0.11652173913043479
Demographic Disparity: 0.21030078689322695
Equalized Odds Disparity: 0.21967584303597842


We can see that, both demographic disparity and equalized odds disparity actually become greater. In general, removing sensitive feature from data has limited effectiveness. This is because other legitimate features in the data can be correlated with the sensitive feature. Indeed, as shown below, LSAT score and undergraduate GPA are both correlated with race to some degree.

In [20]:
X.corrwith(X['race_white'])

lsat           0.376971
ugpa           0.222320
gender_male    0.099201
race_white     1.000000
dtype: float64

```{warning}
One could argue that the correlations between LSAT score / undergraduate GPA with race are themselves manifestations of existing racial bias in the education systme (e.g., perhaps black students systematically received less support in schools, leading to lower LSAT scores and undergraduate GPAs). In generally, what counts as "legitimate" or "non-sensitive" features can be a point of debate. 
```

### Correlation Remover

To deal with this issue, we need to systematically remove the correlations between each non-sensitive feature and the sensitive feature. This can be done via the ```CorrelationRemover``` function in the ```fairlearn``` package. Under the hood, it removes correlations by running linear regressions of non-sensitive features on the sensitive feature and obtaining the residuals.

In [21]:
from fairlearn.preprocessing import CorrelationRemover
cr = CorrelationRemover(sensitive_feature_ids=["race_white"])
X_cr_train = cr.fit_transform(X_train)
# transformation returns a numpy array, let's convert it back to a pandas dataframe
X_cr_train = pd.DataFrame(X_cr_train, columns = ['lsat', 'ugpa', 'gender_male'])
# check correlations again - they are very close to 0 now
race_train = X_train['race_white']
race_train.reset_index(drop = True, inplace = True)
X_cr_train.corrwith(race_train)

lsat           2.069709e-14
ugpa           1.209489e-14
gender_male    5.147216e-15
dtype: float64

In [22]:
# now build another classifier with the transformed data
rf_clf = RandomForestClassifier(n_estimators = 100, random_state = 42, class_weight = 'balanced')
rf_clf.fit(X_cr_train, y_train)
# make predictions on the testing data
y_pred_cr = rf_clf.predict(X_norace_test)
# evaluate the model
evaluate_model(y_test, y_pred_cr)

Accuracy: 0.9017782948313113
Precision: 0.09433962264150944
Recall: 0.1206896551724138
F1 Score: 0.1059001512859304
Demographic Disparity: 0.061266896331535925
Equalized Odds Disparity: 0.07971938775510201


We can see that both disparities are further reduced, especially disparity in terms of equalized odds.

```{admonition} Why does a linear regression remove correlation? (optional, toggle to show)
:class: dropdown
Given a sensitive attribute $V$ and a non-sensitive attribute $X$ that is correlated with $V$, the correlation remover runs a regression of $X$ on $V$ and collect the residual $X_r$ as the transformed non-sensitive attribute that will be uncorrelated with $V$; that is $Cov(X_r, V) = 0$. This is a general procedure that is effective regardless of the distribution of $X$ or $V$.

Intuitively, it works because a linear regression of $X$ on $V$ is trying to use $V$ to account for variations in $X$, and the residual is the variation in $X$ that cannot be accounted for by $V$ (hence independent from $V$).

It can be proved more formally. See [this article](https://statproofbook.github.io/P/slr-rescorr.html) for more details if you are interested.
```

## In-Processing Approaches

Compared to data pre-processing, in-processing approaches aim to mitigate bias by modifying how the model is trained. Many modern machine learning models are trained as an _optimization problem_, i.e., by minimizing a certain loss function (computed over training data). Therefore, some natural ways to mitigate bias include (1) modifying the objective function to have fairness as a part, such as fair regularization approaches ({cite:t}`kamishima2011fairness`); and (2) add fairness as constraints in the optimization problem ({cite:t}`zafar2017fairnessconstraints,cotter2018training,komiyama2018nonconvex,celis2019classification`).

```{admonition} Example: fairness regularization
:class: tip
Consider a resume screening task where a classification model predicts whether candidate $i$ is a good fit for a position or not. Let $Y_i$ denote the ground truth, $\widehat{Y}_i$ denote the classifier's prediction. Let sets $N_F$ and $N_M$ respectively contain the indices of female and male candidates.

The regular classification task typically uses a cross-entropy loss $\sum_{i=1}^N L(Y_i, \widehat{Y}_i)$. With the additional consideration of demographic parity, we want the following difference to be small

$$
\left | \frac{1}{|N_F|}\sum_{i \in N_F} \widehat{Y}_i - \frac{1}{|N_M|}\sum_{i \in N_M} \widehat{Y}_i\right |
$$

The new fairness-aware learning task can be formulated as minimizing the following fairness regularized loss function:

$$
\sum_{i=1}^N L(Y_i, \widehat{Y}_i) + \lambda \left | \frac{1}{|N_F|}\sum_{i \in N_F} \widehat{Y}_i - \frac{1}{|N_M|}\sum_{i \in N_M} \widehat{Y}_i\right |
$$

where parameter $\lambda$ controls the relative importance of achiving demographic parity as compared to achieving greater predictive accuracy.
```

For demonstration, we will use the ```ExponentiatedGradient``` method, proposed by {cite:t}`agarwal2018reductions` and available within the ```fairlearn``` package. This method reframes a binary classification problem as a constrained optimization problem, with fairness objective(s) set as constraints.

In [16]:
from fairlearn.reductions import ExponentiatedGradient, DemographicParity, EqualizedOdds
rf_clf = RandomForestClassifier(n_estimators = 100, random_state = 42, class_weight = 'balanced')
fair_obj = DemographicParity()
EG_Demo = ExponentiatedGradient(rf_clf, constraints = fair_obj)
EG_Demo.fit(X_train, y_train, sensitive_features = X_train['race_white'])
y_pred_eg = EG_Demo.predict(X_test, random_state = 42)
evaluate_model(y_test, y_pred_eg)

You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  self.pos_basis[i]["+", e, g] = 1
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series

Accuracy: 0.815855077281037
Precision: 0.05735930735930736
Recall: 0.18275862068965518
F1 Score: 0.08731466227347612
Demographic Disparity: 0.026640772040859906
Equalized Odds Disparity: 0.0398360044995677


In [28]:
rf_clf = RandomForestClassifier(n_estimators = 100, random_state = 42, class_weight = 'balanced')
fair_obj = EqualizedOdds()
EG_EqualOdds = ExponentiatedGradient(rf_clf, constraints = fair_obj)
EG_EqualOdds.fit(X_train, y_train, sensitive_features = X_train['race_white'])
y_pred_eg = EG_EqualOdds.predict(X_test)
evaluate_model(y_test, y_pred_eg)

You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  self.pos_basis[i]["+", e, g] = 1
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series

Accuracy: 0.7915904936014625
Precision: 0.05941499085923217
Recall: 0.22413793103448276
F1 Score: 0.09393063583815028
Demographic Disparity: 0.043901797983112
Equalized Odds Disparity: 0.06218112244897955


## Post-Processing Approaches

Finally, post-processing approaches mitigate bias by changing how a model's predictions are used. For binary classification, one of the most common approaches here is to adjust the prediction threshold. For example, a threshold optimizing approach ({e.g., cite:t}`hardt2016equality`) searches for group-specific thresholds that achieves certain fairness goals.

In [25]:
from fairlearn.postprocessing import ThresholdOptimizer
# initialize the threshold optimizer
# "constraint" specifies what kind of fairness goal you want to achieve
# "objective" specifies the learning objective
rf_clf = RandomForestClassifier(n_estimators = 100, random_state = 42, class_weight = 'balanced')
TO_Demo = ThresholdOptimizer(estimator = rf_clf, constraints = 'demographic_parity', objective = "accuracy_score", predict_method = 'auto')
TO_Demo.fit(X_train, y_train, sensitive_features = X_train['race_white'])
y_pred_to = TO_Demo.predict(X_test, sensitive_features = X_test['race_white'], random_state = 42)
evaluate_model(y_test, y_pred_to)

Accuracy: 0.9511384410835965
Precision: 0.16666666666666666
Recall: 0.0034482758620689655
F1 Score: 0.006756756756756757
Demographic Disparity: 0.001068947087119243
Equalized Odds Disparity: 0.00520833333333337


In [26]:
rf_clf = RandomForestClassifier(n_estimators = 100, random_state = 42, class_weight = 'balanced')
TO_EqualOdds = ThresholdOptimizer(estimator = rf_clf, constraints = 'equalized_odds', objective = "accuracy_score", predict_method = 'auto')
TO_EqualOdds.fit(X_train, y_train, sensitive_features = X_train['race_white'])
y_pred_to = TO_EqualOdds.predict(X_test, sensitive_features = X_test['race_white'], random_state = 42)
evaluate_model(y_test, y_pred_to)

Accuracy: 0.9511384410835965
Precision: 0.16666666666666666
Recall: 0.0034482758620689655
F1 Score: 0.006756756756756757
Demographic Disparity: 0.001068947087119243
Equalized Odds Disparity: 0.00520833333333337


We see that in this case, the threshold optimizer clearly reduces demographic disparity and disparity in equalized odds. Coincidentally, the results are identical under two constraints. However, this is not always the case (e.g., try replacing the random forest classifier with a gradient boosting classifier).

## Fairness-Performance Tradeoff

Let's start by tabulating the performance, both predictive and fairness-related, of all the above mitigation approaches. The following table summarizes the (1) accuracy score (as an overall predictive performance measure), (2) F-1 of minority class (as a class-specific performance measure), and (3) demographic disparity and equalized odds disparity (as fairness measures). "Baseline" refers to classifier's performance without considering fairness at all; "DD" denotes demographic disparity, and "EOD" denotes equalized odds disparity.

|                              | Accuracy | F-1   | DD    | EOD   |
| ---------------------------- | -------- | ----- | ----- | ----- |
| Baseline                     | 0.845    | 0.109 | 0.168 | 0.212 |
| Drop Race                    | 0.831    | 0.117 | 0.210 | 0.220 |
| Correlation Remover          | 0.902    | 0.121 | 0.061 | 0.080 |
| Exponentiated Gradient (DD)  | 0.816    | 0.087 | 0.027 | 0.040 |
| Exponentiated Gradient (EOD) | 0.792    | 0.094 | 0.044 | 0.062 |
| Threshold Optimizer (DD)     | 0.951    | 0.007 | 0.001 | 0.005 |
| Threshold Optimizer (EOD)    | 0.951    | 0.007 | 0.001 | 0.005 |

Several observations are worth noting:
- There is a tradeoff between predictive performance and fairness performance. In-processing approaches produce more fair predictions than pre-processing approach (correlation remover to be specific) at the cost of having lower accuracy and F-1 scores.
- Post-processing approach (i.e., threshold optimizer) results in extremely small fairness disparities, but the F-1 scores are also very low and the accuracy is very high. Considering the imbalanced nature of the dataset, this is not good news. It indicates that the post-processed predictions are almost always class 1 (e.g., if you predict everyone passes the bar, of course there is not demographic disparity). Therefore, just because disparity metrics have lower values do not automatically mean the classifier is more "useful".
- So which model is the best? Well, this depends on user's tolerance for predictive performance and fairness. If predictive performance takes priority, then correlation remover model may be the best because it produces highest F-1 score while also clearly reduces disparities compared to the baseline. However, if fairness takes priority, then exponentiated gradient models may be prefered. 

```{warning}
Is the fairness-performance tradeoff always a problematic phenomenon? If achiving certain notion of fairness necessarily sacrifices predictive accuracy, is that always a bad thing? Answer to this question can be quite nuanced and depend on the context.

For sake of illustration, imagine an "extreme case" of direct discrimination, where an employer only hires male candidates and rejects all female candidates. Any fairness-aware classification model that predicts a positive hiring decision for a female candidate would have made a "mistake" from the accuracy perspective. However, these "mistakes" often represent a corrective force against historical bias and injustice (direct discrimination in this example). This is a type of **corrective justice**.

When thinking about the potential tradeoff between fairness and predictive performance, it is important to reflect on how predictive performance is being measured. If predictive performance is measured based on (biased) historical labels, then high performance can be a signal of perpetuating existing biases.
```