# Mitigating Algorithmic Bias

This chapter demonstrates a few representative algorithms for mitigating algorithmic bias. As discussed in the Chapter {doc}`1-1-intro.ipynb`, algorithmic bias can arise from (i) pre-existing bias in data, (ii) bias introduced during model training, and (iii) bias introduced when making predictions / decisions. Accordingly, to mitigate these biases, there are at least three types of approaches:

- **Pre-processing Approaches**: pre-process training data to remove existing bias, before training models;
- **In-processing Approaches**: modify how models are trained to impose fairness as a learning objective or constraint;
- **Post-processing Approaches**: post-process model outputs (e.g., predictions or predicted probabilities) to satisfy certain fairness objective.

We again use the [Compas Recidivism Dataset](https://www.propublica.org/datastore/dataset/compas-recidivism-risk-score-data-and-analysis) for demonstration. 

```{admonition} Data: Compas Recidivism Dataset
:class: note
- Location: "data/compas-scores-two-years.csv"
- Shape: (7214, 53)
- Source: ProPublica
```

Different from Chapter {doc}`1-2-measure.ipynb`, we will not use the predicted risk scores from COMPAS (including the ```decile_score``` and ```score_text``` columns). Instead, we will only rely on the ```two_year_recid``` column as ground-truths and build classifiers ourselves.

In [69]:
# import and clean data
import pandas as pd
compas = pd.read_csv('../data/compas-scores-two-years.csv')
compas = compas[(compas['days_b_screening_arrest'] <= 30) & 
                (compas['days_b_screening_arrest'] >= -30) &  
                (compas['is_recid'] != -1) &
                (compas['c_charge_degree'] != 'O') & 
                (compas['score_text'] != 'N/A')]
# again focus only on African-American vs. Caucasian as the protected group of interest
compas = compas[compas['race'].isin(['African-American', 'Caucasian'])]
compas.reset_index(drop = True, inplace = True)
compas.shape

(5278, 53)

In [71]:
# we will use two_year_recid as the label, and sex, age_cat, priors_count, c_charge_degree (F for felony, M for misdemeanor) as the features
X = compas[['age_cat', 'c_charge_degree', 'race', 'sex', 'priors_count']]
Y = compas['two_year_recid']
# many ML algorithms take numerical input, so let's convert the categorical variables to numerical
X = pd.get_dummies(X, columns = ['age_cat', 'c_charge_degree', 'race', 'sex'], drop_first = True, dtype=int)
X.columns

Index(['priors_count', 'age_cat_Greater than 45', 'age_cat_Less than 25',
       'c_charge_degree_M', 'race_Caucasian', 'sex_Male'],
      dtype='object')

In [116]:
# Throughout this chapter, we will evaluate multiple models in terms of both predictive performance and fairness
# For predictive performance: we will report the accuracy, precision, recall, and F1 score
# For fairness: we will evaluate conditional demographic disparity and equalized odds
# let's create a function so that we don't need to repeat the same code multiple times
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
import statsmodels.formula.api as smf
def evaluate_model(X_test, y_test, y_pred):
    df = X_test.join(pd.DataFrame({'y_true': y_test, 'y_pred': y_pred}))
    accuracy = accuracy_score(df['y_true'], df['y_pred'])
    precision, recall, f1, _ = precision_recall_fscore_support(df['y_true'], df['y_pred'], pos_label = 1, average = 'binary')
    # rename columns of df for regression formula
    df.rename(columns = {'priors_count': 'priors_count', 'age_cat_Greater than 45': 'age45', 'age_cat_Less than 25': 'age25', 'c_charge_degree_M': 'degree', 'race_Caucasian': 'race', 'sex_Male': 'sex'}, inplace = True)
    model_demo = smf.logit(formula = "y_pred ~ priors_count + age45 + age25 + degree + race + sex", data = df).fit(disp = 0)
    pval_demo = model_demo.pvalues['race']
    coef_demo = model_demo.params['race']
    model_equalodds = smf.logit(formula = "y_pred ~ priors_count + age45 + age25 + degree + race + sex + y_true", data = df).fit(disp = 0)
    pval_equalodds = model_equalodds.pvalues['race']
    coef_equalodds = model_equalodds.params['race']
    # print all metrics
    # For disparity metrics, we will report African-American (AA) minus Caucasian (W) values
    print('Accuracy:', accuracy)
    print('Precision:', precision)
    print('Recall:', recall)
    print('F1 Score:', f1)
    print('Conditional Demographic Disparity:', coef_demo, 'with p-value:', pval_demo)
    print('Conditional Equalized Odds:', coef_equalodds, 'with p-value:', pval_equalodds)

In [117]:
# Let's first build a baseline classifier for demonstration
# using random forest here, please feel free to try other techniques
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
# we will use 70% of the data for training and 30% for testing
# setting random_state for reproducibility
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.3, random_state = 42)

# train the random forest classifier
rf_clf = RandomForestClassifier(n_estimators = 100, random_state = 42)
rf_clf.fit(X_train, y_train)

# make predictions on the testing data
y_pred = rf_clf.predict(X_test)

# evaluate the model
evaluate_model(X_test, y_test, y_pred)

Accuracy: 0.6521464646464646
Precision: 0.671003717472119
Recall: 0.49115646258503404
F1 Score: 0.5671641791044776
Conditional Demographic Disparity: -0.6519755776198811 with p-value: 5.975548237556959e-05
Conditional Equalized Odds: -0.6477458316017153 with p-value: 7.15534907457676e-05


## Pre-Processing Approaches

### Naive Approach: Remove Sensitive Feature

The idea of pre-processing is to modify the data used for model training to remove the existing bias. Perhaps a seemingly obvious pre-processing approach is to simply drop the sensitive group attribute (```race``` in this case). After all, if the model is "blind" to race, it cannot have racial bias, right? Well, let's try it out.

In [118]:
# now build another classifier without the race column
X_norace_train = X_train.drop(columns = ['race_Caucasian'])
X_norace_test = X_test.drop(columns = ['race_Caucasian'])
# train the random forest classifier
rf_clf = RandomForestClassifier(n_estimators = 100, random_state = 42)
rf_clf.fit(X_norace_train, y_train)
# make predictions on the testing data
y_pred_norace = rf_clf.predict(X_norace_test)
# evaluate the model
# note that we still use X_test here because we need the race column for evaluation
evaluate_model(X_test, y_test, y_pred_norace)

Accuracy: 0.6515151515151515
Precision: 0.65587734241908
Recall: 0.5238095238095238
F1 Score: 0.5824508320726173
Conditional Demographic Disparity: 0.10215799646963158 with p-value: 0.5216770297491757
Conditional Equalized Odds: 0.10990842987855276 with p-value: 0.49247397346427646


We can see that, while removing racial information reduces unfairness to some extent, disparities in terms of outcome and false negative rates remain nontrivial. In general, removing sensitive feature from data has very limited effectiveness. This is because other legitimate features in the data can be correlated with the sensitive feature. Indeed, as shown below, number of prior offenses and age are both correlated with race to some degree.

In [76]:
X.corrwith(X['race_Caucasian'])

priors_count              -0.195713
age_cat_Greater than 45    0.182516
age_cat_Less than 25      -0.106301
c_charge_degree_M          0.102885
race_Caucasian             1.000000
sex_Male                  -0.069502
dtype: float64

### Correlation Remover

To deal with this issue, we need to systematically remove the correlations between each non-sensitive feature and the sensitive feature. This can be done via the ```CorrelationRemover``` function in the ```fairlearn``` package. Under the hood, it removes correlations by running linear regressions of non-sensitive features on the sensitive feature and obtaining the residuals.

In [119]:
from fairlearn.preprocessing import CorrelationRemover
cr = CorrelationRemover(sensitive_feature_ids=["race_Caucasian"])
X_cr = cr.fit_transform(X, Y)
# transformation returns a numpy array, let's convert it back to a pandas dataframe
X_cr = pd.DataFrame(X_cr, columns = ['priors_count', 'age_cat_Greater than 45', 'age_cat_Less than 25', 'c_charge_degree_M', 'sex_Male']).reset_index(drop = True)
# check correlations again - they are very close to 0 now
X_cr.corrwith(X['race_Caucasian'])

priors_count               1.835814e-15
age_cat_Greater than 45   -1.833611e-15
age_cat_Less than 25       9.469844e-16
c_charge_degree_M         -1.044949e-15
sex_Male                   9.270897e-16
dtype: float64

In [120]:
# now build another classifier with the transformed data
X_cr_train, X_cr_test, y_train, y_test = train_test_split(X_cr, Y, test_size = 0.3, random_state = 42)
# train the random forest classifier
rf_clf = RandomForestClassifier(n_estimators = 100, random_state = 42)
rf_clf.fit(X_cr_train, y_train)
# make predictions on the testing data
y_pred_cr = rf_clf.predict(X_cr_test)
# evaluate the model
# note that we still use X_test here because we need the race column for evaluation
X_cr_test['race_Caucasian'] = X_test['race_Caucasian']
evaluate_model(X_cr_test, y_test, y_pred_cr)

Accuracy: 0.6540404040404041
Precision: 0.671559633027523
Recall: 0.49795918367346936
F1 Score: 0.571875
Conditional Demographic Disparity: -2.3788645096335537 with p-value: 1.1074149476639748e-34
Conditional Equalized Odds: -2.3145107088101136 with p-value: 1.0568928986026908e-32


In [56]:
X_train

Unnamed: 0,priors_count,age_cat_Greater than 45,age_cat_Less than 25,c_charge_degree_M,race_Caucasian,sex_Male
1054,2,1,0,0,0,1
731,4,0,0,0,0,0
5827,3,1,0,0,1,0
44,0,0,0,0,0,1
4475,2,0,0,0,1,1
...,...,...,...,...,...,...
4231,3,0,1,0,0,1
5145,1,1,0,0,1,1
7102,3,0,0,0,0,0
7150,11,0,0,0,0,1
