[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://github.com/Trusted-AI/AIF360/blob/master/examples/sklearn/demo_grid_search_reduction_classification_sklearn.ipynb)


# Sklearn compatible Grid Search for classification

Grid search is an in-processing technique that can be used for fair classification or fair regression. For classification it reduces fair classification to a sequence of cost-sensitive classification problems, returning the deterministic classifier with the lowest empirical error subject to fair classification constraints among
the candidates searched. The code for grid search wraps the source class `fairlearn.reductions.GridSearch` available in the https://github.com/fairlearn/fairlearn library, licensed under the MIT Licencse, Copyright Microsoft Corporation.

In [1]:
#Install aif360
#Install Reductions from Fairlearn
!pip install aif360[Reductions]

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [2]:
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

In [3]:
import numpy as np
import pandas as pd

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

from aif360.sklearn.inprocessing import GridSearchReduction

from aif360.sklearn.datasets import fetch_adult
from aif360.sklearn.metrics import average_odds_error

### Loading data

Datasets are formatted as separate `X` (# samples x # features) and `y` (# samples x # labels) DataFrames. The index of each DataFrame contains protected attribute values per sample. Datasets may also load a `sample_weight` object to be used with certain algorithms/metrics. All of this makes it so that aif360 is compatible with scikit-learn objects.

For example, we can easily load the Adult dataset from UCI with the following line:

In [4]:
X, y, sample_weight = fetch_adult()
X.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,age,workclass,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country
race,sex,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
Non-white,Male,25.0,Private,11th,7.0,Never-married,Machine-op-inspct,Own-child,Black,Male,0.0,0.0,40.0,United-States
White,Male,38.0,Private,HS-grad,9.0,Married-civ-spouse,Farming-fishing,Husband,White,Male,0.0,0.0,50.0,United-States
White,Male,28.0,Local-gov,Assoc-acdm,12.0,Married-civ-spouse,Protective-serv,Husband,White,Male,0.0,0.0,40.0,United-States
Non-white,Male,44.0,Private,Some-college,10.0,Married-civ-spouse,Machine-op-inspct,Husband,Black,Male,7688.0,0.0,40.0,United-States
White,Male,34.0,Private,10th,6.0,Never-married,Other-service,Not-in-family,White,Male,0.0,0.0,30.0,United-States


We can then map the protected attributes to integers,

In [5]:
X.index = pd.MultiIndex.from_arrays(X.index.codes, names=X.index.names)
y.index = pd.MultiIndex.from_arrays(y.index.codes, names=y.index.names)

and the target classes to 0/1,

In [6]:
y = pd.Series(y.factorize(sort=True)[0], index=y.index)

split the dataset,

In [7]:
(X_train, X_test,
 y_train, y_test) = train_test_split(X, y, train_size=0.7, random_state=1234567)

We use Pandas for one-hot encoding for easy reference to columns associated with protected attributes, information necessary for grid search reduction.

In [8]:
X_train, X_test = pd.get_dummies(X_train), pd.get_dummies(X_test)
X_train = X_train.drop(columns=['sex_Female'])
X_test = X_test.drop(columns=['sex_Female'])
X_train.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,age,education-num,capital-gain,capital-loss,hours-per-week,workclass_Private,workclass_Self-emp-not-inc,workclass_Self-emp-inc,workclass_Federal-gov,workclass_Local-gov,...,native-country_Guatemala,native-country_Nicaragua,native-country_Scotland,native-country_Thailand,native-country_Yugoslavia,native-country_El-Salvador,native-country_Trinadad&Tobago,native-country_Peru,native-country_Hong,native-country_Holand-Netherlands
race,sex,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
1,1,58.0,11.0,0.0,0.0,42.0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,51.0,12.0,0.0,0.0,30.0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1,26.0,14.0,0.0,1887.0,40.0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1,44.0,3.0,0.0,0.0,40.0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1,33.0,6.0,0.0,0.0,40.0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


The protected attribute information is also replicated in the labels:

In [9]:
y_train.head()

race  sex
1     1      0
      0      1
      1      1
      1      0
      1      0
dtype: int64

### Running metrics

With the data in this format, we can easily train a scikit-learn model and get predictions for the test data:

In [10]:
y_pred = LogisticRegression(solver='liblinear', random_state=1234).fit(X_train, y_train).predict(X_test)
lr_acc = accuracy_score(y_test, y_pred)
print(lr_acc)

0.8453600648632712


We can assess how close the predictions are to equality of odds.

`average_odds_error()` computes the (unweighted) average of the absolute values of the true positive rate (TPR) difference and false positive rate (FPR) difference, i.e.:

$$ \tfrac{1}{2}\left(|FPR_{D = \text{unprivileged}} - FPR_{D = \text{privileged}}| + |TPR_{D = \text{unprivileged}} - TPR_{D = \text{privileged}}|\right) $$

In [11]:
lr_aoe = average_odds_error(y_test, y_pred, prot_attr='sex')
print(lr_aoe)

0.09356509680536546


### Grid Search

Choose a base model for the candidate classifiers. Base models should implement a fit method that can take a sample weight as input. For details refer to the docs. 

In [12]:
estimator = LogisticRegression(solver='liblinear', random_state=1234)

Determine the columns associated with the protected attribute(s). Grid search can handle more than one attribute but it is computationally expensive. A similar method with less computational overhead is exponentiated gradient reduction, detailed at [examples/sklearn/demo_exponentiated_gradient_reduction_sklearn.ipynb](sklearn/demo_exponentiated_gradient_reduction_sklearn.ipynb).

In [13]:
prot_attr = 'sex_Male'

Search for the best classifier and observe test accuracy. Other options for `constraints` include "DemographicParity", "TruePositiveRateParity", "FalsePositiveRateParity", and "ErrorRateParity".

In [14]:
np.random.seed(0) #need for reproducibility
grid_search_red = GridSearchReduction(prot_attr=prot_attr, 
                                      estimator=estimator, 
                                      constraints="EqualizedOdds",
                                      grid_size=20,
                                      drop_prot_attr=False)
grid_search_red.fit(X_train, y_train)
gs_acc = grid_search_red.score(X_test, y_test)
print(gs_acc)

#Check if accuracy is comparable
assert abs(lr_acc-gs_acc)<0.03

0.8458760227021449


In [15]:
gs_aoe = average_odds_error(y_test, grid_search_red.predict(X_test), prot_attr='sex')
print(gs_aoe)

#Check if average odds error improved
assert gs_aoe<lr_aoe

0.05787745779072595


Instead of passing in a string value for `constraints`, we can also pass a `fairlearn.reductions.moment` object. You could use a predefined moment as we do below or create a custom moment using the fairlearn library.

In [16]:
import fairlearn.reductions as red


np.random.seed(0) #need for reproducibility
grid_search_red = GridSearchReduction(prot_attr=prot_attr, 
                                      estimator=estimator, 
                                      constraints=red.EqualizedOdds(),
                                      grid_size=20,
                                      drop_prot_attr=False)
grid_search_red.fit(X_train, y_train)
grid_search_red.score(X_test, y_test)

0.8458760227021449

In [17]:
average_odds_error(y_test, grid_search_red.predict(X_test), prot_attr='sex')

0.05787745779072595