# ComputeFest 2019
# Model Agnostic Methods for Interpretability and Fairness

## Fairness Section

In [1]:
# Imports
import pandas as pd
from IPython.display import Markdown, display
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# AIF360 imports
from aif360.algorithms.preprocessing import OptimPreproc
from aif360.datasets import AdultDataset
from aif360.algorithms.preprocessing.optim_preproc import OptimPreproc
from aif360.algorithms.preprocessing.optim_preproc_helpers.data_preproc_functions import load_preproc_data_adult
from aif360.algorithms.preprocessing.optim_preproc_helpers.distortion_functions import get_distortion_adult
from aif360.algorithms.preprocessing.optim_preproc_helpers.opt_tools import OptTools

### Census Income dataset

The previous dataset was thorough and complex enough to demonstrate interpretability techniques, but as it is an anonymized dataset, it has little to no information on sensitive features. We will switch to another dataset for this part that is more suited to analyzing fairness techniques, as it possesses information on gender and race. 

This dataset is called the **Census Income dataset**, and it associates features of working adults to **whether or not they make more than $50k/yr**. It is extracted from the 1994 Census database, and contains **48842 observations** with a mix of continuous and categorical features (14 in total).  

List of features:
- **age:** continuous. 
- **workclass:** categorical. 
- **education:** categorical. 
- **education-num:** continuous. 
- **marital-status:** categorical. 
- **relationship:** categorical. 
- **race:** categorical. 
- **sex:** categorical. 
- **capital-gain:** continuous. 
- **capital-loss:** continuous. 
- **hours-per-week:** continuous. 
- **fnlwgt:** (final weight) continuous. 
- **native-country:** categorical.

Response: binary, corresponding to >50K (1) or <=50K (0). 


#### Reference:
Ron Kohavi, "Scaling Up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid", Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 1996


### How do we import it?

To import this dataset in an easy way, we will use a very convenient module (that will be presented in detail later this afternoon!) called AIF360, created by IBM. It centralizes multiple fairness metrics and tools for training fair models, as well as easy ways to import useful datasets. We will be using it to import our dataset and ultimately create a fair model, but we will implement our own fairness metrics.

In [7]:
# Load Census Income Dataset from AIF360
privileged_groups = [{'sex': 1}]
unprivileged_groups = [{'sex': 0}]
dataset_orig = load_preproc_data_adult(['sex'])
optim_options = {
    "distortion_fun": get_distortion_adult,
    "epsilon": 0.05,
    "clist": [0.99, 1.99, 2.99],
    "dlist": [.1, 0.05, 0]
}

Let's now print some characteristics of our dataset to see what we just loaded.

In [14]:
print('Training Dataset shape:',dataset_orig_train.features.shape,'\n')
# print('Favorable and unfavorable labels:',dataset_orig_train.favorable_label, dataset_orig_train.unfavorable_label)
print('Protected attribute names:',dataset_orig_train.protected_attribute_names,'\n')
print('Privileged and unprivileged protected attribute values:' ,dataset_orig_train.privileged_protected_attributes, 
      dataset_orig_train.unprivileged_protected_attributes,'\n')
print('Dataset feature names:',dataset_orig_train.feature_names,'\n')

Training Dataset shape: (34189, 18) 

Protected attribute names: ['sex'] 

Privileged and unprivileged protected attribute values: [array([1.])] [array([0.])] 

Dataset feature names: ['race', 'sex', 'Age (decade)=10', 'Age (decade)=20', 'Age (decade)=30', 'Age (decade)=40', 'Age (decade)=50', 'Age (decade)=60', 'Age (decade)=>=70', 'Education Years=6', 'Education Years=7', 'Education Years=8', 'Education Years=9', 'Education Years=10', 'Education Years=11', 'Education Years=12', 'Education Years=<6', 'Education Years=>12'] 



As we can see, we didn't import the exact dataset described above, we actually imported a slightly modified version with only binary features that are easier to understand and work with.

We now split the data into train and test:

In [15]:
# Split into train and test
dataset_orig_train, dataset_orig_test = dataset_orig.split([0.7], shuffle=True)

And we then extract numpy arrays from the dataset objects given by AIF360, standardizing the features in the process:

In [17]:
# Get numpy arrays for x_train, x_test, y_train, y_test by extracting data from AIF360 dataset object

# We define a scaler to normalize our data
scale_orig = StandardScaler()

# We get our training numpy arrays
x_train = scale_orig.fit_transform(dataset_orig_train.features) #This fit_transform from the scaler substracts mean and divides by std for each feature.
y_train = dataset_orig_train.labels.ravel()

# And our testing arrays
x_test = scale_orig.transform(dataset_orig_test.features) # Here, we only transform, as we can't use the testing set to define the scaling factors.
y_test = dataset_orig_test.labels.ravel()

### Training our classifier (Random Forest)

It is now time to train the classifier that we are going to Audit. We chose a RandomForest here for it's ease of manipulation, but any model with an sklearn interface can be used.

In [18]:
# Train classifier on original data
rf_model = RandomForestClassifier(n_estimators=25, 
                               max_depth=None,
                               random_state=42).fit(x_train, y_train)

rf_model.fit(x_train, y_train)

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=25, n_jobs=None,
            oob_score=False, random_state=42, verbose=0, warm_start=False)

### Getting initial accuracy

In [19]:
# Get predictions

preds_test = rf_model.predict(x_test)

acc_train = rf_model.score(x_train, y_train)
acc_test = rf_model.score(x_test, y_test)

print('Training accuracy of our classifier:',acc_train)
print('Testing accuracy of our classifier:',acc_test)

Training accuracy of our classifier: 0.8052005030857878
Testing accuracy of our classifier: 0.8024295366136628


The accuracy on test should be around 80%. As we will see later, this value will decrease when we try to make our model fair.

## Statistical Parity

We will first test our model's predictions with statistical parity, a simple fairness measure that is easy to compute.

### What is statistical parity?

This metric measures the difference between the probability of positive decisions for the protected group and the probability of positive decisions for the unprotected group. Mathematically:
$$Sp = P(d=1|G=0) - P(d=1|G=1)$$

This can be easily approximated with our data by calculating the proportion of positive decisions amongst people from gender "0" and substracting the proportion of positive decisions amongst people from gender "1":

$$Sp = \frac{ \text{# people with positive decision and gender 0}} { \text{ # people from gender 0} } - \frac{ \text{# people with positive decision and gender 1}} { \text{ # people from gender 1}}$$

Let's code a simple function that will calculate this for our dataset. In the next cell, complete the function `evaluate_statistical_parity` to perform the calculation above. The function definition and docstring will guide you.

In [24]:
# Statistical parity function
def evaluate_statistical_parity(predictions, protected_class_array):
    """Function to calculate statistical parity.

     Parameters
    ----------
    predictions (numpy array): binary decision labels outputted by our trained model.
    protected_class_array (numpy array): boolean mask where protected rows are marked True.

    Returns
    -------
    bias (float): statistical parity bias 
    (difference between proportion of positive decisions of protected class and unprotected class)
    """

    # --------------
    # --------------
    # Your code here        
    # --------------
    # --------------

    prop_protected = np.sum(predictions & protected_class_array) / np.sum(protected_class_array)
    prop_not_protected = np.sum(predictions & ~protected_class_array) / np.sum(~protected_class_array)
    bias = np.abs(prop_protected - prop_not_protected)

    return bias

### Testing statistical parity

We can now test this initial statistical measure and observe if our dataset is fair with respect to GENDER. 

In [25]:
# Observing statistical parity on test set

GENDER_COLUMN = 1

predictions = rf_model.predict(x_test)>0.5
protected_class_array = dataset_orig_test.features[:,GENDER_COLUMN]==0 # Here, we're taking the column corresponding to 'sex' and we are transforming it into a boolean array

statistical_parity_orig = evaluate_statistical_parity(predictions, protected_class_array)

print('Statistical Parity on Test set:', statistical_parity_orig)

Statistical Parity on Test set: 0.22109782275375653


We should get a value around 0.2 . Statistical parity shows perfect fairness when the indicator is 0. In this case, we do have a certain amount of unfairness: the proportion of positive classifications is not the same on both groups. 

This is interesting, but we're observing the protected group as a whole. What happens if we zoom in on a part of the dataset fulfilling a certain condition? Are we more or less fair?

# Conditional Parity

Statistical parity is a simple measure, and it gives a fast overview on our model's fairness. However, it disregards important aspects of our dataset, such as the values of the features of each row. Consider the loan application again. We could have a situation where the statistical parity measure tells us that we are giving loans to 20% of people from gender 0 and 20% of people from gender 1, which would be fair, but those 20% from gender 0 are chosen at random, while the 20% from gender 1 are all from developed countries. Our model would be hiding another layer of unfairness: we are not giving loans equally to people from gender 1.

We can use conditional parity to detect these types of imbalances. Conditional parity allows us to test for unfairness in a similar way as Statistical Parity, but conditioning on another feature (for example, country of origin). The equation is:

$$Cp = P(d=1|G=0, C=c) - P(d=1|G=1, C=c)$$

Again, this can be easily calculated by counting the number of positive outcome cases in both protected groups, but this time only looking at the people that fulfill our conditional constraint (C=c). In simpler terms, the equation is:

$$Cp = \frac{ \text{# people with positive decision and gender 0 and condition c}} { \text{ # people from gender 0 and condition c} } - \frac{ \text{# people with positive decision and gender 1 and condition c}} { \text{ # people from gender 1 and condition c}}$$

We want to code this in the following function, `evaluate_conditional_parity`.

In [42]:
# Conditional parity function
def evaluate_conditional_parity(predictions, protected_class_array, condition_array):
        """Function to calculate Conditional statistical parity.
        
         Parameters
        ----------
        predictions (numpy array): binary (decision) labels for X
        protected_class_array (numpy array): boolean array where protected rows are marked True
        condition_array (numpy array): boolean array that indicates conditional status
        
        Returns
        -------
        bias (float): conditional parity bias
        """
        
        # --------------
        # --------------
        # Your code here
        # --------------
        # --------------
        
        
        prop_protected = np.sum(predictions & condition_array & protected_class_array) / np.sum(condition_array & protected_class_array)
        prop_not_protected = np.sum(predictions & condition_array & ~protected_class_array) / np.sum(condition_array & ~protected_class_array)
        bias = np.abs(prop_protected - prop_not_protected)
        return bias

### Testing Conditional Parity
We want to see what happens when we look at the subgroup corresponding to race=1. Are we being fair, in terms of statistical parity, in that subgroup? Our conditional vector is very simple in this case: it just corresponds to the `race` column in our dataset. Remember, however, that we're still looking at fairness in terms of gender; we're just observing a racial subgroup.

In [51]:
# Observing statistical parity on test set

GENDER_COLUMN = 1
AGE_40_COLUMN = 5

predictions = rf_model.predict(x_test)>0.5
protected_class_array = dataset_orig_test.features[:,GENDER_COLUMN]==0 
condition_array = dataset_orig_test.features[:,AGE_40_COLUMN]==1
conditional_parity_orig = evaluate_conditional_parity(predictions, protected_class_array, condition_array)

print('Conditional Parity on Test set, conditioned on people of race=1:', conditional_parity_orig)

Conditional Parity on Test set, conditioned on people of race=1: 0.36339044183949504


We should get a conditional parity value of about 0.36. As we can see, we are actually being more unfair in the distribution of positive outcome for genders 0 and 1 in the subgroup corresponding to "age decade 40". This insight can be repeated with other subgroups of interest to assess the fairness of our algorithm in different cases.

Up until now, we haven't looked at the validity of our predictions. What happens if we're interested in balancing the amount of *errors* that we make?

The next metric will help us on that front.

# False Positive (Negative) Error Rate Balance

The previous measures don't take into account the real labels of each observation; they only consider the predictions. The measure of fairness proposed here controls for equal poportions of false positives/false negatives in protected and unprotected classes. This measure is ideal in cases where committing mistakes disproportionately for different protected groups can bring negative outcomes.

We will again code these measures as they are rather easy to understand. The function definition below will guide you through the process.

In [52]:
# False positive and false negative rates
def evaluate_false_negative_rate(predictions, protected, y):
    """evaluate fnr

    Parameters
    ----------
    predictions (numpy array): binary (decision) labels for X predicted by our model
    protected (numpy array): boolean mask where protected rows are marked True or 1
    y (numpy array): boolean array that marks ground truth

    Note:
        FNR: FN / CP where FN=(predictions==0) & (y==1) CN = (y==1)

    Returns
    -------
    bias (float)
    """

    # --------------
    # --------------
    # Your code here
    # --------------
    # --------------
    
    cond_pos_protected = np.sum((y==1) & protected)
    cond_pos_not_protected = np.sum((y==1) & ~protected)
    
    if cond_pos_protected == 0:
        return 'No Condition Positive in Protected'
    if cond_pos_not_protected == 0:
        return 'No Condition Positive in Not Protected'

    false_neg_protected = np.sum((y==1) & (predictions==0) & protected)
    false_neg_not_protected = np.sum((y==1) & (predictions==0) & ~protected)

    fnr_g = false_neg_protected / cond_pos_protected
    fnr_not_g = false_neg_not_protected / cond_pos_not_protected
    bias = np.abs(fnr_g - fnr_not_g)
    
    return bias


def evaluate_false_positive_rate(predictions, protected, y):
    """evaluate fpr

    Parameters
    ----------
    predictions (numpy array): binary (decision) labels for X predicted by our model
    protected (numpy array): boolean mask where protected rows are marked True or 1
    y (numpy array): boolean array that marks ground truth

    Note:
        FPR: FP / CN where FP=(predictions==1) & (y==0) CN = (y==0)

    Returns
    -------
    bias (float)
    """

    # --------------
    # --------------
    # Your code here
    # --------------
    # --------------

    cond_neg_protected = np.sum((y==0) & protected)
    cond_neg_not_protected = np.sum((y==0) & ~protected)
    
    if cond_neg_protected == 0:
        return 'No Condition Negative in Protected'
    if cond_neg_not_protected == 0:
        return 'No Condition Negative in Not Protected'

    false_pos_protected = np.sum((y==0) & predictions & protected)
    false_pos_not_protected = np.sum((y==0) & predictions & ~protected)

    fpr_g = false_pos_protected / cond_neg_protected
    fpr_not_g = false_pos_not_protected / cond_neg_not_protected
    bias = np.abs(fpr_g - fpr_not_g)
    return bias


### Testing FPR and FNR

Again, we want these measures to be as close to zero as possible.

In [54]:
# Test FPR and FNR
fpr = evaluate_false_positive_rate(predictions, protected_class_array, y_test)
fnr = evaluate_false_negative_rate(predictions, protected_class_array, y_test)

print('False positive rate: ',fpr)
print('False negative rate:',fnr)

False positive rate:  0.11588675370397536
False negative rate: 0.46291301416048547


As we can see, the values of FPR and FNR are not as close to zero as we would want, showing at least a 10% disparity between the errors on protected classes vs the rest of the population.

## Other Fairness metrics 

We have coded and tested some basic Fairness metrics, but there are multiple other metrics that can be used, depending on the situation. Some of them are:

**Predictive parity:**
The fraction of correct positive predictions should be the same for protected and unprotected groups.
$$P(Y=1|d=1, G=m) = P(Y=1|d=1, G=f)$$


**Equalized odds:**
Applicants with a good actual credit scope and applicants with a bad actual credit
score should have a similar classification, regardless of the value of the protected class.
$$P(d=1|Y=i, G=m) = P(d=1|Y=i, G=f), i\in \{0,1\}$$


**Overall accuracy equality:**
Both protected and unprotected groups have equal prediction accuracy.
$$P(d=Y, G=m) = P(d=Y, G=f)$$


**Treatment Equality:**
Looks at ratio of errors a classifier makes instead of its accuracy. Satisfied if both protected and unprotected groups have equal ratio of false negatives and false positives.


We will not go through all of them in code to save time. We will now try to generate a fair model.

# Creating a Fair Model

Once we have characterized and measured the fairness of the model, we might want to build a model that avoids discrimination given a protected class. As there are multiple ways to define fairness, there are also multiple ways to build a fair classifier, depending on what notion we want to emphasize.

Some options are:
- Preprocessing the data to remove biases, and training normal classifiers on that data
- Training the classifier and post-processing the predictions to accomodate our measures of fairness
- Training a modified classifier with clear constraints that enforce fairness

We will exemplify the Optimized Preprocessing technique, published by our very own Flavio Calmon.

![../optimized.PNG]

### Recap

Let's take a look at the values we have so far.

In [55]:
# Getting accuracy and fairness metrics on test

print('Training accuracy of our classifier:',acc_train)
print('Testing accuracy of our classifier:',acc_test)
print('Statistical Parity on Test set:', statistical_parity_orig)
print('Conditional Parity on Test set, conditioned on people of race=1:', conditional_parity_orig)
print('False positive rate on Test set: ',fpr)
print('False negative rate on Test set:',fnr)

Training accuracy of our classifier: 0.8052005030857878
Testing accuracy of our classifier: 0.8024295366136628
Statistical Parity on Test set: 0.22109782275375653
Conditional Parity on Test set, conditioned on people of race=1: 0.36339044183949504
False positive rate on Test set:  0.11588675370397536
False negative rate on Test set: 0.46291301416048547


### Let's apply a dataset transformation to increase fairness !

Using AIF360 OptimPreproc module, we can transform our dataset into a new representation that will improve our metrics above.

In [56]:
# Instantiate OptimizedDataPreprocessing module from AIF360    
OP = OptimPreproc(OptTools, optim_options,
                  unprivileged_groups = unprivileged_groups,
                  privileged_groups = privileged_groups)

# Fit the module to the training data, effectively creating the mapping from original data to transformed, fair data
OP = OP.fit(dataset_orig_train)

  warn("Privileged and unprivileged groups specified will not be "
  if self.max_big_small_squared < big*small**2:
  self.max_big_small_squared = big*small**2


Optimized Preprocessing: Objective converged to 0.000000


In [57]:
# Transform training data and align features
dataset_transf_train = OP.transform(dataset_orig_train, transform_Y=True)
dataset_transf_train = dataset_orig_train.align_datasets(dataset_transf_train)

# Same with test data
dataset_transf_test = OP.transform(dataset_orig_test, transform_Y = True)
dataset_transf_test = dataset_orig_test.align_datasets(dataset_transf_test)

In [58]:
# Again, we have to get our training numpy arrays, this time on the TRANSFORMED training data
x_train_transf = scale_orig.fit_transform(dataset_transf_train.features)
y_train_transf = (dataset_transf_train.labels.ravel()-2)*-1
y_train_transf = dataset_transf_train.labels.ravel()

# And our testing arrays, on the TRANSFORMED test data
x_test_transf = scale_orig.transform(dataset_transf_test.features) # Here, we only transform, as we can't use the testing set to define the scaling factors.
y_test_transf = (dataset_transf_test.labels.ravel()-2)*-1
y_test_transf = dataset_transf_test.labels.ravel()

In [59]:
# Train classifier on TRANSFORMED data
rf_model_transf = RandomForestClassifier(n_estimators=25, 
                               max_depth=None,
                               random_state=42).fit(x_train_transf, y_train_transf)
rf_model_transf.fit(x_train_transf, y_train_transf)

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=25, n_jobs=None,
            oob_score=False, random_state=42, verbose=0, warm_start=False)

In [63]:
# Getting accuracy and fairness metrics on TRANSFORMED test set
acc_transf_train = rf_model_transf.score(x_train_transf, y_train_transf)
acc_transf_test = rf_model_transf.score(x_test_transf, y_test_transf)
print('Accuracy on test with original data (we should expect a bit less than before):', acc_transf)


Accuracy on test with original data (we should expect a bit less than before): 0.756159148297277


In [64]:
# Getting NEW fairness metrics

predictions_transf = rf_model_transf.predict(x_test_transf)>0.5
protected_class_array_transf = dataset_orig_test.features[:,GENDER_COLUMN]==0 

statistical_parity_transf = evaluate_statistical_parity(predictions_transf, protected_class_array_transf)
conditional_parity_transf = evaluate_conditional_parity(predictions_transf, protected_class_array_transf, condition_array)
fpr_transf = evaluate_false_positive_rate(predictions_transf, protected_class_array_transf, y_test_transf)
fnr_transf = evaluate_false_negative_rate(predictions_transf, protected_class_array_transf, y_test_transf)

In [69]:
# COmparing metrics
print('ORIGINAL DATA')
print('Training accuracy of our classifier:',acc_train)
print('Testing accuracy of our classifier:',acc_test)
print('Statistical Parity on Test set:', statistical_parity_orig)
print('Conditional Parity on Test set, conditioned on people of race=1:', conditional_parity_orig)
print('False positive rate on Test set: ',fpr)
print('False negative rate on Test set:',fnr)


print('\n\nTRANSFORMED DATA')
print('Training accuracy of our classifier:',acc_transf_train)
print('Testing accuracy of our classifier:',acc_transf_test)
print('Statistical Parity on Test set:', statistical_parity_transf)
print('Conditional Parity on Test set, conditioned on people of race=1:', conditional_parity_transf)
print('False positive rate on Test set: ',fpr_transf)
print('False negative rate on Test set:',fnr_transf)

ORIGINAL DATA
Training accuracy of our classifier: 0.8052005030857878
Testing accuracy of our classifier: 0.8024295366136628
Statistical Parity on Test set: 0.22109782275375653
Conditional Parity on Test set, conditioned on people of race=1: 0.36339044183949504
False positive rate on Test set:  0.11588675370397536
False negative rate on Test set: 0.46291301416048547


TRANSFORMED DATA
Training accuracy of our classifier: 0.7580508350639095
Testing accuracy of our classifier: 0.756159148297277
Statistical Parity on Test set: 0.06649123768108822
Conditional Parity on Test set, conditioned on people of race=1: 0.2892247671857212
False positive rate on Test set:  0.01984584939953457
False negative rate on Test set: 0.1610829049999274


As we can see, our Fairness metrics improved in all cases, and our accuracy took a hit, as expected.

# Conclusion

We have analyzed particular fairness metrics and observed their behavior on an artificial dataset. It is important to remember that Fairness has multiple definitions, each one approriate for analyzing a specific situation. Statistical notions of fairness as described above are easy to measure. However, it is important to keep in mind that statistical definitions are insufficient in some cases (for example, when similarity has to be taken into account). Moreover, most valuable statistical metrics assume availability of actual, verified outcomes. While such outcomes are available for the training data, it is unclear whether the real classified data always conforms to the same distribution.

# Appendix: extra resources

## Interesting Fairness analysis tools
- Pymetrics audit-ai (https://github.com/pymetrics/audit-ai)
- fairness metrics github (https://github.com/megantosh/fairness_measures_code)
- fairness-comparison github (https://github.com/algofairness/fairness-comparison)
- IBM AIF360 (https://github.com/IBM/AIF360, https://arxiv.org/pdf/1810.01943.pdf)
- Themis ML (https://themis-ml.readthedocs.io/en/latest/)
- FairML (https://github.com/adebayoj/fairml)
- BlackBoxAuditing (https://github.com/algofairness/BlackBoxAuditing)

## Interesting papers
- Learning Fair Representations (seminal paper) http://proceedings.mlr.press/v28/zemel13.pdf
- Optimized Data Pre-Processing for Discrimination Prevention (by Flavio Calmon) https://arxiv.org/pdf/1704.03354.pdf
- Fairness Definitions Explained http://fairware.cs.umass.edu/papers/Verma.pdf
- From parity to Preference-based notions of fairness https://arxiv.org/abs/1707.00010
- Certifying and removing disparate impact https://arxiv.org/pdf/1412.3756.pdf
- Learning Classification without Disparate Mistreatment https://arxiv.org/pdf/1610.08452.pdf
- Fairness Constraints: Mechanisms for Fair Classification https://arxiv.org/abs/1507.05259
- Fairness GAN https://arxiv.org/pdf/1805.09910.pdf
- Adversarial Debiasing https://arxiv.org/pdf/1801.07593.pdf
- Classification with Fairness Constraints: A Meta-Algorithm with Provable Guarantees https://arxiv.org/pdf/1806.06055.pdf


## Fairness concepts
- **Fairness through unawareness:**
No sensitive attributes used in the decision making process.
- **Fairness through awareness:**
Similar individuals should have similar classification.
- **Disparate impact:**
Exists when decision outcomes disproportionately benefits or hurts individuals of a certain group.
- **Disparate treatment:**
Decision changes when protected feature changes.
- **Disparate mistreatment:**
Missclassification rates are different for people of different protected groups


We refer the reader to http://fairware.cs.umass.edu/papers/Verma.pdf for more information.