## Motivation
This branch explores the use of concepts from Causal Inference for fairness penalties. 
This was inspired work on 
[debiasing word embeddings from Bolukbasi et.al](https://papers.nips.cc/paper/6228-man-is-to-computer-programmer-as-woman-is-to-homemaker-debiasing-word-embeddings.pdf) 
The basic idea of that paper is as follows. Each word is embedded according to some algorithm.
This algorithm may be biased however, let's say according to gender.
An intuitive way to debias the word embeddings is to first identify the gender subspace 
and then project embeddings onto the orthogonal complement of the gender subspace.
The Bobulaski methodology heavily depends on the availability of pairs of words that differ only in gender.
Supposing such a list exists, for example
* {grandmother, grandfather}
* {guy, gal}
* {he, she}
* {mother, father}

Let $h$ denote the embedding of a word. Then if we form the differences
* $h_{grandmother} - h_{grandfather}$
* $h_{guy} - h_{gal}$
* $h_{he} - h_{she}$
* $h_{mother} - h_{father}$

and conduct PCA on these differences, we can identify a gender subspace, 
let's call that $B$ defined by $k$ orthogonal unit vectors $b_1,\ldots,b_k$. 
Let $h_B$ denote the projection of a vector $h$ onto the subspace $B$, i.e. $h_B = \sum_{j=1}^k (v \cdot b_j) b_j$.
Then $h - h_B$ is the projection of $h$ onto the orthogonal complement of $B$.
Bolukbasi et. al propse to debias word embeddings by projection onto the orthogonal complement of the identified gender subspace $B$. 
In otherwords, $h^{debiased} = h - h_B$.  

## Causal Inference
Going back to our debiasing neural network prediction project, there is a parallel between debiasing word embeddings and
debiasing the internal representations in the neural networks. Let $h_i$ denote the hidden node vector for sample $i$ (ignore which layer this is, for now).
Let $Z$ denote the sensitive attribute, assume it is binary for simplicity. 
Each sample also has a covariate vector $X$ associated to it. 
These are the variables that are used as input to the neural network predictor/classifier. 
Using the terminology and notation from the causal inference literature, 
let $h_i(1) := h_i(Z_i =1)$ be the potential representation for sample $i$ if the sensitive attribute is indeed $Z_i = 1$.
Similarly, let $h_i(0):= h_i(Z_i=0)$ be the potential  representation if $Z_i = 0$. Both $h_i(1)$ and $h_i(0)$ can be 
thought of as random variables. Note that only one of them can ever be observed. 
So one of these potential outcomes is always a counterfactual and thus can never be observed.

In an ideal situation, we would observe pairs $(h_i(1),h_i(0))$ and so would be able to apply the Bolukbasi methodology above. 
So in this branch, we try to figure out a way to circumvent this difficulty, i.e. the absence of pairs of observations that differ only in gender (or some other protected variable).   


## Load dataset
To demonstrate the idea, we need a dataset that has a binary sensitive attribute. We'll use the propublica compass data.

In [None]:
debiasing.debiasing_sweep(desired_seed=desired_seed, debias_type='causal_inference')