# Learning Aversarially Fair and Transferable Representations
This paper
* They present a way to learn representations that when used in some prediction task are fair with regards to some sensitive attributes.

## Introduction
* Scenario: Company A might provide learned representations of some data and another company B might provide a model trained to make predictions on these representations. Different companies may have different goals and maybe company B does not care about fairness in which case it would be nice if the representations provided by A would make this impossible.
* Idea and intuition: If they have a strong model for acting unfairly, this should act as an upper bound of unfairness for external actors using the learnt representations.

## Fairness
* Notation
    * Data $X \in \mathcal{R}^n$
    * Labels $Y \in \{0, 1 \}$
    * Sensitive attributes $A \in \{ 0, 1 \}$
* Goal: Accurately predict $Y$ but not biased to any certain setting of $A$.
* *Demographic Parity*
    * Positive outcome should be given at the same rate to both groups. $P(\hat{Y} = 1 | A = 0) = P(\hat{Y} = 1 | A = 1)$
    * Less useful if actual rates differ between groups, i.e. $P(Y = 1 | A = 0) \neq P(Y = 1 | A = 1)$
* *Equalized Odds*
    * Predictions should give equal false positive and false negative rates for both groups.
    * $P(\hat{Y}=1-Y | A=0, Y=y) = P(\hat{Y}=1-Y | A=1, Y=y) \forall{y \in \{ 0, 1 \}}$
* *Equalized Opportunity*
    * Same as equalized odds but just requiring either false positive rates or false negatives rates to be the same.
    
## Model
<img src="figs/learning-adv-fair-repr.png" width="45%">

* Parts
    * Encoder $f$ computes a representation $z$ from $x$
    * (Optional, depending on task) decoder $k$ computes a reconstruction from the representation $z$ and sensitive attributes $a$.
    * Classifier $g$ predicts labels $y$ from representation $z$
    * Adversary $h$ predicts sensitive attributes $a$ from representation $z$
* Training
    * Intuitively: adversary $h$ wants to correctly guess sensitive attributes $a$
        * The objective $\mathcal{R}_A$ to be maximized depends on the type of fairness we're trying to achieve.
    * Encoder $f$, (decoder $k$), and classifier $g$ are trained jointly to
        * minimize classification loss
        * (minimize reconstruction error)
        * minimize adversary's objective (wrt parameters of $f,g,k$)
* Losses
    * For classification loss they use average absolute difference $\frac{1}{|\mathcal{D}|}\sum | g(f(x)) - y |$
    * For the adversary objective $\mathcal{R}_A$
        * For demographic parity they use average absolute difference per *group* (with and without the sensitive attribute). $\sum_{i \in \{0,1\}} \frac{1}{|\mathcal{D}_i|} \sum_{(x,a) \in \mathcal{D}_i} | h(f(x)) - a |$ 
        * For equalized odds they use the average absolute difference on each combination of *group* (with and without sensitive attribute) and *label*. $\sum_{(i,j) \in \{0,1\}} \frac{1}{|\mathcal{D}^j_i|} \sum_{(x,a) \in \mathcal{D}^j_i} | h(f(x)) - a |$
        
## Theory
* They derive upper bounds on *fairness* for the adversarial objective functions for the different objectives (demographic parity, equalized odds)
* They use *statistical distance* (total variation) over some test $\mu: \Omega_{\mathcal{D}} \rightarrow \{0,1\}$ (in this case $\mu$ is a function of classifier $g$?) $\Delta^*(\mathcal{D}_0, \mathcal{D}_1) = sup_\mu | \mathbb{E}_{x ~ \mathcal{D}_0} [ \mu(x) ] - \mathbb{E}_{x ~ \mathcal{D}_1} [ \mu(x) ]|$
    * Interpretation: The maximum difference over the set of all tests (*classifiers*?) we get for the two different groups (with or without sensitive variable).
* For demographic parity
    * $\mu$ is the classifier $g$, i.e. for demographic parity the classifier should give the same outcome at the same rate for both groups.
    * They show that $\mathcal{R}_{DP}$, the object the adversary wants to maximize is an upper bound to the test discrepancy or fairness (demographic parity fairness). 
    * Thus by training the encoder (and classifier) to minimize this, it will give a more fair representation.
* For equalized odds and equal opportunity the thinking is similar.
    
## Experiments
* *Fair classification*. They try this out on a few different datasets and note that this method gives a model that has slightly worse accuracy than an MLP classifier without concern for fairness but much better fairness metrics.
* *Transfer learning*. They also show that this framework is capable of learning representations that when used with a separate classifier (trained independently) will still give fair classification.
    * They say that in this case it was best to remove the classification loss and just use the reconstruction loss. I.e. basically train an autoencoder with the adversarial regularization on the latent representation. Like Adversarial Autoencoder?

## Discussion
* Sometimes it might be hard to know what the sensitive attributes are?
* The reason we can't just remove sensitive attributes (if we know them) is that correlations might still exist. E.g. the model learns that the *long hair* feature is a good predictor for lower income because *female* correlates with *long hair* and this would make for unfair predictions. So with fairness this shouldn't happen.