# What's wrong with Fair Representations of input data?

## Data

Throughout we're going to be using synthetic data. We could easily run all of this throughout with real-world datasets (launch any of the following files as a notebook using the rocket icon at the top of the page and replace `synthetic(*args, **kwargs)` with any from `adult(), compas(), credit(), crime(), health(), sqf()`).

The synthetic data covers 4 scenarios. Each scenario is run over the next few pages.

The data consists of 4 input variables, $X_1$, $X_2$, $N_1$ and $N_2$ where $X$ are drawn as described below, and $N$ are random noise that are independent of all variables. There are 3 outcoe variables $Y_1$, $Y_2$ and $Y_3$ which represent outcomes based solely on each of the inputs, in addition to a combination of the inputs. Lastly, we have $S$, a sensitive attribute. How $S$ relates to $X$ and $Y$ is altered in each scenario.

### Scenario 1
![Scenario 1](./assets/scenario_1.png)

In this scenario, there are two input variables, $X_1$ and $X_2$.
There are three outcome variables, $Y_1$, $Y_2$ & $Y_3$.
There is one sensitive attribute, $S$, which is independent of all $X$ & $Y$.

### Scenario 2
![Scenario 2](./assets/scenario_2.png)

In this scenario, there are two input variables, $X_1$ and $X_2$.
There are three outcome variables, $Y_1$, $Y_2$ & $Y_3$.
There is one sensitive attribute, $S$, which is independent of $X_2$ & $Y_2$, but not independent of $X_1$, $Y_1$, or $Y_3$.

### Scenario 3
![Scenario 3](./assets/scenario_3.png)

In this scenario, there are two input variables, $X_1$ and $X_2$.
There are three outcome variables, $Y_1$, $Y_2$ & $Y_3$.
There is one sensitive attribute, $S$, which is independent of both $X$, $Y_1$ and $Y_2$, but not independent of $Y_3$.

### Scenario 4
![Scenario 4](./assets/scenario_4.png)

In this scenario, there are two input variables, $X_1$ and $X_2$.
There are three outcome variables, $Y_1$, $Y_2$ & $Y_3$.
There is one sensitive attribute, $S$, which is independent of $X_2$ & $Y_2$, but not independent of $X_1$, $Y_1$, or $Y_3$.

## Strategic Approaches

### Strategy 1

We have some input data $x$. We want a function that produces a version of this data ($z_x$), such that $z_x$ is independent of some protected characteristic $s$. In other words we want to find $e: X \rightarrow Z_x ~~\mathrm{s.t.}~ Z_x \perp S$.

Let's look at that. 

![setup1](assets/setup1.png)

The red line indicates that you cannot learn $S$ from $Z_x$; there is no mutual information between these two variables. They are independent.

The problem here is that the easiest way for a network to achieve this is to just learn nothing. Make $Z_x$ all $0$'s and your job is done. But that's a claim... let's demonstrate that.

### Strategy 2

So the problem is that our representation doesn't have any direction. It's goal is to make $S$ unrecognizable from $Z$. Which it does, it's just that you can't tell anything else from $Z$ either.

So let's give $Z$ some direction.

![setup2](assets/setup2.png)

In this case we want $Z$ to have no information about $S$, but also be representative of $Y$.

We can re-use most of the parts from before, but we need a predictor.

Well, this is certainly more accurate than before, but although were a bit more equal than in the original data, we're not really doing a great job. The reason for this is that there is a tension between removing information that is relevant to $S$ and keeping information that is relevant for $Y$. To demonstrate this, let's tweak the above model to remove this tension.

### Strategy 3

![setup3](assets/setup3.png)

In this setup we remove the tension. $Z$ can freely remove $S$, and $Y$ can get all the information it needs about $S$ directly.