## Causal Models

Judea Pearl defines a causal model as an ordered triple ${\displaystyle \langle U,V,E\rangle }$ , where U is a set of exogenous variables whose values are determined by factors outside the model; V is a set of endogenous variables whose values are determined by factors within the model; and E is a set of structural equations that express the value of each endogenous variable as a function of the values of the other variables in U and V.

[Reference](https://en.wikipedia.org/wiki/Causal_model)

## Ladder of Causation

![image](http://lgmoneda.github.io/images/book-why/ladder-table.png)

## Mediator

B is a mediator in that it mediates the change that A would otherwise have on C.

${\displaystyle A\rightarrow B\rightarrow C}$

## Fork

In forks, one cause has multiple effects. The two effects have a common cause. There exists a (non-causal) spurious correlation between A and C that can be eliminated by conditioning on B (for a specific value of B).[5]:114

${\displaystyle A\leftarrow B\rightarrow C}$
"Conditioning on B" means "given B" (i.e., given a value of B).

## Collider

In colliders, multiple causes affect one outcome. Conditioning on B (for a specific value of B) often reveals a non-causal correlation between A and C. 

${\displaystyle A\rightarrow B\leftarrow C}$

### Exercise:
 Show that P(A,C)=P(A)P(C) using the graphical model of the collider




## Independence conditions
Independence conditions are rules for deciding whether two variables are independent of each other. Variables are independent if the values of one do not directly affect the values of the other. Multiple causal models can share independence conditions. For example, the models

${\displaystyle A\rightarrow B\rightarrow C}$
and

${\displaystyle A\leftarrow B\rightarrow C}$
have the same independence conditions, because conditioning on B leaves A and C independent

## Intervention


${\displaystyle P(Y|X)\neq P(Y|do(X))}$

If you intervene on a variable V, i.e., do(V), then you remove all the incoming arrows to the variable V and work with the mutilated graph

![image](https://www.inference.vc/content/images/2019/01/Screen-Shot-2019-01-18-at-10.34.15-AM.png)

![image](https://fabiandablander.com/assets/img/Seeing-vs-Doing-II.png)

## Backdoor Adjustment

For analysing the causal effect of X on Y in a causal model we need to adjust for all confounder variables (deconfounding). 

Definition: a backdoor path from variable X to Y is any path from X to Y that starts with an arrow pointing to X.

Definition: Given an ordered pair of variables (X,Y) in a model, a set of confounder variables Z satisfies the backdoor criterion if (1) no confounder variable Z is a descendent of X and (2) all backdoor paths between X and Y are blocked by the set of confounders.



${\displaystyle P(Y|do(X))=\textstyle \sum _{z}\displaystyle P(Y|X,Z=z)P(Z=z)}$

## Frontdoor adjustment

Definition: 

a frontdoor path is a direct causal path for which data is available for all ${\displaystyle z\in Z}$, ${\displaystyle Z}$ intercepts all directed paths ${\displaystyle X}$ to ${\displaystyle Y}$, there are no unblocked paths from ${\displaystyle Z}$ to ${\displaystyle Y}$, and all backdoor paths from ${\displaystyle Z}$ to ${\displaystyle Y}$ are blocked by ${\displaystyle X}$. 

The following converts a do expression into a do-free expression by conditioning on the variables along the front-door path.[5]:226

${\displaystyle P(Y|do(X))=\textstyle \sum _{z}\left[\displaystyle P(Z=z|X)\textstyle \sum _{x}\displaystyle P(Y|X=x,Z=z)P(X=x)\right]}$

Presuming data for these observable probabilities is available, the ultimate probability can be computed without an experiment, regardless of the existence of other confounding paths and without backdoor adjustment.

![image](https://i.stack.imgur.com/k4Jhi.png)

## Do calculus
The do calculus is the set of manipulations that are available to transform one expression into another, with the general goal of transforming expressions that contain the do operator into expressions that do not.

The set of rules is complete. An algorithm can determine whether, for a given model, a solution is computable in polynomial time.

### Rule 1
Rule 1 permits the addition or deletion of observations.

${\displaystyle P(Y|do(X),Z,W)=P(Y|do(X),Z)}$
in the case that the variable set Z blocks all paths from W to Y and all arrows leading into X have been deleted

## Rule 2
Rule 2 permits the replacement of an intervention with an observation or vice versa.

${\displaystyle P(Y|do(X),Z)=P(Y|X,Z)}$
in the case that Z satisfies the back-door criterion.

### Rule 3
Rule 3 permits the deletion or addition of interventions.

{\displaystyle P(Y|do(X))=P(Y)}$
in the case where no causal paths connect X and Y.

## Application of the Do Calculus

![image](https://i.stack.imgur.com/jW6Nb.png)

## Counterfactuals

[Reference](https://github.com/altdeep/causalML/blob/master/tutorials/3-counterfactual/counterfactuals_in_pyro.ipynb)

This is an implementation of an example from Peters et al. 2017.

Consider a treatment study, where a company introduced a new medicine for eyes.

Suppose this is the true underlying model for the causal effect of Treatment $T$ ($T=1$ if the treatment was given) and the result $B$ ($B=1$ if the person goes blind).$$
\begin{align}
N_T \sim Ber(.5)\\ 
N_B \sim Ber(.01) \\
 T := N_T \\
 B := T * N_B + (1-T)*(1-N_B) \\
\end{align}
$$

Suppose patient with poor eyesight comes to the hospital and goes blind (B=1) after the doctor gives treatment (T=1).

We can ask "what would have happened had the doctor administered treatment T = 0?"

Here is the steps we follow to answer this counterfactual question.

Retrieve noise variables given observation.

We observed $B=T=1$. Plugging that to the equations above

$$\begin{align}
1 = N_T\\
1 = 1*N_B + (1-1)*(1-N_B)
\end{align}$$

So, $N_T=1$ and $N_B = 1$

Intervene on $T$. Put $T=0$, and solve for $B$

$$
\begin{align}
 T = 0\\
 B = 0 * 1 + (1-0)*(1-1) = 0 \\
\end{align}
$$
Thus, by this model, person would not have gone blind if the treatment was not given.

## Mediation

How can we disentangle the Direct effect, Indirect effect, and Total effect?

![image](https://lh3.googleusercontent.com/proxy/acghFDtZzq-gR7kkzlMUA8CIFogSL0aDJ8Pg9Et51R-m6cdHEUaw3S1yw-YLf-NaGBds34Ke7Rv5k5A)

[Reference](https://en.wikipedia.org/wiki/Mediation_(statistics) )

 In particular, four types of effects have been defined for the transition from X = 0 to X = 1:

(a) Total effect –

${\displaystyle TE=E[Y(1)-Y(0)]}$

(b) Controlled direct effect -

${\displaystyle CDE(m)=E[Y(1,m)-Y(0,m)]}$

(c) Natural direct effect -

${\displaystyle NDE=E[Y(1,M(0))-Y(0,M(0))]}$

(d) Natural indirect effect

${\displaystyle NIE=E[Y(0,M(1))-Y(0,M(0))]}$

Where E[ ] stands for expectation taken over the error terms.

These effects have the following interpretations:

-- TE measures the expected increase in the outcome Y as X changes from X=0 to X =1, while the mediator is allowed to track the change in X as dictated by the function M = g(X, ε2).

--CDE measures the expected increase in the outcome Y as X changes from X = 0 to X = 1, while the mediator is fixed at a pre-specified level M = m uniformly over the entire population

--NDE measures the expected increase in Y as X changes from X = 0 to X = 1, while setting the mediator variable to whatever value it would have obtained under X = 0, i.e., before the change.

--NIE measures the expected increase in Y when the X is held constant, at X = 1, and M changes to whatever value it would have attained (for each individual) under X = 1.

The difference TE-NDE measures the extent to which mediation is necessary for explaining the effect, while the NIE measures the extent to which mediation is sufficient for sustaining it.

A controlled version of the indirect effect does not exist because there is no way of disabling the direct effect by fixing a variable to a constant.

According to these definitions the total effect can be decomposed as a sum

${\displaystyle TE=NDE-NIE_{r}}$

where NIEr stands for the reverse transition, from X = 1 to X = 0; it becomes additive in linear systems, where reversal of transitions entails sign reversal.

The power of these definitions lies in their generality; they are applicable to models with arbitrary nonlinear interactions, arbitrary dependencies among the disturbances, and both continuous and categorical variables.

## Linear Regression


When the causal model is a plausible representation of reality and the backdoor criterion is satisfied, then partial regression coefficients can be used as (causal) path coefficients (for linear relationships) [Reference](https://en.wikipedia.org/wiki/Causal_model)

The regression coefficient tells us the associational relationship of observational data. y = a + bx + cz. For a fixed value of z, a unit increase in x is associated with on average b increase in y. But we cannot say, in general, that changing x is associated with b increase in y. That is the difference between observation "Seeing" and interventional "Doing". The latter is the causal relationship. How do we know in a linear regression if the coefficient represents a causal impact? It depends on what z is. If z is a confounder, then b is a causal impact. If z is a collider or mediator, b is not a causal impact.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas

from statsmodels.formula.api import ols

import warnings
warnings.filterwarnings('ignore')

  import pandas.util.testing as tm


## Confounder
Let us simulate a data generating process. The first case is a common cause z of the two variables x and y. If we control for the confounding variable z, then then x and y become independent. 

Note that if z is a mediator between x and y and we control for z, then x and y become independent. Statistics alone cannot tell us which one is the right model.

In [2]:
# Fork (Confounder): x<-z->y
z=np.random.normal(size=10000)
x=2*z + np.random.normal(size=10000)
y=3*z + np.random.normal(size=10000)
data = pandas.DataFrame({'x': x, 'y': y, 'z': z})

mod7 = ols("y~x", data).fit()
print(mod7.params)
mod8 = ols("y~x+z", data).fit()
print(mod8.params)

Intercept   -0.004767
x            1.214064
dtype: float64
Intercept   -0.010239
x            0.004289
z            2.988945
dtype: float64


## Mediator
The second case is a mediator z between x and y. But we also add a direct path from x to y.
If we control for z, we can meaure the direct causal impact of x on y. Otherwise we have the total impact (direct + indirect) of x on y.

In [3]:
#Mediator (direct and indirect causal impact): x->z->y and x->y
x=np.random.normal(size=10000)
z = 2*x + np.random.normal(size=10000)
y = 3*z + 5*x
data = pandas.DataFrame({'x': x, 'y': y, 'z': z})
mod5 = ols("y~x", data).fit()
print(mod5.params)
mod6 = ols("y~x+z", data).fit()
print(mod6.params)

Intercept     0.033468
x            11.014335
dtype: float64
Intercept   -9.020562e-17
x            5.000000e+00
z            3.000000e+00
dtype: float64


## Collider

The third case is a common effect z of the two variables x and y. The variables x and y are indpendent. If we control for the variable z, then x and y become dependent. 
This shows that controlling for more variables can create a bias. 

In [4]:
#Collider:  x->z<-y. Controlling for Z introduces bias
x=np.random.normal(size=10000)
y=np.random.normal(size=10000)
z=x+y
data = pandas.DataFrame({'x': x, 'y': y, 'z': z})
mod1 = ols("y~x", data).fit()
print(mod1.params)
mod2 = ols("y~x+z", data).fit()
print(mod2.params)
mod3 = ols("z~x+y", data).fit() # The regression coeffient of z with respect to x does not change if we add/remove y,
                                #since x and y are independent 
print(mod3.params)
mod4 = ols("z~x", data).fit() # it is no different than mod3
print(mod4.params)

Intercept   -0.018117
x           -0.004741
dtype: float64
Intercept    9.497611e-17
x           -1.000000e+00
z            1.000000e+00
dtype: float64
Intercept   -6.071532e-18
x            1.000000e+00
y            1.000000e+00
dtype: float64
Intercept   -0.018117
x            0.995259
dtype: float64
