# Causal Inference in Statistics 
(Pearl, Glymour, Jewell)



## **Chapter 2 - Graphical Models and Their Applications**

**Rule 1 (Conditional  Independence in Chains)**: Two variables, $X$ and $Y$, are conditionally independent given $Z$, if there is only one unidirectional path between $X$ and $Y$ and $Z$ is any set of variables that intercepts that path. 

**Rule 2 (Conditional Independence in Forks)**: If a variable $X$ is a common cause of variables $Y$ and $Z$, and there is only one path between $Y$ and $Z$, then $Y$ and $Z$ are independent conditional on $X$

**Rule 3 (Conditional Independnce in Colliders)**: If a variable $Z$ is the collision node between two variables $X$ and $Y$, and there is only one path between $X$ and $Y$, then $X$ and $Y$ are unconditionally independent but are dependent conditional on $Z$ and any descendant of $Z$. 

Two nodes $X$ and $Y$ are d-separated if every path between them is _blocked_.  If even one path between $X$ and $Y$ is unblocked, $X$ and $Y$ are d-connected.

When we say that a pair of nodes are d-separated, we mean that the variables they represent are definitely independent. 

**(_d_-separation)** A path is blocked by a set of nodes $Z$ iff

1. $p$ contains a chain of nodes $A \to B \to C$ or a fork $A \leftarrow B \to C$ such tat the middle node $B$ is in $Z$ (i.e. $B$ is conditioned on), or
2. $p$ contains a collider $A \to B \leftarrow C$ such that the collision node $B$ is not in $Z$, and no descendant of $B$ is in $Z$.



### **2.1 Connecting Models to Data**







### **2.2 Chains and Forks**


### **2.3 Colliders**


#### **Study question 2.3.1**
_(a) List all pairs of variables in Figure 2.5 that are independent conditional on the set $Z={R,V}$._


<img src="https://github.com/gmonce/datascience/blob/master/causalidad/img/figure_2_5.png?raw=1" alt="Drawing" width="300"/>


$\langle X,S \rangle$  : The $X \to R \to S$ chain is the only unidirectional path between  $X$ and  $S$, and  $R \in Z$ (Rule 1)

$\langle X,T \rangle$: same case as before

$\langle U,Y \rangle$: there is only one path between U and Y,  V is a common cause of U and Y, and $V \in Z$ (Rule 2)

$\langle T,Y \rangle$: same case as before

$\langle S,U \rangle$: there is only one path between $S$ and $U$, $Y$ is a collider, and $T \notin Z$ (Rule 3)

All the cases where there are  no chains, no common causes, and they are not conditioned on colliders or descendants of colliders, are independent. For example: $\langle X,Y \rangle$
___

_(b) For each pair of nonadjacent variables in Figure 2.5, give a set of variables that, when conditioned on, renders that pair independent._

| Pair of Variables | Conditioned on      | Cause |
| :---- | :-------: | :------|
| $\langle X,S \rangle$ | $\{R\}$ | Chain Rule |    
| $\langle X,T \rangle$ | $\{R\}$ | Chain Rule |    
| $\langle X,U \rangle$ | $∅$ |  |    
| $\langle X,V \rangle$ | $∅$ |  |    
| $\langle X,Y \rangle$ | $∅$ |  |    
| $\langle R,T \rangle$ | $\{S\}$ | Chain Rule |    
| $\langle R,U \rangle$ | $∅$ |  |    
| $\langle R,V \rangle$ | $∅$ |  |    
| $\langle R,Y \rangle$ | $∅$ |  |    
| $\langle S,U \rangle$ | $∅$ | Collider Rule |    
| $\langle S,V \rangle$ | $∅$ |  |    
| $\langle S,Y \rangle$ | $∅$ |  |    
| $\langle T,V \rangle$ | $\{U\}$ | Chain Rule |    
| $\langle T,Y \rangle$ | $\{V\}$ | Fork Rule |    
| $\langle U,Y \rangle$ | $\{V\}$ | Fork Rule |    

___

_(c) List all pairs of variables in Figure 2.6 that are independent conditional on the set $Z= \{R,P\}$._

<img src="https://github.com/gmonce/datascience/blob/master/causalidad/img/figure_2_6.png?raw=1" alt="Drawing" width="200"/>


$\langle X,S \rangle$  : The $X \to R \to S$ chain is the only unidirectional path between  $X$ and  $S$, and  $R \in Z$ (Rule 1)

$\langle X,T \rangle$: same case as before

$\langle X,P \rangle$: same case as before

$\langle U,Y \rangle$: there is only one path between U and Y,  V is a common cause of U and Y, and $V \in Z$ (Rule 2)

$\langle T,Y \rangle$: same case as before

All the cases where there are  no chains, no common causes, and they are not conditioned on colliders or descendants of colliders, are independent. For example: $\langle X,Y \rangle$
___

_(d) For each pair of nonadjacent variables in Figure 2.6, give a set of variables taht, when conditioned on, renders that pair independent_ 

| Pair of Variables | Conditioned on      | Cause |
| :---- | :-------: | :------|
| $\langle X,S \rangle$ | $\{R\}$ | Chain Rule |    
| $\langle X,T \rangle$ | $\{R\}$ | Chain Rule |    
| $\langle X,U \rangle$ | $∅$ |  |    
| $\langle X,V \rangle$ | $∅$ |  |    
| $\langle X,Y \rangle$ | $∅$ |  |    
| $\langle X,P \rangle$ | $\{R\}$ | Chain Rule |    
| $\langle R,T \rangle$ | $\{S\}$ | Chain Rule |    
| $\langle R,P \rangle$ | $\{S\}$ | Chain Rule |    
| $\langle R,U \rangle$ | $∅$ |  |    
| $\langle R,V \rangle$ | $∅$ |  |    
| $\langle R,Y \rangle$ | $∅$ |  |    
| $\langle S,U \rangle$ | $∅$ | Collider Rule |    
| $\langle S,P \rangle$ | $\{T\}$ | Chain Rule |    
| $\langle S,V \rangle$ | $∅$ |  |    
| $\langle S,Y \rangle$ | $∅$ |  |    
| $\langle P,U \rangle$ | $\{T\}$ | Chain Rule |    
| $\langle P,V \rangle$ | $\{T\}$ | Chain Rule |    
| $\langle P,Y \rangle$ | $∅$ |  |    
| $\langle T,V \rangle$ | $\{U\}$ | Chain Rule |    
| $\langle T,Y \rangle$ | $\{V\}$ | Fork Rule |    
| $\langle U,Y \rangle$ | $\{V\}$ | Fork Rule |    

___
_(e) Suppose we generate data by the model described in Figure 2.6 and we fit them with the linear equation $Y = a + bX +cZ$. Which of the variables in the model may be chosen for $Z$ so as to guarantee that the slope of $b$ would be equal to zero? [Hint: Recall, a non zero slope implies that $Y$ and $X$ are dependent given $Z$.]_ 

If $b = 0$ then $Y$ is indepdendent of $X$ given $Z$. So, we have

| Z | Y indep of X?       | Cause |
| :---- | :-------: | :------|
| $R$ | Yes | Chain Rule |
| $S$ | Yes | Chain Rule |
| $T$ | No | Cond. on collider |
| $P$ | No | Cond. on desc. of collider |
| $U$ | Yes | Chain Rule |
| $V$ | Yes | Fork rule |

___
_(f) Continuing question (e), suppose we fit the data with the equation:_ 

$$ Y = a + bX + cR + dS + eT + fP $$

_which of the coefficients would be zero?_


If $b = 0$, then $Y \perp \!\!\! \perp X$ given $R,S,T,P$
(Yes: Chain rule)

If $c = 0$, then $Y \perp \!\!\! \perp R$ given $S,T,P$
(Yes: Chain rule)

If $d = 0$, then $Y \perp \!\!\! \perp S$ given $T,P$
(No: Collider rule)

If $e = 0$, then $Y \perp \!\!\! \perp T$ given $S,P$
(No: Collider rule)

If $f = 0$, then $Y \perp \!\!\! \perp P$ given $S,T$
(Yes: Chain rule)





### **2.4 d-Separation**


#### **Study question 2.4.1**

_Figure 2.9 below represents a causal graph from which the error terms have been deleted. Assume that all those errors are mutually independent._

_(a) For each pair of nonadjacent nodes in this graph, find a set of variables that d-separates that pair. What do this list tell us about independencies in the data?_

<img src="https://github.com/gmonce/datascience/blob/master/causalidad/img/figure_2_9.png?raw=1" alt="Drawing" width="200"/>


| Pair of Variables | Conditioned on      | Explanation |
| :---- | :-------: | :------------------| 
| $\langle Z_1,Z_2 \rangle$ | $∅$ | Every path contains a collider |    
| $\langle Z_1,W \rangle$ | $\{X\}$ | We need to block $Z_1 → X → W$ and $Z_1 → Z_3 → X → W$, every other path contains colliders not conditioned on     
| $\langle Z_1,Y \rangle$ | $\{X,Z_3,Z_2\}$ | Block $Z_1 → X → W →Y$ cond on $X$, block $Z_1 → X ← Z_3 → Y$  cond on $Z_3$, etc.
| $\langle Z_2,X \rangle$ | $\{Z_3,Z_1\}$ |    
| $\langle Z_2,W \rangle$ | $\{X\}$ |  |    
| $\langle Z_3,W \rangle$ | $\{X\}$ |  |    
| $\langle X,Y \rangle$ | $\{W,Z_3,Z_2\}$ |  |    

___
_(b) Repeat question (a) assuming that only variables in the set $\{Z_3,W,X,Z_1\}$ can be measured_ 

| Pair of Variables | Conditioned on      | 
| :---- | :------- | 
| $\langle Z_1,Z_2 \rangle$ | $∅$ |    
| $\langle Z_1,W \rangle$ | $\{X\}$ |    
| $\langle Z_1,Y \rangle$ | Not possible.|    
| $\langle Z_2,X \rangle$ | $\{Z_3,Z_1\}$ |    
| $\langle Z_2,W \rangle$ | $\{X\}$ |  |    
| $\langle Z_3,W \rangle$ | $\{X\}$ |  |    
| $\langle X,Y \rangle$ | $\{W,Z_1,Z_3\}$ |  |  

____

_(c) For each pair of nonadjacent nodes in the graph, determine whether they are independent conditional on all other variables in the graph_ 

| Pair of Variables | Independent?     |         Unblocked path for | 
| :---------------- | :------- | :-------------------- |
| $\langle Z_1,Z_2 \rangle$ | No|   $Z_1 \to Z_3 \leftarrow Z_2$ |
| $\langle Z_1,W \rangle$ |   Yes |   |
| $\langle Z_1,Y \rangle$ |   Yes |   |
| $\langle Z_2,X \rangle$ |   Yes |   |
| $\langle Z_2,W \rangle$ |   No | $Z_2 \to Y \leftarrow W$ |    
| $\langle Z_3,W \rangle$ |   No | $Z_3 \to Y \leftarrow W$ |    
| $\langle X,Y \rangle$   |   Yes |  |  

____

_(d) For every variable $V$ in the graph, find a minimal set of nodes that renders $V$ independent of all the other variables in the graph_

| Variable          | Set    of variables that renders $V$ independent                 |
| :---------------- | :--------------------- :| 
| $Z_1$ | $\{X,Z_2,Z_3\}$ |
| $Z_2$ |   Not possible.  If we block $ Z_2 → Z_3 → X$ cond on $Z_3$, then the path $Z_2 → Z_3 → Z_1$ gets unblocked   |
| $Z_3$ |   $\{X\}$  |
| $X$ |   $\{W,Z_2,Z_1\}$|
| $W$ |   $\{X\}$ |    
| $Y$ |   $\{W,Z_2,Z_3\}$ |    

____
_(e) Suppose we wish to estimate the value of $Y$ from measurements taken on all other variables in the model. Find the smallest set of variables that would yield as good an estimate of $Y$ as when measured all variables_

From _(d)_ we know that the set $\{W,Z_2,Z_3\}$ renders $Y$ independent of all the other variables in the graph. From this, estimating $Y$ based on this set yields as good an estimate as when measured all variables in the graph.
___

_(f) Repeat question (e) assuming that we wish to estimate the value of $Z_2$._

It is not possible.

____
_(g) Suppose we wish to predict the value of $Z_2$ from measurements of $Z_3$. Would the quality of our prediction improve if we add measurement of $W$? Explain._

Suppose we have $Z_2 = aZ_3 + cW + b$. We want to know if $c \neq 0$, i.e. if $Z_2$ is dependent of $W$ given $Z_3$. This is equivalent to say that $Z_2$ and $W$ are d-connected in the graph.

If we condition on $Z_3$ every path is blocked... except for the path $Z_2 → Z_3 → Z_1 → X → W$, wich is not blocked, since we are only conditioning on a collider ($Z_3$). So, $Z_2$ and $W$ are d-connected, and so they are most likely dependent given $Z_3$. So, the quality  of our prediction will actually improve if we add measurement of $W$.  









### **2.5 Model Testing and Causal Search**
