# Causal Inference in Statistics 
(Pearl, Glymour, Jewell)



## **Chapter 2 - Graphical Models and Their Applications**

**Rule 1 (Conditional  Independence in Chains)**: Two variables, $X$ and $Y$, are conditionally independent given $Z$, if there is only one unidirectional path between $X$ and $Y$ and $Z$ is any set of variables that intercepts that path. 

**Rule 2 (Conditional Independence in Forks)**: If a variable $X$ is a common cause of variables $Y$ and $Z$, and there is only one path between $Y$ and $Z$, then $Y$ and $Z$ are independent conditional on $X$

**Rule 3 (Conditional Independnce in Colliders)**: If a variable $Z$ is the collision node between two variables $X$ and $Y$, and there is only one path between $X$ and $Y$, then $X$ and $Y$ are unconditionally independent but are dependent conditional on $Z$ and any descendant of $Z$. 

### **2.1 Connecting Models to Data**







### **2.2 Chains and Forks**


### **2.3 Colliders**


#### **(*) Study question 2.3.1**
_(a) List all pairs of variables in Figure 2.5 that are independent conditional on the set $Z={R,V}$._


<img src="https://github.com/gmonce/datascience/blob/master/causalidad/img/figure_2_5.png?raw=1" alt="Drawing" width="300"/>


$\langle X,S \rangle$  : The $X \to R \to S$ chain is the only unidirectional path between  $X$ and  $S$, and  $R \in Z$ (Rule 1)

$\langle X,T \rangle$: same case as before

$\langle U,Y \rangle$: there is only one path between U and Y,  V is a common cause of U and Y, and $V \in Z$ (Rule 2)

$\langle T,Y \rangle$: same case as before

$\langle S,U \rangle$: there is only one path between $S$ and $U$, $Y$ is a collider, and $T \notin Z$ (Rule 3)

All the cases where there are  no chains, no common causes, and they are not conditioned on colliders or descendants of colliders, are independent. For example: $\langle X,Y \rangle$
___

_(b) For each pair of nonadjacent variables in Figure 2.5, give a set of variables that, when conditioned on, renders that pair independent._

| Pair of Variables | Conditioned on      | Cause |
| :---- | :-------: | :------|
| $\langle X,S \rangle$ | $\{R\}$ | Chain Rule |    
| $\langle X,T \rangle$ | $\{R\}$ | Chain Rule |    
| $\langle X,U \rangle$ | $∅$ |  |    
| $\langle X,V \rangle$ | $∅$ |  |    
| $\langle X,Y \rangle$ | $∅$ |  |    
| $\langle R,T \rangle$ | $\{S\}$ | Chain Rule |    
| $\langle R,U \rangle$ | $∅$ |  |    
| $\langle R,V \rangle$ | $∅$ |  |    
| $\langle R,Y \rangle$ | $∅$ |  |    
| $\langle S,U \rangle$ | $∅$ | Collider Rule |    
| $\langle S,V \rangle$ | $∅$ |  |    
| $\langle S,Y \rangle$ | $∅$ |  |    
| $\langle T,V \rangle$ | $\{U\}$ | Chain Rule |    
| $\langle T,Y \rangle$ | $\{V\}$ | Fork Rule |    
| $\langle U,Y \rangle$ | $\{V\}$ | Fork Rule |    

___

_(c) List all pairs of variables in Figure 2.6 that are independent conditional on the set $Z= \{R,P\}$._

<img src="https://github.com/gmonce/datascience/blob/master/causalidad/img/figure_2_6.png?raw=1" alt="Drawing" width="300"/>


$\langle X,S \rangle$  : The $X \to R \to S$ chain is the only unidirectional path between  $X$ and  $S$, and  $R \in Z$ (Rule 1)

$\langle X,T \rangle$: same case as before

$\langle X,P \rangle$: same case as before

$\langle U,Y \rangle$: there is only one path between U and Y,  V is a common cause of U and Y, and $V \in Z$ (Rule 2)

$\langle T,Y \rangle$: same case as before

All the cases where there are  no chains, no common causes, and they are not conditioned on colliders or descendants of colliders, are independent. For example: $\langle X,Y \rangle$
___

_(d) For each pair of nonadjacent variables in Figure 2.6, give a set of variables taht, when conditioned on, renders that pair independent_ 

| Pair of Variables | Conditioned on      | Cause |
| :---- | :-------: | :------|
| $\langle X,S \rangle$ | $\{R\}$ | Chain Rule |    
| $\langle X,T \rangle$ | $\{R\}$ | Chain Rule |    
| $\langle X,U \rangle$ | $∅$ |  |    
| $\langle X,V \rangle$ | $∅$ |  |    
| $\langle X,Y \rangle$ | $∅$ |  |    
| $\langle X,P \rangle$ | $\{R\}$ | Chain Rule |    
| $\langle R,T \rangle$ | $\{S\}$ | Chain Rule |    
| $\langle R,P \rangle$ | $\{S\}$ | Chain Rule |    
| $\langle R,U \rangle$ | $∅$ |  |    
| $\langle R,V \rangle$ | $∅$ |  |    
| $\langle R,Y \rangle$ | $∅$ |  |    
| $\langle S,U \rangle$ | $∅$ | Collider Rule |    
| $\langle S,P \rangle$ | $\{T\}$ | Chain Rule |    
| $\langle S,V \rangle$ | $∅$ |  |    
| $\langle S,Y \rangle$ | $∅$ |  |    
| $\langle P,U \rangle$ | $\{T\}$ | Chain Rule |    
| $\langle P,V \rangle$ | $\{T\}$ | Chain Rule |    
| $\langle P,Y \rangle$ | $∅$ |  |    
| $\langle T,V \rangle$ | $\{U\}$ | Chain Rule |    
| $\langle T,Y \rangle$ | $\{V\}$ | Fork Rule |    
| $\langle U,Y \rangle$ | $\{V\}$ | Fork Rule |    

___
**(TODO)** _(e) Suppose we generate data by the model described in Figure 2.6 and we fit them with the linear equation $Y = a + bX +cZ$. Which of the variables in the model may be chosen for $Z$ so as to guarantee that the slope of $b$ would be equal to zero? [Hint: Recall, a non zero slope implies that $Y$ and $X$ are dependent given $Z$.]_ 

___
**(TODO)**  _(f) Continuing question (e), suppose we fit the data with the equation:_ 

$$ Y = a + bX + cR + dS + eT + fP $$

_which of the coefficients would be zero?_





### **2.4 d-Separation**



### **2.5 Model Testing and Causal Search**
