## Week 2: Confounding and Directed Acyclic Graphs

Confounders are often defined as variables that affect treatment and affect the outcome
- If i assign treatment based on a coin flip, then that affects treatment but should not affect outcome (the coin flip is not a confounder)
- If people with a family history of cancer are more likely to develop cancer (the outcome), but family history was not a factor in the treatment decision, then family history is not a confounder
- If older people are at higher risk of cardiovascular disease (the outcome) and are more likely to receive statins (the treatment), then __age is a confounder__

Confounder Control involves:
- Identifying a set of variables X that will make the ignorability assumption hold (“sufficient to control for confounding”)
- Using statistical methods (covered later in the course) to control for these variables and estimate causal effects.

Causal Graphs
- Helpful for identifying which variables to control for
- Makes assumptions explicit

A directed acyclic graph (DAG) will tell us:
- Which variables are independent from each other
- Which variables are conditionally independent from each other
- I.e. ways that we can factor and simplify the joint distribution


Compatibility between DAGS and Distributions
- DAGs that are compatible with a particular probability function are not necessarily unique


For example: 
- DAG 1: A -> B
- DAG 2: A <- A

Both DAGS convey that A and B are dependent, i.e.
- $P(A,B) \ne P(A) P(B)$

### Types of Paths
Paths that induce association
- Forks (D <- E -> F)
- Chain (D -> E -> F)

Paths that do not induce association
- Inverted fork / Collider (D -> E <- F)


### Conditional Independence (d-separation)

Blocking: Paths can be blocked by conditioning on nodes in the path
- Forks (A <- G -> B)
- Chain (A -> G -> B)
 - A and B are associated (marginally)
 - By conditioning on G, both variables A and B are independent

Collider conditioning induces dependence between variables.
- Inverted fork / Collider (D -> E <- F) 

A path is d-separated by a set of nodes C if :
- It contains a chain (D -> E -> F) and the middle part is in C, OR
- It contains a fork (D <- E -> F) and the middle part is in C, OR
- It contains an inverted fork (D -> E <- F) and the middle part is not in C, nor are any descendants of it


### Confounders in DAGS

<img src="./img/dags_confounding.png" >

What matters is not identifying specific confounders, but identifying a set of variables that are sufficient to control for confounding. We need to block backdoor paths from treatment to outcome.



### Frontdoor paths

A front door path from A to Y is one that begins with an arrow emanating out of A. Frontdoor paths capture effects of treatment. 


<img src="./img/dags_frontdoor.png" >

### Backdoor paths

A backdoor path from treatment A to outcome Y are paths from A to Y that travel through arrows going into A:


<img src="./img/dags_backdoor.png" >

Where “A <- X -> Y” is a backdoor path from A to Y.

__Backdoor paths confound the relationship between treatment A and outcome Y.__

A set of variables X is sufficient to control for confounding if: 
- It blocks all backdoor paths from treatment to the outcome
- It does not include any descendants of treatment

This is the __backdoor path criterion__, and is not __necessarily unique__.


__Disjunctive cause criterion__: Control for all observed causes of the exposure, the outcome or both. 
- Does not always select the smallest set of variables to control for
- It is conceptually simpler
- Guaranteed to select a set of variables that are sufficient to control for confounding IF either:
 - Such a set exists
 - We correctly identify all of the observed causes of A and Y
