## Week 5: Instrumental Variables:

Instrumental Variables are used to deal with confounding situations.

Notation:
- A is Treatment
- Y is outcome
- X is measured confounder(s)
- U is unmeasured confounder(s)


### Confounding diagram

<img src="./img/IV_confounding.png" >

- If X is observed, we can analyze data using:
 - Matching
 - Propensity Scoring Matching
 - Inverse Propensity Treatment Weighting
 
However, if there are unmeasured confounding variables that we cannot control for, this poses a problem.

### Unmeasured Confounding Diagram:

<img src="./img/IV_unmeasured_confounding.png" >

- Implications: Ignorability assumption is violated. Even if we condition on X, treatment would still not be randomised. 
 - Potential outcomes $Y^0, Y^1$ not $\perp A | X$


### Instrumental Variables Diagram:

<img src="./img/IV_instrumental_variables.png" >

- Z is an IV: It affects treatment but does not directly affect the outcome
- It can be thought of as __an encouragement which is randomized__. 
 - For higher values of Z, subjects are more likely to get the treatment.
- Implications: some part of treatment is being explained by something that is random


### Terminology:

Intention-to-treat (ITT) analysis focuses on the __causal effect of encouragement Z on outcome Y__:
- $E(Y^{Z=1}) - E(Y^{Z=0})$   

Note that this is not the same as the __causal effect of treatment A on outcome Y__: 
- $E(Y^{A=1}) - E(Y^{A=0})$

Setup (Randomized trial)
- Z: randomization to treatment (1 if randomized to treatment, 0 otherwise)
- A: treatment received (1 if received treatment, 0 otherwise)
- Y: outcome

Potential Treatment:
- Observed data: (Z, A, Y)
- Each subject has two potential values of treatment:
 - Value of treatment if randomized to Z = 1: $A^{Z=1} = A^1$
 - Value of treatment if randomized to Z = 0: $A^{Z=0} = A^0$


### Compliance & Non Compliance:
Non-compliance can come from confounding variables, and thus it is essential to keep track of these variables (as typically represented by X).

### Causal Effect of Assignment on Receipt
Average causal effect of treatment assignment on treatment received is $E(A^1 - A^0)$. 

This represents the proportion treated if everyone had been assigned to receive treatment minus the proportion treated if no one had been assigned to receive the treatment. 
- If there is perfect compliance (everybody did what they were told), this would be equal to 1.

To calculate it, we can estimate it from observed data (by __randomization__ and __consistency__):
- $E(A^1) = E(A|Z=1)$
- $E(A^0) = E(A|Z=0)$ 

We can reduce the causal association (involving potential outcomes) to a statistical association (involving conditioning on observed data) because of randomization, since the subpopulation upon conditioning of Z is representative of the general population.

### Causal Effect of Assignment on Outcome
Average causal effect of treatment assignment on the outcome is $E(Y^{Z=1} - Y^{Z=0})$, and also known as ITT.

This represents the average values of the outcomes if everyone had been assigned to receive treatment minus the average outcome if no one had been assigned to receive treatment.
- If there is perfect compliance, this would be equal to the causal effect of treatment E(YA=1 - YA=0)

To calculate it, we can estimate it from observed data (by __randomization__ and __consistency__):
- $E(Y^{Z=1}) = E(Y|Z=1)$
- $E(Y^{Z=0}) = E(Y|Z=0)$


### Compliance classes
We can think of the compliance classes as subpopulations of people, as illustrated by the following table:


|$A^0$|$A^1$|Label|Treatment Assignment Implications|
|-|-|-|-|
|0|0|Never-takers|No variation|
|0|1|Compliers|Randomised assignment|
|1|0|Defiers|Randomised assignment but opposite|
|1|1|Always-takers|No variation|

Key motivation for using IV methods is to tackle possible unmeasured confounding. If there is unmeasured confounding, we cannot marginalize/condition over all confounders (via matching, IPTW, etc) since they are unmeasurable. 

However, IV methods do not focus on the average causal effect on the population, but on a __local average treatment effect__ (namely on the compliers subpopulation).


### Local Average Treatment Effect (LATE) and its derivation
Target of inference: 

$E(Y^{Z=1}|A^0 = 0, A^1 = 1) - E(Y^{Z=0}|A^0 = 0, A^1 = 1)$

This is a valid causal effect since it is __a contrast of potential outcomes__ (represented by $Y^{Z=1}, Y^{Z=0}$) __on the same subpopulation__ (represented by “$A^0 = 0, A^1 = 1$”).

$E(Y^{Z=1}|A^0 = 0, A^1 = 1) - E(Y^{Z=0}|A^0 = 0, A^1 = 1) = E(Y^{Z=1} - Y^{Z=0}| compliers)$

“Local” implies it is restricted to a subpopulation of the general population, and in this case, we are concerned with the “compliers” subpopulation. By restricting it to compilers, we can rewrite the potential outcomes notation to:

$E(Y^{Z=1} - Y^{Z=0}| compliers) = E(Y^{A=1} - Y^{A=0}| compliers)$

This implies that the LATE can be reduced from the causal effect of treatment assignment on outcomes to the causal effect of treatment received on outcomes.

This is also known as the __Complier Average Causal Effect (CACE)__. 

In summary, 
- this is a causal effect in the "compliers" subpopulation (which gives the term “local”), and 
- there is no inference about the other compliance classes (ie defiers, always-takers, or never-takers).




### Compliance Classes with Observed Data

For each person/treatment unit, we only have one observed outcome based on treatment received. We only observe an A and a Z, but we never observe ($A^0, A^1$). Thus, we will never know which compliance class they belong to.



|Z|A|$A^0$|$A^1$|Class|
|-|-|-|-|-|
|0|0|0|<span style="color:blue">?</span>|Never-takers or Compliers|
|0|1|1|<span style="color:blue">?</span>|Always-takers or Defiers|
|1|0|<span style="color:blue">?</span>|0|Never-takers or Defiers|
|1|1|<span style="color:blue">?</span>|1|Always-takers or Compliers|

### Identifiability
Compliance classes are also known as principal strata that we can stratify the population on. However, these classes are latent (not directly observed).


### Assumptions of IVs
A variable is an instrumental variable (IV) if:
- It is associated with the treatment  
<img src="./img/IV_assumptions_1.png" >  
- It affects the outcome only through its effect on treatment. Also known as the exclusion restriction. It cannot affect outcome directly, or indirectly through unmeasured confounders.  
<img src="./img/IV_assumptions_2.png" >  

### Identification Challenge
Based on the observed data (and only one observed outcome for each treatment unit), we cannot identify exactly who the compilers are. 

|Z|A|$A^0$|$A^1$|Class|
|-|-|-|-|-|
|0|0|0|<span style="color:blue">?</span>|Never-takers or Compliers|
|0|1|1|<span style="color:red">1</span>|Always-takers <s style="color:red"> or Defiers  </s>|
|1|0|<span style="color:red">0</span>|0|Never-takers <s style="color:red"> or Defiers  </s>|
|1|1|<span style="color:blue">?</span>|1|Always-takers or Compliers|

To tackle that, we must assume __monotonicity__ (which means that there are no defiers). 
- Key essence is that no one consistently does the opposite of what they are told. 
- The term 'monotonicity' is given because of the assumption that the probability of treatment received should increase with more encouragement. 


### Identification of Causal Effect:

Our goal is to estimate Compliers Average Causal Effect: 

$E(Y^{A=1} - Y^{a=1} | compilers)$

To do that, we have to identify the ITT effect:

$E(Y^{Z=1} - Y^{Z=0}) = E(Y | Z = 1) - E(Y | Z = 0)$

Breaking it down into the 3 subpopulations,  
$ E(Y|Z = 1) =$ 

$E(Y|Z=1, always\ takers) P(always\ takers) +$  
$E(Y|Z=1, never\ takers) P(never\ takers) + $  
$E(Y|Z=1, compliers) P(compliers)$  

__Key assumption:__ Among always-takers and never-takers, treatment assignment Z is redundant and does not provide further information about their treatment received. 

In other words, treatment received A is independent of treatment assignment Z. Thus:
- <span style="color:blue">$E(Y|Z=1, always\ takers) = E(Y|always\ takers)$ </span>
- <span style="color:orange">$E(Y|Z=1, never\ takers) = E(Y|never\ takers)$ </span>

Therefore,

$E(Y|Z = 1) =$  
<span style="color:blue">$E(Y|always\ takers) P(always\ takers) + $  </span>  
<span style="color:orange">$E(Y|never\ takers) P(never\ takers) +$  </span>  
<span style="color:green">$E(Y|Z=1, compliers) P(compliers)$  </span>

$E(Y|Z = 0) = $  
<span style="color:blue">$E(Y|always\ takers) P(always\ takers) + $  </span>  
<span style="color:orange">$E(Y|never\ takers) P(never\ takers) +$  </span>  
<span style="color:green">$E(Y|Z=0, compliers) P(compliers)$  </span>



Therefore:

$E(Y|Z = 1) - E(Y|Z = 0) = $  
<span style="color:green">$E(Y|Z=1, compliers) P(compliers)$</span> - <span style="color:green">$E(Y|Z=0, compliers) P(compliers)$</span>

Rewriting it the above equation with algebraic manipulation,

$\frac{E(Y|Z = 1) - E(Y|Z = 0)}{P(compliers)} = E(Y|Z=1, compliers) - E(Y|Z=0, compliers)$


Looking at the RHS, we see that we can rewrite it in terms of potential outcomes Y based on treatment received A (since we are dealing with the compliers subpopulation). 
- In this context, we are directly randomizing treatment received A via randomizing treatment assignment Z due to the “compliers” behaviour.
- For compliers, if encouragement Z = 1, treatment A = 1, which can result in potential outcome $Y^1$, vice versa for Z = 0.

$E(Y|Z=1, compliers) - E(Y|Z=0, compliers) =  E(Y^{A=1} - Y^{A=0}| compliers)$

Thus, the RHS expression is equivalent to the CACE.

CACE = $\frac{E(Y|Z = 1) - E(Y|Z = 0)}{P(compliers)}$

Note that the denominator $P(compliers)$ = $E(A=1|Z=1) - E(A=1|Z=0)$
- $E(A=1|Z=1)$: proportion of people who are always takers or compilers
- $E(A=1|Z=0)$: proportion of people who are always takers (since there are no defiers).

CACE = $\frac{E(Y|Z = 1) - E(Y|Z = 0)}{E(A=1|Z=1) - E(A=1|Z=0)}$

Essentially, the CACE is the ratio of __"causal effect of treatment assignment on the outcome (or ITT)"__ over the __"causal effect of treatment assignment encouragement Z on the treatment received A"__.
- Denominator is always between 0 and 1. Thus, CACE is at least as large as ITT if not more.
- ITT is an underestimate of CACE since some people assigned to treatment will not receive treatment.
- If there is perfect compliance (denominator = 1), CACE = ITT


### Two Staged Least Squares (2SLS)
2SLS is typically adopted for IV analysis.
- Regress treatment received A on the treatment assignment Z ($A \sim \ Z$)
- Regress outcome Y on predicted treatment received A_hat ($Y \sim \hat{A}$)


<u>Single stage least squares fails:</u>

In the context of OLS with a simple model (Y is outcome, A is treatment):  

$Y_i = β_0 + A_iβ_1 + ε_i$

The assumptions in the model are that the error term ε and the covariate ($A$) are independent. However, if there is confounding involved, $A_i$ and $ε_i$ are correlated.

With 2SLS, we assume Z is a valid IV based on the assumptions of exclusion restriction. 

<u>Stage 1:</u>  
Regress treatment received A on the instrumental variable Z, where the error term is mean 0 and constant variance. Also, by randomization, Zi and εi are independent.  

$A_i = α_0 + Z_iα_1 + ε_i$

Obtain predicted value of A ($\hat{A}$) given Z for each subject _i_.

<u>Stage 2:</u>  
Regress the outcome Y on the fitted values from Stage 1 $\hat{A}$. 

$Y_i = β_0 +  \hat{A}_iβ_1 + ε_i$

By exclusion restriction, Z is independent of Y given $\hat{A}$. Thus, $ε_i$ has mean 0 and constant variance.

Estimate of $β_1$ is the estimate of the local average treatment effect.

$β_1$  = CACE =  $\frac{E(Y|Z = 1) - E(Y|Z = 0)}{ E(A=1|Z=1) - E(A=1|Z=0)} = \frac{E(Y|Z = 1) - E(Y|Z = 0)}{α_1}$,

Where $E(A=1|Z=1) - E(A=1|Z=0)$ is the slope of stage 1 model ($α_1$).


Therefore, ITT = $ E(Y|Z = 1) - E(Y|Z = 0)$ = $β_1 * α_1$

### Strength of IVs
A strong instrument is highly predictive of treatment received, and vice versa.

We can estimate the strength by estimating the proportion of compilers $E(A=1|Z=1) - E(A=1|Z=0)$.
- If the complier proportion is close to 1, it is a strong instrument
- If the compiler proportion is close to 0, it is a weak instrument.

Weak instruments (when denominator is small) lead to very large variance estimates of the ITT, and the estimate of causal effect can be unstable. Mathematically, the proportion of compilers is a denominator in the CACE, where the numerator is the ITT. 
