# Instrumental variables

### What? 

- A method to identify the causal effect of one variable (the treatment, x) on another (the outcome, y) 
- Example: Effect of education (x) on salary (y)
$$y= a+b x$$

### Why?

- Why can't we just run a simple regression? 
    - A simple regression may give biased answers because of unobserved variables that are not included in the regression
    - In other words: This metod is most commonly used when you strongly suspect one or more missing varibles
- Example: Do not have information about IQ, and IQ is likely to affect both education and salary
    - Would like to run $$salary = a + b_1 iq + b_2 education$$
    - But do not have information about iq, in which case the coefficient $b_2$ will be biased i.e. capture not only the effect of education but also iq

### Solution

- Find a variable that is correlated with the treatment, but not directly correlated with the outcome
- This variable is called 'the instrument' and often labelled z.
- First step: Run a regression with the instrument and the treatment variable
$$x = a + b z$$
- Second step: Run a regression with the outcome and the predicted value of the treatment variable from step one
$$y = a + b\hat{x}$$
- The coefficient from this second step is the (unbiased) estimate of the effect of education on salary 
    

 ### Example

- The month you are born may affect number of months of education you get (because you cannot quit before you turn 16, and people turn 16 in different months, but they all start school in the same month), but the month of birth itself may not affect the salary you get.
- Step one: Run a regression between education and birth month
- Step two: Run a regression between salary and the predicted education based on month of birth

### Assumes

- The treatment variable and the instrument has to be correlated (correlation between x and z)
- No (direct) correlation between the instrument and the outcome (no correlation between z and y)
   

### Advantage

- Requires very little information to identify a causal effect. 
- We can ignore all potential confounding variables (like IQ) as long as we have an instrument. 
- Almost magic!
    

### Extensions

- **Case 1: Have information about some confounders, but not all. Solution: Use what you have!**
    - First step: Run a regression with the instrument and the treatment variable and the conofounders you have information about
    $$x = a + b_0 z + b_1 x_1 + b_2 x_2 $$
    
    - Second step: Run a regression with the outcome, the confounders and the predicted value of the treatment variable from step one
    
    $$y = a + b\hat{x} + + b_1 x_1 + b_2 x_2$$
 
 
- **Case 2: Have more than one instrument ($z_1$ and $z_2$). Solution: Use all!**
     - First step: Run a regression with the instrument and the treatment variable and the conofounders you have information about: 
     $$x = a + b_0 z_1 + b_1 z_2 $$
     - Second step: Run a regression with the outcome, the confounders and the predicted value of the treatment variable from step one: 
     $$y = a + b\hat{x} $$

### Problems

**1. Often difficult to find a good instrument**
   - But some common instruments are:
        - Situations where a lottery is used (to select who gets health insurance, who has to serve in the army and so on)
        - More and more commmon: use genetic information as an instrument
            - Example: Effect of obesity on some outcome
            - Assume: Have information about genes that are likely to affect obesity, but not the outcome
        - Institutional rules and structures 
            - Example: The rule that you can quit school when you turn 16
            - Interesting note: This means that knowledge of history, politics and institutions is an advantage when trying to find instruments! A creative process.

**2. Cannot test the second assumption that there is no direct relationship between the instrument and the outcome**
   - Maybe month of birth really affects salary?
   - Lesson: Instrument requires strong theoretical justification, knowledge of causal mechanisms
    
**3. Small errors can have large effects on the outcome**
   - Small departures from each assumption can have a large effect when combined
        - Weak instrument (small correlation between the instrument and the treatment)
        - Some correlation between instrument and outcome
        --> Potential for large bias

**4. Outcome is not really an average effect of treatment, but a local average effect of treatment (LATE)**
   - The estimated effect is for those who are affected by the instrument
   - Effect of education on salary for those who are likely to quit when they turn 16, not for all youths
   - Even more difficult to interpret the estimated effect when there are many instruments
       - Average effect of several local effects (those affected by the different instruments)? 

## More examples

- Effect of serving in the military on salary?
    - Use lottery number as instrument 
    - The lottery was used to draft males to the military during the Vietnam war and the number shoult not be directly related to salary, but it would affect probability of being in the army
    
- Effect of having medical insurance on health?
    - Sometimes a lottery has been used to give some people health insurance (Oregon experiment)

- Effect of access to special educationl resources
    - Sometimes distributed (vouchers) using a lottery

- Effect of obesity (and other conditions) on different outcomes
    - Genetic information may serve as instruments since the genes may increase the probability of having the disease, without being directly related to the outcome we are interested in