# Propensity Score Analysis

### What? 

- A method to identify the causal effect of one variable (the treatment, d) on another (the outcome, y) 
- In one sentence: Compare the outcome for individuals (or units) who are as similar as possible.
- In three sentences: Use all the confounding variables in a logistic regression to estimate the probability of receiving the treatment (the propensity score) for each individual. Next, compare the outcome for each individual who receive treatment with one or more who have not received treatment who have the same propensity score. The treatment effect is the average for all those comparisons.
- Example: Effect of moderate consumption of alcohol (d) on mortality (y)

### Why can't we just run a simple regression?

- We could, and you would get similar result given certain assumptions (causal effect is constant, all confounding variables are included, all standard regression assumptions are OK).
- In practice there are some difference because in many cases all assumptions are not fulfilled
- These differences do not imply that one is always better, but that OLS and PSA have different strengths and weaknesses 
- The choice wil then depend on the data you have data and the assumptions you are wiling to make 
- In general:
    - Standard applied OLS is usually parameteric i.e. you have to make assumptions about the **functional form** of the relationship between the cause and the effect (e.g. linearity). You do not need this in PSA. 
    - PSA often requires **more observations** than OLS to work well (since it assumes a functional form is can extrapolate between points)
    - PSA **explicitly warns the user**, and drops observations, for which it cannot find a good control to compare against. 
    - OLS has well establoished formulas and algorithms for calculating **the standard error** of the estimates. Estimation of correct standard errors in PSA is more difficult.
    - PSA is most intuitive and easy to apply if the **treatment is dichotemous** (treated vs. untreated), while OLS also works fine for continous treatment variables.

    


### Why can't we just use matching?

- We could, but often matching requires many more observarions than you have available
- Matching: Compare each person who receives treatment with another person who has the same age, gender, severity of disease (and so on), but who have not received the treatment.
- **Problem of dimensionality**: Assume you have n variables (age, gender, etc) and you want the individuals you compare to have the same value for each variable: Same age, same gender, same severity etc. How many individuals do you need to make this method work?
    - Simplification: 
        - Each varible can only have two values (Gender: Male vs. Female, Age: Young vs. Old, Severity: Low vs. High).
        - Example, one possible subgroup is: old females with low severity
    - How many subgroups (s) are there?
        - Answer: $s = 2^n$
    - With 8 variables, we have 256 possible subgroups and we would like to have many people who are treated and not-treated in each sub-groups in order to do matching
    - In this case we would need at the very least more than 512 individuals, but probably and preferrably a lot more (comparing only two individuals in each sub-groups does not inspire confidence!).
- Solution: Instead of matching on every single variable, we use all the variables to create a single number (the probability of receiving treatment, also called the propensity score) and we compare individuals who have the same propensity score. This eliminates the problem of dimensionality since we now only need to match individuals on one variable (the propensity score) and not all the variables.
- Rubin and Rosenbaum (1983) proved that matching on the propensity score works and give the correct answer (same as matching on each variable).

    

 ### Example

- The month you are born may affect number of months of education you get (because you cannot quit before you turn 16, and people turn 16 in different months, but they all start school in the same month), but the month of birth itself may not affect the salary you get.
- Step one: Run a regression between education and birth month
- Step two: Run a regression between salary and the predicted education based on month of birth

### Assumes

- The treatment variable and the instrument has to be correlated (correlation between x and z)
- No (direct) correlation between the instrument and the outcome (no correlation between z and y)
   

### Advantage

- Requires very little information to identify a causal effect. 
- We can ignore all potential confounding variables (like IQ) as long as we have an instrument. 
- Almost magic!
    

### Extensions

- **Case 1: Have information about some confounders, but not all. Solution: Use what you have!**
    - First step: Run a regression with the instrument and the treatment variable and the conofounders you have information about
    $$x = a + b_0 z + b_1 x_1 + b_2 x_2 $$
    
    - Second step: Run a regression with the outcome, the confounders and the predicted value of the treatment variable from step one
    
    $$y = a + b\hat{x} + + b_1 x_1 + b_2 x_2$$
 
 
- **Case 2: Have more than one instrument ($z_1$ and $z_2$). Solution: Use all!**
     - First step: Run a regression with the instrument and the treatment variable and the conofounders you have information about: 
     $$x = a + b_0 z_1 + b_1 z_2 $$
     - Second step: Run a regression with the outcome, the confounders and the predicted value of the treatment variable from step one: 
     $$y = a + b\hat{x} $$

### Problems

**1. Often difficult to find a good instrument**
   - But some common instruments are:
        - Situations where a lottery is used (to select who gets health insurance, who has to serve in the army and so on)
        - More and more commmon: use genetic information as an instrument
            - Example: Effect of obesity on some outcome
            - Assume: Have information about genes that are likely to affect obesity, but not the outcome
        - Institutional rules and structures 
            - Example: The rule that you can quit school when you turn 16
            - Interesting note: This means that knowledge of history, politics and institutions is an advantage when trying to find instruments! A creative process.

**2. Cannot test the second assumption that there is no direct relationship between the instrument and the outcome**
   - Maybe month of birth really affects salary?
   - Lesson: Instrument requires strong theoretical justification, knowledge of causal mechanisms
    
**3. Small errors can have large effects on the outcome**
   - Small departures from each assumption can have a large effect when combined
        - Weak instrument (small correlation between the instrument and the treatment)
        - Some correlation between instrument and outcome
        --> Potential for large bias

**4. Outcome is not really an average effect of treatment, but a local average effect of treatment (LATE)**
   - The estimated effect is for those who are affected by the instrument
   - Effect of education on salary for those who are likely to quit when they turn 16, not for all youths
   - Even more difficult to interpret the estimated effect when there are many instruments
       - Average effect of several local effects (those affected by the different instruments)? 