## Notebook for following along "Propensity Score-Matching For NonExperimental Causal Studies"

paper: https://users.nber.org/~rdehejia/papers/matching.pdf

Dehjia and Wahba considers methodologies of causal inference and sample selection bias in non experimental setting where:
<li> few units in the comparison groups are comparable to the treatment unit.</li>
<li> selecting a subset of comparison units similar to the treattment units is difficult because there are many pre-treatment covariates to match on</li>

<p>The paper estimates treatment effects by pairing experimental treated units with nonexperimental comparison units from CPS and PSID datasets. The authors then compare the estimated treatment effect obtained from these methods and demonstrate that these methods succeed in alleviated bias due to systematic differences between treated and original comparison units.

Some notes:
- PSM methods are especially useful in cases of high dimensionality
- Dehjia and Wahba methods discard unmatched comparison units and aren't used in estimating treatment impact (ATE in the original LaLonde paper)
- ATE calculates the effect over all comparison units and extrapolates/smooths over all comparison units, but it is useful to know how many of the comparison units are actually similar/or comparable and how much smoothing one's estimator is performing
- Paper talks about stratification and using binning as covariates to condition on, but that requires each bin to contain a comparison unit, which may not always be the case.
- The more covariates you have, the more difficult it is to find an exact match for each of the treated units

When matching, the following issues arise:
- whether or not to match w/ replacement
    - matching with replacement minimizes the propensity-score distance betweent he matched comparison/treatment unit
    - each treatment unit can be matched to the nearest comparison unit, even if the comparison unit is matched more than once.
    - beneficial in terms of bias reduction
    - when matching w/out replacement, when there are few comparison units, might match treatment units to comparison units that are not actually close in terms of PS.(increases bias)
    - matching w/out replacement also means the results are potentially sensitive to order in which the matching occurs.
- how many comparison units to match to eac treated unit
- which matching method to choose
    - NN-method - select $m$ nearest comparison units whose PS are closest to treatment unit in question
    - caliper matching - uses all comparison units w/ing a pre-defined PS radisu ("caliper") --> only uses

Goal of paper is to demonstrate that PSM can identify like cohorts from non-experiment units and replicate the estimates from the original experiment results

## Questions/Observations
- why are the regression calculated treatment effects so different from the original NSW treatment effect?
- The larger the $\delta$ value for the caliper, the better the approximation (closer the number) of the treatment effect is?
- Even with the use of calipers/NN matching w/ replacement, the comparison dataset means are still quite different than the treatment means in the original NSW data
- PSM highlights the weakness of these "synthetic" control groups by showcasing how few comparable control units there are in each bin
- Are they estimating ATE in the original NSW experiment, but ATT when looking for comparison groups in the CPS and PSID datasets?
- The SEs in the estimated effects from using PS methods are quite large compared to original experiment.

## Data
Data set used is from the National Supported Work experiment. The National Supported Work (NSW) Demonstration, a
labor market experiment in which participants were randomized between treatment (on-the-job training lasting between nine months and a year) and control groups

data link: https://users.nber.org/~rdehejia/data/.nswdata2.html

## Paper methodology discussion

In [19]:
import requests
import pandas as pd
import numpy as np
import statsmodels.api as sm
import csv

In [3]:
# nsw_treated_url_ll = 'http://www.nber.org/~rdehejia/data/nsw_treated.txt'
# nsw_control_url_ll = 'http://www.nber.org/~rdehejia/data/nsw_control.txt'
# nsw_control_url_dehjia = 'http://www.nber.org/~rdehejia/data/nswre74_control.txt'
# nsw_treated_url_dehjia = 'http://www.nber.org/~rdehejia/data/nswre74_treated.txt'

# PSID_control_url = 'http://www.nber.org/~rdehejia/data/psid_controls.txt'
# PSID2_control_url = 'http://www.nber.org/~rdehejia/data/psid2_controls.txt'
# PSID3_control_url = 'http://www.nber.org/~rdehejia/data/psid3_controls.txt'

# CPS_control_url = 'http://www.nber.org/~rdehejia/data/cps_controls.txt'
# CPS2_control_url = 'http://www.nber.org/~rdehejia/data/cps2_controls.txt'
# CPS3_control_url = 'http://www.nber.org/~rdehejia/data/cps3_controls.txt'

In [104]:
nsw_treat_headers = ['treatment_indicator', 'age', 'education', 'Black', 'Hispanic', 'married', 'nodegree', 'RE75', 'RE78']

In [115]:
nsw_treated = pd.read_csv('nsw_treated.txt', header= None)
nsw_control = pd.read_csv('nsw_control.txt', header= None)

In [106]:
def process_datafile(df_txt):
    datalist = []

    data = df_txt.values
    for row in data:
        datalist.append(row[0].strip(' ').split('  '))
    return datalist

In [116]:
nsw_treated_df = process_datafile(nsw_treated)
nsw_control_df = process_datafile(nsw_control)


nsw_treated_df.extend(nsw_control_df)

In [117]:
nsw_t_df = pd.DataFrame(nsw_treated_df, columns =nsw_treat_headers)
for x in nsw_treat_headers:
    nsw_t_df[x] = nsw_t_df[x].astype(float)

In [118]:
nsw_t_df.describe()

Unnamed: 0,treatment_indicator,age,education,Black,Hispanic,married,nodegree,RE75,RE78
count,722.0,722.0,722.0,722.0,722.0,722.0,722.0,722.0,722.0
mean,0.411357,24.520776,10.267313,0.800554,0.105263,0.16205,0.779778,3042.896575,5454.635848
std,0.492421,6.625947,1.704774,0.399861,0.307105,0.368752,0.414683,5066.143366,6252.943422
min,0.0,17.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,19.0,9.0,1.0,0.0,0.0,1.0,0.0,0.0
50%,0.0,23.0,10.0,1.0,0.0,0.0,1.0,936.30795,3951.889
75%,1.0,27.0,11.0,1.0,0.0,0.0,1.0,3993.207,8772.00425
max,1.0,55.0,16.0,1.0,1.0,1.0,1.0,37431.66,60307.93


In [120]:
nsw_t_df[nsw_t_df['treatment_indicator'] == 1].describe()

Unnamed: 0,treatment_indicator,age,education,Black,Hispanic,married,nodegree,RE75,RE78
count,297.0,297.0,297.0,297.0,297.0,297.0,297.0,297.0,297.0
mean,1.0,24.626263,10.380471,0.801347,0.094276,0.16835,0.73064,3066.098187,5976.352033
std,0.0,6.686391,1.817712,0.39966,0.292706,0.374808,0.444376,4874.888973,6923.796427
min,1.0,17.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,1.0,20.0,9.0,1.0,0.0,0.0,0.0,0.0,549.2984
50%,1.0,23.0,11.0,1.0,0.0,0.0,1.0,1117.439,4232.309
75%,1.0,27.0,12.0,1.0,0.0,0.0,1.0,4310.455,9381.295
max,1.0,49.0,16.0,1.0,1.0,1.0,1.0,37431.66,60307.93


In [121]:
nsw_t_df[nsw_t_df['treatment_indicator'] == 0].describe()

Unnamed: 0,treatment_indicator,age,education,Black,Hispanic,married,nodegree,RE75,RE78
count,425.0,425.0,425.0,425.0,425.0,425.0,425.0,425.0,425.0
mean,0.0,24.447059,10.188235,0.8,0.112941,0.157647,0.814118,3026.682743,5090.048302
std,0.0,6.590276,1.618686,0.400471,0.316894,0.364839,0.38947,5201.249807,5718.088763
min,0.0,17.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,19.0,9.0,1.0,0.0,0.0,1.0,0.0,0.0
50%,0.0,23.0,10.0,1.0,0.0,0.0,1.0,823.2544,3746.701
75%,0.0,28.0,11.0,1.0,0.0,0.0,1.0,3649.769,8329.823
max,0.0,55.0,14.0,1.0,1.0,1.0,1.0,36941.27,39483.53


In [None]:
nsw_t_df