   
The below example is attempting to illustrate how regression can fail to recover
the ATE. The regression models we're using are designed to capture a constant
and additive treatment effect; if the treatment effect is multiplicative instead
there could be problems. In this example, a latent variable, _talent_ is the
product of household wealth and intelligence, and the treatment effect of
private education on the observed outcome of salary is multiplicative of talent.

In [1]:
library(lfe)
set.seed(1578347079)
N <- 10000
df <- data.frame("ID" = 1:N)
df$hhw <- floor(runif(N, min=0, max=2))
# endogenous selection into treatment
df$private <- 1* (df$hhw + runif(N, min=-0.8, max=0.8) > 0.5)

# additional, continuous, variable
df$intelligence <- rnorm(N, 100, 15)

# treatment effect is multiplicative rather than additive
treatment.multiplier <- 0.2
df$talent <- 50000 + df$hhw*5000 + df$intelligence*100 
df$salary <- df$talent + 
  (df$talent * df$private * treatment.multiplier) + 
  rnorm(N, mean=0, sd=15000)

head(df)

Loading required package: Matrix



Unnamed: 0_level_0,ID,hhw,private,intelligence,talent,salary
Unnamed: 0_level_1,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,1,0,0,90.23538,59023.54,42737.25
2,2,0,1,103.85424,60385.42,88172.67
3,3,1,1,88.99482,63899.48,93138.38
4,4,1,0,90.13135,64013.14,60110.22
5,5,0,0,122.80658,62280.66,70225.56
6,6,1,1,89.68078,63968.08,69614.36


We can see the relationship between talent and salary directly here. As in
class, we write out a short and a long regression model; in this case neither
recovers the treatment effect directly, because the model is misspecified. Not
all treatment effects are additive!

Note that the long regression recovers something like the true intercept and effect of household wealth. But the continuous variable, intelligence, and the treatment variable are off.

In [4]:
short <- felm(salary ~ private, data=df)
long <- felm(salary ~ private + hhw + intelligence, data=df)

print(summary(short, robust=TRUE))
print(summary(long, robust=TRUE))


Call:
   felm(formula = salary ~ private, data = df) 

Residuals:
   Min     1Q Median     3Q    Max 
-63540 -10442   -177  10418  56726 

Coefficients:
            Estimate Robust s.e t value Pr(>|t|)    
(Intercept)  60729.4      217.7  278.97   <2e-16 ***
private      16047.1      304.1   52.78   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 15200 on 9998 degrees of freedom
Multiple R-squared(full model): 0.2179   Adjusted R-squared: 0.2179 
Multiple R-squared(proj model): 0.2179   Adjusted R-squared: 0.2179 
F-statistic(full model, *iid*): 2786 on 1 and 9998 DF, p-value: < 2.2e-16 
F-statistic(proj model):  2785 on 1 and 9998 DF, p-value: < 2.2e-16 



Call:
   felm(formula = salary ~ private + hhw + intelligence, data = df) 

Residuals:
   Min     1Q Median     3Q    Max 
-58833 -10143   -101  10104  57837 

Coefficients:
             Estimate Robust s.e t value Pr(>|t|)    
(Intercept)  49157.69    1025.34   47.94   <2e-1

**Donghee's comment: This is a really interesting case. I hope I have time to delve into this more when we discuss a bit more about heterogeneous treatment effect!**