In [1]:
## Set seed and parameters
library(lfe)
set.seed(5)
N <- 3000

## Create empty dataframe
df <- data.frame("ID" = 1:N)

## Simulate household wealth
### Let's assume that in this hypothetical survey, we only observed whether HH wealth is > 1M or not.
df$fgs <- floor(runif(N, min=0, max=2))

## Simulate decision to attend private college 
### Instead of completely random, less likely to attend private college if first gen student)
df$private <- 1* (runif(N, min=0, max=2) - df$fgs  > 0.5)

## Simulate earning
treatment.effect <- 10000
df$salary <- 50000 + df$private*treatment.effect + df$fgs*-10000 + rnorm(N, mean=0, sd=40000)

## Some sample data
head(df)


## Run regression without/with HHW
short <- felm(salary ~ private, data=df)
long <- felm(salary ~ private + fgs, data=df)

print(summary(short, robust=TRUE))
print(summary(long, robust=TRUE))
#short is overestimating effect of private, true value not in ci

### The OVB formula
fs <- felm(fgs ~ private, data=df) ### fs = "first stage"
print(summary(fs, robust=TRUE))
#this shows going to private negatively correlated with fgs


bias <- coef(summary(fs, robust=TRUE))["private",1] * coef(summary(long, robust=TRUE))["fgs",1]
long.estimate <- coef(summary(long, robust=TRUE))["private",1]
short.estimate <- coef(summary(short, robust=TRUE))["private",1]
print(bias)
print(bias + long.estimate)
print(short.estimate)

"package 'lfe' was built under R version 3.6.2"Loading required package: Matrix


ID,fgs,private,salary
1,0,1,78973.884
2,1,1,66067.184
3,1,0,7405.971
4,0,1,39449.905
5,0,1,54470.215
6,1,0,86778.779



Call:
   felm(formula = salary ~ private, data = df) 

Residuals:
    Min      1Q  Median      3Q     Max 
-129119  -27226     -17   27474  137502 

Coefficients:
            Estimate Robust s.e t value Pr(>|t|)    
(Intercept)    42991       1061  40.536   <2e-16 ***
private        14030       1482   9.467   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 40570 on 2998 degrees of freedom
Multiple R-squared(full model): 0.02902   Adjusted R-squared: 0.0287 
Multiple R-squared(proj model): 0.02902   Adjusted R-squared: 0.0287 
F-statistic(full model, *iid*): 89.6 on 1 and 2998 DF, p-value: < 2.2e-16 
F-statistic(proj model): 89.63 on 1 and 2998 DF, p-value: < 2.2e-16 



Call:
   felm(formula = salary ~ private + fgs, data = df) 

Residuals:
    Min      1Q  Median      3Q     Max 
-125588  -27240    -496   26943  140089 

Coefficients:
            Estimate Robust s.e t value Pr(>|t|)    
(Intercept)    50577       1626  31.101  <

**Donghee's comment: a great example of OVB!**