# Exercise 1: Returns to College
## (a)

In [1]:
url = paste("https://raw.githubusercontent.com/jtorcasso/teaching/",
"master/econ210_fall2017/data/project/psid_1972.csv", sep="")
df_full = read.csv(url)

In [2]:
df_full$D = ifelse(is.na(df_full$edu), NaN, ifelse(df_full$edu >= 16, 1, 0))
cols = c("inc_labor", "D", "h_sentscore1972")
df = na.omit(df_full[,cols])
N = dim(df)[1]
summary(df)

   inc_labor            D         h_sentscore1972
 Min.   :     0   Min.   :0.000   Min.   : 0.0   
 1st Qu.:  1972   1st Qu.:0.000   1st Qu.: 9.0   
 Median : 29602   Median :0.000   Median :10.0   
 Mean   : 36310   Mean   :0.188   Mean   : 9.9   
 3rd Qu.: 56372   3rd Qu.:0.000   3rd Qu.:11.0   
 Max.   :557620   Max.   :1.000   Max.   :13.0   

### (i)
Since
$$
Y = \gamma_0 + \gamma_1D + \epsilon = \gamma_0 + \gamma_1D + E[u] + u - E[u]
$$
$$
= \gamma_0 + E[u] + \gamma_1D + u - E[u] = \gamma_0 + E[u] + \gamma_1D + \gamma_2(A - E[A]) + \epsilon
$$
$$
= \tilde{\gamma}_0 + \gamma_1D + \mu,
$$
where $\tilde{\gamma}_0 = \gamma_0 + E[u] = \gamma_0 + \gamma_2E[A]$ and $\mu=\gamma_2(A - E[A]) + \epsilon$ is such that $E[\mu] = 0$.

### (ii)
$\gamma_1$ is the causal effect of college on earnings. That is, the effect of going to college holding everything else constant. Some authors refer to this as the "rate of return to schooling." 

### (iii)
$\beta_1$ is the increase in earnings associated with going to college. It is not the causal effect, like $\gamma_1$. For instance, if attending college is more costly for low ability individuals and ability positively effects earnings, then $\beta_1$ will in part capture the effect of being high ability on earnings.

### (iv)
From class, we showed
$$
\beta_1 = \frac{Cov[Y,D]}{Var[D]} = \gamma_1 + \frac{Cov[D,\mu]}{Var[D]}
$$
where, after substituting in for $\mu$, we have
$$
= \gamma_1 + \frac{Cov[D,\gamma_2(A - E[A]) + \epsilon]}{Var[D]} = \gamma_1 + \gamma_2\frac{Cov[D,A]}{Var[D]}
$$
and using the fact that (proved in last homework) $\frac{Cov[D,A]}{Var[D]}=E[A|D=1]-E[A|D=0]$, we have
$$
= \gamma_1 + \gamma_2(E[A|D=1] - E[A|D=0]).
$$
Thus, if we expect $E[A|D=1] - E[A|D=0] > 0$, then $\beta_1 > \gamma_1$. That is, if the average ability of those that go to college is greater than the average ability of those that don't go to college, and if ability increases earnings ($\gamma_2>0$), then $\beta_1 > \gamma_1$ and the slope coefficient of the regression is an upwardly biased estimate of $\gamma_1$.

### (v)

In [3]:
reg = function(Y, X){solve(t(X)%*%X)%*%t(X)%*%Y}
Y = df$inc_labor
X = cbind(rep(1, N), df$D)
reg(Y, X)[2]

### (vi)
From class we proved the consistency of the OLS estimator for regression, that is $\hat{\beta}\overset{p}{\to}\beta$ and therefore $\hat{\beta}_1\overset{p}{\to}\beta_1$. But since we argued that $\beta_1\neq \gamma_1$, then $\hat{\beta}_1$ is not consistent for $\gamma_1$. 

## (b)
### (i)
If we define $X=(1,D,A)'$, then $Y=X'\gamma + \epsilon$. From class, we show that
$$
\beta = E[XX']^{-1}E[XY] = \gamma + E[XX']^{-1}E[X\epsilon]
$$
so that $\beta=\gamma$ if
$$
0=E[X\epsilon]=
\begin{pmatrix}
E[\epsilon] \\ E[D\epsilon] \\ E[A\epsilon].
\end{pmatrix}
$$
Since we assumed $E[\epsilon] = 0$, this amounts to the condition $E[D\epsilon]=E[A\epsilon]=0$ or no correlation between $D$ and $\epsilon$, and $A$ and $\epsilon$. In other words, that $D$ and $A$ are both exogenous.

### (ii)
Because we thought $\beta_1$ from a-v would be biased upward because of ability, after controlling for ability, I suspect the estimate would be smaller. 

### (iii)
I think that we have better reason to believe $\beta_1=\gamma_1$ after controlling for ability because we suspect the ability bias to be the biggest contributor to the bias term. However, if throught ability was multidimensional, for instance, including both a cognitive and noncognitive (or personality) component, then we might suspect that even after controlling for the cognitive component, we still have an upward bias in $\beta_1$, since $E[D\epsilon]$ may still be positive.

### (iv)

In [4]:
X = cbind(rep(1, N), df$D, df$h_sentscore1972)
reg(Y, X)[2]

And indeed, we confirm our prediction from b-ii. After controlling for ability, the slope parameter on $D$ decreases.

### (v)
We have that
$$
\beta_1 = \frac{Cov[Y,D]}{Var[D]} = \frac{Cov[\gamma_0 + \gamma_1D + \gamma_2A + \epsilon, D]}{Var[D]}
$$
$$
= \gamma_1 + \gamma_2\frac{Cov[A,D]}{Var[D]}
$$
$$
= \gamma_1 + \gamma_2(E[A|D=1] - E[A|D=0])
$$
so that $\beta_1=\gamma_1$ if $E[A|D=1] = E[A|D=0]$ or ability does not effect earnings ($\gamma_2=0$).

### (vi)
Now, defining $\tilde{Y} = Y - BLP(Y|A)$ and $\tilde{D}=D-BLP(D|A)$, by Frisch-Waugh,
$$
\beta_1 = \frac{Cov[\tilde{Y},\tilde{D}]}{Var[\tilde{D}]} = \frac{Cov[Y - BLP(D|A),\tilde{D}]}{Var[\tilde{D}]}
$$
$$
= \frac{Cov[Y,\tilde{D}]}{Var[\tilde{D}]} = \frac{Cov[\gamma_0 + \gamma_1D + \gamma_2A + \epsilon,\tilde{D}]}{Var[\tilde{D}]}
$$
$$
= \frac{\gamma_1Cov[D,\tilde{D}] + \gamma_2Cov[A,\tilde{D}] + Cov[\epsilon,\tilde{D}]}{Var[\tilde{D}]}
$$
where since $Cov[D,\tilde{D}] = Cov[\tilde{D} + BLP(D|A),D] = Var[\tilde{D}]$ and $Cov[A,\tilde{D}]=E[A\tilde{D}]=0$ by properties of BLP,
$$
= \gamma_1 + \frac{E[\epsilon\tilde{D}]}{Var[\tilde{D}]}
$$
$$
= \gamma_1 + \frac{E[\epsilon(D-BLP(D|A))]}{Var[\tilde{D}]} = \gamma_1 - \frac{E[\epsilon BLP(D|A)]}{Var[\tilde{D}]}
$$
$$
=\gamma_1 - \frac{\frac{Cov[D,A]}{Var[A]}E[A\epsilon]}{Var[\tilde{D}]}
$$
$$
=\gamma_1 - \frac{E[A\epsilon]}{Var[\tilde{D}]}(E[A|D=1] - E[A|D=0])
$$
so that $\beta_1=\gamma_1$ if $E[A|D=1] = E[A|D=0]$ or $E[A\epsilon]=0$.

### (vii)
We were given that $E[D\epsilon]=0$. Suppose further that $E[A|D=1]\neq E[A|D=0]$. You can test this in the data and it should hold. But we don't know $\gamma_2$ or $E[A\epsilon]$. 

However, if $\gamma_2=0$ and $E[A\epsilon]\neq 0$, then $\beta_1$ from a univariate regression of $Y$ on $D$ would identify $\gamma_1$ and $\beta_1$ from a bivariate regression of $Y$ on $D$ and $A$ would not. But if $\gamma_2\neq 0$ and $E[A\epsilon]=0$, then $\beta_1$ from a bivariate regression of $Y$ on $D$ and $A$ would identify $\gamma_1$ and $\beta_1$ from a univariate regression of $Y$ on $D$ would not.

Because we don't know $\gamma_2$ and $E[A\epsilon]$, we don't know whether it's a good idea to include $A$ in the regression or not. This is the depressing part.

However, it's likely that $\gamma_2>0$ and that $E[A\epsilon]$ is small so that the bivariate regression is better. But because we don't know, it is best to come up with an instrument to find exogenous variation in $D$.

# Exercise 2: Testing Heterogeneous Returns to Schooling

## (a)

In [5]:
df_full$wage =  ifelse(df_full$inc_labor==0 | df_full$hours==0, NaN, df_full$inc_labor/df_full$hours)
df.m = df_full[df_full$male==1,]
cols = c("wage", "D", "h_sentscore1972", "age", "black")
df.m = na.omit(df.m[,cols])
N = dim(df.m)[1]

In [6]:
fit = lm(log(wage) ~ black + D + D:black + h_sentscore1972 + age + I(age^2), data=df.m)
summary(fit)


Call:
lm(formula = log(wage) ~ black + D + D:black + h_sentscore1972 + 
    age + I(age^2), data = df.m)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.89643 -0.26264  0.05766  0.31886  2.50580 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)      0.6990147  0.2458950   2.843  0.00454 ** 
black           -0.1757128  0.0582436  -3.017  0.00260 ** 
D                0.3522977  0.0351464  10.024  < 2e-16 ***
h_sentscore1972  0.0546821  0.0076922   7.109 1.83e-12 ***
age              0.0862625  0.0119472   7.220 8.32e-13 ***
I(age^2)        -0.0009278  0.0001443  -6.430 1.72e-10 ***
black:D         -0.0994828  0.2276905  -0.437  0.66223    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.5342 on 1460 degrees of freedom
Multiple R-squared:  0.1734,	Adjusted R-squared:   0.17 
F-statistic: 51.03 on 6 and 1460 DF,  p-value: < 2.2e-16


## (b)
### (i)

In [7]:
names = c("(Intercept)", "black", "D", "black:D", "h_sentscore1972",
         "age", "I(age^2)")
Y=log(df.m$wage)
X=cbind(rep(1, N), df.m$black, df.m$D, df.m$D*df.m$black,
df.m$h_sentscore1972, df.m$age, df.m$age^2)

reg = function(Y, X){solve(t(X)%*%X)%*%t(X)%*%Y}
cbind(names, reg(Y, X))

names,Unnamed: 1
(Intercept),0.699014721288868
black,-0.17571282191806
D,0.352297724351845
black:D,-0.0994828489382204
h_sentscore1972,0.0546820984143524
age,0.0862624518137688
I(age^2),-0.0009278427850579


### (ii)

In [8]:
res = function(Y, X){Y - X%*%reg(Y, X)}
U.hat = res(Y, X)

# Multiplying row in X by corresponding element in U.hat
XU = sweep(X, MARGIN=1, U.hat, `*`)

# Calculating variance matrix
V = N*solve(t(X)%*%X)%*%t(XU)%*%XU%*%solve(t(X)%*%X)
se = sqrt(diag(V)/N)
cbind(names, se)

names,se
(Intercept),0.264028329149603
black,0.0576630511077317
D,0.0331171527407276
black:D,0.117551208916448
h_sentscore1972,0.0080739238208821
age,0.012963899819327
I(age^2),0.0001619926454327


In [9]:
K = dim(X)[2] - 1
V.hom = (N/(N-K-1))*solve(t(X)%*%X)*sum(U.hat^2)
se = sqrt(diag(V.hom)/N)
cbind(names, se)

names,se
(Intercept),0.245895004385014
black,0.0582436406826939
D,0.0351464472693296
black:D,0.22769046004374
h_sentscore1972,0.0076921944539625
age,0.0119472228031754
I(age^2),0.0001442940252217


### (iii)

In [10]:
b.hat = reg(Y, X)
t.stats = b.hat/se
cbind(names, t.stats)

names,Unnamed: 1
(Intercept),2.84273656976932
black,-3.0168584906175
D,10.0237079910855
black:D,-0.436921463108772
h_sentscore1972,7.10877744207095
age,7.22029322085141
I(age^2),-6.43022317543637


### (iv)

In [11]:
p.vals = 2*(1 - pt(abs(t.stats), df=N-K-1))
cbind(names, p.vals)

names,Unnamed: 1
(Intercept),0.0045350436546356
black,0.0025982586510666
D,0.0
black:D,0.662232917153247
h_sentscore1972,1.82565074169361e-12
age,8.31779090049167e-13
I(age^2),1.72193148628708e-10


In [12]:
cbind(names, 2*(1 - pnorm(abs(t.stats))))

names,Unnamed: 1
(Intercept),0.004472802288427
black,0.00255409013199
D,0.0
black:D,0.662168305789016
h_sentscore1972,1.1708412017696899e-12
age,5.18696197104873e-13
I(age^2),1.2741674382255e-10


### (v)

In [27]:
R = diag(K+1)[1:K+1,]
R

0,1,2,3,4,5,6
0,1,0,0,0,0,0
0,0,1,0,0,0,0
0,0,0,1,0,0,0
0,0,0,0,1,0,0
0,0,0,0,0,1,0
0,0,0,0,0,0,1


In [28]:
q = dim(R)[1]
f.stat = N*t(R%*%b.hat)%*%solve(R%*%V.hom%*%t(R))%*%(R%*%b.hat)/q
p.val = 1 - pf(f.stat, df1=q, df2=N-K-1)
c(f.stat, p.val)

### (vi)

In [29]:
R2 = 1 - sum(U.hat^2)/sum((Y - mean(Y))^2)
R2

In [30]:
f.stat = ((R2)/(1-R2))*((N-K-1)/q)
f.stat

Therefore, $X$ explains about 17% of the variance in $Y$.

### (vii)

In [13]:
library(lmtest)
library(sandwich)

fit = lm(log(wage) ~ black + D + D:black + h_sentscore1972 + age + I(age^2), data=df.m)
coeftest(fit, vcov=vcovHC(fit, type="HC"))

Loading required package: zoo

Attaching package: ‘zoo’

The following objects are masked from ‘package:base’:

    as.Date, as.Date.numeric




t test of coefficients:

                   Estimate  Std. Error t value  Pr(>|t|)    
(Intercept)      0.69901472  0.26402833  2.6475  0.008196 ** 
black           -0.17571282  0.05766305 -3.0472  0.002351 ** 
D                0.35229772  0.03311715 10.6379 < 2.2e-16 ***
h_sentscore1972  0.05468210  0.00807392  6.7727 1.827e-11 ***
age              0.08626245  0.01296390  6.6541 4.021e-11 ***
I(age^2)        -0.00092784  0.00016199 -5.7277 1.234e-08 ***
black:D         -0.09948285  0.11755121 -0.8463  0.397528    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


In [15]:
se = sqrt(diag(V)/N)
cbind(names, se)

names,se
(Intercept),0.264028329149603
black,0.0576630511077317
D,0.0331171527407276
black:D,0.117551208916448
h_sentscore1972,0.0080739238208821
age,0.012963899819327
I(age^2),0.0001619926454327


In [17]:
t.stats = b.hat/se
cbind(names, t.stats)

names,Unnamed: 1
(Intercept),2.64749893899754
black,-3.04723420877914
D,10.6379231061911
black:D,-0.846293711950937
h_sentscore1972,6.77267950843482
age,6.65405109696742
I(age^2),-5.72768462777476


In [19]:
p.vals = 2*(1 - pt(abs(t.stats), df=N-K-1))
cbind(names, p.vals)

names,Unnamed: 1
(Intercept),0.0081962426197397
black,0.0023510164314299
D,0.0
black:D,0.397527600090964
h_sentscore1972,1.8269386004021697e-11
age,4.0214054308762596e-11
I(age^2),1.23428662845981e-08


In [31]:
f.stat = N*t(R%*%b.hat)%*%solve(R%*%V%*%t(R))%*%(R%*%b.hat)/q
p.val = 1 - pf(f.stat, df1=q, df2=N-K-1)
c(f.stat, p.val)

## (c)
The null hypothesis is $H_0:\beta_k=0$. Because by assuming $E[X\epsilon]=0$, we have that $\beta=\gamma$, i.e., regression identifies $\gamma$. Therefore, any hypothesis on $\beta$ is the same as a hypothesis on $\gamma$.

## (d)
We fail to reject the null that $\gamma_{1,d}=0$. Thus, it appears that the returns are statistically equal. But we may fear that we have too small a sample of black people, therefore leading to imprecision in our estimate of $\gamma_{1,d}$.