<h2> Exercise 10 - Unobserved Factors and Instrumental Variables </h2>

We now consider cases in which our explanatory variable is correlated with the error term in the specification
\begin{align}
y = \beta x + u
\end{align}
and how the estimate of $\beta$ using an instrument $z$ is affect by one of the conditions
\begin{align}
\mathbb{E}zu &= 0\notag\\
\mathbb{E}zx &\neq 0\notag
\end{align}
being invalid.

In [1]:
using Distributions;
using PyPlot;

First we consider the case where both conditions are satisfied.

In [15]:
N = 10000;
beta = 0.1;
u = rand(Uniform(-1,1),N)

z = u.*u;
x = 3*z+0.5*u;
y = beta*x + u;

println("Covariance between x and u: ", cov(x,u))
println("Covariance between x and z: ", cov(x,z))
println("Covariance between z and u: ", cov(z,u))
println()

betaHat = (x'*y)/(x'*x);
betaHatIV = (z'*y)/(z'*x);

println("OLS estimator: ", betaHat)
println("IV estimator: ", betaHatIV)

Covariance between x and u: 0.1733398969147499
Covariance between x and z: 0.2682181204995582
Covariance between z and u: 0.0019402757000700765

OLS estimator: 0.1915295957501989
IV estimator: 0.10363222528008577


While the IV estimator is quite close to true value, the OLS estimator is significantly biased upwards.

Next, we consider the case where the first condition is not satisfied, so the prospective instrument is actually correlated with $u$.

In [16]:
N = 10000;
beta = 0.1;
u = rand(Uniform(-1,1),N)

z = u;
x = 3*z+0.5*u;
y = beta*x + u;

println("Covariance between x and u: ", cov(x,u))
println("Covariance between x and z: ", cov(x,z))
println("Covariance between z and u: ", cov(z,u))
println()

betaHat = (x'*y)/(x'*x);
betaHatIV = (z'*y)/(z'*x);

println("OLS estimator: ", betaHat)
println("IV estimator: ", betaHatIV)

Covariance between x and u: 1.1519427921151577
Covariance between x and z: 1.1519427921151577
Covariance between z and u: 0.3291265120329029

OLS estimator: 0.38571428571428573
IV estimator: 0.3857142857142856


Now both the OLS and IV estimators are heavily biased upwards.

Finally, we consider the case where the correlation between $x$ and $z$ is very small, typically referred to as having a <i> weak instrument </i>.

In [17]:
N = 10000;
beta = 0.1;
u = rand(Uniform(-1,1),N)

z = u.*u;
x = (1e-8)*z+0.5*u;
y = beta*x + u;

println("Covariance between x and u: ", cov(x,u))
println("Covariance between x and z: ", cov(x,z))
println("Covariance between z and u: ", cov(z,u))
println()

betaHat = (x'*y)/(x'*x);
betaHatIV = (z'*y)/(z'*x);

println("OLS estimator: ", betaHat)
println("IV estimator: ", betaHatIV)

Covariance between x and u: 0.1671371617162454
Covariance between x and z: -0.001772738080724136
Covariance between z and u: -0.0035454779334527717

OLS estimator: 2.100000000510748
IV estimator: 2.1000018773854827


Once again this leads to a very biased IV estimator, even though the population conditions are satisfied in the precise mathematical sense.  This continues to be problematic even with a much larger coefficient on $z$ in the specification of $x$.

In [23]:
N = 10000;
beta = 0.1;
u = rand(Uniform(-1,1),N)

z = u.*u;
x = (1e-2)*z+0.5*u;
y = beta*x + u;

println("Covariance between x and u: ", cov(x,u))
println("Covariance between x and z: ", cov(x,z))
println("Covariance between z and u: ", cov(z,u))
println()

betaHat = (x'*y)/(x'*x);
betaHatIV = (z'*y)/(z'*x);

println("OLS estimator: ", betaHat)
println("IV estimator: ", betaHatIV)

Covariance between x and u: 0.1645447657943272
Covariance between x and z: 0.0006106922130842042
Covariance between z and u: -0.000527823387215736

OLS estimator: 2.099714931142283
IV estimator: -1.2384061423303538
