# Suggested Solution to PS2 

You can optionally activate any environment. In this particular case, I will go to such location and activate it. If you don't use any particular environment in your computer, you can ignore the cell below.

In [1]:
cd(joinpath(pwd(),".."))

using Pkg
Pkg.activate(".") ;

[32m[1m Activating[22m[39m environment at `C:\Users\paulc\.julia\dev\ECON627_2020\Project.toml`


## Load Main Packages

In [2]:
using Distributions, PrettyTables, Random

# Part (a)

#### Define parameters and other constants

In [3]:
n=500
const β=1.0
const Π=[1.0;1.0;]
const ρ=0.95;
const Σ=[1.0 ρ; ρ 1;];

#### Set Seed

In [4]:
Random.seed!(1234);

#### Define function that generates data

In [5]:
function generate_data(n)
    #Define the Multivariate Normal Distribution instance
    mvnormal = MvNormal([0.0; 0.0], Σ)
    
    #DGP
    Errors=rand(mvnormal,n)'
    ϵ=Errors[:,1]
    V=Errors[:,2]
    Z=randn(n,2)
    X=Z*Π+V
    U=exp.(Z*Π) .* ϵ
    Y=β*X+U
    return (Y = Y , X = X , Z = Z)
end

generate_data (generic function with 1 method)

Notice that the function returns a tuple where the first element is $Y$, the second element is $X$, and the third element is $Z$. If we ran this

In [6]:
X1 , Y1, Z1 = generate_data(10);

Our $X1$ would be in fact $Y$, and $Y1$ would be $X$. You have to be careful! The package `Parameters` allows us to forget about the order of that, and just unpack the named tuples using the output names in the function.

In [7]:
using Parameters
@unpack X, Y , Z = generate_data(10);

# Parts (b,c)

#### Function for estimation of $\Omega$

In [8]:
function Ω(U,Z)
    n=length(U)
    zr = Z.*U
    omega = (zr' * zr)/n
    
    return omega
end

Ω (generic function with 1 method)

#### Function that computes 2SLS and the two-step efficient GMM and their standard errors

Recall that we can write the estimator (for any choice of weighting matrix) as 
$$ \hat{\beta}_n(W_n) = (X'Z W_n Z'X)^{-1} X'Z W_n Z'Y $$

and don't forget that 2SLS corresponds to the case where $W_n^{-1} = \frac{(Z'Z)}{n}$.

In [9]:
function estimators(Y,X,Z)
    
    n=length(Y)
    
    # 2SLS
    PZ = Z*( (Z'*Z)\Z' )
    β2SLS = (X'*PZ*X)\(X'*PZ*Y)
    Q = Z'*X/n
    W = inv(Z'*Z/n)
    Ω1 = Ω(Y-β2SLS*X,Z)
    var2sls = ( (Q'*W*Q)\(Q'*W*Ω1*W*Q)/(Q'*W*Q) )/n
    std2SLS = sqrt(var2sls)
    
    # Two-step efficient GMM
    WGMM=inv(Ω1);
    βGMM=(X'*Z*WGMM*Z'*X)\(X'*Z*WGMM*Z'*Y)
    Ω2=Ω(Y-βGMM*X,Z)
    WGMM=inv(Ω2)
    stdGMM=sqrt(inv(Q'*WGMM*Q)/n)
    
    return β2SLS, std2SLS,  βGMM, stdGMM
    
end

estimators (generic function with 1 method)

# Part (d)

In [10]:
R=10^4
Bias2SLS=0.0
BiasGMM=0.0
std2SLS=0.0
stdGMM=0.0
In2SLS=0.0
InGMM=0.0
CritVal = quantile(Normal(0,1), .975);

In [11]:
for r=1:R
    Y, X, Z = generate_data(n)
    b2SLS, s2SLS, bGMM, sGMM = estimators(Y,X,Z)
    Bias2SLS += abs(b2SLS-β)
    BiasGMM += abs(bGMM-β)
    std2SLS += s2SLS
    stdGMM += sGMM
    In2SLS += (β>b2SLS - CritVal*s2SLS)*(β<b2SLS + CritVal*s2SLS)
    InGMM += (β>bGMM - CritVal*sGMM)*(β<bGMM + CritVal*sGMM)
end
    

# Part (e) $n=500$

In [12]:
table_data = ["Bias" Bias2SLS/R BiasGMM/R
        "Ave. std.err" std2SLS/R stdGMM/R;
        "Coverage Prob of CI" In2SLS/R InGMM/R;        
]
header=["Statistic" "2SLS" "Two-step efficient GMM";]
pretty_table(table_data,header)

┌─────────────────────┬──────────┬────────────────────────┐
│[1m           Statistic [0m│[1m     2SLS [0m│[1m Two-step efficient GMM [0m│
├─────────────────────┼──────────┼────────────────────────┤
│                Bias │ 0.458717 │               0.372466 │
│        Ave. std.err │ 0.523736 │               0.437846 │
│ Coverage Prob of CI │   0.9581 │                 0.9473 │
└─────────────────────┴──────────┴────────────────────────┘


# Part (f)

* The two-step efficient GMM estimator has a smaller bias and a smaller standard error.
* The simulated coverage probabilities for both confidence intervals are close to the nominal 0.95.
* The two-step efficient GMM approach is preferred when $n=500$.

# Part (g) $n=100$

In [13]:
n=100
Bias2SLS=0.0;
BiasGMM=0.0;
std2SLS=0.0;
stdGMM=0.0;
In2SLS=0.0;
InGMM=0.0;

for r=1:R
    Y, X, Z = generate_data(n)
    b2SLS, s2SLS, bGMM, sGMM = estimators(Y,X,Z)
    Bias2SLS +=abs(b2SLS-β)
    BiasGMM +=abs(bGMM-β)
    std2SLS +=s2SLS
    stdGMM +=sGMM
    In2SLS += (β>b2SLS - CritVal*s2SLS)*(β<b2SLS + CritVal*s2SLS)
    InGMM += (β>bGMM - CritVal*sGMM)*(β<bGMM + CritVal*sGMM)
end

table_data = ["Bias" Bias2SLS/R BiasGMM/R
        "Ave. std.err" std2SLS/R stdGMM/R;
        "Coverage Prob of CI" In2SLS/R InGMM/R;     
]
header=["Statistic" "2SLS" "Two-step efficient GMM";]
pretty_table(table_data,header)

┌─────────────────────┬──────────┬────────────────────────┐
│[1m           Statistic [0m│[1m     2SLS [0m│[1m Two-step efficient GMM [0m│
├─────────────────────┼──────────┼────────────────────────┤
│                Bias │ 0.843634 │               0.630633 │
│        Ave. std.err │ 0.902724 │               0.686344 │
│ Coverage Prob of CI │    0.952 │                 0.9318 │
└─────────────────────┴──────────┴────────────────────────┘


* The two-step efficient GMM estimator is less biased and has a smaller standard error.
* The confidence interval based on the two-step efficient GMM estimator has a coverage pobability below the nominal 0.95.
* The 2SLS approach is more reliable in smaller samples: the efficient two-step GMM approach may result in invalid inference in smaller samples.