## Finite Sample Properties of Linear GMM



### Introduction



GMM provides a generalized way to think about instrumental
variables estimators, and we also have evidence that the finite
sample properties of these estimators may be poor.  Here we&rsquo;ll
construct a simple Monte Carlo framework within which to evaluate
the finite-sample behavior of GMM linear IV estimators.



### Asymptotic Variance of GMM estimator



If we have $\mbox{E}g_j(\beta)g_j(\beta)^\top=\Omega$ and
$\mbox{E}\frac{\partial g_j}{\partial b^\top}(\beta)=Q$ then we&rsquo;ve
seen that the asymptotic variance of the optimally weighted GMM
estimator is
$$
       V_b = \left(Q^\top\Omega^{-1}Q\right)^{-1}.
   $$



### Data Generating Process



We consider the linear IV model

\begin{align*}
   y &= X\beta + u\\
   \mbox{E}Z^\top u = 0
\end{align*}

Thus, we need to describe processes that generate $(X,Z,u)$.



In [1]:
import numpy as np
from numpy.linalg import inv

# Let Z have order ell, and X order 1, with Var([X,Z])=VXZ

ell = 4

# Arbitrary (but deterministic) choice for VXZ
A = np.sqrt(np.arange(1,(ell+1)**2+1)).reshape((ell+1,ell+1))
VXZ = A.T@A/100 

Q = VXZ[1:,[0]]  # EZX'

sigma_u = 1

Omega = (sigma_u**2)*VXZ[1:,1:] # E(Zu)(u'Z')

# Asymptotic variance of optimally weighted GMM estimator:
print(inv(Q.T@inv(Omega)@Q))

Now code to generate N realizations of $(y,X,Z)$:



In [1]:
from scipy.stats import distributions as iid

def dgp(N,beta,sigma_u,VXZ):
    
    u = iid.norm.rvs(size=(N,1))*sigma_u

    # "Square root" of VXZ
    l,v = np.linalg.eig(VXZ)
    SXZ = v@np.diag(np.sqrt(l))

    # Generate normal random variates [X,Z]
    XZ = iid.norm.rvs(size=(N,VXZ.shape[0]))@SXZ.T

    # But X is endogenous...
    X = XZ[:,[0]] + u
    Z = XZ[:,1:]

    # Calculate y
    y = X*beta + u

    return y,X,Z

Check on DGP:



In [1]:
N = 1000

y,X,Z = dgp(N,1,1,VXZ)

# Check that we've computed things correctly:
print(np.cov(np.c_[X,Z].T) - VXZ)

### Estimation



Now that we have a data-generating process we proceed with under
   the conceit that we can observe samples generated by this process,
   but otherwise temporarily &ldquo;forget&rdquo; the properties of the DGP, and use the
   generated data to try to reconstruct aspects of the DGP.

Here we consider using the optimally weighted linear IV estimator.



#### Construct sample moments



Begin by defining a function to construct the sample moments given
    the data and a parameter estimate $b$:



In [1]:
def gj(b,y,X,Z):
    """Observations of g_j(b)
    """
    return Z*(y - X*b)

def gN(b,y,X,Z):
    """Averages of g_j(b)
    """
    u = gj(b,y,X,Z)
    
    return u.mean(axis=0)

#### Define estimator of Egg'



Next we define a function to compute covariance matrix of moments.
Re-centering can be important in finite samples, even if irrelevant in
the limit.  Since we have $\mbox{E}g_j(\beta)=0$ under the null we may
as well use this information when constructing our weighting matrix.



In [1]:
def Omegahat(b,y,X,Z):
    u = gj(b,y,X,Z)

    # Recenter! We have Eu=0 under null.
    # Important to use this information.
    u = u - u.mean(axis=0) 
    
    return u.T@u/u.shape[0]

# Check construction:
Winv = Omegahat(.3,y,X,Z) 
print(Winv)

Finally define the criterion function given a weighting matrix $W$:



In [1]:
def J(b,W,y,X,Z):

    m = gN(b,y,X,Z)
    
    return m.T@W@m*y.shape[0] # Scale by sample size

# Check construction
%matplotlib inline
from matplotlib import pyplot as plt

limiting_J = iid.chi2(ell-1)

B = np.linspace(-0,2,100)
W = inv(Winv)

plt.plot(B,[J(b,W,y,X,Z) for b in B.tolist()])
plt.axhline(limiting_J.isf(0.05),color='r')

#### Two Step Estimator



We next implement the two-step GMM estimator



In [1]:
from scipy.optimize import minimize_scalar

def two_step_gmm(y,X,Z):

    # First step uses identity weighting matrix
    W1 = np.eye(Z.shape[1])

    b1 = minimize_scalar(lambda b: J(b,W1,y,X,Z)).x 

    W2 = inv(Omegahat(b1,y,X,Z))

    return minimize_scalar(lambda b: J(b,W2,y,X,Z))

Now let&rsquo;s try it with an actual sample:



In [1]:
soltn = two_step_gmm(y,X,Z)

print("b=%f, J=%f, Critical J=%f" % (soltn.x,soltn.fun,limiting_J.isf(0.05)))

### Monte Carlo Draws



Next we&rsquo;ll generate a sample of estimates of $b$ by drawing repeated
samples of size $N$:



In [1]:
N = 1000

D = 1000

b_list = []
J_list = []
for d in range(D):
    soltn = two_step_gmm(*dgp(N,1,sigma_u,VXZ))
    b_list.append(soltn.x)
    J_list.append(soltn.fun)

_ = plt.hist(b_list,bins=int(np.ceil(np.sqrt(N))))

### Distribution of Monte Carlo draws vs. Asymptotic distribution



Compare Monte Carlo standard errors with asymptotic approximation:



In [1]:
# Limiting distribution of estimator

sigma_0 = np.sqrt(inv(Q.T@inv(Winv)@Q)/N)[0][0] # Limiting Std.
limiting_b = iid.norm(scale=sigma_0)

print("Bootstrapped standard errors: %g" % np.std(b_list))
print("Asymptotic approximation: %g" % sigma_0)
print("Critical value for J statistic: %g (5%%)" % limiting_J.isf(.05))

Now construct probability plot (bootstrapped $b$s vs. quantiles of
limiting distribution):



In [1]:
from scipy.stats import probplot

_ = probplot(b_list,dist=limiting_b,fit=False,plot=plt)

Next, consider the a $p$-$p$ plot for $J$ statistics (recall these
should be distributed $\chi^2_{\ell-1}$).



In [1]:
from statsmodels.api import ProbPlot

# statsmodels is too good for lists, so create 1-d array
testplots = ProbPlot(np.array(J_list),limiting_J) 
_ = testplots.qqplots(line='45')
_ = testplots.ppplots(line='45')