# GMM for Over-Identification

Both `MEstimator` and `GMMEstimator` can be used interchangeably for many problems. The key difference between them is how the point estimates are being found. `MEstimator` uses a root-finding algorithm to find the approximate zeroes of the estimating equations, whereas `GMMEstimator` takes a matrix product of the estimating equations and searches for the minimum. Broadly, these two approaches are simply two different ways to compute the point estimates. Preference for one or the other in any particular scenario will come down to the problem. For this reason, `MEstimator` could be replaced by `GMMEstimator` in any of the applied examples.

However, GMM and `GMMEstimator` are also able to address problems where there are more estimating equations than parameters. These types of problems are called *over identified*. This is in contrast to *just identified* problems, where there is an equal number of estimating equations and parameters. Due to how the optimization problem is structured, only minimization (and thus `GMMEstimator`) can be used in this scenario. 

## Setup 

To illustrate use of `GMMEstimator`, we will consider the use of an instrumental variable for the effect of $A$ on $Y$. To coincide with the *over* identified setting, we will have access to two different instruments. Data for this example are simulated according to the following mechanism

In [1]:
import numpy as np
import scipy as sp
import pandas as pd

import delicatessen as deli
from delicatessen import MEstimator, GMMEstimator
from delicatessen.estimating_equations import ee_regression
from delicatessen.utilities import inverse_logit

print("Versions")
print("NumPy:        ", np.__version__)
print("SciPy:        ", sp.__version__)
print("pandas:       ", pd.__version__)
print("Delicatessen: ", deli.__version__)

Versions
NumPy:         1.25.2
SciPy:         1.11.2
pandas:        1.4.1
Delicatessen:  3.0


In [2]:
# Set up for dgm
np.random.seed(777)
n = 500

d = pd.DataFrame()
d['W'] = np.random.binomial(n=1, p=0.25, size=n)
d['Z1'] = np.random.normal(scale=0.5, size=n)
d['Z2'] = np.random.normal(scale=0.5, size=n)
d['A'] = d['Z1'] + d['Z2'] + np.random.normal(size=n)
d['Y'] = 2*d['A'] - 1*d['W']*d['A'] + np.random.normal(scale=1.0, size=n)
d.describe()

Unnamed: 0,W,Z1,Z2,A,Y
count,500.0,500.0,500.0,500.0,500.0
mean,0.226,0.033211,0.003048,0.068616,0.137321
std,0.418658,0.517523,0.523425,1.216829,2.408356
min,0.0,-1.2281,-1.545184,-4.180748,-8.943065
25%,0.0,-0.35553,-0.326217,-0.680898,-1.353907
50%,0.0,0.033815,0.016186,0.083086,0.189151
75%,0.0,0.367397,0.348956,0.87149,1.644282
max,1.0,1.732386,2.135354,3.534259,6.914683


## Instrumental Variable Example 1

For this instrumental variable analysis, we use the following estimating equation
$$ E[Z(Y - \beta_a A)] = 0 $$
where $Z$ is an instrument. In our case, there are two instruments available $Z_1,Z_2$. We might consider applying this previous estimating equation for each instrument separate. The following is code that does this

In [3]:
z1 = np.asarray(d['Z1'])
z2 = np.asarray(d['Z2'])
a = np.asarray(d['A'])
y = np.asarray(d['Y'])

In [4]:
def psi(theta):
    ee_z1 = z1 * (y - theta[0]*a)
    ee_z2 = z2 * (y - theta[1]*a)
    return np.vstack([ee_z1, ee_z2])

In [5]:
estr = MEstimator(psi, init=[0., 0.])
estr.estimate()
print(estr.theta)
print("95% CI")
print(estr.confidence_intervals())

[1.76248289 1.86172303]
95% CI
[[1.54446967 1.98049611]
 [1.67522582 2.04822024]]


In [6]:
estr = GMMEstimator(psi, init=[0., 0.])
estr.estimate(solver='nelder-mead')
print(estr.theta)
print("95% CI")
print(estr.confidence_intervals())

[1.76248289 1.86172303]
95% CI
[[1.54446966 1.98049611]
 [1.67522582 2.04822024]]


This analysis (whether done via `MEstimator` or `GMMEstimator`) gives us two different estimates. These are close to each other (given the precision, as indicated by the confidence or compability intervals), but how do we select which one to report? Well we could report both, but we could also revise our estimating equations to correspond to an *over* identified setting. 

Here, our stacked over identified estimating equations are 
$$ E
\begin{bmatrix}
Z_1(Y - \beta A) \\
Z_2(Y - \beta A) \\
\end{bmatrix}
= 0 
$$
Note that there is only a single parameter being estimated, but we have two separate equations. 

Let's setup this up as an estimating function for `delicatessen`

In [7]:
def psi(theta):
    ee_z1 = z1 * (y - theta*a)
    ee_z2 = z2 * (y - theta*a)
    return np.vstack([ee_z1, ee_z2])

Now, let's see what happens when we try to use `MEstimator` for this estimating function

In [8]:
estr = MEstimator(psi, init=[0.,])
estr.estimate()

ValueError: The number of initial values and the number of rows returned by `stacked_equations` should be equal but there are 1 initial values and the `stacked_equations` function returns 2 row(s).

We get an error from `delicatessen`. This error states that the dimension of the parameters must match the dimension of the estimating equations. This is because of how the root-finding procedure is structured. We can't use `MEstimator` for this type of problem.

However, we are still able to use `GMMEstimator`. Lets's see how that works

In [9]:
estr = GMMEstimator(psi, init=[0.,])
estr.estimate()
print(estr.theta)
print("95% CI")
print(estr.confidence_intervals())

[1.81930988]
95% CI
[[1.6825304  1.95608936]]


Here, we are able to obtain an estimate. Perhaps more interestingly, our confidence intervals are narrower (CLD of 0.274) than the previous implementation that treated instruments as separate (CLD of 0.436 and 0.373). This feature is due to us being able to leverage more information to estimate a single parameter in this second setup (under the assumption that both are valid instruments that are not weak). 

## Instrumental Variable Example 2

To develop a slightly more complicated example, we are going to extend the previous instrumental variable analysis with some transportability methods. Here, interest is in the effect of $A$ on $Y$, except we are interested in the effect a different population. Importantly, we think the only relevant variable differing between our populations is $W$, which as measured in both populations. To transport, we are going to use inverse odds of sampling weights (see Cole et al. applied example for more details). Again, we have the same instruments $Z_1,Z_2$.

Below is some simulated data corresponding to this scenario

In [10]:
# Set up for dgm
np.random.seed(777)
n = 500

# External data
d0 = pd.DataFrame()
d0['W'] = np.random.binomial(n=1, p=0.25, size=n)
d0['Z1'] = np.random.normal(scale=0.5, size=n)
d0['Z2'] = np.random.normal(scale=0.5, size=n)
d0['A'] = d0['Z1'] + d0['Z2'] + np.random.normal(size=n)
d0['Y'] = 2*d0['A'] - 1*d0['W']*d0['A'] + np.random.normal(scale=1.0, size=n)
d0['S'] = 0

# Target data
d1 = pd.DataFrame()
d1['W'] = np.random.binomial(n=1, p=0.75, size=n)
d1['Z1'] = -99
d1['Z2'] = -99
d1['A'] = -99
d1['Y'] = -99
d1['S'] = 1

# Stacking data together
d = pd.concat([d1, d0], ignore_index=True)
d['C'] = 1

In [11]:
W = np.asarray(d[['C', 'W']])
z1 = np.asarray(d['Z1'])
z2 = np.asarray(d['Z2'])
a = np.asarray(d['A'])
y = np.asarray(d['Y'])
s = np.asarray(d['S'])

For this problem, we are going to estimate inverse odds of sampling weights. To do that, we need to estimate the probability of $S$ (the source population indicator) given $W$. We will do this using logistic regression. Below is an estimating equation that illustrates this process

In [12]:
def psi(theta):
    # This is how the inverse odds weights will be computed
    # pi_s = inverse_logit(np.dot(W, beta))
    # iosw = (1 - s) * pi_s / (1 - pi_s)
    return ee_regression(theta=theta, y=s, X=W, model='logistic')

In [13]:
estr = GMMEstimator(psi, init=[0., 0.])
estr.estimate()
estr.theta

array([-1.0832274 ,  2.26663623])

To combine these weights with the instrumental variable setup from before, we will use the following estimating equations
$$ E
\begin{bmatrix}
Z_1(Y - \beta A) \pi_s(W) (1-S) \\
Z_2(Y - \beta A) \pi_s(W) (1-S) \\
\psi_s(S,W; \alpha) \\
\end{bmatrix}
= 0 
$$
where $\pi_s(W)$ is the inverse odds weights and $\psi_s$ is the estimating function for the logistic model of $W$ on $S$. Note that only the external data contributes to the instrumental variable functions (since $A,Y,Z_1,Z_2$ is missing in the target data). Code for these equations is given in the following

In [14]:
def psi(theta):
    beta = theta[0]
    alpha = theta[1:]

    # Calculating inverse odds of sampling weights
    pi_s = inverse_logit(np.dot(W, alpha))
    iosw = (1 - s) * pi_s / (1 - pi_s)

    # Estimating functions
    ee_sample = ee_regression(theta=alpha, y=s, X=W, model='logistic')
    ee_z1 = z1 * (y - beta*a) * iosw * (1 - s)
    ee_z2 = z2 * (y - beta*a) * iosw * (1 - s)
    return np.vstack([ee_z1, ee_z2, ee_sample])

In [15]:
estr = GMMEstimator(psi, init=[0, 0, 0])
estr.estimate()
estr.theta

array([ 1.36473524, -1.08341306,  2.26674163])

This example highlights how we can combine *over* and *just* identified parameters into the same estimation procedure with GMM. 

Note: you may notice that the nuisance model parameters differ. From my understanding, these differences are due to how the weight matrix is updated and the contributions across the different estimating equations. If you dig further into this example, you will also note that the variance estimates differ for the nuisance parameters as well. Again, this seems to be a result of the differing point estimates. If this information is not correct, please reach out to me.

## Conclusion

This completes the illustration of how `GMMEstimator` can be used for *over* identified parameters.