# Chapter 7: M-Estimation (Estimating Equations)
From Boos DD, & Stefanski LA. (2013). M-estimation (estimating equations). In Essential Statistical Inference (pp. 297-337). Springer, New York, NY.

Examples of M-Estimation provided in that chapter are replicated here using `delicatessen`. Reading the chapter and looking at the corresponding implementations is likely to be the best approach to learning both the theory and application of M-Estimation. 

In [1]:
# Initial setup
import numpy as np
import scipy as sp
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf

import delicatessen
from delicatessen import MEstimator

np.random.seed(80950841)

print("NumPy version:       ", np.__version__)
print("SciPy version:       ", sp.__version__)
print("Pandas version:      ", pd.__version__)
print("Delicatessen version:", delicatessen.__version__)

NumPy version:        1.22.2
SciPy version:        1.9.2
Pandas version:       1.4.1
Delicatessen version: 1.1


In [2]:
# Generating a generic data set for following examples
n = 200
data = pd.DataFrame()
data['Y'] = np.random.normal(loc=10, scale=2, size=n)
data['X'] = np.random.normal(loc=5, size=n)
data['C'] = 1

### 7.2.2 Sample Mean and Variance
The first example is the estimating equations for the mean and variance. Here, estimating equations for both the mean and variance are stacked together:

$$\psi(Y_i, \theta) = 
\begin{bmatrix}
    Y_i - \theta_1\\
    (Y_i - \theta_1)^2 - \theta_2
\end{bmatrix} $$

The top estimating equation is the mean, and the bottom estimating equation is the (asymptotic) variance. Here, both the by-hand and built-in estimating equations are demonstrated

In [3]:
def psi_mean_var(theta):
    """By-hand stacked estimating equations"""
    return (data['Y'] - theta[0],
            (data['Y'] - theta[0])**2 - theta[1])



estr = MEstimator(psi_mean_var, init=[0, 0])
estr.estimate()

print("=========================================================")
print("Mean & Variance")
print("=========================================================")
print("M-Estimation: by-hand")
print("Theta:", estr.theta)
print("Var:  \n", estr.asymptotic_variance)
print("---------------------------------------------------------")
print("Closed-Form")
print("Mean: ", np.mean(data['Y']))
print("Var:  ", np.var(data['Y'], ddof=0))
print("=========================================================")

Mean & Variance
M-Estimation: by-hand
Theta: [10.16284625  4.11208477]
Var:  
 [[ 4.11208477 -1.6739995 ]
 [-1.6739995  36.16386927]]
---------------------------------------------------------
Closed-Form
Mean:  10.162846250198633
Var:   4.112084770881207


Notice that $\theta_2$ also matches the first element of the (asymptotic) variance matrix. These two values should match (since they are estimating the same thing). Further, as shown the closed-form solutions for the mean and variance are equal to the M-Estimation approach.

The following uses the built-in estimating equation to estimate the mean and variance

In [4]:
from delicatessen.estimating_equations import ee_mean_variance

def psi_mean_var_default(theta):
    """Built-in stacked estimating equations"""
    return ee_mean_variance(y=np.asarray(data['Y']), theta=theta)


estr = MEstimator(psi_mean_var_default, init=[0, 0])
estr.estimate()

print("=========================================================")
print("Mean & Variance")
print("=========================================================")
print("M-Estimation: built-in")
print("Theta:", estr.theta)
print("Var:  \n", estr.asymptotic_variance)
print("=========================================================")

Mean & Variance
M-Estimation: built-in
Theta: [10.16284625  4.11208477]
Var:  
 [[ 4.11208477 -1.6739995 ]
 [-1.6739995  36.16386927]]


### 7.2.3 Ratio Estimator
The next example is a ratio estimator, which can be written as either a single estimating equation or as three stacked estimating equations. First is the single estimating equation version

$$\psi(Y_i, \theta) = 
\begin{bmatrix}
    Y_i - X_i \times \theta_1
\end{bmatrix} $$


In [5]:
def psi_ratio(theta):
    return data['Y'] - data['X']*theta


estr = MEstimator(psi_ratio, init=[0, ])
estr.estimate()

print("=========================================================")
print("Ratio Estimator")
print("=========================================================")
print("M-Estimation: single estimating equation")
print("Theta:", estr.theta)
print("Var:  ",estr.asymptotic_variance)
print("---------------------------------------------------------")
print("Closed-Form")

theta = np.mean(data['Y']) / np.mean(data['X'])
b = 1 / np.mean(data['X'])**2
c = np.mean((data['Y'] - theta*data['X'])**2)
var = b * c
print("Ratio:",theta)
print("Var:  ",var)

print("=========================================================")

Ratio Estimator
M-Estimation: single estimating equation
Theta: [2.08234516]
Var:   [[0.33842324]]
---------------------------------------------------------
Closed-Form
Ratio: 2.0823451609959682
Var:   0.33842329733168625


The next example is the ratio estimator consisting of 3 estimating equations. 

$$\psi(Y_i, \theta) = 
\begin{bmatrix}
    Y_i - \theta_1\\
    X_i - \theta_2\\
    \theta_1 - \theta_2 \theta_3
\end{bmatrix} $$

Note that the last element is the ratio. To keep the dimensions correct, the last element needs to be multiplied by an array of $n$ constants. This is done via the `np.ones` trick

In [6]:
def psi_ratio_three(theta):
    return (data['Y'] - theta[0],
            data['X'] - theta[1],
            np.ones(data.shape[0])*theta[0] - theta[1]*theta[2])


estr = MEstimator(psi_ratio_three, init=[0.1, 0.1, 0.1])
estr.estimate()

print("=========================================================")
print("Ratio Estimator")
print("=========================================================")
print("M-Estimation: three estimating equations")
print("Theta:", estr.theta)
print("Var:  \n", estr.asymptotic_variance)
print("=========================================================")

Ratio Estimator
M-Estimation: three estimating equations
Theta: [10.16284625  4.88048112  2.08234516]
Var:  
 [[ 4.11208477  0.04326814  0.82409608]
 [ 0.04326814  0.95223639 -0.39742316]
 [ 0.82409608 -0.39742316  0.3384232 ]]


### 7.2.4 Delta Method via M-Estimation
M-estimation also allows for a generalization of the delta method. Below is an example with two transformations

$$\psi(Y_i, \theta) = 
\begin{bmatrix}
    Y_i - \theta_1\\
    (Y_i - \theta_1)^2 - \theta_2\\
    \sqrt{\theta_2} - \theta_3\\
    \log(\theta_2) - \theta_4
\end{bmatrix} $$


In [7]:
def psi_delta(theta):
    return (data['Y'] - theta[0],
            (data['Y'] - theta[0])**2 - theta[1],
            np.ones(data.shape[0])*np.sqrt(theta[1]) - theta[2],
            np.ones(data.shape[0])*np.log(theta[1]) - theta[3])


estr = MEstimator(psi_delta, init=[1., 1., 1., 1.])
estr.estimate()

print("=========================================================")
print("Delta Method")
print("=========================================================")
print("M-Estimation")
print("Theta:", estr.theta)
print("Var:  \n", estr.variance)
print("=========================================================")

Delta Method
M-Estimation
Theta: [10.16284625  4.11208477  2.0278276   1.41393014]
Var:  
 [[ 0.02056042 -0.00837    -0.00206379 -0.00203546]
 [-0.00837     0.18081935  0.04458452  0.04397267]
 [-0.00206379  0.04458452  0.01099318  0.01084232]
 [-0.00203546  0.04397267  0.01084232  0.01069352]]


### 7.2.6 Instrumental Variable Estimation
Two variations on the estimating equations for instrumental variable analyses. $X$ is the exposure of interest, $X^*$ is the mismeasured or observed values of $X$, $I$ is the instrument for $X$, and $Y$ is the outcome of interest. We are interested in estimating $\beta_1$ of:
$$Y_i = \beta_0 + \beta_1 X_i + e_{i,j}$$
Since $X^*$ is mismeasured, we can't immediately estimated $\beta_1$. Instead, we need to use an instrumental variable approach. Below is some generated data consistent with this measurment error story:

In [8]:
# Generating some data
n = 500
data = pd.DataFrame()
data['X'] = np.random.normal(size=n)
data['Y'] = 0.5 + 2*data['X'] + np.random.normal(loc=0, size=n)
data['X-star'] = data['X'] + np.random.normal(loc=0, size=n)
data['T'] = -0.75 - 1*data['X'] + np.random.normal(loc=0, size=n)

The estimating equations are
$$\psi(Y_i,X_i^*,T_i, \theta) = 
\begin{bmatrix}
    T_i - \theta_1\\
    (Y_i - \theta_2X_i^*)(\theta_1 - T_i)
\end{bmatrix} $$
where $\theta_1$ is the mean of the instrument, and $\theta_2$ corresponds to $\beta_1$

In [9]:
def psi_instrument(theta):
    return (theta[0] - data['T'],
            (data['Y'] - data['X-star']*theta[1])*(theta[0] - data['T']))


estr = MEstimator(psi_instrument, init=[0.1, 0.1])
estr.estimate()

print("=========================================================")
print("Instrumental Variable")
print("=========================================================")
print("M-Estimation")
print("Theta:", estr.theta)
print("Var:  \n", estr.variance)
print("=========================================================")

Instrumental Variable
M-Estimation
Theta: [-0.89989957  2.01777751]
Var:  
 [[ 0.00430115 -0.0006694 ]
 [-0.0006694   0.023841  ]]


Another set of estimating equations for this instrumental variable approach is
$$\psi(Y_i,X_i^*,T_i, \theta) = 
\begin{bmatrix}
    T_i - \theta_1\\
    \theta_2 - X_i^* \\
    (Y_i - \theta_3 X_i^*)(\theta_2 - X_i^*)\\
    (Y_i - \theta_4 X_i^*)(\theta_1 - T_i)
\end{bmatrix} $$
This set of estimating equations further allows for inference on the difference between $\beta_1$ minus the coefficient for $Y$ given $X^*$. Here, $\theta_1$ is the mean of the instrument, $\theta_2$ is the mean of the mismeasured value of $X$, and $\theta_3$ corresponds to the coefficient for $Y$ given $X^*$, and $\theta_4$ is $\beta_1$

In [10]:
def psi(theta):
    return (theta[0] - data['T'],
            theta[1] - data['X-star'],
            (data['Y'] - data['X-star']*theta[2])*(theta[1] - data['X-star']),
            (data['Y'] - data['X-star']*theta[3])*(theta[0] - data['T'])
            )


estr = MEstimator(psi, init=[0.1, 0.1, 0.1, 0.1])
estr.estimate()

print("=========================================================")
print("Instrumental Variable")
print("=========================================================")
print("M-Estimation")
print("Theta:", estr.theta)
print("Var:  \n", estr.variance)
print("=========================================================")

Instrumental Variable
M-Estimation
Theta: [-0.89989957  0.02117577  0.95717618  2.01777751]
Var:  
 [[ 0.00430115 -0.00207361 -0.00011136 -0.0006694 ]
 [-0.00207361  0.0041239   0.00023703  0.00039778]
 [-0.00011136  0.00023703  0.00302462  0.00171133]
 [-0.0006694   0.00039778  0.00171133  0.023841  ]]


### 7.4.1 Robust Location Estimation
The robust location estimator reduces the influence of outliers by applying bounds. The robust mean with a simple bounding function is
   
$$\psi(Y_i, \theta_1) = g_k(Y_i) - \theta_1$$

where $k$ indicates the bound, such that if $Y_i>k$ then $k$, or $Y_i<-k$ then $-k$, otherwise $Y_i$. Below is the estimating equation translated into code

In [11]:
# Generating some generic data
y = np.random.normal(size=250)
n = y.shape[0]

In [12]:
def psi_robust_mean(theta):
    k = 3                          # Bound value
    yr = np.where(y > k, k, y)     # Applying upper bound
    yr = np.where(y < -k, -k, y)   # Applying lower bound
    return yr - theta


estr = MEstimator(psi_robust_mean, init=[0.])
estr.estimate()

print("=========================================================")
print("Robust Location Estimation")
print("=========================================================")
print("M-Estimation")
print("Theta:", estr.theta)
print("Var:  \n", estr.variance)
print("=========================================================")

Robust Location Estimation
M-Estimation
Theta: [0.03056108]
Var:  
 [[0.00370521]]


### 7.5.1 Linear Model with Random $X$
Next, we can run a linear regression model. Note that the variance here is robust (to violations of the homoscedastic assumption). Note that we need to manually add an intercept (the column `C` in the data). As comparison, we provide the equivalent using `statsmodels` generalized linear model with heteroscedastic-corrected variances.

In [13]:
n = 500
data = pd.DataFrame()
data['X'] = np.random.normal(size=n)
data['Z'] = np.random.normal(size=n)
data['Y'] = 0.5 + 2*data['X'] - 1*data['Z'] + np.random.normal(size=n)
data['C'] = 1

In [14]:
def psi_regression(theta):
    x = np.asarray(data[['C', 'X', 'Z']])
    y = np.asarray(data['Y'])[:, None]
    beta = np.asarray(theta)[:, None]
    return ((y - np.dot(x, beta)) * x).T


mestimator = MEstimator(psi_regression, init=[0.1, 0.1, 0.1])
mestimator.estimate()

print("=========================================================")
print("Linear Model")
print("=========================================================")
print("M-Estimation: by-hand")
print(mestimator.theta)
print(mestimator.variance)
print("---------------------------------------------------------")

print("GLM Estimator")
glm = smf.glm("Y ~ X + Z", data).fit(cov_type="HC1")
print(np.asarray(glm.params))
print(np.asarray(glm.cov_params()))

print("=========================================================")

Linear Model
M-Estimation: by-hand
[ 0.41082601  1.96289222 -1.02663555]
[[ 2.18524097e-03  7.28169740e-05  1.54216618e-04]
 [ 7.28169740e-05  2.08315691e-03 -4.09520902e-05]
 [ 1.54216618e-04 -4.09520902e-05  2.14573782e-03]]
---------------------------------------------------------
GLM Estimator
[ 0.41082601  1.96289222 -1.02663555]
[[ 2.18524092e-03  7.28169947e-05  1.54216630e-04]
 [ 7.28169947e-05  2.08315690e-03 -4.09519947e-05]
 [ 1.54216630e-04 -4.09519947e-05  2.14573770e-03]]


The following uses the built-in linear regression functionality

In [15]:
from delicatessen.estimating_equations import ee_regression

def psi_regression(theta):
    return ee_regression(theta=theta,
                         X=data[['C', 'X', 'Z']],
                         y=data['Y'],
                         model='linear')


mestimator = MEstimator(psi_regression, init=[0.1, 0.1, 0.1])
mestimator.estimate()

print("=========================================================")
print("Linear Model")
print("=========================================================")
print("M-Estimation: built-in")
print(mestimator.theta)
print(mestimator.variance)
print("=========================================================")

Linear Model
M-Estimation: built-in
[ 0.41082601  1.96289222 -1.02663555]
[[ 2.18524097e-03  7.28169740e-05  1.54216618e-04]
 [ 7.28169740e-05  2.08315691e-03 -4.09520902e-05]
 [ 1.54216618e-04 -4.09520902e-05  2.14573782e-03]]


### 7.5.4 Robust Regression

In [16]:
def psi_robust_regression(theta):
    k = 1.345    
    x = np.asarray(data[['C', 'X', 'Z']])
    y = np.asarray(data['Y'])[:, None]
    beta = np.asarray(theta)[:, None]
    preds = np.clip(y - np.dot(x, beta), -k, k)
    return (preds * x).T


mestimator = MEstimator(psi_robust_regression, init=[0.5, 2., -1.])
mestimator.estimate()

print("=========================================================")
print("Linear Model")
print("=========================================================")
print("M-Estimation: by-hand")
print(mestimator.theta)
print(mestimator.variance)
print("=========================================================")

Linear Model
M-Estimation: by-hand
[ 0.41223641  1.95577495 -1.02508413]
[[ 2.31591852e-03  1.82106073e-04  2.57209797e-04]
 [ 1.82106073e-04  2.12098831e-03 -6.95782727e-05]
 [ 2.57209797e-04 -6.95782727e-05  2.38212599e-03]]


The following uses the built-in robust linear regression functionality

In [17]:
from delicatessen.estimating_equations import ee_robust_regression

def psi_robust_regression(theta):
    return ee_robust_regression(theta=theta,
                                X=data[['C', 'X', 'Z']],
                                y=data['Y'],
                                model='linear',
                                loss='huber', k=1.345)

mestimator = MEstimator(psi_robust_regression, init=[0.5, 2., -1.])
mestimator.estimate()

print("=========================================================")
print("Linear Model")
print("=========================================================")
print("M-Estimation: built-in")
print(mestimator.theta)
print(mestimator.variance)
print("=========================================================")

Linear Model
M-Estimation: built-in
[ 0.41223641  1.95577495 -1.02508413]
[[ 2.31591852e-03  1.82106073e-04  2.57209797e-04]
 [ 1.82106073e-04  2.12098831e-03 -6.95782727e-05]
 [ 2.57209797e-04 -6.95782727e-05  2.38212599e-03]]


End of tutorial