![image.png](attachment:image.png)

By utilizing the law of iterated expectations, we derive properties of the estimators in linear regression models. The law of iterated expectations posits that for any random variables $X$ and $Y$, the expectation of $X$ is the expectation of the conditional expectation of $X$ given $Y$:

The expected value of the product of the regressor matrix $\mathbf{x}$ and the error term $ u $, conditional on $\mathbf{x}$, is zero under the classical linear regression model assumptions. This assumption, known as the zero conditional mean assumption, states that the expected value of the error term, conditional on the regressor matrix, is zero:

$$
E[\mathbf{x}u] = E_{\mathbf{x}}[E[\mathbf{x}u | \mathbf{x}]] = E_{\mathbf{x}}[\mathbf{x}E[u | \mathbf{x}]] = E_{\mathbf{x}}[\mathbf{x} \cdot 0] = 0
$$

Consequently, this implies that:

$$
E[\mathbf{x}u] = 0
$$

Following this logic, the expected value of the residual term, when regressing the dependent variable $ y$ on the independent variables contained in $\mathbf{x}$, equates to zero:

$$
E[\mathbf{x}(y - \mathbf{x}'\beta)] = 0
$$

The Method of Moments (MM) estimator is obtained by equating the sample moment conditions to their population counterparts and solving for the parameters. The sample moment condition is expressed as:

$$
\frac{1}{N} \sum_{i=1}^{N} \mathbf{x}_{i}(y_{i} - \mathbf{x}_{i}'\beta) = 0
$$

The MM estimator, $\hat{\beta}_{MM}$, is derived by setting the sample covariance between the regressors and the residuals to zero and solving for $\beta$:

$$
\hat{\beta}_{MM} = \left( \sum_{i=1}^{N} \mathbf{x}_{i} \mathbf{x}_{i}' \right)^{-1} \left( \sum_{i=1}^{N} \mathbf{x}_{i}y_{i} \right)
$$

Defining the outcome vector $y$ and the design matrix $X$ as follows facilitates the use of matrix algebra to simplify the computation:

$$
y = \begin{bmatrix}
y_{1} \\
\vdots \\
y_{N}
\end{bmatrix} 
$$

$$
X = \begin{bmatrix}
\mathbf{x}_{1}' \\
\vdots \\
\mathbf{x}_{N}'
\end{bmatrix}
$$
cosidering $g(\beta) = \frac{1}{N}\sum_{i=1}^{N} \mathbf{x}_{i}(y_{i} - \mathbf{x}_{i}'\beta)$, the MM estimator in matrix notation, which is algebraically equivalent to the Ordinary Least Squares (OLS) estimator, is:

$$
\hat{\beta}_{MM} = (X'X)^{-1}X'y = \hat{\beta}_{OLS}
$$

 

![image.png](attachment:image.png)

![image.png](attachment:image.png)

#### Aditional Moment Restriction For Skewness

Furthermore, considering the error term's symmetry and its higher moments, we have an additional moment condition:

Considering
$$
E[u^3|\mathbf{x}] = 0 
$$
This secures
$$
E[\mathbf{x}u^3] = 0
$$
,according to law of iterated expectations.

When $\beta$ is estimated based on these "2K" moment conditions, where "K" represents the number of parameters in vector $\beta$, we compile the conditions into a vector equation:

$$
\begin{bmatrix}
E[\mathbf{x}(y-\mathbf{x}'\beta)]\\ 
E[\mathbf{x}(y-\mathbf{x}'\beta)^3]
\end{bmatrix}
 = \begin{bmatrix}
\mathbf{0}\\ 
\mathbf{0}
\end{bmatrix}
$$

Accordingly, the Method of Moments (MM) estimator aims to solve these sample moment conditions:

$$
\frac{1}{N} \sum_{i=1}^{N} \mathbf{x}_i (y_i - \mathbf{x}_{i}'\beta) = \mathbf{0} 
$$
$$
\frac{1}{N} \sum_{i=1}^{N} \mathbf{x}_i (y_i - \mathbf{x}_{i}'\beta)^3 = \mathbf{0}
$$

The MM estimator thus endeavors to reconcile the sample moments with the population moments. The inclusion of higher-order moment conditions can potentially enhance the efficiency of the estimator if the error distribution deviates from normality, particularly when it exhibits skewness.
#### Addressing Overidentification in the Method of Moments

However, when faced with "2K" equations and only "K" unknown parameters $\beta$, an overidentification issue arises. Overidentification occurs because it is "not possible for all of these sample moment conditions to be satisfied simultaneously" given the number of unknowns. This dilemma renders the conventional method of moments inapplicable as we cannot find a parameter vector $\beta$ that satisfies all the sample moment conditions exactly.

To circumvent this challenge, the Generalized Method of Moments (GMM) estimator is employed. GMM aims to set the sample moments as close to zero as feasible by minimizing a quadratic form of the moments, reflecting a trade-off between the various conditions. Thus, the GMM estimator, $\hat{\beta}_{GMM}$, is defined as the argument that minimizes the following quadratic loss function:

$$
{
\hat{\beta}_{GMM} = \underset{\beta}{\text{argmin}} \ Q_N(\beta) = \underset{\beta}{\text{argmin}} \ \left( \begin{bmatrix}
\frac{1}{N} \sum_{i=1}^{N} \mathbf{x}_i u_i\\ 
\frac{1}{N} \sum_{i=1}^{N} \mathbf{x}_i u_i^3
\end{bmatrix}' 
\mathbf{W}_N
\begin{bmatrix}
\frac{1}{N} \sum_{i=1}^{N} \mathbf{x}_i u_i\\ 
\frac{1}{N} \sum_{i=1}^{N} \mathbf{x}_i u_i^3
\end{bmatrix} \right)}
$$

Here, $u_i = y_i - \mathbf{x}_i ' \beta$ represents the residuals, and $\mathbf{W}_N$ is a positive semi-definite (PSD) weighting matrix of dimension $2K \times 2K$. The choice of $\mathbf{W}_N$ is critical; for certain consistent estimators of the inverse of the variance of the moment vector, the GMM estimator can achieve greater efficiency than the Ordinary Least Squares (OLS) estimator.
