# Extremum Estimators

## Introduction

**Definition (Extremum Estimator)**: An estimator $\hat{\theta}$ is called an extremum estimator if there is a scalar objective function $Q_n(\bf{w};\theta)$ such that
$$\hat{\theta} \in \arg \max Q_n(\bf{w};\theta)$$
subject to $\theta \in \Theta \subset \mathbb{R}^p$, where
- $n$ is the number of observations in the data
- $\bf{w} \equiv (\bf{w}_1,\dots,\bf{w}_n)$ is the sample or the data, and 
- $\Theta$ is the set of possible parameter values

This maximization problem may not necessarily have a solution. The following lemma shows that $\hat{\theta}$ is measurable if $Q_n(\theta)$ is

**Lemma (Existence of Extremum Estimators)**: Suppose that
1. the parameter space $\Theta$ is a compact subset of $\mathbb{R}^p$
2. $Q_n(\theta)$ is continuous in $\theta$ for any data $\bf{w}$, and
3. $Q_n(\theta)$ is a measurable function of $\bf{w}$ for all $\theta \in \Theta$.

Then there exists $\hat{\theta}$ such that $\arg \max Q_n(\bf{w};\theta)$ subject to  $\theta \in \Theta$

## Two Classes of Extremum Estimators
1. M-Estimators:  $Q_n(\theta)$ is a simple averate
$$Q_n(\theta)=\frac{1}{n}\sum_{1}^{n}m(\bf{w}_i;\theta)$$
    - Examples: maximum likelihood (ML) and nonlinear least squares (NLS)
2. Generalized Method of Moments (GMM)
$$Q_n(\theta)=-g_n(\theta)'\hat{\bf{W}}g_n(\theta)$$
where
    - $\hat{\bf{W}}$ is a $K \times K$ symmetric and positive definite matrix that defines the distance of $g_n(\theta)$ from zero.
    - $g_n(\theta) = \frac{1}{n}\sum_{1}^{n}g(\bf{w}_i;\theta)$

### M-Estimator Example: Maximum Likelihood
- ${\bf{w}_i}$ is i.i.d.
- $\theta$ is a finite-dimensional vector
- a functional form of $f({\bf{w}_i};\theta)$ is known
- $\theta_0$ is the true parameter value

The joint density of data $(\bf{w}_1,\dots,\bf{w}_n)$ is 
$$f(\bf{w}_1,\dots,\bf{w}_n;\theta_0)=\prod_{1}^{n} f(\bf{w}_i;\theta_0)$$
The $Q_n(\theta)$ can either be the likelihood and the log-likelihood function:
$$f(\bf{w}_1,\dots,\bf{w}_n;\theta)=\prod_{1}^{n} f(\bf{w}_i;\theta)$$
$$\log f(\bf{w}_1,\dots,\bf{w}_n;\theta)=\log \left[ \prod_{1}^{n} f(\bf{w}_i;\theta) \right] = \sum_{1}^{n} \log f(\bf{w}_i;\theta) $$ 



### M-Estimator Example: Conditional Maximum Likelihood
- ${\bf{w}_i}$ is partitioned into two groups, $y_i$ an $\bf{x}_i$, and the interest is to examine how $\bf{x}_i$ influences the conditional distribution of $y_i$
- $f(y_i |\bf{x}_i; \psi_0)$ be the conditional density of $y_i$ given $\bf{x}_i$ 
- $f(\bf{x}_i; \psi_0)$ be the marginal density of $\bf{x}_i$

The joint density of data $(\bf{w}_1,\dots,\bf{w}_n) = (y_t,\bf{x}'_i)' $ is 
$$ f(y_t ,\bf{x}_i;\theta_0,\psi_0) = f(y_i | \bf{x}_i;\theta_0)f(\bf{x}_i;\psi_0) $$
The $Q_n(\theta)$ can either be the likelihood and the log-likelihood function:
$$f(\bf{w}_i;\theta,\psi)=\prod_{1}^{n} f(y_i|\bf{x}_i;\theta) + \prod_{1}^{n} f(\bf{x}_i;\psi)$$
$$\sum_{1}^{n} \log f(\bf{w}_i;\theta,\psi)=\sum_{1}^{n} \log f(y_i|\bf{x}_i;\theta) + \sum_{1}^{n} \log f(\bf{x}_i;\psi)$$

### M-Estimator Example: Nonlinear least square
- $y_i = \varphi_i(\bf{x}_i; \psi_0) + \epsilon_i$
- $\mathbb{E}(\epsilon_i|\bf{x}_i)$
- The functional form of $\varphi$ is known

The $Q_n(\theta)$ is
$$-\frac{1}{n}\sum_{1}^{n}\left[ y_i - \varphi_i(\bf{x}_i; \psi) \right]^2$$

### M-Estimator Example: Nonlinear GMM
- $y_i = \varphi_i(\bf{x}_i; \psi_0) + \epsilon_i$
- $\mathbb{E}(\epsilon_i|\bf{x}_i)$
- The functional form of $\varphi$ is known

Moment condition:
$$\mathbb{E}(\epsilon_i | \bf{x}_i)=0 \rightarrow \mathbb{E}(\epsilon_i \cdot \bf{x}_i)=0 \rightarrow \mathbb{E}\bigg( \big[ y_i - \varphi_i(\bf{x}_i; \psi) \big] \cdot \bf{x}_i \bigg)=0$$
Using the moment condition, the $Q_n(\theta)$ is
$$Q_n(\theta)=-g_n(\theta)'\hat{\bf{W}}g_n(\theta)$$
where
$$g_n(\theta) = \frac{1}{n}\sum_{1}^{n}\big[ y_i - \varphi_i(\bf{x}_i; \psi) \big] \cdot \bf{x}_i$$

<div style="page-break-after: always"></div>


## Consistency
If the parameter space is compact,

**Proposition (Consistency with Compact Parameter Space)**: Suppose that 
1. $\Theta$ is a compact subset of $\mathbb{R}^p$
2. $Q_n(\bf{w};\theta)$ is a continuous function of $\theta$ for any data $\bf{w}$
3. $Q_n(\bf{w};\theta)$ is a measurable function of $\bf{w}$ for all $\theta \in \Theta$
4. If there is a function $Q_0(\theta)$ such that
    - (identification)  $Q_0(\theta)$ is uniquely maximized at $\theta_0 \in \Theta$
    - (uniform convergence) $\sup_{\theta \in \Theta} \vert Q_n(\theta) - Q_0(\theta) \vert \rightarrow_{p} 0 $
    
Then, $\hat{\theta} \rightarrow_{p} \theta_0$

If the parameter space is not compact,

**Proposition (Consistency without Compact Parameter Space)**: Suppose that 
1. $\theta_0 \in \text{interior} \Theta$ and $\Theta$ is a convex subset of $\mathbb{R}^p$
2. $Q_n(\bf{w};\theta)$ is a concave over $\Theta$ of for any data $\bf{w}$
3. $Q_n(\bf{w};\theta)$ is a measurable function of $\bf{w}$ for all $\theta \in \Theta$
4. If there is a function $Q_0(\theta)$ such that
    - (identification)  $Q_0(\theta)$ is uniquely maximized at $\theta_0 \in \Theta$
    - (point-wise convergence) $\vert Q_n(\theta) - Q_0(\theta) \vert \rightarrow_{p} 0$ for all $\theta \in \Theta$
    
Then, $\hat{\theta} \rightarrow_{p} \theta_0$

Above proposition presents the set of sufficient conditions under which an extremum estimator is consistent. Now, let's specialize these conditions to M-estimators and GMM estimators.

1. What is $Q_n(\theta)$ for M-Estimators and GMM?
2. What are the conditions for an M-estimator $\hat{\theta}$ to be well-defined?
3. What is the identification condition for an M-estimator?
4. What is the uniform/point-wise convergence condition and the point-wise convergence condition?

### Consistency of M-Estimators
#### (Q1) What is $Q_0(\theta)$ in the previous consistency propositions?
For M-estimator, the objective function is:
$$Q_n(\theta)=\frac{1}{n}\sum_{1}^{n}m(\bf{w}_i;\theta)$$

If $\mathbb{E}\left[m(\bf{w}_i;\theta)\right]$ exists and is finite, 
$$Q_n(\theta)=\frac{1}{n}\sum_{1}^{n}m(\bf{w}_i;\theta)\rightarrow_{p} \mathbb{E}\left[m(\bf{w}_i;\theta)\right]$$

Therefore, 
$$Q_0(\theta)=\mathbb{E}\left[m(\bf{w}_i;\theta)\right]$$

### Consistency of M-Estimators
#### (Q2) What are the conditions for an M-estimator $\hat{\theta}$ to be well-defined?
- If $\Theta$ is compact,
    - $m(\bf{w}_i;\theta)$ is a continuous function of $\theta$ for any data $\bf{w}$
    - $m(\bf{w}_i;\theta)$ is a measurable function of $\bf{w}$ for all $\theta \in \Theta$
- If $\Theta$ is not compact, but is convex and $\theta \in \text{interior} \Theta$:
    - $m(\bf{w}_i;\theta)$ is concave over $\Theta$ for any data $\bf{w}$
    - $m(\bf{w}_i;\theta)$ is a measurable function of $\bf{w}$ for all $\theta \in \Theta$


### Consistency of M-Estimators
#### (Q3)  What is the identification condition for an M-estimator?

Identification condition for M-estimator is $\mathbb{E}\left[m(\bf{w}_i;\theta)\right]$ is uniquelyidentified at $\theta_0 \in \Theta$
- For ML, where $m(\bf{w}_i;\theta)=\log f(y_i \vert \bf{x}_i;\theta_0)$, for all $\theta \neq \theta_0$,
$$\log f(y_i \vert \bf{x}_i;\theta) \neq \log f(y_i \vert \bf{x}_i;\theta_0) $$

- For NLS, where $m(\bf{w}_i;\theta)=-\left[ y_i - \varphi_i(\bf{x}_i; \psi) \right]^2$, for all $\theta \neq \theta_0$,
$$\varphi(\bf{x_i};\theta) \neq \varphi(\bf{x_i};\theta_0)$$


### Consistency of M-Estimators
#### (Q4) What is the uniform and point-wise convergence conditions?
- Uniform convergence condition: by the Law of the Large Numbers, the condition becomes
$$\mathbb{E}\left[\sup_{\theta \in \Theta} \vert m(\bf{w}_i;\theta) \vert \right] < \infty$$
- Point-wise convergence condition: by the Ergodic Theorem, the condition becomes
$$\mathbb{E}\left[\vert m(\bf{w}_i;\theta) \vert \right] < \infty$$
for all $\theta \in \Theta$, (i.e., $\mathbb{E}\left[m(\bf{w}_i;\theta) \right]$ exists and is finite)

### Consistency of GMM Estimator 
#### (Q1) What is $Q_0(\theta)$ in the previous consistency propositions?

For GMM estimator, the objective function is:
$$Q_n(\theta)=-\bigg[\frac{1}{n}\sum_{1}^{n} g_n(\bf{w}_i;\theta)\bigg]'\hat{\bf{W}}\bigg[\frac{1}{n}\sum_{1}^{n} g_n(\bf{w}_i;\theta)\bigg]$$
$$Q_0(\theta)=-\mathbb{E}\big[g(\bf{w}_i;\theta)\big]'\hat{\bf{W}}\mathbb{E}\big[g(\bf{w}_i;\theta)\big]$$

#### (Q2) What are the conditions for an M-estimator $\hat{\theta}$ to be well-defined?
1. $g(\bf{w}_i;\theta)$ is a continuous function of $\theta$ for any data $\bf{w}$
2. $g(\bf{w}_i;\theta)$ is a measurable function of $\bf{w}$ for all $\theta \in \Theta$

### Consistency of GMM Estimator 
#### (Q3)  What is the identification condition for an GMM estimator?

- Notice that the maximum is zero at $\theta_0$, because of the orthogonality conditions, $\mathbb{E}\big[g(\bf{w}_i;\theta)\big]=0$.
- Therefore, the identification is satisfied if for all $\theta \in \Theta$, 
$$\mathbb{E}\big[g(\bf{w}_i;\theta)\big] \neq \mathbb{E}\big[g(\bf{w}_i;\theta_0)\big]$$

#### (Q4) What is the uniform convergence condition?
$$\mathbb{E}\left[\sup_{\theta \in \Theta} \vert\vert g(\bf{w}_i;\theta) \vert\vert \right] < \infty$$

## Aymptotic Normality

## The General Framework

- $\hat{\theta} = \arg \max Q_n(\theta)$
- If $\bar{\theta} \in [\theta_0,\hat{\theta}]$, [Mean Value Theorem](https://en.wikipedia.org/wiki/Mean_value_theorem) or first order Taylor Expansion:
$$0 = \frac{\partial{Q_n(\hat{\theta})}}{\partial{\theta}}  = \frac{\partial{Q_n(\theta_0)}}{\partial{\theta}} + 
    \frac{\partial{Q^2_n(\bar{\theta})}}{\partial{\theta}\partial{\theta'}}(\hat{\theta}-\theta_0) $$
- If $\frac{\partial{Q^2_n(\bar{\theta})}}{\partial{\theta}\partial{\theta'}}$ is [nonsingular](https://mathworld.wolfram.com/NonsingularMatrix.html#:~:text=A%20square%20matrix%20that%20is,45) and $\frac{\partial{Q_n(\hat{\theta})}}{\partial{\theta}}=0$, then
$$\sqrt{n}(\hat{\theta}-\theta_0) 
= -\bigg[\frac{\partial{Q^2_n(\bar{\theta})}}
{\partial{\theta}\partial{\theta'}}\bigg]^{-1}
\sqrt{n}\frac{\partial{Q_n(\theta_0)}}{\partial{\theta}}$$


$$\begin{align}
\sqrt{n}(\hat{\theta}-\theta_0) 
&= -\bigg[\frac{\partial{Q^2_n(\theta_0)}}
{\partial{\theta}\partial{\theta'}}\bigg]^{-1}
\sqrt{n}\frac{\partial{Q_n(\theta_0)}}{\partial{\theta}} \\
&\xrightarrow{d} N(0,A^{-1}BA^{-1})
\end{align}$$
where
$$A = \frac{\partial{Q^2_n(\theta_0)}}{\partial{\theta}\partial{\theta'}}$$
$$B = \mathrm{Var}\left(\sqrt{n}\frac{\partial{Q_n(\theta_0)}}{\partial{\theta}}\right)$$



### Asymptotic Normality for M-Estimators


Let's denote
- **Score vector** as $$\bf{s}(\bf{w_i};\theta) = \frac{\partial{Q_n(\theta)}}{\partial{\theta}} = \frac{\partial{m(\bf{w_i};\theta)}}{\partial{\theta}}$$ 
- **Hessian** as $$\bf{H}(\bf{w_i};\theta) = \frac{\partial{Q^2_n(\theta)}}
{\partial{\theta}\partial{\theta'}} = \frac{\partial^2{m(\bf{w_i};\theta)}}{\partial{\theta}\partial{\theta'}}$$



$$\frac{1}{n}\sum_{1}^{n}\bf{H}(\bf{w_i};\bar{\theta}) \xrightarrow{p} \mathbb{E}\left[\bf{H}(\bf{w_i};\theta_0)\right]$$
$$\frac{1}{\sqrt{n}}\sum_{1}^{n}\bf{s}(\bf{w_i};\theta_0)\xrightarrow{d} N(0,\Sigma)$$
Then by [Slutzky's theorem](https://en.wikipedia.org/wiki/Slutsky%27s_theorem),
$$\sqrt{n}(\hat{\theta}-\theta_0) \rightarrow_{d} N\Bigg( 0,\mathbb{E}\big[\bf{H}(\bf{w_i};\theta_0)\big]^{-1}\; \Sigma \; \mathbb{E}\big[\bf{H}(\bf{w_i};\theta_0)\big]^{-1} \Bigg)$$


### Asymptotic Normality for GMM-Estimators
$$Q_n(\theta) = g_n(\theta)'Wg_n(\theta)$$ where
$$g_n(\theta) = \frac{1}{n}\sum_{1}^{n}g(w_i;\theta)$$
Let $G_n(\theta)$ is the Jacobian of $g_n(\theta)$
$$\bf{G}_n(\theta) = \frac{\partial g_n(\theta)}{\partial \theta } $$

- If $\bar{\theta} \in [\theta_0,\hat{\theta}]$,
$$\begin{align}0 = \bf{G}_n(\hat{\theta})'\bf{W}g_n(\hat{\theta}) & = \bf{G}_n(\hat{\theta})'\bf{W}\bigg(g_n(\theta_0) + \bf{G}_n(\bar{\theta})\big(\hat{\theta}-\theta_0\big)\bigg) \\
& = \bf{G}_n(\hat{\theta})'\bf{W}g_n(\theta_0) + \bf{G}_n(\hat{\theta})'\bf{W}\bf{G}_n(\bar{\theta})\big(\hat{\theta}-\theta_0\big)
\end{align}$$
because $Q_n(\theta)$ is already a quadratic form in $g_n(\theta)$
- If $\bf{G}_n(\hat{\theta})'\bf{W}\bf{G}_n(\bar{\theta})$ is nonsingular, then
$$\sqrt{n}(\hat{\theta}-\theta_0) 
= -\big[\bf{G}_n(\hat{\theta})'\bf{W}\bf{G}_n(\bar{\theta})\big]^{-1}
\bf{G}_n(\hat{\theta})'\bf{W}\sqrt{n}g_n(\theta_0)$$


Let $G=\mathbb{E}\big[G_n(\theta_0)\big]$ and 
$\Omega=\mathbb{E}=\big[g(\bf{w};\theta_0)g(\bf{w};\theta_0)'\big]$
$$\begin{align}
\sqrt{n}(\hat{\theta}-\theta_0) & = (G'WG)^{-1}G'W\sqrt{n}g_n(\theta_0) \\
& = (G'WG)^{-1}G'WN(0,\Omega) \\
& = N\bigg(0,(G'WG)^{-1}G'W \Omega WG (G'WG)^{-1}\bigg) 
\end{align}$$


**What is the optimal choice of the weighting matrix $W$?**
- The most efficient choice of $W = \Omega^{-1}$
$$\begin{align}
\sqrt{n}(\hat{\theta}-\theta_0) & = 
N\bigg(0,(G'\Omega^{-1}G)^{-1}G'\Omega^{-1} \Omega \Omega^{-1} G (G'\Omega^{-1} G)^{-1}\bigg) \\
& \xrightarrow{d} N\bigg(0,(G'\Omega^{-1}G)^{-1}\bigg)
\end{align}$$
- When $G$ is invertible, $W$ is irrelevant
$$\begin{align}
\sqrt{n}(\hat{\theta}-\theta_0) & = N\bigg(0,G^{-1}\Omega G'^{-1}\bigg) \\
& \xrightarrow{d} N\bigg(0,(G'\Omega^{-1}G)^{-1}\bigg)
\end{align}$$

### GMM vs. ML
$$ \mathrm{Avar}(\hat{\theta})\geq\mathbb{E}\big[\bf{s}(\bf{w_i};\theta_0)\bf{s}(\bf{w_i};\theta_0)'\big]^{-1}$$
where 
$$\bf{s}(\bf{w_i};\theta_0) \equiv \frac{\partial \log f(\bf{w}_i;\theta_0)}{\partial \theta}$$
- The lower bound for the asymptotic variance of GMM estimators is asymptotic variance of the ML estimator.
- ML is more efficient than GMM in general
- GMM with the optimal orthogonal condition is numerically equivalent to ML
- ML exploits the knowledge of the parametric form of $f(\bf{w}_i;\theta)$ while GMM doesn't 
- GMM is more robust than ML to the specification error in $f(\bf{w}_i;\theta)$

## Restrictions and Hypothesis Testing

## Restrictions

Let $\hat{\theta}$ be the extremum estimator in either ML or GMM. The constrained estimator, denoted $\tilde{\theta}$, solves
$$\max_{\theta \in \Theta} Q_n(\theta) \quad s.t. \quad \bf{a}(\theta)=\bf{0}$$

In many cases, economic theory suggests restrictions on the parameters
of a model. For example, a demand function is supposed to be homogeneous
of degree zero in prices and income. 

The general formulation of linear equality restrictions is the model
$$\begin{align}
y&=X\beta+\epsilon \\
R\beta&=r
\end{align}$$
- We assume $R$ is of rankn $Q$, so that there are no redundant restrictions
- We also assume that $\exists \; \beta$ that satisfies the restrictions: they aren't infeasible
Taking Lagrangean,
$$\min_{\beta,\lambda} Q_n(\beta,\lambda)
= \frac{1}{n}\big(y-X\beta\big)' + 2\lambda'\big(R\beta-r\big)$$

$$H_0: R\beta_0=r$$


## Hypothesis Testing

In many cases, one wishes to test economic theories. If theory suggests parameter restrictions,
as in the above homogeneity example, one can test theory by testing parameter restrictions. A number of tests are available.
- Wald
- Lagrange multiplier (LM) - for constrained estimator
- Likelihood ratio (LR)


There is a trio of statistics called **the trinity**:
1. Wald - for unconstrained estimator
2. Lagrange multiplier (LM) - for constrained estimator
3. Likelihood ratio (LR)

that can be used for testing the null hypothesis.

- The three statistics share the same asymptotic distribution (of $\chi^2$)
- Applicable for both ML and GMM

### Null Hypothesis
Consider the problem of testing a set of $r$ possibly nonlinear
restrictions and $p$-dimensional model parameter:
$$ H_0 : \bf{a}(\theta_0) = 0$$

- $\bf{a}(\theta_0)$ has dimension $(r \times 1)$
- $\bf{A}(\theta)$ has dimension $(r \times p)$

Assume
- $\bf{a}(\cdot)$ is continuously differentiable 
- $\bf{A}(\theta)$ is the Jacobian of $\bf{a}(\theta)$
$$\bf{A}(\theta) = \frac{\partial \bf{a}(\theta)}{\partial \theta'}$$
- $\bf{A}(\theta)$ is of full (row) rank (i.e. r restrictions are not
redundant)

### Assumptions for the Trinity
1. MVT or Taylor expansion for the sampling error:
$$\sqrt{n}\big(\hat{\theta}-\theta_0\big) = \Psi^{-1}\sqrt{n}\frac{\partial{Q_n(\theta_0)}}{\partial{\theta}}+o_p$$
where the term $o_p$ means some random variable that converges to zero in probability, which will depend on the context. 
2. $\quad$
$$\frac{\partial{Q_n(\theta_0)}}{\partial{\theta}}
    \xrightarrow{d} N\big(0,\Sigma\big)$$
3. $\sqrt{n}\big(\tilde{\theta}-\theta_0\big)$ converges in distribution to a random variable, where $\tilde{\theta}$ is the constrained estimator:
$$\tilde{\theta} \in \arg \max_{\theta \in \Theta} Q_n(\theta) \quad s.t. \quad \bf{a}(\theta)=0$$

$$\sqrt{n}\big(\hat{\theta}-\theta_0\big) = \Psi^{-1}\sqrt{n}\frac{\partial{Q_n(\theta_0)}}{\partial{\theta}}+o_p$$
Recall
- For M-estimator:
$$\sqrt{n}(\hat{\theta}-\theta_0) 
= -\bigg[\frac{\partial{Q^2_n(\bar{\theta})}}
{\partial{\theta}\partial{\theta'}}\bigg]^{-1}
\sqrt{n}\frac{\partial{Q_n(\theta_0)}}{\partial{\theta}}$$
$$\Psi = \frac{\partial{Q^2_n(\bar{\theta})}}
{\partial{\theta}\partial{\theta'}} = \mathbb{E}\big[\bf{H}(\bf{w_i};\bar{\theta})\big]$$

- For GMM: 
$$\sqrt{n}(\hat{\theta}-\theta_0) 
= -\big[\bf{G}_n(\hat{\theta})'\hat{\bf{W}}\bf{G}_n(\bar{\theta})\big]^{-1}
\bf{G}_n(\hat{\theta})'\hat{\bf{W}}\sqrt{n}g_n(\theta_0)$$
$$\Psi = \bf{G}_n(\hat{\theta})'\hat{\bf{W}}\bf{G}_n(\bar{\theta})$$

Notice that for ML and efficient GMM ($\bf{W}=\bf{\Omega^{-1}}$), then
$$\Sigma = -\Psi$$

### Wald Statistic
Based on the Mean Value Theorem and Taylor expansion, under the null:
$$\begin{align}
\sqrt{n} \; \bf{a}(\hat{\theta}) 
& = \bf{A}(\theta_0)\sqrt{n} \; (\hat{\theta}-\theta_0)+o_p \\
& =-\bf{A}(\theta_0)\Psi^{-1}\sqrt{n} \; \Psi^{-1}\sqrt{n}\frac{\partial{Q_n(\theta_0)}}{\partial{\theta}}+o_p+o_p
\end{align}$$
and the asymptotic variance is:
$$\begin{align}
\mathrm{AVar}\big(\bf{a}(\hat{\theta})\big)
& = \bf{A}(\theta_0)\Psi^{-1}\Sigma\Psi^{-1}\bf{A}(\theta_0) \\
& = \bf{A}(\theta_0) \Sigma^{-1} \bf{A'}(\theta_0)
\end{align}$$

Since the $\bf{A}_0$ and $\Sigma$ is positive definite $\mathrm{AVar}\big(\bf{a}(\hat{\theta})\big)$ is positive definite. Therefore, the associated quadratic form 
$$W\equiv n\bf{a}(\hat{\theta})'
\big[\bf{A}(\hat{\theta})\hat{\Sigma}^{-1}\bf{A}(\hat{\theta})'\big]^{-1}\bf{a}(\hat{\theta})$$
is asymptotically $\chi^2(r)$ under the null hypothesis.

### Lagrange Multiplier (LM) Statistic
$$\begin{align}
LM &\equiv n\bigg(\frac{\partial Q_n(\tilde{\theta})}{\partial \theta}\bigg)' \tilde{\Sigma}^{-1} \bigg(\frac{\partial Q_n(\tilde{\theta})}{\partial \theta}\bigg) \\
& = n\gamma'_n \big[\bf{A}(\hat{\theta})\hat{\Sigma}^{-1}\bf{A}(\hat{\theta})'\big] \gamma_n
\end{align}$$
is asymptotically $\chi^2(r)$ under the nuull hypothesis.
### Likelihood Ratio Multiplier (LR) Statistic

$$\begin{align}
LR &\equiv 2n\big[ Q_n(\hat{\theta}) - Q_n(\tilde{\theta}) \big] \\
& = n\gamma'_n \big[\bf{A}(\hat{\theta})\hat{\Sigma}^{-1}\bf{A}(\hat{\theta})'\big] \gamma_n
\end{align}$$
is asymptotically $\chi^2(r)$ under the null hypothesis.