# Extreme Estimators

## Introduction

**Definition (Extreme Estimator)**: An estimator $\hat{\theta}$ is called an extreme estimator if there is a scalar objective function $Q_n(\bf{w};\theta)$ such that
$$\hat{\theta} \in \arg \max Q_n(\bf{w};\theta)$$
subject to $\theta \in \Theta \subset \mathbb{R}^p$, where
- $n$ is the number of observations in the data
- $\bf{w} \equiv (\bf{w}_1,\dots,\bf{w}_n)$ is the sample or the data, and 
- $\Theta$ is the set of possible parameter values

This maximization problem may not necessarily have a solution. The following lemma shows that $\hat{\theta}$ is measurable if $Q_n(\theta)$ is

**Lemma (Existence of Extreme Estimators)**: Suppose that
1. the parameter space $\Theta$ is a compact subset of $\mathbb{R}^p$
2. $Q_n(\theta)$ is continuous in $\theta$ for any data $\bf{w}$, and
3. $Q_n(\theta)$ is a measurable function of $\bf{w}$ for all $\theta \in \Theta$.

Then there exists $\hat{\theta}$ such that $\arg \max Q_n(\bf{w};\theta)$ subject to  $\theta \in \Theta$

## Two Classes of Extreme Estimators
1. M-Estimators:  $Q_n(\theta)$ is a simple averate
$$Q_n(\theta)=\frac{1}{n}\sum_{1}^{n}m(\bf{w}_i;\theta)$$
    - Examples: maximum likelihood (ML) and nonlinear least squares (NLS)
2. Generalized Method of Moments (GMM)
$$Q_n(\theta)=-g_n(\theta)'\hat{\bf{W}}g_n(\theta)$$
where
    - $\hat{\bf{W}}$ is a $K \times K$ symmetric and positive definite matrix that defines the distance of $g_n(\theta)$ from zero.
    - $g_n(\theta) = \frac{1}{n}\sum_{1}^{n}g(\hat{\bf{W}};\theta)$

### M-Estimator Example: Maximum Likelihood
- ${\bf{w}_i}$ is i.i.d.
- $\theta$ is a finite-dimensional vector
- a functional form of $f({\bf{w}_i};\theta)$ is known
- $\theta_0$ is the true parameter value

The joint density of data $(\bf{w}_1,\dots,\bf{w}_n)$ is 
$$f(\bf{w}_1,\dots,\bf{w}_n;\theta_0)=\prod_{1}^{n} f(\bf{w}_i;\theta_0)$$
The $Q_n(\theta)$ can either be the likelihood and the log-likelihood function:
$$f(\bf{w}_1,\dots,\bf{w}_n;\theta)=\prod_{1}^{n} f(\bf{w}_i;\theta)$$
$$\log f(\bf{w}_1,\dots,\bf{w}_n;\theta)=\log \left[ \prod_{1}^{n} f(\bf{w}_i;\theta) \right] = \sum_{1}^{n} \log f(\bf{w}_i;\theta) $$ 



### M-Estimator Example: Conditional Maximum Likelihood
- ${\bf{w}_i}$ is partitioned into two groups, $y_i$ an $\bf{x}_i$, and the interest is to examine how $\bf{x}_i$ influences the conditional distribution of $y_i$
- $f(y_i |\bf{x}_i; \psi_0)$ be the conditional density of $y_i$ given $\bf{x}_i$ 
- $f(\bf{x}_i; \psi_0)$ be the marginal density of $\bf{x}_i$

The joint density of data $(\bf{w}_1,\dots,\bf{w}_n) = (y_t,\bf{x}'_i)' $ is 
$$ f(y_t ,\bf{x}_i;\theta_0,\psi_0) = f(y_i | \bf{x}_i;\theta_0)f(\bf{x}_i;\psi_0) $$
The $Q_n(\theta)$ can either be the likelihood and the log-likelihood function:
$$f(\bf{w}_i;\theta,\psi)=\prod_{1}^{n} f(y_i|\bf{x}_i;\theta) + \prod_{1}^{n} f(\bf{x}_i;\psi)$$
$$\sum_{1}^{n} \log f(\bf{w}_i;\theta,\psi)=\sum_{1}^{n} \log f(y_i|\bf{x}_i;\theta) + \sum_{1}^{n} \log f(\bf{x}_i;\psi)$$

### M-Estimator Example: Nonlinear least square
- $y_i = \varphi_i(\bf{x}_i; \psi_0) + \epsilon_i$
- $\mathbb{E}(\epsilon_i|\bf{x}_i)$
- The functional form of $\varphi$ is known

The $Q_n(\theta)$ is
$$-\frac{1}{n}\sum_{1}^{n}\left[ y_i - \varphi_i(\bf{x}_i; \psi) \right]^2$$

### M-Estimator Example: Nonlinear GMM
- $y_i = \varphi_i(\bf{x}_i; \psi_0) + \epsilon_i$
- $\mathbb{E}(\epsilon_i|\bf{x}_i)$
- The functional form of $\varphi$ is known

Moment condition:
$$\mathbb{E}(\epsilon_i | \bf{x}_i)=0 \rightarrow \mathbb{E}(\epsilon_i \cdot \bf{x}_i)=0 \rightarrow \mathbb{E}\bigg( \big[ y_i - \varphi_i(\bf{x}_i; \psi) \big] \cdot \bf{x}_i \bigg)=0$$
Using the moment condition, the $Q_n(\theta)$ is
$$Q_n(\theta)=-g_n(\theta)'\hat{\bf{W}}g_n(\theta)$$
where
$$g_n(\theta) = \frac{1}{n}\sum_{1}^{n}\big[ y_i - \varphi_i(\bf{x}_i; \psi) \big] \cdot \bf{x}_i$$

## Consistency
If the parameter space is compact,

**Proposition (Consistency with Compact Parameter Space)**: Suppose that 
1. $\Theta$ is a compact subset of $\mathbb{R}^p$
2. $Q_n(\bf{w};\theta)$ is a continuous function of for any data $\bf{w}$
3. $Q_n(\bf{w};\theta)$ is a measurable function of $\bf{w}$ for all $\theta \in \Theta$
4. If there is a function $Q_0(\theta)$ such that
    - (identification)  $Q_0(\theta)$ is uniquely maximized at $\theta_0 \in \Theta$
    - (uniform convergence) $\sup_{\theta \in \Theta} \vert Q_n(\theta) - Q_0(\theta) \vert \rightarrow_{p} 0 $
    
Then, $\hat{\theta} \rightarrow_{p} \theta_0$

If the parameter space is not compact,

**Proposition (Consistency without Compact Parameter Space)**: Suppose that 
1. $\theta_0 \in \text{interior} \Theta$ and $\Theta$ is a convex subset of $\mathbb{R}^p$
2. $Q_n(\bf{w};\theta)$ is a concave over $\Theta$ of for any data $\bf{w}$
3. $Q_n(\bf{w};\theta)$ is a measurable function of $\bf{w}$ for all $\theta \in \Theta$
4. If there is a function $Q_0(\theta)$ such that
    - (identification)  $Q_0(\theta)$ is uniquely maximized at $\theta_0 \in \Theta$
    - (point-wise convergence) $\vert Q_n(\theta) - Q_0(\theta) \vert \rightarrow_{p} 0$ for all $\theta \in \Theta$
    
Then, $\hat{\theta} \rightarrow_{p} \theta_0$

1. What is Q_n(\theta) for M-Estimators and GMM?
2. What are the conditions for an M-estimator $\hat{\theta}$ to be well-defined?
3. What is the identification condition for an M-estimator?
4. What is the uniform/point-wise convergence condition and the point-wise convergence condition?

### Consistency of M-Estimators
#### (Q1) What is $Q_0(\theta)$ in the previous consistency propositions?
For M-estimator, the objective function is:
$$Q_n(\theta)=\frac{1}{n}\sum_{1}^{n}m(\bf{w}_i;\theta)$$

If $\mathbb{E}\left[m(\bf{w}_i;\theta)\right]$ exists and is finite, 

by Ergodic Theorem, 
$$Q_n(\theta)=\frac{1}{n}\sum_{1}^{n}m(\bf{w}_i;\theta)\rightarrow_{p} \mathbb{E}\left[m(\bf{w}_i;\theta)\right]$$

Therefore, 
$$Q_0(\theta)=\mathbb{E}\left[m(\bf{w}_i;\theta)\right]$$

### Consistency of M-Estimators
#### (Q2) What are the conditions for an M-estimator $\hat{\theta}$ to be well-defined?
- If $\Theta$ is compact,
    - $m(\bf{w}_i;\theta)$ is a continuous function of $\theta$ for any data $\bf{w}$
    - $m(\bf{w}_i;\theta)$ is a measurable function of $\bf{w}$ for all $\theta \in \Theta$
- If $\Theta$ is not compact, but is convex and $\theta \in \text{interior} \Theta$:
    - $m(\bf{w}_i;\theta)$ is concave over $\Theta$ for any data $\bf{w}$
    - $m(\bf{w}_i;\theta)$ is a measurable function of $\bf{w}$ for all $\theta \in \Theta$


### Consistency of M-Estimators
#### (Q3)  What is the identification condition for an M-estimator?

Identification condition for M-estimator is $\mathbb{E}\left[m(\bf{w}_i;\theta)\right]$ is uniquelyidentified at $\theta_0 \in \Theta$
- For ML, where $m(\bf{w}_i;\theta)=\log f(y_i \vert \bf{x}_i;\theta_0)$, for all $\theta \neq \theta_0$,
$$\log f(y_i \vert \bf{x}_i;\theta) \neq \log f(y_i \vert \bf{x}_i;\theta_0) $$

- For NLS, where $m(\bf{w}_i;\theta)=-\left[ y_i - \varphi_i(\bf{x}_i; \psi) \right]^2$, for all $\theta \neq \theta_0$,
$$\varphi(\bf{x_i};\theta) \neq \varphi(\bf{x_i};\theta_0)$$


### Consistency of M-Estimators
#### (Q4) What is the uniform and point-wise convergence conditions?
- Uniform convergence condition: by the Law of the Large Numbers, the condition becomes
$$\mathbb{E}\left[\sup_{\theta \in \Theta} \vert m(\bf{w}_i;\theta) \vert \right] < \infty$$
- Point-wise convergence condition: by the Ergodic Theorem, the condition becomes
$$\mathbb{E}\left[\vert m(\bf{w}_i;\theta) \vert \right] < \infty$$
for all $\theta \in \Theta$, (i.e., $\mathbb{E}\left[m(\bf{w}_i;\theta) \right]$ exists and is finite)

### Consistency of GMM Estimator 
#### (Q1) What is $Q_0(\theta)$ in the previous consistency propositions?

For GMM estimator, the objective function is:
$$Q_n(\theta)=-\bigg[\frac{1}{n}g_n(\bf{w}_i;\theta)\bigg]'\hat{\bf{W}}\bigg[\frac{1}{n} g_n(\bf{w}_i;\theta)\bigg]$$

By Ergodic Theorem
$$Q_0(\theta)=-\mathbb{E}\big[g(\bf{w}_i;\theta)\big]'\hat{\bf{W}}\mathbb{E}\big[g(\bf{w}_i;\theta)\big]$$

#### (Q2) What are the conditions for an M-estimator $\hat{\theta}$ to be well-defined?
1. $g(\bf{w}_i;\theta)$ is a continuous function of $\theta$ for any data $\bf{w}$
2. $g(\bf{w}_i;\theta)$ is a measurable function of $\bf{w}$ for all $\theta \in \Theta$

### Consistency of GMM Estimator 
#### (Q3)  What is the identification condition for an GMM estimator?

- Notice that the maximum is zero at $\theta_0$, because of the orthogonality conditions, $\mathbb{E}\big[g(\bf{w}_i;\theta)\big]=0$.
- Therefore, the identification is satisfied if for all $\theta \in \Theta$, 
$$\mathbb{E}\big[g(\bf{w}_i;\theta)\big] \neq \mathbb{E}\big[g(\bf{w}_i;\theta_0)\big]$$

#### (Q4) What is the uniform convergence condition?
$$\mathbb{E}\left[\sup_{\theta \in \Theta} \vert\vert g(\bf{w}_i;\theta) \vert\vert \right] < \infty$$