# Extreme Estimators

## Introduction

**Definition (Extreme Estimator)**: An estimator $\hat{\theta}$ is called an extreme estimator if there is a scalar objective function $Q_n(\bf{w};\theta)$ such that
$$\hat{\theta} \in \arg \max Q_n(\bf{w};\theta)$$
subject to $\theta \in \Theta \subset \mathbb{R}^p$, where
- $n$ is the number of observations in the data
- $\bf{w} \equiv (\bf{w}_1,\dots,\bf{w}_n)$ is the sample or the data, and 
- $\Theta$ is the set of possible parameter values

This maximization problem may not necessarily have a solution. The following lemma shows that $\hat{\theta}$ is measurable if $Q_n(\theta)$ is

**Lemma (Existence of Extreme Estimators)**: Suppose that
1. the parameter space $\Theta$ is a compact subset of $\mathbb{R}^p$
2. $Q_n(\theta)$ is continuous in $\theta$ for any data $\bf{w}$, and
3. $Q_n(\theta)$ is a measurable function of $\bf{w}$ for all $\theta \in \Theta$.

Then there exists $\hat{\theta}$ such that $\arg \max Q_n(\bf{w};\theta)$ subject to  $\theta \in \Theta$

## Two Classes of Extreme Estimators
1. M-Estimators:  $Q_n(\theta)$ is a simple averate
$$Q_n(\theta)=\frac{1}{n}\sum_{1}^{n}m(\bf{w}_t;\theta)$$
    - Examples: maximum likelihood (ML) and nonlinear least squares (NLS)
2. Generalized Method of Moments (GMM)
$$Q_n(\theta)=-g_n(\theta)'\hat{\bf{W}}g_n(\theta)$$
where
    - $\hat{\bf{W}}$ is a $K \times K$ symmetric and positive definite matrix that defines the distance of $g_n(\theta)$ from zero.
    - $g_n(\theta) = \frac{1}{n}\sum_{1}^{n}g(\hat{\bf{W}};\theta)$

### M-Estimator Example: Maximum Likelihood
- ${\bf{w}_i}$ is i.i.d.
- $\theta$ is a finite-dimensional vector
- a functional form of $f({\bf{w}_i};\theta)$ is known
- $\theta_0$ is the true parameter value

The joint density of data $(\bf{w}_1,\dots,\bf{w}_n)$ is 
$$f(\bf{w}_1,\dots,\bf{w}_n;\theta_0)=\prod_{1}^{n} f(\bf{w}_i;\theta_0)$$
The $Q_n(\theta)$ can either be the likelihood and the log-likelihood function:
$$f(\bf{w}_1,\dots,\bf{w}_n;\theta)=\prod_{1}^{n} f(\bf{w}_i;\theta)$$
$$\log f(\bf{w}_1,\dots,\bf{w}_n;\theta)=\log \left[ \prod_{1}^{n} f(\bf{w}_i;\theta) \right] = \sum_{1}^{n} \log f(\bf{w}_i;\theta) $$ 



### M-Estimator Example: Conditional Maximum Likelihood
- ${\bf{w}_i}$ is partitioned into two groups, $y_i$ an $\bf{x}_i$, and the interest is to examine how $\bf{x}_i$ influences the conditional distribution of $y_i$
- $f(y_i |\bf{x}_i; \psi_0)$ be the conditional density of $y_i$ given $\bf{x}_i$ 
- $f(\bf{x}_i; \psi_0)$ be the marginal density of $\bf{x}_i$

The joint density of data $(\bf{w}_1,\dots,\bf{w}_n) = (y_t,\bf{x}'_i)' $ is 
$$ f(y_t ,\bf{x}_i;\theta_0,\psi_0) = f(y_i | \bf{x}_i;\theta_0)f(\bf{x}_i;\psi_0) $$
The $Q_n(\theta)$ can either be the likelihood and the log-likelihood function:
$$f(\bf{w}_i;\theta,\psi)=\prod_{1}^{n} f(y_i|\bf{x}_i;\theta) + \prod_{1}^{n} f(\bf{x}_i;\psi)$$
$$\sum_{1}^{n} \log f(\bf{w}_i;\theta,\psi)=\sum_{1}^{n} \log f(y_i|\bf{x}_i;\theta) + \sum_{1}^{n} \log f(\bf{x}_i;\psi)$$

### M-Estimator Example: Nonlinear least square
- $y_i = \varphi_i(\bf{x}_i; \psi_0) + \epsilon_i$
- $\mathbb{E}(\epsilon_i|\bf{x}_i)$
- The functional form of $\varphi$ is known

The $Q_n(\theta)$ is
$$-\frac{1}{n}\sum_{1}^{n}\left[ y_i - \varphi_i(\bf{x}_i; \psi) \right]^2$$

### M-Estimator Example: Nonlinear GMM
- $y_i = \varphi_i(\bf{x}_i; \psi_0) + \epsilon_i$
- $\mathbb{E}(\epsilon_i|\bf{x}_i)$
- The functional form of $\varphi$ is known

Moment condition:
$$\mathbb{E}(\epsilon_i | \bf{x}_i)=0 \rightarrow \mathbb{E}(\epsilon_i \cdot \bf{x}_i)=0 \rightarrow \mathbb{E}\bigg( \big[ y_i - \varphi_i(\bf{x}_i; \psi) \big] \cdot \bf{x}_i \bigg)=0$$
Using the moment condition, the $Q_n(\theta)$ is
$$Q_n(\theta)=-g_n(\theta)'\hat{\bf{W}}g_n(\theta)$$
where
$$g_n(\theta) = \frac{1}{n}\sum_{1}^{n}\big[ y_i - \varphi_i(\bf{x}_i; \psi) \big] \cdot \bf{x}_i$$

## Consistency

Suppose that 
1. $\Theta$ is a compact subset of $\mathbb{R}^p$
2. $Q_n(\bf{w};\theta)$ is continuous function of 