## 9 Parametric Inference

Parametric models:
$$
\mathfrak{F}=\{ f(x; \theta): \theta \in \Theta \}
$$

Why is nonparametric methods are preferable in statistical inference? Because we rarely know that the data is generated from a specific distribution.   

Two reasons make studing parametric models useful. 
1. Some well known scenarios, such as counts of traffic accidents, follow specific models.
2. Parametric models provide background for understanding certain nonparametric methods.

Topics:  
1. Parameters of interst
2. Two methods: the methods of moments and method of maximum likelihood

### 9.1 Parameter of Interest

In $X \sim N(\mu,\delta^2)$, the parameter is $\theta = (\mu,\delta)$. If we are only interested in $\mu$, it can be seen as $\mu = T(\theta)$ and is called $\textbf{parameter of interest}$, and $\delta$ is called $\textbf{nuisance parameter}$.

9.1 Example  

Let $X_i \sim N(\mu, \delta^2)$. the parameter is $\theta=(\mu,\delta)$ and the parameter space is $\Theta={(\mu,\delta): \mu \in \mathbb{R}, \delta > 0}$. Suppose $X_i$ is the outcome of a blood test and suppose we are interested in $\tau$, the fraction of the population whose test score is larger than 1. Then
$$
\begin{eqnarray*}
\tau=\mathbb{P}(X>1) &=& 1-\mathbb{P}(X<1) \\
&=& 1 - \mathbb{P}(\frac{X-\mu}{delta}<\frac{1-\mu}{\delta}) \\
&=& 1 - \mathbb{P}(Z<\frac{1-\mu}{\delta}) \\
&=& 1 - \Phi(\frac{1-\mu}{\delta})
\end{eqnarray*}
$$
The parameter of interest is $\tau=T(\mu, \delta)=1-\Phi(\frac{1-\mu}{\delta})$.


9.2 Example  

Recall that $X \sim Gamma(\alpha,\beta)$ if
$$
f(x;\alpha,\beta)=\frac{1}{\beta^\alpha \Gamma (\alpha)} x^{\alpha-1} e^{x/\beta},x>0
$$
where $\alpha,\beta>0$ and the Gamma function is
$$
\Gamma (\alpha)=\int_0^\infty y^{\alpha-1}e^{-y} dy
$$
The parameter $\theta=(\alpha,\beta)$. It is sometimes used to model lifetimes of people, animals and electronic equipment. The mean lifetime is $T(\alpha,\beta)=\mathbb{E}_\theta (X_1)=\alpha \beta$.

### 9.2 The Method of Moments

Not optimal but easy to compute. Suppose that the parameter $\theta=(\theta_1,...,\theta_k)$ has $k$ components. For $1\le j \le k$, define the jth moment
$$
\alpha_j \equiv \alpha_j(\theta)=\mathbb{E}(X^j)=\int x^j dF(x)
$$
and the jth sample moment
$$
\hat{\alpha}_j(\theta)=\frac{1}{n}\sum_{i=1}^n X_i^j
$$

9.3 Definition
The method of momnets estimator $\hat{\theta}_n$ is defined to be the value of $\theta$ such that
$$
\alpha_1(\hat{\theta}_n)=\hat{\alpha}_1 \\
\alpha_2(\hat{\theta}_n)=\hat{\alpha}_2 \\
... \\
\alpha_k(\hat{\theta}_n)=\hat{\alpha}_k \\
$$
there are k equations with k unknowns.

9.5 Example  

Let $X_i \sim N(\mu, \delta^2)$.  Then $\alpha_1=\mathbb{E}_\theta (X_1)=\mu, \alpha_2=\mathbb{E}_\theta (X_1^2)=\delta^2+\mu^2$. Solve the following equations can get us $\mu,\delta$:
$$
\begin{cases}
  \mu            &= \frac{1}{n}\sum_{i=1}^n X_i \\
  \delta^2+\mu^2 &= \frac{1}{n}\sum_{i=1}^n X_i^2
\end{cases}
$$
This is a system of 2 equations with 2 unknowns.

9.6 Theorem
Let $\hat{\theta}_n$ denote the method of moments estimator. Under appropriate conditions on the model, the following statements hold:  
1. The estimate $\hat{\theta}_n$ exists with probability thending to 1.
2. The estimate is consistent: $\hat{\theta}_n \overset{P}{\longrightarrow} \theta$.
3. The estimate is asympototically Normal:
$$
\sqrt{n}(\hat{\theta}_n-\theta) \rightsquigarrow N(0, \Sigma)
$$
where
$$
\Sigma = g \mathbb{E}_\theta (YY^T) g^T, Y=(X,X^2,...,X^k)^T, g=(g_1,g_2,...,g_k) and g_j=\frac{\partial a_j^{-1}(\theta)}{\partial \theta}
$$

### 9.3 maximum Likelihood

9.7 Definition (likelihood function )
The likelihood function is defined by 
$$
\mathcal{L}(\theta)=\prod_{i=1}^n f(X_i;\theta)
$$

The log-likelihood function is defined by
$$
\ell(\theta)=\log \mathcal{L}(\theta) = \sum_{i=1}^n f(X_i;\theta)
$$

9.8 Definition (maximum likelihood estimator)

The maximum likelihood estimator MLE is denoted by $\hat{\theta}_n$ s.t.
$$
\hat{\theta}_n = \arg \max \mathcal{L}(\theta)
$$
The maximum of $\ell(\theta)$ occurs at the same place as $\mathcal{L}(\theta)$.


9.12 Example (A Hard Example)

Let $X_i \sim \text{Uniform}(0, \theta)$. 
$$ 
f(x;\theta) = 
\begin{cases}
  \frac{1}{\theta} & 0 \le x \le \theta \\
  0                & otherwise
\end{cases}
$$

Suppose $X_{(n)}=\max{X_1,...,X_n}$, the likelihood can be written as:
$$
\mathcal{L}(\theta)=
\begin{cases}
  (\frac{1}{\theta})^n & \theta \ge X_{(n)} \\
  0                & \theta \lt X_{(n)}
\end{cases}
$$

### 9.4 Properties of MLE

The main properties (under certain conditions) are:
1. The MLE is consistent;
2. The MLE is equivariant: if $\hat{\theta}_n$ is the MLE of \theta, then $g(\hat{\theta}_n)$ is the MLE of $g(\theta)$;
3. The MLE is asymptotically Normal, and the estimated standard error $\hat{se}$ can often be computed analytically;
4. The MLE is asymptotically optimal or efficient. Roughly this means that among all well-behaved estimators, the MLE has the smallest variance, at least for large samples.
5. The MLE is approximately the Bayes estimator.

In sufficiently complicated problems these will no longer hold and the MLE will no longer be a good estimator.  
The properties only hold if the model satifies certain regularity conditions, these are essentially smoothness conditions on $f(x;\theta)$. 

### 9.5 Consistency of MLE