# Maximum Likelihood Estimation

## Overview

Perhaps the most common approach for deriving estimators, is the maximum likelihood method.
This is a frequentist probabilistic framework that seeks a set of parameters for the model 
that maximizes a likelihood function. In particular, we wish to maximize the conditional probability 
of observing the data $D$ given a specific probability distribution and its parameters $\theta$.

Maximum likelihood parameter estimation is a technique that can be used
when we are willing to make assumptions about the probability distribution of
the data. Based on the theoretical probability distribution and the observed data,
the likelihood function is a probability statement that can be made about a
particular set of parameter values. If two sets of parameters values are being
identified, the set with the larger likelihood would be deemed more consistent
with the observed data.

## Maximum likelihood estimation

We define the likelihood function as follows [1]

----
**Definition**

The likelihood function is defined by

\begin{equation}
L_{n}(\theta) = \prod_{i=1}^{n} f(X;\theta)
\end{equation}


It often easier to work with the logarithm of $L_{n}(\theta)$. Hence, the log-likelihood function as 

\begin{equation}
l_{n}(\theta) = log L_{n}(\theta)
\end{equation}


----


The likelihood function $L_{n}(\theta)$ is the joint  density of the data, assuming that the data is i.i.d. We treat it as a function
of the parameter $\theta$ i.e. [1]

\begin{equation}
L_{n}(\theta):\Theta \rightarrow [0, \infty]
\end{equation}


----
**Remark**

The function $L_{n}(\theta)$ is not a density function. That is it is not true that integrates to 1 with respect to $\theta$ [1].

----

The maximum likelihood estimator or MLE is the value of $\theta$ that maximizes $L_{n}(\theta)$. We 
will denote this value with $\hat{\theta}$. In addition, given that the maximum of the log-likelihood occurs
at the same point as the maximum of $L_{n}(\theta)$, we will often maximize $l_n(\theta)$. Let's see some theoretical
examples in order to solidify the process.

### Example 1

This is a classical example often cited when discussing maximum likelihood estaimators.  Specifically, let the data
follow the normal distribution with parameters $\mu$ and $\sigma^2$.

## Summary

## References

1. Larry Wasserman, _All of Statistics. A Concise Course in Statistical Inference_, Springer 2003.