# Maximum Likelihood Estimation (MLE)

maximum likelihood method for parameter estimation

## The problem

Define a model or a data generating process (DGP) as:
 
$$ F(x_t, z_t | \theta) = 0 $$

In reality, a model could also include inequalities representing constraints. But this is sufficient for our discussion. The goal of maximum likelihood estimation (MLE) is to choose the parameter vector of the model $\theta$ to maximize the likelihood of seeing the data produced by the model $(x_t, z_t)$.

## Simple(normal) distribution example

A simple example of a model is a statistical distribution [e.g., the normal distribution $\mathcal{N}(\mu, \sigma)$].

$$ Pr(x|\theta) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}$$

The probability of drawing value $x_i$ from the distribution $f(x|\theta)$ is $f(x_i|\theta)$. The probability of drawing the following vector of two observations $(x_1, x_2)$ from the distribution $f(x|\theta)$ is $f(x_1|\theta) \times f(x_2|\theta)$. 

We define the likelihood function of $N$ draws $(x_1, x_2, ..., x_N)$ from a model of distribution $f(x|\theta)$ as $\mathcal{L}$

$$ \mathcal{L}(x_1, x_2, ..., x_N|\theta) = \prod_{i=1}^N f(x_i|\theta) $$

Because it can be numerically difficult to **maximize a product of percentages** (one small value can make dominate the entire product), it is almost always easier to use the **log likelihood** function $ln(\mathcal{L})$.

$$\ln(\mathcal{L}(x_1, x_2, ..., x_N|\theta)) = \sum_{i=1}^N \ln(f(x_i | \theta))$$

The maximum likelihood estimate $\hat{\theta}_{MLE}$ is the following:

$$ \hat{\theta}_{MLE} = \theta : \argmax_{\theta} \ln\mathcal{L} = \sum_{i=1}^N{\ln(f(x_i|\theta))}$$

 
 


## Resources

* https://github.com/rickecon/Notebooks/blob/master/MLE/MLest.ipynb
* https://github.com/QuantEcon/lecture-python.notebooks/blob/master/mle.ipynb