In [1]:
from datascience import *
import numpy as np
from math import *
from scipy import stats
import matplotlib.pyplot as plt
%matplotlib inline

## Lesson 29: Maximum Likelihood Estimation

Last lesson, we studied method of moments estimators. These estimators are obtained by setting the moments of a distribution equal to the sample moments obtained from an independent random sample, and then solving for the parameters of interest. As we saw, method of moments estimators are relatively easy to find, but don't always make sense (as in the case of $X\sim \textsf{Unif}(0,b)$.) 

Another way to estimate a parameter is by maximizing the likelihood function. First, we should introduce the likelihood function. The likelihood function, $L(\theta \mid \textbf{x})$, is a function of $\theta$ given the data $\textbf{x}$ that is larger for likelier values of $\theta$. Finding the value of $\theta$ that maximizes this function yields a maximum likelihood estimator, or $\hat{\theta}_{ML}$. The likelihood function is really just the probability mass or density function where in the later we assume $\theta$ is known and $\textbf{x}$ is unknown but this is reversed in the likelihood function.

Let $X_1,X_2,...,X_n$ be a sequence of iid, independent and identically distributed, random variables with mass or density function $f(x;\theta)$. The likelihood function is given by:

$L(\theta\mid \textbf{x}) = \prod_{i=1}^n f(x_i;\theta)$

Note that the product comes from independence.  

Often, it is easier to deal with the log of the likelihood function. This is because the log of a product is the sum of individual logs, which is often analytically "nicer". The log-likelihood function is denoted as $l(\theta \mid \textbf{x})$ and is given by:

$l(\theta\mid\textbf{x})=\log \prod_{i=1}^n f(x_i;\theta) = \sum_{i=1}^n \log f(x_i;\theta)$
 

### Example 1: Exponential Distribution

Suppose $X_1,X_2,...,X_n$ is an iid sequence of random variables from the exponential distribution with unknown parameter $\lambda$. I would like to obtain $\hat{\lambda}_{ML}$, the maximum likelihood estimate of $\lambda$. 

Recall that if $X\sim \textsf{Exp}(\lambda)$, then $f(x)=\lambda e^{-\lambda x}$. So,

$L(\lambda\mid \textbf{x}) = \prod_{i=1}^n f(x_i;\lambda) = \prod_{i=1}^n \lambda e^{-\lambda x_i} = \lambda^n e^{-\lambda \sum x_i}$

Maximizing this through differentiation looks difficult. Let's consider the log-likelihood instead: 

$l(\lambda\mid \textbf{x}) = n \log \lambda - \lambda \sum x_i$

This looks easier. Take the derivative with respect to $\lambda$ and set to 0. Then solve for $\lambda$. I leave this next step to you. How does your answer compare to $\hat{\lambda}_{MoM}$? 

${d \over d\lambda} l(\lambda\mid \textbf{x}) = {d \over d\lambda} \left( n \log \lambda - \lambda \sum x_i \right) $

$\frac{n}{\lambda} - \sum{x_{i}} = 0$

$\lambda = \frac{n}{\sum{x_{i}}} = \frac{1}{\bar{x}}$

This is equal to $\hat{\lambda}_{MoM}$.

### Example 2: Uniform Distribution

Suppose $X_1,X_2,...,X_n$ is an iid sequence of random variables from the continuous uniform distribution on $0 \leq X \leq b$ with unknown parameter $b$. I would like to obtain $\hat{b}_{ML}$, the maximum likelihood estimate of $b$. 

This one is trickier since the domain of $X$ depends on the parameter we are trying to estimate. So I will start you off with a hint. The pdf of $X$ is $f(x)=\frac{1}{b}$ where $0\leq x \leq b$ and 0 otherwise. Another way to write this is with indicator functions:

$$
f(x)={1\over b}I(x\leq b)
$$

where $I(x\leq b)$ is equal to 1 if $x \leq b$ and 0 otherwise. 

Write the likelihood for this problem:
$L = \prod {\frac{1}{b}} = \frac{1}{b^n}  ; X_{max} < b$

Note: L decreases exponentially as b grows larger than $X_{max}$

Find $\hat{b}_{ML}$ = $X_{max}$

### Example 3: Binomial Distribution

Suppose $X_1,X_2,...,X_n$ is an iid sequence of random variables with the binomial distribution with 20 trials and unknown probability of success $\pi$. Find the maximum likelihood estimate of $\pi$. 

$L = \prod{{20\choose{x_{i}}}  p^x_{i}(1-p)^{(20-x_{i})}}$

$L = \prod{{20\choose{x_{i}}}^n  p^{\sum{x_{i}}}(1-p)^{\sum{(20-x_{i})}}}$

$L = n \log{20\choose{x_{i}}} +\sum{x_{i}} \log{p} + \sum{(20-x_{i})}\log{(1-p)}$


$\frac{dL}{dp} = \frac{\sum{x_{i}}}{p} - \frac{\sum{(20-x_{i})}}{1-p} = 0$


$(1-p)\sum{x_{i}}=p\sum{(20-x_{i})}$

$(1-p)n\bar{X} = 20pn - pn\bar{X}$


$\bar{X}-\bar{X}p = 20p - \bar{X}p$

$\hat{p}_{ML} = \frac{\bar{X}}{20}$