In [1]:
from datascience import *
import numpy as np
from math import *
from scipy import stats
import matplotlib.pyplot as plt
%matplotlib inline

## Lesson 29: Maximum Likelihood Estimation

Last lesson, we studied method of moments estimators. These estimators are obtained by setting the moments of a distribution equal to the sample moments obtained from an independent random sample, and then solving for the parameters of interest. As we saw, method of moments estimators are relatively easy to find, but don't always make sense (as in the case of $X\sim \textsf{Unif}(0,b)$.) 

Another way to estimate is by maximizing the likelihood function. First, we should introduce the likelihood function. The likelihood function, $L(\theta \mid \textbf{x})$, is a function of $\theta$ that is larger for likelier values of $\theta$. Finding the value of $\theta$ that maximizes this function yields a maximum likelihood estimator, or $\hat{\theta}_{ML}$. 

Let $X_1,X_2,...,X_n$ be a sequence of iid random variables with mass or density function $f(x;\theta)$. The likelihood function is given by:

$$
L(\theta\mid \textbf{x}) = \prod_{i=1}^n f(x_i;\theta)
$$

Often, it is easier to deal with the log of the likelihood function. This is because the log of a product is the sum of individual logs, which is often analytically "nicer". The log-likelihood function is denoted as $l(\theta \mid \textbf{x})$ and is given by:

$$
l(\theta\mid\textbf{x})=\log \prod_{i=1}^n f(x_i;\theta) = \sum_{i=1}^n \log f(x_i;\theta)
$$
 

### Example 1: Exponential Distribution

Suppose $X_1,X_2,...,X_n$ is an iid sequence of random variables from the exponential distribution with unknown parameter $\lambda$. I would like to obtain $\hat{\lambda}_{ML}$, the maximum likelihood estimate of $\lambda$. 

Recall that if $X\sim \textsf{Exp}(\lambda)$, then $f(x)=\lambda e^{-\lambda x}$. So,

$$
L(\theta\mid \textbf{x}) = \prod_{i=1}^n f(x_i;\theta) = \prod_{i=1}^n \lambda e^{-\lambda x_i} = \lambda^n e^{-\lambda \sum x_i}
$$

Maximizing this through differentiation looks difficult. Let's consider the log-likelihood instead: 

$$
l(\theta\mid \textbf{x}) = n \log \lambda - \lambda \sum x_i
$$

This looks easier. Take the derivative with respect to $\lambda$ and set to 0. Then solve for $\lambda$. I leave this next step to you. How does your answer compare to $\hat{\lambda}_{MoM}$? 

Taking the derivative of $l(\theta\mid \textbf{x})$ and setting equal to 0 gives  
$\frac n\lambda- \sum x_i = 0$  
$\lambda={n \over \sum x_i}$
$$
\lambda={1 \over \bar{X}}
$$
This is equal to $\hat{\lambda}_{MoM}$.

### Example 2: Uniform Distribution

Suppose $X_1,X_2,...,X_n$ is an iid sequence of random variables from the continuous uniform distribution on $0 \leq X \leq b$ with unknown parameter $b$. I would like to obtain $\hat{b}_{ML}$, the maximum likelihood estimate of $b$. 

This one is trickier since the domain of $X$ depends on the parameter we are trying to estimate. So I will start you off with a hint. The pdf of $X$ is $f(x)=\frac{1}{b}$ where $0\leq x \leq b$ and 0 otherwise. Another way to write this is with indicator functions:

$$
f(x)={1\over b}I(x\leq b)
$$

where $I(x\leq b)$ is equal to 1 if $x \leq b$ and 0 otherwise. 

$L(\theta\mid \textbf{x}) = \prod_{i=1}^n f(x_i;\theta) = \prod_{i=1}^n \frac 1b I(x\le b)$  
$ = \frac 1{b^n}$I(X_1,X_2,...,X_n all <b)   
If any of the sampled $X_i>b$ then the term for that $i$ will be 0, making the whole product 0. This observation allows us to write  
$$
L(\theta\mid \textbf{x}) = \frac 1{b^n}I(max(X_i)\le b)
$$
  
We note that $L(\theta\mid \textbf{x})$ is decreasing, so it will be maximized by picking the smallest possible $b$ where the condition $max(X_i)\le b$ is true.  This is satisfied when
$$
b = max(X_i)
$$


### Example 3: Binomial Distribution

Suppose $X_1,X_2,...,X_n$ is an iid sequence of random variables with the binomial distribution with 20 trials and unknown probability of success $\pi$. Find the maximum likelihood estimate of $\pi$. 

First let's remember that the binomial distribution has the probability mass function
$$
f(k,n,\pi) = \frac {n!}{k!(n-k)!}\pi^k(1-\pi)^{n-k}
$$
where $k$ is the number of successes in $n$ trials each of which has probability of success $\pi$.  
Here $n = 20$, so the MLE is  
$$
L(\theta\mid \textbf{x}) = \prod_{i=1}^n f(X_i;\pi) = \prod_{i=1}^n \frac {20!}{X_i!(20-X_i)!}\pi^{X_i}(1-\pi)^{20-X_i}
$$
Taking this derivative looks hard so let's use the loglikeihood function
$$
l(\pi\mid \textbf{x}) = constant + xln(\pi) + (20-x)ln(1-\pi)
$$
where $x = \sum_i X_i$ is the total number of successes.  
Taking $\frac {d}{d\pi}$ and setting  equal to 0 gives
$$
\frac x\pi - \frac{20-x}{1-\pi} = 0
$$
$$
x(1-\pi) = (20-x)\pi
$$
$$
\pi = \frac {x}{20}
$$

The maximum likelihood estimate is the total number of observed successes divided by the number of trials.  