In [2]:
from datascience import *
import numpy as np
from math import *
from scipy import stats
import matplotlib.pyplot as plt
%matplotlib inline

## Lesson 29: Maximum Likelihood Estimation

Last lesson, we studied method of moments estimators. These estimators are obtained by setting the moments of a distribution equal to the sample moments obtained from an independent random sample, and then solving for the parameters of interest. As we saw, method of moments estimators are relatively easy to find, but don't always make sense (as in the case of $X\sim \textsf{Unif}(0,b)$.) 

Another way to estimate is by maximizing the likelihood function. First, we should introduce the likelihood function. The likelihood function, $L(\theta \mid \textbf{x})$, is a function of $\theta$ that is larger for likelier values of $\theta$. Finding the value of $\theta$ that maximizes this function yields a maximum likelihood estimator, or $\hat{\theta}_{ML}$. 

Let $X_1,X_2,...,X_n$ be a sequence of iid random variables with mass or density function $f(x;\theta)$. The likelihood function is given by:

$$
L(\theta\mid \textbf{x}) = \prod_{i=1}^n f(x_i;\theta)
$$

Often, it is easier to deal with the log of the likelihood function. This is because the log of a product is the sum of individual logs, which is often analytically "nicer". The log-likelihood function is denoted as $l(\theta \mid \textbf{x})$ and is given by:

$$
l(\theta\mid\textbf{x})=\log \prod_{i=1}^n f(x_i;\theta) = \sum_{i=1}^n \log f(x_i;\theta)
$$
 

### Example 1: Exponential Distribution

Suppose $X_1,X_2,...,X_n$ is an iid sequence of random variables from the exponential distribution with unknown parameter $\lambda$. I would like to obtain $\hat{\lambda}_{ML}$, the maximum likelihood estimate of $\lambda$. 

Recall that if $X\sim \textsf{Exp}(\lambda)$, then $f(x)=\lambda e^{-\lambda x}$. So,

$$
L(\theta\mid \textbf{x}) = \prod_{i=1}^n f(x_i;\theta) = \prod_{i=1}^n \lambda e^{-\lambda x_i} = \lambda^n e^{-\lambda \sum x_i}
$$

Maximizing this through differentiation looks difficult. Let's consider the log-likelihood instead: 

$$
l(\theta\mid \textbf{x}) = n \log \lambda - \lambda \sum x_i
$$

This looks easier. Take the derivative with respect to $\lambda$ and set to 0. Then solve for $\lambda$. I leave this next step to you. How does your answer compare to $\hat{\lambda}_{MoM}$? 

###### Check notes for full computation

Taking the derivative with respect to $\lambda$,setting to 0, and solving for $\lambda$ yeilds $$\hat{\lambda}_{ML} = {1 \over \bar{X}}$$
This is the same as $\hat{\lambda}_{MoM}$.

### Example 2: Uniform Distribution

Suppose $X_1,X_2,...,X_n$ is an iid sequence of random variables from the continuous uniform distribution on $0 \leq X \leq b$ with unknown parameter $b$. I would like to obtain $\hat{b}_{ML}$, the maximum likelihood estimate of $b$. 

This one is trickier since the domain of $X$ depends on the parameter we are trying to estimate. So I will start you off with a hint. The pdf of $X$ is $f(x)=\frac{1}{b}$ where $0\leq x \leq b$ and 0 otherwise. Another way to write this is with indicator functions:

$$
f(x)={1\over b}I(x\leq b)
$$

where $I(x\leq b)$ is equal to 1 if $x \leq b$ and 0 otherwise. 

###### Check notes for full computation

Plugging the pdf of a contiuous uniform distribution into the Likelihood Function results in:
$$
L(b\mid \textbf{x}) = \prod_{i=1}^n f(x_i;b) = \prod_{i=1}^n {1\over b}I(x\leq b) = {1\over b^n}I(x_i\leq b)
$$

There is don't take the log of this function because there is no need to take the derivative. The Indicator Function basically indicates True or False. Suppose we graphed ${1\over b}$ and set $x_i$ equal to  6.3, or any other number. (Check notes for graph). The indicator function is 0 to the left of 6.3 because b would then be less than 6.3, and 1 to the right of 6.3. Therefore, $\hat{b}_{ML}$ = max $x_i$. In other words, the best guess of the max is our largest value (max).

### Example 3: Binomial Distribution

Suppose $X_1,X_2,...,X_n$ is an iid sequence of random variables with the binomial distribution with 20 trials and unknown probability of success $\pi$. Find the maximum likelihood estimate of $\pi$. 

Recall that if $X\sim \textsf{Exp}(n,\pi)$, then $f(x)={n\choose x} \pi^x (1-\pi)^{n - x}$. So,

$$
L(\pi\mid \textbf{x}) = \prod_{i=1}^n f(x_i;20,\pi) = \prod_{i=1}^n {20\choose x_i} \pi^x_i (1-\pi)^{20 - x_i} = {20\choose \sum x_i} \pi^\sum x_i (1-\pi)^{20n - \sum x_i}
$$

We must take the log of $L(\pi\mid \textbf{x})$ to more easily take the derivative.

$$
l(\pi\mid \textbf{x}) = log{20\choose \sum x_i} + log\pi^{\sum x_i} + log(1-\pi)^{20n - \sum x_i}
$$

Taking the derivative, setting equal to 0 and solving fro $\pi$ yields

$$
\hat{\pi}_{ML} = {\sum x_i\over 20n}
$$