In [2]:
import numpy as np
from scipy.optimize import minimize

Maximum Likelihood Estimation - An Introduction
===

The purpose of this notebook is to introduce or refresh the concept of Maximum Likelihood Estimationn (MLE) to readers. MLE is a commonly-used method of estimating parameters in a stastical model. It uses probability and numerical optimization to estimate parameters by answering the question "What is the most likely true value of the parameters that will result in the observations?"

This notebook uses the Normal Distribution as a basic application of MLE. The notebook assumes that the reader has an understanding of
+ Probability Theory, specifically joint probability
+ The Normal Distribution
+ Optimization

The notebook briefly covers the intution behind MLE, and then applies the intuition to an example with the Normal Distribution.

Definition and Intuition of the Likelihood Function
===

Maximum Likelihood Estimationn (MLE) is a method of estimating parameters in a stastical model. It uses probability and numerical optimization to estimate parameters by answering the question "What is the most likely true value of the parameters that will result in the observations?"

MLE begins with the likelihood function, which is rooted in probability. Recall that the joint probability of events A and B is the product of their individual probabilities: P(AB) = P(A)P(B). P(AB) is the likelihood of both events A and B occurring, so P(A)P(B) can be thought of as its likelihood fucntion. We can generalize this by saying that the likelihood function of a set of a events is the joint probability, or with the function:

$$ \prod_{i=1}^{n} P(x_{i}) $$ 

for events $ x_{1} $ through $ x_{n} $ where $ P(x_{i}) $ is the probability of event $ x_{i} $ occurring.






Application of MLE to the Normal Distribution
===

Say our model is the normal distribution, and we're fitting the model to a set of observations of a random variable $X$. Recall that the normal distribution is defined with the formula
$$ f\left(X, \mu, \sigma^{2}\right) = \frac{1}{\sqrt{2\sigma^{2}\pi}} e^-{\frac{\left(X-\mu\right)^2}{2\sigma^{2}}}$$ 

A set of $N$ observations of $X$ is a set of $N$ events, so the probability of observing all of these observations is the likelihood function:

$$ \prod_{i=1}^{N} P(X_{i}) = \prod_{i=1}^{N} \frac{1}{\sqrt{2\sigma^{2}\pi}} e^-{\frac{\left(X_{i}-\mu\right)^2}{2\sigma^{2}}}$$ 

This can be interpreted as the probability of this set of events occurring by sampling from a normally distributed population with unknown parameters $\mu$ and $\sigma$. 

The true values of $\mu$ and $\sigma$ are currently unknown and must be estimated from the data. This is where MLE becomes useful.

Let's begin by defining the normal distribution in a Python function:

In [1]:
# Define the normal distribution
# parameters:
#   params - list: contains the mean and std deviation params for the normal distribution
#   x - 1d ndarray: contains the observations for the random variable X
def normal_dist(params, x):
    sig, mu = params
    return (1 / np.sqrt(2 * (sig ** 2) * np.pi)) * np.exp((-(x - mu) ** 2) / (2 * (sig ** 2)))

We'll define the likelihood function as the product the observation probabilities. I've also taken the natural log of the likelihood function so that it's easier to numerically optimize:

In [2]:
# Define the likelihood function
# parameters:
#   params - list: contains the mean and std deviation params for the normal distribution
#   x_array - 1d ndarray: an array of the sample
def norm_log_likelihood_function(params, x_array):
     return np.log(np.prod([normal_dist(params, x) for x in x_array]))

Optimizing the Log Likelihood function if equivalent to optimizing the Likelihood function because the Likelihood function monotonically increases. Let's define a lambda function that we'll optimize. I'll be minimizing the negative:

In [5]:
nll = lambda *args: -norm_log_likelihood_function(*args)

Now we'll create a random, normally-distributed sample of observations with a mean of approximately 5 and standard deviation of approximately 4:

In [8]:
sample = np.random.normal(5, 4, 100)
sample_std = np.std(sample)
sample_mean = np.mean(sample)


print("Sample standard deviation (found via np.std): %s" % str(sample_std))
print("Sample mean (found via np.mean): %s" % str(sample_mean))

Sample standard deviation (found via np.std): 3.60004583384
Sample mean (found via np.mean): 5.04421523434


We'll use <python>scipy.optimize.minimize </python>to estimate the parameters:

In [9]:
result = minimize(nll, [4,1], args=(sample))
mle_std = result["x"][0]
mle_mean = result["x"][1]

print("Sample standard deviation (found via MLE): %s" % str(mle_std))
print("Sample mean (found via MLE): %s" % str(mle_mean))

Sample standard deviation (found via MLE): 3.60004606311
Sample mean (found via MLE): 5.04421502909


Conclusion
===

Blah