In [2]:
import numpy as np
from scipy.optimize import minimize

Definition and Intuition of Maximum Likelihood Estimation
===

Maximum Likelihood Estimationn (MLE) is a method of estimating parameters in a stastical model. It uses probability and numerical optimization to estimate parameters by answering the question "What is the most likely true value of the parameters that will result in the observations?"

MLE begins with the likelihood function, which is rooted in probability. Recall that the joint probability of events A and B is the product of their individual probabilities: P(AB) = P(A)P(B). P(AB) is the likelihood of both events A and B occurring, so P(A)P(B) can be thought of as its likelihood fucntion. We can generalize this with the function:

[likelihood function about here]

Say we're fitting the normal distribution to a set of observations of a random variable X. A set of N observations of X is a set of N events, so the probability of observing all of these observations is the likelihood function:

[likelihood function 2 about here]

This can be interpreted as the probability of this set of events occurring by sampling from a normally distributed population with unknown parameters mu and sigma. The probability of each individual observation is the normal distribution function:

[normal distribution function]

The true values of mu and sigma are currently unknown and must be estimated from the data. This is where MLE becomes useful.


Let's begin by defining the normal distribution function:

In [3]:
# Define the normal distribution
# params:
#   params - list: contains the mean and std deviation params for the normal distribution
#   x - NumPy array: contains the observations for the random variable X
def normal_dist(params, x):
    sig, mu = params
    return (1 / np.sqrt(2 * (sig ** 2) * np.pi)) * np.exp((-(x - mu) ** 2) / (2 * (sig ** 2)))

Then we'll define the likelihood function as the product the observation probabilities. I've also taken the natural log of the likelihood function so that it's easier to numerically optimize:

In [4]:
# Define the likelihood function
# params:
#   x_array - 1xN ndarray; an array of the sample
def norm_log_likelihood_function(params, x_array):
     return np.log(np.prod([normal_dist(params, x) for x in x_array]))

Optimizing the Log Likelihood function if equivalent to optimizing the Likelihood function because the Likelihood function monotonically increases. Let's define a lambda function that we'll optimize. I'll be minimizing the negative:

In [5]:
nll = lambda *args: -norm_log_likelihood_function(*args)

Now we'll create a random, normally-distributed sample of observations with a mean of approximately 5 and standard deviation of approximately 4:

In [8]:
sample = np.random.normal(5, 4, 100)
sample_std = np.std(sample)
sample_mean = np.mean(sample)


print("Sample standard deviation (found via np.std): %s" % str(sample_std))
print("Sample mean (found via np.mean): %s" % str(sample_mean))

Sample standard deviation (found via np.std): 3.60004583384
Sample mean (found via np.mean): 5.04421523434


We'll use scipy.optimize.minimize to estimate the parameters:

In [9]:
result = minimize(nll, [4,1], args=(sample))
mle_std = result["x"][0]
mle_mean = result["x"][1]

print("Sample standard deviation (found via MLE): %s" % str(mle_std))
print("Sample mean (found via MLE): %s" % str(mle_mean))

Sample standard deviation (found via MLE): 3.60004606311
Sample mean (found via MLE): 5.04421502909
