## Notebook 10.2


### Required software

In [1]:
# conda install scipy
# conda install numpy
# pip install toyplot

In [2]:
import scipy.optimize as so
import scipy.stats as sc
import numpy as np
import toyplot

### Optimization

Maximum likelihood optimization is a statistical method for finding the best fitting set of parameters for a model. To maximize a function the function must be written to describe an equation called a `likelihood function`. 

A likelihood function can be made up of many independent likelihood functions, where each describes the result of a trial relative to a statistical distribution. Let's consider a simple example of a coin flip, and estimating whether the coin is fair (p=0.5).

### Likelihood equation
The likelihood is a statement about probability. There are two outcomes heads or tails. If the probability of heads is 0.5 then we know the probability that five heads comes up is `(prob. of flipping heads) * (prob. flipping heads) * (prob. flipping heads) * (prob. flipping heads) * (prob. flipping heads)`. Or, stated more concisely `((prob. flipping heads)**nheads)`. This is because each coin flip is independent and probability theory tells us that the product of many independent probabilities equals the total probability.

To search all of the possible values we use a method called `optimization`, or maximum likelihood optimization. This is something `scipy` is very good at. By convention these methods search for a value the *minimizes* the likelihood function, so after calculating the likelihood for a set of input values we return it as the negative of that value to that the optimization function can find the lowest value. Let's try it. 

In [3]:
# a likelihood equation
def coin_flip(p, nheads, ntails):
    ## calculate likelihood
    likelihood = p**(nheads) * (1-p)**(ntails)
    
    ## return negative likelihood
    return -likelihood

In [5]:
# prob of getting 10 heads and 10 tails if the coin is p=0.5
coin_flip(0.5, 10, 10)

-9.5367431640625e-07

In [6]:
# prob of getting 10 heads and 10 tails if the coin is p=0.1
coin_flip(0.1, 10, 10)

-3.4867844010000026e-11

#### The optimization
Here we use the `scipy.optimize` library (named `so` here) to call the `fmin` function to find the best value of the parameter for our equation given a set of data. We need to pass it a starting value for its search procedure (x0=[value]), and then the a set of values (coin flips). Let's assume in this case we flipped 50 heads and 200 tails. The returned value of the function is `0.2`, meaning the probability of heads in this scenario is estimated to be 20% -- that is the most likely weighted coin that could produce the observed data. The coin is not fair!

In [9]:
# starting value=0.5; observed flips = (50, 50)
so.fmin(coin_flip, x0=[0.5], args=(50, 200))


Optimization terminated successfully.
         Current function value: -0.000000
         Iterations: 14
         Function evaluations: 28


array([ 0.2])

### Same but using -log-likelihood
Becuase likelihood scores are very small numbers (very low probability of each possible outcome) it is often much easier to work with the log of these numbers which will be a negative integer value. 

In [10]:
def coin_flip_log(p, nheads, ntails):
    ## calculate likelihood
    logp = nheads*np.log(p) + ntails*np.log(1.-p)
    
    ## return negative log-likelihood
    return -1*logp

### Plot the likelihood surface for p=0.2

In [14]:
## generate data across 100 equally spaced points for lambda
data = [coin_flip_log(p, 50, 200) for p in np.linspace(0.01, 0.99, 100)]
    
## plot the likelihood surface
toyplot.plot(
    b=np.log(data), 
    a=np.linspace(0.01, 0.99, 100),
    width=500, height=300,
    ylabel="-log-likelihood", 
    xlabel="probability of heads");

### Plot the likelihood surface for p=0.5

In [13]:
## generate data across 100 equally spaced points for lambda
data = [coin_flip_log(p, 100, 100) for p in np.linspace(0.01, 0.99, 100)]
    
## plot the likelihood surface
toyplot.plot(
    b=np.log(data),
    a=np.linspace(0.01, 0.99, 100),
    width=500, height=300,
    ylabel="-log-likelihood", 
    xlabel="probability of heads");