# Probability to Likelihood

### Introduction

In the last class, we learned about probability and likelihood.  Let's do a couple of review problems on these topics.

### Starting with Probability

Let's say that we have a flu vaccine, and we know that it has a .9 percent effectiveness rate.  We want to calculate the probability of a different amount of successes.  Assume that all observations in the sample are independent and identically distributed.

> Let's represent a successful vaccination as 1, and an unsuccessful vaccination as 0.

What is the probability that we vaccinate five individuals and see the exact sequence of `[1, 1, 1, 1, 0]`.

In [1]:
.9*.9*.9*.9*.1

# 0.06561000000000002

0.06561000000000002

Now what is the probability that we vaccinate five individuals and see that four of five were successful in general.

In [3]:
from scipy.special import comb

comb(5, 4)*.9*.9*.9*.9*.1

0.32805

In [4]:
# 0.32805

So we can see that if we vaccinate 5 individuals, 32 percent of the time four of five will be successful.  Now calculate the probability that 3 of 5 are successful, given $p = .9$.

In [9]:
comb(5, 3)*.9*.9*.9*.1*.1
# 0.0729

0.0729

### Calculating the likelihood

Now let's imagine that we just received information that a new flu vaccine has a success rate of .8.  We were given five samples and saw that three were successful.

What is the likelihood that the probability of success, given what we witnessed is .8.

In [10]:
comb(5, 3)*.8*.8*.8*.2*.2

0.20480000000000007

So we see that twenty percent of the time, we will see a success rate of 3 of 5, assuming p = .8.  Now given the evidence above what is the likelihood that p = .7.

In [12]:
comb(5, 3)*.7*.7*.7*.3*.3

0.3086999999999999

Notice that when calculating likelihood, we are keeping the evidence fixed, and calculating the likelihood of different underlying probabilities being correct.

### Using the Bernoulli PDF

Now the bernoulli probability distribution following is the following:

$f(x) = p^x*(1 - p )^{1 - x}$

In [14]:
def bernoulli_pdf(p, x):
    return (p**x)*(1 - p)**(1 - x)

Use the definition to calculate the probability of $x = 1$, and $x = 0$.

In [16]:
bernoulli_pdf()
# 0.7

0.7

In [None]:
bernoulli_pdf()
# .3

Make sure you understand how the aboe function works.

### Calculating a sequence

Now remember that the probability of a sequence of independent events is:

$f(X) = \prod_{i= 1}^n p^x*(1 - p )^{1 - x}$

In [18]:
events = [1, 1, 1, 1, 0]
p = .7
event_probs = [bernoulli_pdf(p, event) for event in events]
event_probs

[0.7, 0.7, 0.7, 0.7, 0.30000000000000004]

In [19]:
import numpy as np
np.prod(event_probs)

0.07202999999999998

Now find the parameter, p, that maximizes the likelihood of our parameter, assuming the sequence of events above.

### Relating to Logistic Regression

Now, we can also think of our logistic regression function being given a task of maximizing the likelihood.  Except it is not only tasked with maximizing p, but a set of parameters $\theta$.  

For example, imagine we saw the success rate for a vaccine of `[1, 1, 1, 1, 0]`.  But we were also told the participant's age and weight (which let's assume, have an impact on success).  Our logistic regression algorithm will try to find the parameters $\theta_1$ and $\theta_2$ that will return the scores, $\sigma(\theta*x)$ for the set of observations that maximize the likelihood of our evidence above $[1, 1, 1, 1, 0]$.

### Maximizing Log Likelihood 

Because the logarithmic function is monotonic, finding:
* $\underset{\theta}{\text{arg max}} \prod_{i= 1}^n p^x*(1 - p )^{1 - x} =$
* $\underset{\theta}{\text{arg max}} \log( \prod_{i= 1}^n p^x*(1 - p )^{1 - x})$

Now applying the log to the function above, we find that:

$ \underset{\theta}{\text{arg max}} \log( \prod_{i= 1}^n p^x*(1 - p )^{1 - x})$ = 

$\underset{\theta}{\text{arg max}}  \sum xlog(p) + (1 - x)log(1-p) $

### Turning into a machine learning

Finally, we turn this into a machine learning problem:

$\underset{\theta}{\text{arg max}}  \sum xlog(p) + (1 - x)log(1-p) $

First, we change our x values into y, as y is our target. 

$\underset{\theta}{\text{arg max}}  \sum ylog(p) + (1 - y)log(1-p) $

Second, instead of finding the maximum of the likelihood -- that is finding the top of our curve.  We can reverse the function, and find the minimum of the negative log likelihood.

$J_\theta(X)  = \underset{\theta}{\text{arg min}} -  \sum ylog(p) + (1 - y)log(1-p) $