# Probability Mass Function

In [2]:
from empiricaldist import Pmf

In [3]:
coin = Pmf()
coin['heads'] = 1/2
coin['tails'] = 1/2
coin

Unnamed: 0,probs
heads,0.5
tails,0.5


Can also make a Pmf from a sequencve of possible outcomes

In [4]:
die = Pmf.from_seq([1,2,3,4,5,6])
die

Unnamed: 0,probs
1,0.166667
2,0.166667
3,0.166667
4,0.166667
5,0.166667
6,0.166667


In [5]:
letters = Pmf.from_seq(list('Mississippi'))
letters

Unnamed: 0,probs
M,0.090909
i,0.363636
p,0.181818
s,0.363636


In [6]:
letters['s']

0.36363636363636365

In the word "Mississippi", about 36% of the words are 's'

In [7]:
letters('t')

0

The letter 't' is not in the distribution

Can provide a sequence of quantities and get their respective probabilities

In [8]:
die([1,4,7])

array([0.16666667, 0.16666667, 0.        ])

# The Cookie Problem Revisited

In this section I'll use a `Pmf` to solve the cookie problem from <<_TheCookieProblem>>.
Here's the statement of the problem again:

> Suppose there are two bowls of cookies.
>
> * Bowl 1 contains 30 vanilla cookies and 10 chocolate cookies. 
>
> * Bowl 2 contains 20 vanilla cookies and 20 chocolate cookies.
>
> Now suppose you choose one of the bowls at random and, without looking, choose a cookie at random. If the cookie is vanilla, what is the probability that it came from Bowl 1?

Here's a Pmf that represents the two hypothesis and their prior probabilities:

In [9]:
prior = Pmf.from_seq(['Bowl 1', 'Bowl 2'])
prior

Unnamed: 0,probs
Bowl 1,0.5
Bowl 2,0.5


Let's make the Prior Distribution

To update the distribution based on the new data (vanilla cookie), we update the priors by the likelihoods

Likelihood of drawing a vanilla cookie is 3/4

Likelihood for Bowl 2 is 1/2

In [10]:
likelihood_vanilla = [0.75, 0.5]
posterior = prior * likelihood_vanilla
posterior

Unnamed: 0,probs
Bowl 1,0.375
Bowl 2,0.25


Let's normalize the probabilities (make them add up to 1) using normalize method

In [11]:
posterior.normalize()

0.625

This returns the total probability of the data (P(D))

Posterior now contains the Posterior Distribution

In [12]:
posterior

Unnamed: 0,probs
Bowl 1,0.6
Bowl 2,0.4


In [13]:
posterior('Bowl 1')

0.6

Can do successive updates with more data.

For example, suppose you put the first cookie back (so the contents of the bowls don’t change) and draw again from the same bowl. If the second cookie is also vanilla, we can do a second update like this:

In [14]:
posterior *= likelihood_vanilla
posterior.normalize()
posterior

Unnamed: 0,probs
Bowl 1,0.692308
Bowl 2,0.307692


Posterior probability increased to nearly 70%.

Suppose we do the same thing again and get a chocolate cookie

Here are the likelihoods for the new data:

In [15]:
likelihood_chocolate = [0.25, 0.5]

# And here's the update
posterior *= likelihood_chocolate
posterior.normalize()
posterior

Unnamed: 0,probs
Bowl 1,0.529412
Bowl 2,0.470588


# 101 Bowls

Next let's solve a cookie problem with 101 bowls:

* Bowl 0 contains 0% vanilla cookies,

* Bowl 1 contains 1% vanilla cookies,

* Bowl 2 contains 2% vanilla cookies,

and so on, up to

* Bowl 99 contains 99% vanilla cookies, and

* Bowl 100 contains all vanilla cookies.

As in the previous version, there are only two kinds of cookies, vanilla and chocolate.  So Bowl 0 is all chocolate cookies, Bowl 1 is 99% chocolate, and so on.

Suppose we choose a bowl at random, choose a cookie at random, and it turns out to be vanilla.  What is the probability that the cookie came from Bowl $x$, for each value of $x$?

To solve this problem, I'll use `np.arange` to make an array that represents 101 hypotheses, numbered from 0 to 100.

In [17]:
import numpy as np

hypos = np.arange(101)

Now let's make the prior distribution

In [18]:
prior = Pmf(1, hypos)
prior.normalize()

101

First parameter is prior probability

Second parameter is the sequence of quantities

In [20]:
prior.head()

Unnamed: 0,probs
0,0.009901
1,0.009901
2,0.009901
