## Distributions
- We'll use the `empiricaldist` package
- Probability mass function = mapping of each discrete outcome to its probability
- The `empiricaldist.Pmf` class inherits from a Pandas Series, so anything you can do with a Series, you can also do with a `Pmf`.



In [1]:
from empiricaldist import Pmf

In [2]:
coin = Pmf()
coin["heads"] = 1/2
coin["tails"] = 1/2
coin

Unnamed: 0,probs
heads,0.5
tails,0.5


In [3]:
die = Pmf.from_seq(range(1, 7))
die

Unnamed: 0,probs
1,0.166667
2,0.166667
3,0.166667
4,0.166667
5,0.166667
6,0.166667


In [4]:
letters = Pmf.from_seq(list("Mississippi"))
letters

Unnamed: 0,probs
M,0.090909
i,0.363636
p,0.181818
s,0.363636


In [5]:
# api surface mimics pandas Series
letters["s"]

0.36363636363636365

In [6]:
# api surface allows treating Pmf as a function
letters('s')

0.36363636363636365

In [7]:
# With parentheses, you can also provide a sequence of quantities and get a sequence of probabilities.
die([1, 4, 7])

array([0.16666667, 0.16666667, 0.        ])

## The Cookie Problem Revisited

Suppose there are two bowls of cookies.

Bowl 1 contains 30 vanilla cookies and 10 chocolate cookies.

Bowl 2 contains 20 vanilla cookies and 20 chocolate cookies.

Now suppose you choose one of the bowls at random and, without looking, choose a cookie at random. If the cookie is vanilla, what is the probability that it came from Bowl 1?

In [8]:
# This distribution, which contains the prior probability for each hypothesis, is called (wait for it) 
# This is the "prior distribution"


prior = Pmf.from_seq(["Bowl 1", "Bowl 2"])
prior

Unnamed: 0,probs
Bowl 1,0.5
Bowl 2,0.5


In [9]:
# P(Vanilla|Bowl1) = 0.75, P(Vanilla|Bowl2) = 0.5
likelihood_vanilla = [0.75, 0.5]

# unnormalized posteriors
posterior = prior * likelihood_vanilla
posterior

Unnamed: 0,probs
Bowl 1,0.375
Bowl 2,0.25


In [10]:
# P(data) is the probability of the data
# P(Vanilla from anywhere) = 5/8
posterior.normalize()

0.625

In [11]:
# Posterior distribution
posterior

Unnamed: 0,probs
Bowl 1,0.6
Bowl 2,0.4


> One benefit of using Pmf objects is that it is easy to do successive updates with more data. For example, suppose you put the first cookie back (so the contents of the bowls don’t change) and draw again from the same bowl. If the second cookie is also vanilla, we can do a second update like this:

In [12]:
posterior *= likelihood_vanilla
posterior.normalize()
posterior

Unnamed: 0,probs
Bowl 1,0.692308
Bowl 2,0.307692
