# Simple PMF Example

The `Pmf` module in `thinkbayes2` allows you to build a probability mass function - a discrete distribution. The following represents the distribution of a fair six-sided die. We use the `Set` method to set the probability associated with each value to 1/6.

In [6]:
import sys
sys.path.append('../code')

from thinkbayes2 import Pmf

pmf = Pmf()
for x in range(1,7):
    pmf.Set(x, 1/6.0)

1
2
3
4
5
6


Another way this can be accomplished is by counting the number of times an event occurs to find the empirical pmf. Suppose we observe the following rolls of the same die.

In [10]:
import random

random.seed(123)
rolls = [random.randrange(1, 7, 1) for i in range(25)] 
print(rolls)

[1, 3, 1, 4, 3, 1, 1, 4, 5, 5, 3, 3, 1, 2, 2, 3, 5, 3, 6, 2, 2, 1, 4, 1, 5]


We can convert this to an empirical pmf with the `Incr` method. Increasing the count of each roll by 1 every time it is observed.

In [11]:
pmf = Pmf()
for roll in rolls:
    pmf.Incr(roll, 1)

But each of these represent a count. To change these to probabilities we use the `Normalize` method.

In [12]:
pmf.Normalize()

25

To obtain the probability of a certain event we can use the `Prob` method. For example, our observed probability of seeing a 1 rolled is:

In [13]:
print(pmf.Prob(1))

0.28


# Examples from Chapter 1
We illustrate the use of the `Pmf` module through examples from the previous chapter.

## Cookie Problem
Our goal is to define the pmf of the posterior under the different hypothesis (vanilla cookie coming from Bowl 1 or from Bowl 2). We begin by defining the different priors.

In [14]:
pmf = Pmf()
pmf.Set('Bowl 1', 0.5)
pmf.Set('Bowl 2', 0.5)

To update the distribution based on the new data (drawing a vanilla cookie), we multiply each prior by their corresponding likelihood using the `Mult` method.

In [15]:
pmf.Mult('Bowl 1', 0.75)
pmf.Mult('Bowl 2', 0.5)

After the above update, the distribution no longer represents probabilities since they are not normalized. We employ the `Normalize` method.

In [16]:
pmf.Normalize()

0.625

The pmf of the posteriors is as follows:

In [17]:
print(pmf.Prob('Bowl 1'))
print(pmf.Prob('Bowl 2'))

0.6000000000000001
0.4


### Generalization of Above Code
The above code is rewritten to be more general to apply to other problems.

In [21]:
class Cookie(Pmf):
    
    mixes = {
        'Bowl 1':dict(vanilla=0.75,chocolate=0.25),
        'Bowl 2':dict(vanilla=0.5,chocolate=0.5),
    }
    
    def __init__(self, hypos):
        Pmf.__init__(self)
        for hypo in hypos:
            self.Set(hypo, 1)
        self.Normalize()
        
    def Likelihood(self, data, hypo):
        mix = self.mixes[hypo]
        like = mix[data]
        return like
    
    def Update(self, data):
        for hypo in self.Values():
            like = self.Likelihood(data, hypo)
            self.Mult(hypo, like)
        self.Normalize()

The `Cookie` class is a `Pmf` object. It is initialized by setting the pmf with a prior that has equal probability for each hypothesis. In this case there are two hypotheses: that the cookie came from Bowl 1 or that the cookie came from Bowl 2. We define the hypotheses below.

In [22]:
hypos = ['Bowl 1', 'Bowl 2']
pmf = Cookie(hypos)

The `mixes` attribute represents the likelihoods under each of the hypotheses. To determine the likelihood of a specific hypothesis, the `Likelihood` method is used. To update the posterior distribution, the `Update` method calculates the likelihood of the data under the hypothesis and then mutliplies the prior and normalizes to maintain a distribution. Suppose we draw a vanilla cookie (new data).

In [23]:
pmf.Update('vanilla')

The updated posteriors for each hypothesis is:

In [24]:
for hypo, prob in pmf.Items():
    print(hypo, prob)

Bowl 1 0.6000000000000001
Bowl 2 0.4


Notice that we can use the above code if the problem was based on drawing multiple cookies simultaneously or if continued to draw more cookies. We update the posterior in the same way.

In [25]:
dataset = ['vanilla', 'chocolate', 'vanilla']
for data in dataset:
    pmf.Update(data)

## Monty Hall Problem
Suppose without loss of generality that we chose door A and that Monty will open either door B or C - whichever one that does not have a car behind it. He opened door B. We start out by using a similar template as above.

In [34]:
class Monty(Pmf):
    
    def __init__(self, hypos):
        Pmf.__init__(self)
        for hypo in hypos:
            self.Set(hypo, 1)
        self.Normalize()
        
    def Likelihood(self, data, hypo):
        if hypo == data:
            return 0
        elif hypo == 'A':
            return 0.5
        else:
            return 1
        
    def Update(self, data):
        for hypo in self.Values():
            like = self.Likelihood(data, hypo)
            self.Mult(hypo, like)
        self.Normalize()

Recall that the hypotheses are which of the three doors contain the car - A, B, or C.

In [35]:
hypos = 'ABC'
pmf = Monty(hypos)

And we saw that he opened door B, which is new data.

In [36]:
data = 'B'
pmf.Update(data)

Then the results are as expected.

In [37]:
for hypo, prob in pmf.Items():
    print(hypo, prob)

A 0.3333333333333333
B 0.0
C 0.6666666666666666


## Encapsulating the Framework
Noticing the similarities between the previous two examples, we can encapsulate them into a single object - `Suite` which is a `Pmf` and provides `__init__`, `Update`, and `Print` methods.

In [38]:
from thinkbayes2 import Suite

Instead of writing everything from scratch, we just have to modify the `Likelihood` method. For example, suppose we are using `Suite` to define the Monty Hall problem.

In [39]:
class Monty(Suite):
    
    def Likelihood(self, data, hypo):
        if hypo == data:
            return 0
        elif hypo == 'A':
            return 0.5
        else:
            return 1
        
suite = Monty('ABC')
suite.Update('B')
suite.Print()

A 0.3333333333333333
B 0.0
C 0.6666666666666666


## M&M Problem
By using the `Suite` framework, we only have to rewrite the likelihood function.

In [40]:
class M_and_M(Suite):
    
    mix94 = dict(brown=30, yellow=20, red=20, green=10, orange=10, tan=10)
    mix96 = dict(blue=24, green=20, orange=16, yellow=14, red=13, brown=13)

    hypotheses = dict(A=dict(bag1=mix94, bag2=mix96), B=dict(bag1=mix96, bag2=mix94))
    
    def Likelihood(self, data, hypo):
        bag, color = data
        mix = self.hypotheses[hypo][bag]
        like = mix[color]
        return like
    
suite = M_and_M('AB')
suite.Update(('bag1', 'yellow'))
suite.Update(('bag2', 'green'))
suite.Print()

A 0.7407407407407407
B 0.2592592592592592


# Exercises
In the Cookie Problem above, we said that the solution generalizes to the case where we draw multiple cookies with replacement. But in the more likely scenario where we eat the cookies we draw, the likelihood of each draw depends on the previous draws. Modify the solution in this chapter to handle selection without replacement. Hint: add instance variables to `Cookie` to represent the hypothetical state of the bowls and modify `Likelihood` accordingly. You might want to define a `Bowl` object.

In [44]:
class Bowl:
    
    def __init__(self, vanilla, chocolate):
        self.vanilla = vanilla
        self.chocolate = chocolate
        
    def eat(self, cookie):
        if cookie == 'vanilla' and self.vanilla > 0:
            self.vanilla -= 1
        elif cookie == 'chocolate' and self.chocolate > 0:
            self.chocolate -= 1

class Cookie(Pmf, Bowl):
    
    mixes = {
        'Bowl 1':Bowl(vanilla=30, chocolate=10),
        'Bowl 2':Bowl(vanilla=20, chocolate=20),
    }
    
    def __init__(self, hypos):
        Pmf.__init__(self)
        for hypo in hypos:
            self.Set(hypo, 1)
        self.Normalize()
        
    def Likelihood(self, data, hypo):
        mix = self.mixes[hypo]
        if data == 'vanilla':
            like = mix.vanilla/(mix.vanilla + mix.chocolate)
            mix.eat('vanilla')
        elif data == 'chocolate':
            like = mix.chocolate/(mix.vanilla + mix.chocolate)
            mix.eat('chocolate')
        return like
    
    def Update(self, data):
        for hypo in self.Values():
            like = self.Likelihood(data, hypo)
            self.Mult(hypo, like)
        self.Normalize()
        
pmf = Cookie(['Bowl 1', 'Bowl 2'])
for i in range(11):
    pmf.Update('chocolate')
    pmf.Print()

Bowl 1 0.3333333333333333
Bowl 2 0.6666666666666666
Bowl 1 0.1914893617021277
Bowl 2 0.8085106382978723
Bowl 1 0.09523809523809526
Bowl 2 0.9047619047619048
Bowl 1 0.041543026706231466
Bowl 2 0.9584569732937686
Bowl 1 0.015993907083016
Bowl 2 0.9840060929169839
Bowl 1 0.0053887605850654365
Bowl 2 0.9946112394149345
Bowl 1 0.0015455950540958277
Bowl 2 0.9984544049459043
Bowl 1 0.0003571003451970005
Bowl 2 0.999642899654803
Bowl 1 5.953444067392988e-05
Bowl 2 0.9999404655593259
Bowl 1 5.412514816759313e-06
Bowl 2 0.9999945874851832
Bowl 1 0.0
Bowl 2 0.9999999999999999
