# Computational statistics

## Distributions

* A distribution is a set of values and corresponding probabilities
* The possible rolls on a die have a distribution across the values 1-6 with uniform probabilities of 1/6.
* Frequencies that words occur in a text is another exmaple
* We can map each value to it's probability, which we can call a probability mass function.

In [1]:
from thinkbayes import Pmf

In [2]:
pmf = Pmf()
for x in list(range(1,7)):
    pmf.Set(x, 1/6)

In [6]:
word_list = ["this", "is", "a", "word", "list", "which", "is", "a", "list", "of", "words"]
pmf = Pmf()
for word in word_list:
    pmf.Incr(word, 1)
    
pmf.Normalize()
print(pmf.Prob("list"))

0.18181818181818182


In the first example, all the values are set. In the second example, the "probability" associated with each instance of a word is incremented. To make the values true probabilities we normalise so that they all add up to 1.

## The cookie problem

We can now solve the cookie problem using the Pmf function.

In [8]:
pmf = Pmf()
pmf.Set('Bowl 1', 0.5)
pmf.Set('Bowl 2', 0.5)

# This give us the prior distribution

pmf.Mult('Bowl 1', 0.75)
pmf.Mult('Bowl 2', 0.5)

# We then update distribution using the new data (a vanilla cookie was selected), probability of this for bowl 1 is 75% and bowl 2 is 50%

pmf.Normalize()

# We normalise as we've multiplied our priors by the likelihoods

print(pmf.Prob('Bowl 1'))

# And print our result

0.6000000000000001


## The Bayesian framework

We can rewrite the code to generlise it.

In [17]:
class Cookie(Pmf):
    def __init__(self, hypos, mixes):
        
        self.mixes = mixes
        
        Pmf.__init__(self)
        for hypo in hypos:
            self.Set(hypo,1)
        self.Normalize()
        
    def Update(self, data):
        for hypo in self.Values():
            like = self.Likelihood(data, hypo)
            self.Mult(hypo, like)
        self.Normalize()
        
    
    def Likelihood(self, data, hypo):
        mix = self.mixes[hypo]
        like = mix[data]
        return like

hypos = ['Bowl 1', 'Bowl 2']
mixes = {
        'Bowl 1':dict(vanilla=0.75, chocolate=0.25),
        'Bowl 2':dict(vanilla=0.5, chocolate=0.5),
    }
pmf = Cookie(hypos, mixes)
pmf.Update('vanilla')
for hypo, prob in pmf.Items():
    print(hypo, prob)
    
dataset = ['vanilla', 'chocolate', 'vanilla']
for data in dataset:
    pmf.Update(data)
    
    for hypo, prob in pmf.Items():
        print(hypo, prob)
    


Bowl 1 0.6000000000000001
Bowl 2 0.4
Bowl 1 0.6923076923076923
Bowl 2 0.30769230769230765
Bowl 1 0.5294117647058825
Bowl 2 0.4705882352941177
Bowl 1 0.627906976744186
Bowl 2 0.37209302325581395


## Monty Hall problem

We can also solve the monty hall problem using the Pmf class

In [19]:
class Monty(Pmf):
    
    def __init__(self, hypos):
        
        Pmf.__init__(self)
        for hypo in hypos:
            self.Set(hypo, 1)
        self.Normalize()
        
    def Update(self, data):
        for hypo in self.Values():
            like = self.Likelihood(data, hypo)
            self.Mult(hypo, like)
        self.Normalize()
        
    ## So far exactly the same as Cookie class
    
    def Likelihood(self, data, hypo):
        if hypo == data:
            return 0
        elif hypo == 'A':
            return 0.5
        else:
            return 1
        
    ## Likelihood based on the problem setting
    
hypos = 'ABC'
pmf = Monty(hypos)

data = 'B'
pmf.Update(data)

for hypo, prob in psuimf.Items():
    print(hypo, prob)
    
    

A 0.3333333333333333
B 0.0
C 0.6666666666666666


We can capture the overlapping aspects of the framework that we defined in these two classes as a Suite, which is a Pmf with and __init__, Update and Print function.

This simplifies our code a lot.

In [21]:
from thinkbayes import Suite

class Monty(Suite):
    
    def Likelihood(self, data, hypo):
        if hypo == data:
            return 0
        elif hypo == 'A':
            return 0.5
        else:
            return 1
        
suite = Monty("ABC")
suite.Update("B")
suite.Print()

A 0.3333333333333333
B 0.0
C 0.6666666666666666


We can use the Suite class to solve the m&m problem.

In [31]:
class M_and_M(Suite):
    
    mix94 = dict(brown=30, 
             yellow=20, 
             red=20,
             green=10,
             orange=10,
             tan=10)

    mix96 = dict(blue=24,
             brown=13, 
             yellow=14, 
             red=13,
             green=20,
             orange=16)

    hypoA = dict(bag1=mix94, bag2=mix96)
    hypoB = dict(bag1=mix96, bag2=mix94)

    hypotheses = dict(A=hypoA, B=hypoB) 
    
    def Likelihood(self, data, hypo):
        bag, color = data
        mix = self.hypotheses[hypo][bag]
        like = mix[color]
        return like
    
suite = M_and_M('AB')
suite.Update(('bag1', 'yellow'))
suite.Update(('bag2', 'green'))
suite.Print()

A 0.7407407407407407
B 0.2592592592592592


In [33]:
## We can also overwrite __init__ to read in the hypotheses

class M_and_M(Suite):
    
    def __init__(self, hypos, hypotheses):
        
        self.hypotheses = hypotheses
        
        Pmf.__init__(self)
        for hypo in hypos:
            self.Set(hypo, 1)
        self.Normalize()
    
    def Likelihood(self, data, hypo):
        bag, color = data
        mix = self.hypotheses[hypo][bag]
        like = mix[color]
        return like
    
mix94 = dict(brown=30, 
             yellow=20, 
             red=20,
             green=10,
             orange=10,
             tan=10)

mix96 = dict(blue=24,
         brown=13, 
         yellow=14, 
         red=13,
         green=20,
         orange=16)

hypoA = dict(bag1=mix94, bag2=mix96)
hypoB = dict(bag1=mix96, bag2=mix94)

hypotheses = dict(A=hypoA, B=hypoB) 
    
suite = M_and_M('AB', hypotheses)
suite.Update(('bag1', 'yellow'))
suite.Update(('bag2', 'green'))
suite.Print()

A 0.7407407407407407
B 0.2592592592592592


In [40]:
## Or inheriting the Base init and amending 

class M_and_M(Suite):
    
    def __init__(self, hypos, hypotheses):
        super(Suite, self).__init__(hypos)
        # or more generally super(self.__class__, self).__init__(hypos)
        self.hypotheses = hypotheses
    
    def Likelihood(self, data, hypo):
        bag, color = data
        mix = self.hypotheses[hypo][bag]
        like = mix[color]
        return like
    
mix94 = dict(brown=30, 
             yellow=20, 
             red=20,
             green=10,
             orange=10,
             tan=10)

mix96 = dict(blue=24,
         brown=13, 
         yellow=14, 
         red=13,
         green=20,
         orange=16)

hypoA = dict(bag1=mix94, bag2=mix96)
hypoB = dict(bag1=mix96, bag2=mix94)

hypotheses = dict(A=hypoA, B=hypoB) 
    
suite = M_and_M('AB', hypotheses)
suite.Update(('bag1', 'yellow'))
suite.Update(('bag2', 'green'))
suite.Print()

A 0.7407407407407407
B 0.2592592592592592


Suite is an **abstract type** which means it ddefines the interface the class is supposed to have but doesn't provide a complete implementation (Likelihood is not implemented).

A **concrete type** is a class that extends an abstract parent class and provides an implementation of the missing methods, such as Monty extending Suite by inheriting Update and providing Likelihood.

This is an example of the *template method design pattern.*

Reading on super: 
https://fuhm.net/super-harmful/
https://rhettinger.wordpress.com/2011/05/26/super-considered-super/
https://stackoverflow.com/questions/576169/understanding-python-super-with-init-methods
https://stackoverflow.com/questions/8972866/correct-way-to-use-super-argument-passing