# Think Bayes: Chapter 2

This notebook presents example code and exercise solutions for Think Bayes.

Copyright 2016 Allen B. Downey

MIT License: https://opensource.org/licenses/MIT

In [1]:
from __future__ import print_function, division

% matplotlib inline

import sys
import os

sys.path.insert(0, os.path.abspath(os.path.join(os.getcwd(), '..')))
from thinkbayes2 import Hist, Pmf, Suite

## Exercises

**Exercise:** This one is from one of my favorite books, David MacKay's "Information Theory, Inference, and Learning Algorithms":

> Elvis Presley had a twin brother who died at birth.  What is the probability that Elvis was an identical twin?"
    
To answer this one, you need some background information: According to the Wikipedia article on twins:  ``Twins are estimated to be approximately 1.9% of the world population, with monozygotic twins making up 0.2% of the total---and 8% of all twins.''

**Solution**

We can start by defining our hypotheses: 
* A. Elvis was an identical twin
* B. Elvis was not an identical twin

The data D is that Elvis had a brother, not a sister. 

I am going to start by doing this by hand.

```
  | Prior p(H) | Likelihood p(D given H) | p(H) p(D given H) | Posterior P(H given D)
  |------------|-------------------------|-------------------|------------------------
 A|  0.08      |   1                     | 0.08              |  0.15
 B|  0.92      |  0.5                    | 0.46              |  0.85
 ```

Next, I am going to do this using a Pmf function

In [44]:
class Elvis_Pmf(Pmf):
    """Map from twin type (identical or not) to a probability"""

    def __init__(self):
        """Initialize the distribution.

        hypos: sequence of hypotheses
        """
        Pmf.__init__(self)
        
        self.Set('identical', 0.08)
        self.Set('non-identical', 0.92)
        self.Normalize()

    def Update(self, data):
        """Updates each hypothesis based on the data.

        data: any representation of the data
        """
        for hypo in self.Values():
            like = self.Likelihood(data, hypo)
            self.Mult(hypo, like)
        self.Normalize()

    def Likelihood(self, data, hypo):
        """Compute the likelihood of the data under the hypothesis.

        hypo: {'identical', 'non-identical'}; the type of twin Elvis was
        data: 'brother' or 'sister'; the type of sibling Elvis had
        """
        if hypo == 'identical':
            if data == 'brother':
                return 1
            else:
                return 0
        elif hypo == 'non-identical':
            return 0.5

In [45]:
elvis_pmf = Elvis_Pmf()
elvis_pmf.Update('brother')
elvis_pmf.Print()

identical 0.148148148148
non-identical 0.851851851852


As expected! Finally, I am going to use a `Suite`. Here, I'll need to override the `__init__` method (since my priors don't have an equal probability)

In [55]:
class Elvis_Suite(Suite):
    """Map from twin type (identical or not) to a probability"""

    def __init__(self):
        """Initialize the distribution.

        hypos: sequence of hypotheses
        """
        Suite.__init__(self)
        
        self.Set('identical', 0.08)
        self.Set('non-identical', 0.92)
        self.Normalize()

    def Likelihood(self, data, hypo):
        """Compute the likelihood of the data under the hypothesis.

        hypo: {'identical', 'non-identical'}; the type of twin Elvis was
        data: 'brother' or 'sister'; the type of sibling Elvis had
        """
        if hypo == 'identical':
            if data == 'brother':
                return 1
            else:
                return 0
        elif hypo == 'non-identical':
            return 0.5

In [56]:
elvis_suite = Elvis_Suite()
elvis_suite.Update('brother')
elvis_suite.Print()

identical 0.148148148148
non-identical 0.851851851852


**Exercise:** Let's consider a more general version of the Monty Hall problem where Monty is more unpredictable.  As before, Monty never opens the door you chose (let's call it A) and never opens the door with the prize.  So if you choose the door with the prize, Monty has to decide which door to open.  Suppose he opens B with probability `p` and C with probability `1-p`.  If you choose A and Monty opens B, what is the probability that the car is behind A, in terms of `p`?  What if Monty opens C?

Hint: you might want to use SymPy to do the algebra for you. 

In [46]:
from sympy import symbols
p = symbols('p')

** Solution ** 

In [49]:
class general_Monty(Pmf):
    """Map from a door choice to a probability"""
    def __init__(self, hypos):
        """Initialize the distribution.

        hypos: sequence of hypotheses
        """
        Pmf.__init__(self)
        for hypo in hypos:
            self.Set(hypo, 1)
        self.Normalize()

    def Update(self, data, p):
        """Updates each hypothesis based on the data.

        data: any representation of the data
        """
        for hypo in self.Values():
            like = self.Likelihood(data, hypo, p)
            self.Mult(hypo, like)
        self.Normalize()
    
    def Likelihood(self, data, hypo, p):
        """Compute the likelihood of the data under the hypothesis.

        hypo: string name of the door where the prize is
        data: string name of the door Monty opened
        """
        if hypo == data:
            return 0
        elif hypo == 'A':
            if data == 'B':
                return p
            if data == 'C':
                return 1 - p
        else:
            return 1

In [51]:
gM = general_Monty('ABC')
gM.Update('B', p)
gM.Print()

A 0.333333333333333*p/(0.333333333333333*p + 0.333333333333333)
B 0
C 0.333333333333333/(0.333333333333333*p + 0.333333333333333)


Alternatively, I could return a real value for p, say 0.75

In [52]:
gM = general_Monty('ABC')
gM.Update('B', 0.75)
gM.Print()

A 0.428571428571
B 0.0
C 0.571428571429


Finally, consider the case where Monty is random, as the original story went

In [53]:
gM = general_Monty('ABC')
gM.Update('B', 0.5)
gM.Print()

A 0.333333333333
B 0.0
C 0.666666666667


Or that he always goes for B, if he can

In [54]:
gM = general_Monty('ABC')
gM.Update('B', 1)
gM.Print()

A 0.5
B 0.0
C 0.5


**Exercise:** According to the CDC, ``Compared to nonsmokers, men who smoke are about 23 times more likely to develop lung cancer and women who smoke are about 13 times more likely.''  Also, among adults in the U.S. in 2014:

> Nearly 19 of every 100 adult men (18.8%) smoke

> Nearly 15 of every 100 adult women (14.8%) smoke

If you learn that a woman has been diagnosed with lung cancer, and you know nothing else about her, what is the probability that she is a smoker?

** Solution **
My hypotheses are that 
* A. She is a smoker
* B. She isn't a smoker

and my data D is that she has been diagnosed with Lung Cancer

In [60]:
class DoTheySmoke(Suite):
    
    def __init__(self, gender):
        """Initialize the distributions
        
        gender = {'male', 'female'}
        """
        Suite.__init__(self)
        self.Set('smoker', 0.148)
        self.Set('non-smoker', 0.852)
        self.Normalize()
        
        self.gender = gender
        
    def Likelihood(self, data, hypo):
        """Compute the likelihood of the data under the hypothesis.

        hypo: smoker or non smoker
        data: True or False, True if she has lung cancer, False if not
        """
        if data:
            if hypo == 'smoker':
                if self.gender == 'male':
                    return 25
                else:
                    return 13
            else:
                return 1

In [62]:
smoker_suite = DoTheySmoke('female')
smoker_suite.Update(True)
smoker_suite.Print()

non-smoker 0.306916426513
smoker 0.693083573487


And for men?

In [63]:
smoker_suite = DoTheySmoke('male')
smoker_suite.Update(True)
smoker_suite.Print()

non-smoker 0.187170474517
smoker 0.812829525483


**Exercise** In Section 2.3 I said that the solution to the cookie problem generalizes to the case where we draw multiple cookies with replacement.

But in the more likely scenario where we eat the cookies we draw, the likelihood of each draw depends on the previous draws.

Modify the solution in this chapter to handle selection without replacement. Hint: add instance variables to Cookie to represent the hypothetical state of the bowls, and modify Likelihood accordingly. You might want to define a Bowl object.

In [72]:
class Cookies(Suite):
    
    def __init__(self, cookie_values):
        """Initialize the distributions
        
        cookie_values: a dictionary 
            {'bowl_name': {'cookie_type': number of cookies}}
        """
        Suite.__init__(self)
        for bowl in cookie_values:
            self.Set(bowl, 1)
        self.Normalize()
        
        self.cookie_values = cookie_values
        
    def Likelihood(self, data, hypo):
        """Compute the likelihood of the data under the hypothesis.

        hypo: the bowl name
        data: the cookie drawn
        """
        cookies_in_bowl = self.cookie_values[hypo]
        num_drawn_cookies = cookies_in_bowl.get(data, 0)
        
        # find the ratio
        num_other_cookies = 0
        for key, value in cookies_in_bowl.items():
            if key == data:
                continue
            else:
                num_other_cookies += value
        
        # update the number of cookies
        cookies_in_bowl[data] = max(0, num_drawn_cookies - 1)
        
        return num_drawn_cookies / max(1, num_other_cookies)

In [73]:
cookies = Cookies({'bowl_1': {
                   'vanilla': 20,
                   'chocolate': 30},
                   'bowl_2': {
                   'vanilla': 20,
                   'chocolate': 20}})

In [74]:
cookies.Update('vanilla')
cookies.Print()

bowl_1 0.4
bowl_2 0.6
