In [1]:
import numpy as np
from empiricaldist import Pmf

Exercise: Let’s use Bayes’s Rule to solve the Elvis problem from <<_Distributions>>:

    Elvis Presley had a twin brother who died at birth. What is the probability that Elvis was an identical twin?

In 1935, about 2/3 of twins were fraternal and 1/3 were identical. The question contains two pieces of information we can use to update this prior.

    First, Elvis’s twin was also male, which is more likely if they were identical twins, with a likelihood ratio of 2.

    Also, Elvis’s twin died at birth, which is more likely if they were identical twins, with a likelihood ratio of 1.25.

If you are curious about where those numbers come from, I wrote https://www.allendowney.com/blog/2020/01/28/the-elvis-problem-revisited/

Bayes Rule: odds(A|D) = odds(A) * P(D|A) / P(D|B)

In [3]:
def odds(p):
    return p / (1-p)

def prob(o):
    return o / (o+1)

In [5]:
'''
A = Elvis is identical twin
D = Twin brother
'''

prior_odds = odds(1/3) # 1/3 of twins were identical
# p_d_given_a = prob(2)
# p_d_given_b = prob(1.25)

# the problem statement gave likelihood ratios

# posterior_odds = prior_odds * p_d_given_a / p_d_given_b

likelihood_ratio_1 = 2
likelihood_ratio_2 = 1.25
posterior_odds = prior_odds * likelihood_ratio_1 * likelihood_ratio_2
prob(posterior_odds)


0.5555555555555555

Exercise: The following is an interview question that appeared on glassdoor.com, attributed to Facebook:

    You’re about to get on a plane to Seattle. You want to know if you should bring an umbrella. You call 3 random friends of yours who live there and ask each independently if it’s raining. Each of your friends has a 2/3 chance of telling you the truth and a 1/3 chance of messing with you by lying. All 3 friends tell you that “Yes” it is raining. What is the probability that it’s actually raining in Seattle?

Use Bayes’s Rule to solve this problem. As a prior you can assume that it rains in Seattle about 10% of the time.

This question causes some confusion about the differences between Bayesian and frequentist interpretations of probability; if you are curious about this point,

In [None]:
# Trying to find P(Rain|YYY)
# P(Rain|YYY) = P(Rain) * P(YYY|Rain) / P(YYY)

prior_odds = odds(0.1)

# prob_yyy_rain = (2/3)**3
# prob_yyy = (2/3)**3 + (1/3)**3
# this is wrong -> each friend has a likelihood ratio of 2: (2/3) / (1/3) = 2. Therefore the likelihood each says the truth is 2**3
# basically, probability of rain given truth = 2/3, probability of rain given lie = 1/3 -> (2/3) / (1/3). And then its for each friend
likelihood_ratio = 2**3

posterior_odds = prior_odds * likelihood_ratio
print(prob(posterior_odds))

0.4705882352941177


Exercise: According to the CDC, people who smoke are about 25 times more likely to develop lung cancer than nonsmokers.

Also according to the CDC, about 14% of adults in the U.S. are smokers. If you learn that someone has lung cancer, what is the probability they are a smoker?

In [None]:
prior_odds = odds(0.14)
likelihood_ratio = 25 # probability cancer given smoke / probability cancer given non smoke -> seems reasonable
posterior_odds = prior_odds * likelihood_ratio
prob(posterior_odds)

0.8027522935779816