## German Tank Problem

A railroad numbers its locomotives $1,\ldots,N$. You see a railcar with the number 60 painted on it. The problem is to come up with an estimate for $N$. We'll denote $N=\theta$ to stick with our standard notation.

Apply Bayesian analysis to this problem by articulating the hypothesis/hypotheses, the data, and the likelihood. Be sure to try at least three separate prior distributions for $\theta$. What effect does this have on your posterior distribution of $\theta$ and, thus, your estimate for $N$?

In [None]:
## The hypotheses are:
## H_60: N = 60.
## H_61: N = 61.
## H_62: N = 62.
## ...
## H_1000: N = 1,000. (I arbitrarily stop here, but we could add more hypotheses.)

## The data is: we observed railcar 60.

## The likelihood P(y=60|H) = 1/N. (We assume a Uniform distribution here. It is certainly possible
## to assume a different form of the likelihood - but Uniform(0,N) seems to make the most sense here.)
## For example, P(y=60|H_60) = 1/60; P(y=60|H_61) = 1/61; and so on.

## Dungeons & Dragons Dice Problem 1

There are five dice: a 4-sided die, 6-sided die, 8-sided die, 12-sided die, 20-sided die. You roll a 6. The problem is to predict which die was thrown.

Apply Bayesian analysis to this problem by articulating the hypothesis/hypotheses, the data, and the likelihood. Identify which die you believe to be the thrown die and how likely this is to be the thrown die.

In [None]:
## The hypotheses are:
## H_4: 4-sided die.
## H_6: 6-sided die.
## H_8: 8-sided die.
## H_12: 12-sided die.
## H_20: 20-sided die.

## The data is: we rolled a 6.

$$ P(H_4|y=6) \propto P(y=6|H_4)P(H_4) = 0 \times \frac{1}{5} = 0$$$$ P(H_6|y=6) \propto P(y=6|H_6)P(H_6) = \frac{1}{6} \times \frac{1}{5} = \frac{1}{30}$$$$ P(H_8|y=6) \propto P(y=6|H_8)P(H_8) = \frac{1}{8} \times \frac{1}{5} = \frac{1}{40}$$$$ P(H_{12}|y=6) \propto P(y=6|H_{12})P(H_{12}) = \frac{1}{12} \times \frac{1}{5} = \frac{1}{60}$$$$ P(H_{20}|y=6) \propto P(y=6|H_{20})P(H_{20}) = \frac{1}{20} \times \frac{1}{5} = \frac{1}{100}$$

In [None]:
## Note that, since we scale all probabilities by the same amount,
## this is not currently a valid probability distribution but our
## results are correct up to a constant of proportionality.

## It is clear that the likeliest die thrown is the 6-sided die,
## but just how likely is it?

## 1/30 / (1/30 + 1/40 + 1/60 + 1/100) is roughly 39.2%.
## 1/40 / (1/30 + 1/40 + 1/60 + 1/100) is roughly 29.4%.
## 1/60 / (1/30 + 1/40 + 1/60 + 1/100) is roughly 19.6%.
## 1/100 / (1/30 + 1/40 + 1/60 + 1/100) is roughly 11.8%.

## There's roughly a four in ten chance that the die we threw was
## the 6-sided die.

## Dungeons & Dragons Dice Problem 2

There are five dice: a 4-sided die, 6-sided die, 8-sided die, 12-sided die, 20-sided die. You roll the same die and get a 6, 4, 8, 7, 5, 7. The problem is to predict which die was thrown.

Apply Bayesian analysis to this problem by articulating the hypothesis/hypotheses, the data, and the likelihood. Identify which die you believe to be the thrown die and how likely this is to be the thrown die.

In [None]:
## The hypotheses are:
## H_4: 4-sided die.
## H_6: 6-sided die.
## H_8: 8-sided die.
## H_12: 12-sided die.
## H_20: 20-sided die.

## The data is: we rolled a 6, 4, 8, 7, 5, 7.

$$ P(H_4|data) \propto P(data|H_4)P(H_4) = 0 \times \frac{1}{5} = 0$$$$ P(H_6|data) \propto P(data|H_6)P(H_6) = 0 \times \frac{1}{5} = 0$$$$ P(H_8|data) \propto P(data|H_8)P(H_8) = \left(\frac{1}{8}\right)^6 \times \frac{1}{5} = \frac{1}{1,310,720}$$$$ P(H_{12}|data) \propto P(data|H_{12})P(H_{12}) = \left(\frac{1}{12}\right)^6 \times \frac{1}{5} = \frac{1}{14,929,920}$$$$ P(H_{20}|data) \propto P(data|H_{20})P(H_{20}) = \left(\frac{1}{20}\right)^6 \times \frac{1}{5} = \frac{1}{320,000,000}$$

In [None]:
## Note that, since we scale all probabilities by the same amount,
## this is not currently a valid probability distribution but our
## results are correct up to a constant of proportionality.

## It is clear that the likeliest die thrown is now the 8-sided die,
## but just how likely is it?

## 1/1310720 / (1/1310720 + 1/14929920 + 1/320000000) is roughly 91.6%.
## 1/14929920 / (1/1310720 + 1/14929920 + 1/320000000) is roughly 8.0%.
## 1/320000000 / (1/1310720 + 1/14929920 + 1/320000000) is roughly 0.4%.

## There's roughly a nine in ten chance that the die we threw was
## the 8-sided die.

## M&M Problem

You have two bags of M&Ms. The first bag, created before 1995, has the following color distribution: 30% brown, 20% yellow, 20% red, 10% orange, 10% green, 10% tan. The second bag, created after 1995, has the following color distribution: 24% blue, 20% green, 16% orange, 14% yellow, 12% red, 12% brown.

From one bag, you pull a yellow M&M. The problem is to predict from which bag you pulled the yellow M&M.

Apply Bayesian analysis to this problem by articulating the hypothesis/hypotheses, the data, the likelihood.

Consider the yellow M&M already pulled (so this is part of your data). From the other bag, you pull a green M&M. Update your posterior appropriately and update your answer to the problem.

In [None]:
## The hypotheses are:
## H_1: Bag 1 is the "pre-1995 bag" and Bag 2 is the "post-1995 bag."
## H_2: Bag 1 is the "post-1995 bag" and Bag 2 is the "pre-1995 bag."

## The data is: we pulled a yellow M&M from Bag 1.

$$ P(H_1|\text{yellow from }B_1) \propto P(\text{yellow from }B_1|H_1)P(H_1) = \frac{1}{5} \times \frac{1}{2} = \frac{1}{10}$$$$ P(H_2|\text{yellow from }B_1) \propto P(\text{yellow from }B_1|H_2)P(H_2) = \frac{14}{100} \times \frac{1}{2} = \frac{7}{100}$$

In [None]:
## Note that, since we scale all probabilities by the same amount,
## this is not currently a valid probability distribution but our
## results are correct up to a constant of proportionality.

## It is clear that it's currently likelier that H_1 is true, indicating
## Bag 1 is the pre-1995 bag - but just how likely is it?

## 1/10 / (1/10 + 7/100) is roughly 58.8%.
## 7/100 / (1/10 + 7/100) is roughly 41.2%.
## There's roughly a three in five chance that Bag 1 is the pre-1995
## bag.

In [None]:
## Now let's consider the new piece of information. We now know that
## we pull a green M&M from Bag 2. Note that our old posterior is
## our new prior.

## The hypotheses are still:
## H_1: Bag 1 is the "pre-1995 bag" and Bag 2 is the "post-1995 bag."
## H_2: Bag 1 is the "post-1995 bag" and Bag 2 is the "pre-1995 bag."

## The data is: we pulled a green M&M from Bag 2.

$$ P(H_1|\text{G from }B_2,\text{Y from }B_1) \propto P(\text{G from }B_2|H_1,\text{Y from }B_1)P(H_1|\text{Y from }B_1) = \frac{1}{5} \times \frac{1}{10} = \frac{1}{50}$$$$ P(H_2|\text{G from }B_2,\text{Y from }B_1) \propto P(\text{G from }B_2|H_2,\text{Y from }B_1)P(H_2|\text{Y from }B_1) = \frac{1}{10} \times \frac{7}{100} = \frac{7}{1000}$$

In [None]:
## Note that, since we scale all probabilities by the same amount,
## this is not currently a valid probability distribution but our
## results are correct up to a constant of proportionality.

## It is clear that it's currently likelier that H_1 is true, indicating
## Bag 1 is the pre-1995 bag - but just how likely is it?

## 1/50 / (1/50 + 7/1000) is roughly 74.1%.
## 7/1000 / (1/50 + 7/1000) is roughly 25.9%.
## There's roughly a three in four chance that Bag 1 is the pre-1995
## bag.