**Exercise:** Suppose you are giving a talk in a large lecture hall and the fire marshal interrupts because they think the audience exceeds 1200 people, which is the safe capacity of the room. 

You think there are fewer then 1200 people, and you offer to prove it.
It would take too long to count, so you try an experiment:

* You ask how many people were born on May 11 and two people raise their hands.  

* You ask how many were born on May 23 and 1 person raises their hand.  
* Finally, you ask how many were born on August 1, and no one raises their hand.

How many people are in the audience?  What is the probability that there are more than 1200 people.
Hint: Remember the binomial distribution.

In [None]:
import numpy as np
from scipy.special import comb
from scipy.stats import binom
import seaborn as sns

# the hypothesis space represents the population of the lounge.
# we can assume there will be at least 300 people (we will notice otherwise)
# and no more than 2000 (as later we will see that having more than 1200 people
# is somewhat unlikely.
hs = np.arange(100, 2000)

# We choose a uniform prior as each population is equally likely
prior = np.full(hs.size, 1)

# Compute the likelihoods making use of binomial dist
like0 = binom.pmf(0, hs, 1/365)
like1 = binom.pmf(1, hs, 1/365)

# Let's explain what going on here:
# we compute how many different pairs we can build out of each value in hs
# then multiply by the probability of having birthday the same day (1/365)**2
# Finally multiply by the probability of not having a birthday the same day
# (364/365)**(hs-2)
like2 = binom.pmf(2, hs, 1/365)

# Compute posterior
posterior = prior * like0 *like1 * like2
posterior /= posterior.sum()

# Get the probability of having more than 1200 people in the hall
loc_1200 = np.where(hs > 1200)[0][0]
prob_gte_1200 = posterior[loc_1200:].sum().round(3)

expected_population = int((hs * posterior).sum())  # weighted average

# Credible interval
cdf = posterior.cumsum()
lower_index = np.where(cdf <= .05)[0][0]
upper_index = np.where(cdf >= .95)[0][0]


hs[lower], hs[upper]
print(f'The probability of having more than 1200 people in the hall is: {prob_gte_1200}')
print(f'The expected population in the hall is {expected_population}')
print(f'The 95% of the population sizes lay between {hs[lower_index]} and {hs[upper_index]}')

sns.set()
sns.lineplot(x=hs, y=posterior, label="posterior");

In [None]:
(hs * posterior).sum()


In [None]:
binom.pmf(2, 23, 1/365)

In [None]:
(1/365)**3, (364/365)

In [None]:
binom.pmf(5, 10, .5)