1. What expression corresponds to the sentence, _The probability of being sunny given is that it is 9th of July of 1816?

    p(sunny | 9th of July of 1816)
    
    However, the product rule states:
    
    $ p(A, B) = p(A|B)p(B) $
    
    which can be rearranged to:
    
    $ p(A|B) = \frac{p(A, B)}{p(B)} $
    
    if we replace A with "sunny" and B with "9th of July of 1816," then we see that the expression
    
    $ \frac{p(sunny, 9th\ of\ July\ of\ 1816)}{p(9th\ of\ July\ of\ 1816)} = p(sunny | 9th\ of\ July\ of\ 1816) $
    
    Therefore, 
    
    $ \frac{p(sunny, 9th\ of\ July\ 1816)}{p(9th\ of\ July\ of\ 1816)} $ also corresponds to the sentence.

2. Show that the probability of choosing a human at random and picking the Pope is not the same as the probability of
 the Pope being human.
 
    The probabilities of interest are:

        - The probability that you pick the Pope given that you pick a human: p(Pope | human)
        - The probability that you pick a humane given that you pick the Pope: p(human | Pope)

    The population of the world as of March 2020 is estimated at 7.8 billion people. The Pope is exactly one of those 
    people. Consequently, 

    $p(Pope | human) = \frac{1}{7,800,000,000}$
        
    Assuming we are not in *Futurama*, then

    $p(human | Pope) = 1$
        
    If we live in *Futurama*, then the calculations both change because the Pope is **not** human.

    $p(human | Pope) = 0$<br/>
    $p(Pope | human) = 0$
    

3. In the following definition of a probabilistic model, identify the prior and the likelihood:

    $y _{i} \sim Normal(\mu, \sigma)$<br/>
    $\mu \sim Normal(0, 10)$<br/>
    $\sigma \sim HalfNormal(25)$<br/>

    - Likelihood 
        - $y _{i} \sim Normal(\mu, \sigma)$<br/>

    - Prior 
        - $\mu \sim Normal(0, 10)$ 
        - $\sigma \sim HalfNormal(25)$<br/>

4. In the previous model, how many parameters will the posterior have? Compare it to the model for the 
coin-flipping problem.

    The posterior in the previous problem has two parameters, $\mu$ and $\sigma$.
    
    In the coin-flipping problem, the posterior only has a single parameter, $\theta$, modeling the 
    bias of the coin.
    

5. Write the Bayes' theorem for the model in exercise 3.

    $p(\mu, \sigma | y) = \frac{p(y | \mu, \sigma) p(\mu) p(\sigma)}{p(y)}$
    

6. Let's suppose we have two coins: when we toss the first coin, half of the time it lands tails and 
half of the time on heads. The other coin is a _loaded coin_ that always lands on heads. If we take one of the
coins at random and get a head, what is the probability that is coin is the unfair one.

    Here are our symbols for this problem:
    
        - F fair coin
        - U unfair coin
        - H heads
        - T tails
        
    If we use Bayes' rule to model this problem, we have:
    
    $p(U | H) = \frac{p(H | U) p(U)}{p(H | U) p(U) + p(H | F) p(F)}$
    
    If we then substitute what we know, this equation becomes:
    
    $p(U | H) = \frac{1 \times ^1/_2}{(1 \times ^1/_2) + (^1/_2 \times ^1/_2)}$
    
    Simplifying yields
    
    $ p(U | H) = \frac {^1/_2} {^1/_2 + ^1/_4} $ <br/>
    $ p(U | H) = \frac {^1/_2} {^3/_4} $ <br/>
    $ p(U | H) = \,^2/_3 $
    

7. Modify the code that generated  _Figure 1.5_ in order to add a dotted line showing the observed rate 
(heads / number of tosses). Compare the location of this line to the mode of the posteriors in each subplot.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats

In [None]:
plt.figure(figsize=(10, 8))

n_trials = [0, 1, 2, 3, 4, 8, 16, 32, 50, 150]
data = [0, 1, 1, 1, 1, 4, 6, 9, 13, 48]
theta_actual = 0.35

beta_params = [(1, 1), (20, 20), (1, 4)]  # Uniform, central tendency, weighted toward tails
dist = scipy.stats.beta
x = np.linspace(0, 1, 200)

for ndx, N in enumerate(n_trials):
    if ndx == 0:
        # Plotting the priors
        plt.subplot(4, 3, 2)
        plt.xlabel('Θ')
    else:
        # Plotting the posteriors for different trials
        plt.subplot(4, 3, ndx + 3)
        plt.xticks([])
    y = data[ndx]
    for (a_prior, b_prior) in beta_params:
        p_theta_given_y = dist.pdf(x, a_prior + y, b_prior + N - y)
        plt.fill_between(x, 0, p_theta_given_y, alpha=0.7)
        if ndx != 0:
            plt.axvline(y / n_trials[ndx], ymax=0.3, color='k', linestyle='dashed')
    plt.axvline(theta_actual, ymax=0.3, color='k')
    plt.plot(0, 0, label=f'{N:4d} trials\n{y:4d} heads', alpha=0)
    plt.xlim(0, 1)
    plt.ylim(0, 12)
    plt.legend()
    plt.yticks([])
plt.tight_layout()

At very small values, the observed rate of heads seems to bear no relationship to the mode of the posteriors; 
however as the number of trials increases and the posteriors approach each other, the mode of the posterior(s)
approaches this value - even as this value approaches the actual rate of the generating distribution. 

This observation is consistent with the Central Limit Theorem. As the posteriors approach a Gaussian with the 
mean / mode at the actual bias, the mean and the mode approach each other.
