In [1]:
# Imports
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from plotly.io import templates
from scipy.stats import norm

from assignment1_utils import *

%load_ext autoreload
%autoreload 2

templates.default = "simple_white"


## Exercise 1
### Question
**Using the football dataset, estimate the following conditional probabilities in two different ways (one through relative frequencies and one using an approximated distribution):**

- P1: Pr(Favorite wins | point spread = 8)
- P2: Pr(Favorite wins by at least 8 points | point spread = 8)
- P3: Pr(Favorite wins by at least 8 points | point spread = 8 and favorite wins)

___
### Answer


Here are the results:

In [2]:
football = Football()
football.display_results()



Unnamed: 0,Prob (freq),Prob (approx)
P1,0.755102,0.726033
P2,0.44898,0.506582
P3,0.594595,0.69774




#### Using relative requencies

We apply the following formula for conditional probabilities:
\begin{equation*}
    \mathbb{P} \left( X | Y \right) = \frac{\mathbb{P} \left( X \cap Y \right)}{\mathbb{P} \left( Y \right)}
\end{equation*}
This is all we need for this section. The details of the computations are available in the `.py` file.


#### Using approximated distribution
##### Computation of 1
Let $z := y - x$. \
Then we write the probability of interest as:
\begin{align*}
\mathbb{P}(y \geq 0 \mid x = 8)
&= \mathbb{P}(y - x \geq -x \mid x = 8) \\
&= \mathbb{P}(z \geq -x \mid x = 8) \\
&= 1 - \mathbb{P}(z < -x \mid x = 8) \\
&= 1 - \mathbb{P}(z < -8) \\
\end{align*}

##### Computation of 2
Let $z := y - x$. \
Then we write the probability of interest as:
\begin{align*}
\mathbb{P}(y \geq 8 \mid x = 8)
&= \mathbb{P}(z \geq 8 - x \mid x = 8) \\
&= 1 - \mathbb{P}(z < 8 - x \mid x = 8) \\
&= 1 - \mathbb{P}(z < 0) \\
\end{align*}

##### Computation of 3
Let $z := y - x$. \
Then we write the probability of interest as:
\begin{align*}
\mathbb{P}(y \geq 8 \mid x = 8, y \geq 0)
&= \mathbb{P}(z \geq 8 - x \mid x = 8, z + x \geq 0) \\
&= \mathbb{P}(z \geq 0 \mid z \geq -8) \\
&= 1 - \mathbb{P}(z < 0 \mid z \geq -8) \\
&= 1 - \frac{\mathbb{P}(z < 0, z \geq -8)}{\mathbb{P}(z \geq -8)} \\
&= 1 - \frac{\mathbb{P}(-8 \leq z < 0)}{1 - \mathbb{P}(z < -8)} \\
\end{align*}

## Exercise 2
### Question

A random sample of n students is drawn from a large population, and their weights are measured. The average weight of the $n$ sampled students is $y^{mean} = 70$ Kg. We assume that the weights in the population are normally distributed with unknown mean $\theta$, and known standard deviation 10 Kg. Suppose your prior distribution for $\theta$ is normal with mean 80 Kg and standard deviation 15 Kg.

**1) Give the posterior distribution of $\theta$ (the answer will be a function of n).**


Let:
\begin{align*}
\mu &:= \theta \\
\sigma &:= 10 \\
\mu_0 &:= 80 \\
\tau_0 &:= 15 \\
\bar{y} &:= y^{mean} = 70 \\
\end{align*}
Then, we get from the question above:
\begin{align*}
y &\sim \mathcal{N}(\mu, \sigma^2) \\
\theta &\sim \mathcal{N}(\mu_0, \tau_0^2)
\end{align*}
Now, the exercise asks us to find the posterior. Following Bayes' rule, that is:
\begin{align*}
\mathbb{P}(\theta | y)
&= \frac{\mathbb{P}(y | \theta) \mathbb{P}(\theta)}{\mathbb{P}(y)} \\
\end{align*}

<!-- ##### Denominator

We assume that the weights of students are independent. We know that they are identically distributed. We therefore deduce:
\begin{align*}
\mathbb{P}(y)
&= \prod*{i=1}^n \frac{1}{\sigma\sqrt{2\pi}} \exp\left[ -\frac{1}{2\sigma^2} \left( y_i - \theta \right)^2 \right] \\
&= \frac{1}{(2\pi)^{\frac{n}{2}} \sigma^n} \exp\left[-\frac{1}{2\sigma^2} \sum*{i=1}^n \left( y_i - \theta \right)^2 \right] \\
\end{align*}

##### Numerator

For the numerator, w -->

We will skip the long computation that was done in class (lecture 02, "The maths guts of Bayesian inference in Gaussian models" & "Multiple Gaussian observations"). \
According to this exercise, we have that $\mathbb{P}(\theta|y)\sim N(\mu_n,\sigma_n^2)$, with:

\begin{align*}
&
\begin{cases}
\mu_n = \left[\frac{n \bar{y}}{\sigma^2} + \frac{\mu_0}{\tau_0^2}\right] / \left[\frac{n}{\sigma^2} + \frac{1}{\tau_0^2}\right] \\
\sigma_n^2 = 1 / \left[ \frac{n}{\sigma^2} + \frac{1}{\tau_0^2} \right] \\
\end{cases} \\
\implies&
\begin{cases}
\mu_n = \left[ \frac{n \bar{y} \tau_0^2 + \mu_0\sigma^2}{\sigma^2 \tau_0^2} \right] / \left[ \frac{n \tau_0^2 + \sigma^2}{\sigma^2 \tau_0^2} \right] \\
\sigma_n^2 = \frac{1}{\frac{n \tau_0^2 + \sigma^2}{\sigma^2 \tau_0^2}} \\
\end{cases} \\
\implies&
\begin{cases}
\mu_n = \frac{n \bar{y} \tau_0^2 + \mu_0\sigma^2}{n \tau_0^2 + \sigma^2} \\
\sigma_n^2 = \frac{\sigma^2 \tau_0^2}{n \tau_0^2 + \sigma^2} \\
\end{cases} \\
\end{align*}



**2) For n=10, and n=100, give a 95% posterior interval for $\theta$.**

In [3]:
posterior10 = Posterior(n_obs=10)
posterior100 = Posterior(n_obs=100)
print(posterior10, posterior100, sep=2 * "\n")


For 10 observations, we have:
	- The posterior mean for theta is: 70.4255
	- The posterior std for theta is: 3.0943
	- The centered 95% confidence interval for theta is: [64.3609, 76.4902]

For 100 observations, we have:
	- The posterior mean for theta is: 70.0442
	- The posterior std for theta is: 0.9978
	- The centered 95% confidence interval for theta is: [68.0886, 71.9999]


#### Note
We notice that increasing the sample size:
1. Reduces the size of the 95% confidence interval and the standard deviation, in the spirit of the central limit theorem.
1. Shifts the mean for theta towards $\bar{y}$. The reason for this is clear, we can move further away from our prior since we have a stronger belief in our empirical mean.

We can even go further on the second point. From the expression:
$$\mu_n = \frac{n \bar{y} \tau_0^2 + \mu_0\sigma^2}{n \tau_0^2 + \sigma^2}$$
We see that the limit as $n \to \infty$ will completely ignore the prior and focus only of the empirical data until (in the limit) it reaches $\bar{y}$.

## Exercise 3
### Question

Suppose your prior distribution for $\theta$, the proportion of Californians who support the death penalty, is Beta with mean 0.6 and standard deviation 0.3.

**1) Determine the parameters $\alpha$ and $\beta$ of your prior distribution and plot it.**

___
### Answer

#### 1)
We know that for a random variable $X \sim \mathcal{B} \left( \alpha, \beta \right)$, the expectation is given by:
\begin{equation*}
    \mathbb{E}[X] = \frac{\alpha}{\alpha + \beta}
\end{equation*}
and the variance is given by:
\begin{equation*}
    \mathbb{V}[X] = \sigma^2[X] = \frac{\alpha \beta}{(\alpha + \beta)^2 (\alpha + \beta + 1)}
\end{equation*}
therefore, we have a system of 2 equations with 2 unknown, which we can solve.
\begin{align*}
    &
    \begin{cases}
        \mathbb{E}[X] = \frac{\alpha}{\alpha + \beta} \\
        \sigma^2[X] = \frac{\alpha \beta}{(\alpha + \beta)^2 (\alpha + \beta + 1)}
    \end{cases} \\
    \implies&
    \begin{cases}
        0.6 = \frac{\alpha}{\alpha + \beta} \\
        0.3^2 = \frac{\alpha \beta}{(\alpha + \beta)^2 (\alpha + \beta + 1)}
    \end{cases} \\
    \implies&
    \begin{cases}
        0.6 \alpha + 0.6 \beta = \alpha \\
        0.3^2 = \frac{\alpha \beta}{(\alpha + \beta)^2 (\alpha + \beta + 1)}
    \end{cases} \\
    \implies&
    \begin{cases}
        0.4 \alpha = 0.6 \beta \\
        0.3^2 = \frac{\alpha \beta}{(\alpha + \beta)^2 (\alpha + \beta + 1)}
    \end{cases} \\
    \implies&
    \begin{cases}
        \alpha = \frac{3}{2} \beta \\
        0.3^2 = \frac{\frac{3}{2} \beta^2}{(\frac{3}{2} \beta + \beta)^2 (\frac{3}{2} \beta + \beta + 1)}
    \end{cases} \\
    \implies&
    \begin{cases}
        \alpha = \frac{3}{2} \beta \\
        0.3^2 = \frac{\frac{3}{2} \beta^2}{(\frac{5}{2} \beta)^2 (\frac{5}{2} \beta + 1)}
    \end{cases} \\
    \implies&
    \begin{cases}
        \alpha = \frac{3}{2} \beta \\
        0.3^2 = \frac{3}{\frac{25}{2} (\frac{5}{2} \beta + 1)}
    \end{cases} \\
    \implies&
    \begin{cases}
        \alpha = \frac{3}{2} \beta \\
        \frac{9}{100} = \frac{12}{25 (5 \beta + 2)}
    \end{cases} \\
    \implies&
    \begin{cases}
        \alpha = \frac{3}{2} \beta \\
        5 \beta + 2 = 12 \times \frac{4}{9}
    \end{cases} \\
    \implies&
    \begin{cases}
        \alpha = \frac{3}{2} \beta \\
        5 \beta = \frac{16 - 6}{3}
    \end{cases} \\
    \implies&
    \begin{cases}
        \alpha = \frac{3}{2} \beta \\
        \beta = \frac{2}{3}
    \end{cases} \\
    \implies&
    \begin{cases}
        \alpha = 1 \\
        \beta = \frac{2}{3}
    \end{cases} \\
\end{align*}
So we get the pair $(\alpha, \beta) = \left(1, \frac{2}{3} \right)$.

Now we plot this distribution.

In [4]:
a = 1
b = 2 / 3


plot_beta_pdf(a=a, b=b)


**2) A random sample of 1000 Californians is taken, 65% support the death penalty. What are your posterior mean and variance ? Plot the posterior density function.**

Let $\theta$ be the probability that a given Californian supports the death penalty. We determined that the prior distribution of $\theta$ is given by:
$$
\theta \sim Beta \left(1, \frac{2}{3} \right)
$$
therefore we have:
\begin{align*}
\mathbb{P}(\theta)
&= Beta(\alpha,\beta) \\
& \propto \theta^{\alpha-1}(1-\theta)^{\beta-1}
\end{align*}
Let $y$ be the random variable that describes the number of Californians in favor of the death penalty. Under the assumption that drawings from the population of Californians are independent, $y|\theta$ follows a binomial distribution. Hence, with:
- $n=1000$ the sample size
- $k=650$ the number of positive events (being in favor of the death penalty in our case)

the likelihood is given by:
$$
\mathbb{P}(y=k|\theta) = \binom{n}{k}\theta^{k}(1-\theta)^{n-k}
$$
Therefore by Bayes theorem, the posterior of $\theta$ is given by:
\begin{align*}
\mathbb{P}(\theta|y)
&\propto \mathbb{P}(y|\theta) * \mathbb{P}(\theta) \\
&\propto \theta^{k}(1-\theta)^{n-k} * \theta^{\alpha-1}(1-\theta)^{\beta-1} \\
&= \theta^{\alpha + k - 1}(1-\theta)^{\beta + n - k - 1} \\
\end{align*}
We recognize a Beta distribution, namely with parameter $\alpha + k$ and $\beta + n - k$. That is:
$$
\theta | y \sim Beta(\alpha + k, \beta + n - k)
$$
replacing $\alpha = 1, \beta = \frac{2}{3}, n = 1000, k = 650$, we get:
$$
\theta | y \sim Beta(651, \frac{1052}{3})
$$
Let us define $\alpha' := 651$ and $\beta' := \frac{1052}{3}$. The mean is given by:
\begin{align*}
\mathbb{E}[X] 
&:= \frac{\alpha'}{\alpha' + \beta'} \\
&= \frac{651}{651 + \frac{1052}{3}} \\
&= \frac{651}{\frac{3005}{3}} \\
&= \frac{1953}{3005} \\
&\approx 0.6499
\end{align*}
and the variance is given by:
\begin{align*}
\mathbb{V}[X] 
&:= \frac{\alpha' \beta'}{(\alpha' + \beta')^2 (\alpha' + \beta' + 1)} \\
&= \frac{651 * \frac{1052}{3}}{\left(651 + \frac{1052}{3}\right)^2 \left(651 + \frac{1052}{3} + 1\right)} \\
&= \frac{228284}{\left(\frac{3005}{3}\right)^2 \left(\frac{3008}{3}\right)} \\
&= 228284 \left(\frac{3}{3005}\right)^2 \left(\frac{3}{3008}\right) \\
&= \frac{1\ 540\ 917}{6\ 790\ 578\ 800} \\
&\approx 0.0002
\end{align*}

In [5]:
sample_size = 1000
prop_support = 0.65
a = 651
b = 1052 / 3
fig = plot_beta_pdf(a=a, b=b)
fig.update_layout(title=f"Posterior distribution of Beta{round(a, 2), round(b, 2)}")

**3) Examine the impact of the prior parameters on the posterior distribution through different statistics (i.e mean, median, 95% posterior interval).**

*<u>Note</u>: The following plot contains a slider that allows to modify $\alpha$ for a given beta. This allows to explore the distribution as $\alpha$ changes.*

In [6]:
slider_plot(b=b)



divide by zero encountered in _beta_pdf


plotly.graph_objs.Line is deprecated.
Please replace it with one of the following more specific types
  - plotly.graph_objs.scatter.Line
  - plotly.graph_objs.layout.shape.Line
  - etc.




In [11]:
df = compute_beta_stats()
df

Unnamed: 0,prior_alpha,prior_beta,post_alpha,post_beta,post mean,post median,post 95% CI lower bound,post 95% CI upper bound,post 95% CI spread
0,1,1,651,351,0.649701,0.649800,0.619901,0.678934,0.059033
1,1,2,651,352,0.649053,0.649152,0.619257,0.678286,0.059029
2,1,5,651,355,0.647117,0.647215,0.617333,0.676348,0.059015
3,1,10,651,360,0.643917,0.644012,0.614153,0.673142,0.058989
4,1,15,651,365,0.640748,0.640840,0.611006,0.669966,0.058960
...,...,...,...,...,...,...,...,...,...
139,50,30,700,380,0.648148,0.648240,0.619427,0.676350,0.056922
140,50,35,700,385,0.645161,0.645251,0.616458,0.673358,0.056900
141,50,40,700,390,0.642202,0.642289,0.613517,0.670392,0.056875
142,50,45,700,395,0.639269,0.639354,0.610605,0.667452,0.056847


*<u>Note:</u> The following is a 3D plot that allows to explore multiple values of $\alpha$ and $\beta$*.

In [9]:
df = compute_beta_stats()
df_mean_pivoted = df.pivot(index="prior_alpha", columns="prior_beta", values= "post mean")
df_mean_pivoted
surface = go.Surface(z=df_mean_pivoted.values, x=df_mean_pivoted.index, y=df_mean_pivoted.columns)
fig = go.Figure(surface)
fig

## Exercise 4

**1) Which of the expressions below correspond to the statement: *the probability of rain on Monday* ?**

- Pr(rain)
- **Pr(rain|Monday)** <--
- Pr(Monday|rain)
- **Pr(rain, Monday) / Pr(Monday)** <--


**2) Which of the following statements corresponds to the expression: *Pr(Monday|rain)* ?**

- The probability of rain on Monday.
- The probability of rain, given that it is Monday.
- **The probability that it is Monday, given that it is raining.** <--
- The probability that it is Monday and it is raining.


**3) Which of the expressions below correspond to the statement: *the probability that it is Monday, given that it is raining* ?**

- **Pr(Monday|rain)** <--
- Pr(rain|Monday)
- Pr(rain | Monday)Pr(Monday)
- **Pr(rain | Monday)Pr(Monday)/Pr(rain)** <--
- Pr(Monday|rain)Pr(rain)/Pr(Monday)

## Exercise 5

Suppose there are two species of panda bear. Both are equally common in the wild and live in the same places. They look exactly alike and eat the same food, and there is yet no genetic assay capable of telling them appart. They differ however in their family sizes. Species A gives birth to twins 10% of the time, otherwise birthing a single infant. Species B births twins 20% of the time, otherwise birthing singleton infants. Assume these numbers are known with certainty, from many years of field research. Now suppose you are managing a captive panda breeding program. You have a new female panda of unknown species, and she has just given birth to twins. 

**What is the probability that her next birth will also be twins ?**

Let:

- $A$ be the event that panda is of species A.
- $B$ be the event that panda is of species B.
- $T$ be the event that panda has twins.
- $S$ be the event that panda has a singletone infant.
- $T_P$ be the event that female panda gave birth to twins in the past
- $T_F$ be the event that female panda will birth to twins in the past in her next litter

We get from the question above:
\begin{align*}
\mathbb{P}(A) &= \frac{1}{2} \\
\mathbb{P}(B) &= \frac{1}{2} \\
\mathbb{P}(T|A) &= \frac{1}{10} \\
\mathbb{P}(T|B) &= \frac{2}{10} \\
\end{align*}

We want to find the following:
\begin{align*}
\mathbb{P}(T_F | T_P)
&= \frac{\mathbb{P}(T_P, T_F)}{\mathbb{P}(T_P)} \\
&= \frac{\mathbb{P} (T_P, T_F, A) + \mathbb{P} (T_P, T_F, B)}{\mathbb{P} (T_P, A) + \mathbb{P} (T_P, B)} \\
&= \frac{\mathbb{P}(T_P, T_F|A)\mathbb{P}(A) + \mathbb{P}(T_P, T_F|B)\mathbb{P}(B)}{\mathbb{P}(T_P, A) + \mathbb{P}(T_P, B)} \\
&= \frac{\mathbb{P}(T_P|A)\mathbb{P}(T_F|A)\mathbb{P}(A) + \mathbb{P}(T_P|B)\mathbb{P}(T_F|B)\mathbb{P}(B)}{\mathbb{P}(T_P, A) + \mathbb{P}(T_P, B)} \\
&= \frac{\frac{1}{10^2}\frac{1}{2} + \frac{2^2}{10^2}\frac{1}{2}}{\frac{1}{10}\frac{1}{2} + \frac{2}{10}\frac{1}{2}} \\
&= \frac{\frac{5}{200}}{\frac{3}{20}} \\
&= \frac{1}{6} \\
\end{align*}
Therefore, the probability of this female panda having twins is $\frac{1}{6}$.