# Exercise 3.1: Binomial and multinomial models

(a) Let $y=(y_1, \dots, y_J) \mid \theta \sim \text{Multinomial}(\theta_1, \dots, \theta_J)$ where $\sum \theta_j = 1$ and $\theta = (\theta_1, \dots,  \theta_J) \sim \text{Dirichlet}(\alpha_1, \dots, \alpha_J)$. 

Now define $z_i = \alpha_i + y_i$ for all $i = 1, .., J$, then we have

\begin{equation*}
\theta \mid y \sim \text{Dirichlet}(z_1, \dots, z_J)
\end{equation*}

and 

\begin{equation*}
\tilde{\theta} = (\theta_1, \theta_2, 1-\theta_1 - \theta_2) \mid y \sim \text{Dirichlet}(z_1, z_2, \sum_{j=3}^N z_j) \:\:\:\: \text{ (cfr. Appendix A)}
\end{equation*}

Therefore 

\begin{equation*}
p(\tilde{\theta} \mid y) = p(\theta_1, \theta_2 \mid y) \propto \theta^{z_1 - 1}_1 \theta^{z_2 - 1}_2 (1 - \theta_1 - \theta_2)^{(\sum_{j=3}^N z_j) - 1}
\end{equation*}

Define $\phi : \mathbb{R}^2 \rightarrow \mathbb{R}^2$ such that:

\begin{equation*}
\phi(\theta_1, \theta_2) = \left(\frac{\theta_1}{\theta_1 + \theta_2}, \theta_1 + \theta_2 \right) =: (\alpha, \beta)
\end{equation*}

which is a bijection with inverse:

\begin{equation*}
\phi^{-1}(\alpha, \beta) = \big(\alpha\beta, \beta(1-\alpha) \big)
\end{equation*}

Hence 

\begin{equation*}
J_{\phi^{-1}} = \begin{pmatrix}
\beta & -\beta\\
\alpha & (1-\alpha)
\end{pmatrix}
\:\: \Longrightarrow \:\: \big| J_{\phi^{-1}} \big| = \beta
\end{equation*}

Therefore we obtain:

\begin{equation*}
\begin{split}
p(\alpha,\beta\mid y) & = p\big(\phi^{-1}(\alpha, \beta) \mid y \big) \cdot \Big| J_{\phi^{-1}} \Big|\\[5pt]
& \propto (\alpha\beta)^{z_1-1}\beta^{z_2-1}(1-\alpha)^{z_2-1}(1-\beta)^{(\sum_{j=3}^N z_j) - 1}\beta \\[5pt]
& = \alpha^{z_1-1}(1-\alpha)^{z_2-1}\beta^{z_1+z_2-1}(1-\beta)^{(\sum_{j=3}^N z_j) - 1}
\end{split}
\end{equation*}

Marginalizing we get the posterior distribution of $\alpha$:

\begin{equation*}
\begin{split}
p(\alpha\mid y) & = \int^1_0 p(\alpha,\beta\mid y) d\beta\\
& \propto \alpha^{z_1-1}(1-\alpha)^{z_2-1}\\[10pt]
\Longrightarrow \:\:& \alpha\mid y \sim \text{Beta}(z_1, z_2) = \text{Beta}(\alpha_1 + y_1, \alpha_2 + y_2) 
\end{split}
\end{equation*}

(b) If $y_1 \mid \alpha \sim \text{Binomial}(y_1 + y_2, \alpha)$ and  $\alpha \sim \text{Beta}(\alpha_1, \alpha_2)$, then 
$\alpha \mid y_1 \sim \text{Beta}(\alpha_1 + y_1, \alpha_2 + y_2)$.

# Exercise 3.2: Comparison of two multinomial observations

Let $y=(y_1, y_2, y_3) \mid \theta \sim \text{Multinomial}(\theta_1, \theta_2, \theta_3)$ with $\theta = (\theta_1, \theta_2, \theta_3) \sim \text{Dirichlet}(1,1,1)$, and let $z=(z_1, z_2, z_3) \mid \omega \sim \text{Multinomial}(\omega_1, \omega_2, \omega_3)$ with $\omega = (\omega_1, \omega_2, \omega_3)\sim \text{Dirichlet}(1,1,1)$.

Then $\theta \mid y \sim \text{Dirichlet}(295, 308, 39)$ and $\omega \mid z \sim \text{Dirichlet}(289, 333, 20)$.
Now let $\alpha_1 = \frac{\theta_1}{\theta_1 + \theta_2}$ and $\alpha_2 =\frac{\omega_1}{\omega_1 + \omega_2}$, then from Exercise 3.1 follows that $\alpha_1 \mid y \sim \text{Beta}(295, 308)$ and $\alpha_2 \mid z \sim \text{Beta}(289, 333)$.

```python
import numpy as np
import scipy.stats as stats
import plotly.graph_objects as go

a1 = stats.beta(a=295, b=308).rvs(5000)
a2 = stats.beta(a=289, b=333).rvs(5000)

np.mean(a2-a1>0)
0.1924
```

and the histogram of the posterior density for $\alpha_2 - \alpha_1$ is:

```python
fig = go.Figure(
        go.Histogram(
            x=a2-a1,
            histnorm='probability'         
        )
)

fig.update_layout(
    title={'text': 'Fig 3.1 - Histogram of alpha2 - alpha1',
           'y':0.9, 'x':0.5, 'xanchor': 'center', 'yanchor': 'top'},
    xaxis={'title': 'alpha2 - alpha1'}, yaxis={'title': 'frequencies'})

fig
```
<img src="figures/fig3.1.png">


# Exercise 3.3: Estimation from two independent experiments

(a) Let's denote with subscript $c$ the control group and subscript $t$ the treatment group.

- _Control group_: $y_c \sim N(\mu_c, \sigma^2_c)$, $n_c=32$, $\bar{y_c}=1.013$, $s_c = 0.24$.
- _Treatment group_: $y_t \sim N(\mu_t, \sigma^2_t)$, $n_t=36$, $\bar{y_t}=1.173$, $s_t =0.2$.

Starting from a noninformative prior distribution, follows that 
\begin{equation*}
\begin{split}
& \mu_c \mid y_c \sim t_{n_c-1}\left( \bar{y_c}, \frac{s^2_c}{n_c} \right) = t_{31}\left( 1.013, \frac{0.24^2}{32}\right) = t_{31}\left( 1.013, (0.0424)^2\right) \\[5pt]
& \mu_t \mid y_t \sim t_{n_t-1}\left( \bar{y_t}, \frac{s^2_t}{n_t} \right) = t_{35}\left( 1.173, \frac{0.20^2}{36}\right) = t_{35}\left( 1.173, (0.0333)^2\right) \\[5pt]
\end{split}
\end{equation*}

(b) Let's sample from these two distributions:

```python
import numpy as np
import scipy.stats as stats
import plotly.graph_objects as go

mu_c = stats.t(df=31, loc=1.013, scale=0.0424).rvs(5000)
mu_t = stats.t(df=35, loc=1.173, scale=0.0333).rvs(5000)

m_diff = mu_t - mu_c 
print(np.percentile(m_diff, [2.5, 97.5]))
[0.04719944 0.26832581]
```

Based on this sampling 95% posterior interval for $\mu_t - \mu_c$ is $[0.047, 0.268]$ and its histogram is:

```python
fig = go.Figure(
        go.Histogram(
            x=m_diff,
            histnorm='probability'         
        )
)

fig.update_layout(
    title={'text': 'Fig 3.2 - Histogram of mu_t - mu_c',
           'y':0.9, 'x':0.5, 'xanchor': 'center', 'yanchor': 'top'},
    xaxis={'title': 'mu_t - mu_c'}, yaxis={'title': 'frequencies'})

fig
```
<img src="figures/fig3.2.png">


# Exercise 3.4: Inference for a 2 × 2 table

(a) Let:

- _Control_: $n_0=674, y_0=39$
- _Treatment_: $n_1=680, y_1=22$

\begin{equation*}
\begin{split}
y_i\mid p_i \sim \text{Binomial}(n_i, p_i) \\[5pt]
p_i \sim \text{Beta}\left(\alpha =\frac{1}{2}, \beta =\frac{1}{2}\right)
\end{split}
\end{equation*}

Then 
\begin{equation*}
\begin{split}
p_i \mid y_i \sim \text{Beta}(y_i + \alpha, n_i -y_i + \beta) = \text{Beta}\left(y_i + \frac{1}{2}, n_i -y_i + \frac{1}{2}\right)
\end{split}
\end{equation*}

(b) Define $\tilde{p} = \frac{p_1(1-p_0)}{p_0(1-p_1)}$, let's sample $p_i\mid y_i$ to plot the histogram of $\tilde{p}$.

```python
import numpy as np
import scipy.stats as stats
import plotly.graph_objects as go
import pandas as pd
p0 = stats.beta(a=39+0.5, b=674-39+0.5).rvs(5000)
p1 = stats.beta(a=22+0.5, b=680-22+0.5).rvs(5000)

p = (p1*(1-p0))/(p0*(1-p1))
print(pd.DataFrame(p).describe().T)
```
|   count |     mean |      std |      min |      25% |      50% |      75% |     max |
|--------:|---------:|---------:|---------:|---------:|---------:|---------:|--------:|
|    5000 | 0.565959 | 0.155294 | 0.207334 | 0.454819 | 0.546833 | 0.657715 | 1.40768 |

```python
fig = go.Figure(
        go.Histogram(
            x=p,
            histnorm='probability'         
        )
)

fig.update_layout(
    title={'text': 'Fig 3.3 - Histogram of odds',
           'y':0.9, 'x':0.5, 'xanchor': 'center', 'yanchor': 'top'},
    xaxis={'title': 'odds '}, yaxis={'title': 'frequencies'})

fig
```
<img src="figures/fig3.3.png">

(c) Let's compute expected value and standard deviation given some different values of $\alpha$ and $\beta$, maintaining $\frac{\alpha}{\alpha+\beta}$ constant (=0.5).

```python
import numpy as np
import scipy.stats as stats
import pandas as pd

alpha=beta=np.arange(0.5, 500, 50)
p0 = stats.beta(a=39+alpha, b=674-39+beta)
p1 = stats.beta(a=22+alpha, b=680-22+beta)

pd.DataFrame(data={
    'alpha': alpha,
    'beta': beta,
    'alpha/(alpha+beta)': alpha/(alpha+beta),
    'p0 mean': p0.mean(),
    'p0 std': p0.std(),
    'p1 mean': p1.mean(),
    'p1 std': p1.std()
})
```

|    |   alpha |   beta |   alpha/(alpha+beta) |   p0 mean |     p0 std |   p1 mean |     p1 std |
|---:|--------:|-------:|---------------------:|----------:|-----------:|----------:|-----------:|
|  0 |     0.5 |    0.5 |                  0.5 | 0.0585185 | 0.00902774 | 0.0330396 | 0.00684431 |
|  1 |    50.5 |   50.5 |                  0.5 | 0.115484  | 0.0114732  | 0.0928297 | 0.0103773  |
|  2 |   100.5 |  100.5 |                  0.5 | 0.159429  | 0.0123685  | 0.139047  | 0.0116503  |
|  3 |   150.5 |  150.5 |                  0.5 | 0.194359  | 0.0126663  | 0.175841  | 0.0121481  |
|  4 |   200.5 |  200.5 |                  0.5 | 0.222791  | 0.0126856  | 0.205828  | 0.0122912  |
|  5 |   250.5 |  250.5 |                  0.5 | 0.246383  | 0.0125654  | 0.230737  | 0.0122543  |
|  6 |   300.5 |  300.5 |                  0.5 | 0.266275  | 0.0123739  | 0.251756  | 0.0121218  |
|  7 |   350.5 |  350.5 |                  0.5 | 0.283273  | 0.012147   | 0.269732  | 0.0119386  |
|  8 |   400.5 |  400.5 |                  0.5 | 0.297966  | 0.0119047  | 0.28528   | 0.0117295  |
|  9 |   450.5 |  450.5 |                  0.5 | 0.310794  | 0.0116582  | 0.298861  | 0.0115089  |

One can see how as we increase the $\alpha$ and $\beta$ which sum is a proxy for the number of prior observations, the posterior mean move towards the prior mean, on the other hand, when the posterior is based on a large sample, statistics are less sensitive to the prior distribution.

# Exercise 3.5: Rounded data

Let $y=(y_1, \dots, y_5)$ be such that $y_i \mid \mu, \sigma^2 \sim N(\mu, \sigma^2)$ and let $z=(z_1, \dots, z_5) = (10,10,12,11,9)$ be such that $z_i = \text{round}(y_i)$. Then $\bar{z}=10.4, s^2_z=1.3$ and $n=5$.

(a) Assuming $y=z$ we get:

\begin{equation*}
\begin{split}
\sigma^2 \mid z \sim \text{Inv-}\chi^2(n-1, s^2_z) = \text{Inv-}\chi^2(4, 1.3)\\[5pt]
\mu \mid \sigma^2, z \sim N\left(\bar{z}, \frac{\sigma^2}{n}\right) = N\left(10.4, \frac{\sigma^2}{5}\right) = N\left( 10.4, \left( \frac{ \sigma}{\sqrt{5}} \right)^2 \right)
\end{split}
\end{equation*}

and 

\begin{equation*}
\begin{split}
p(\mu, \sigma^2 \mid z) \propto \sigma^{-n-2} \text{exp}\left(-\frac{1}{2\sigma^2} \left( (n-1)s^2_z + n(\bar{z}-\mu)^2\right) \right)
\end{split}
\end{equation*}


(b) Notice that to get $z_i$, $y_i$ should fall in the range $[z_i-0.5, z_i+0.5]$, therefore:

\begin{equation*}
\begin{split}
p(z_i \mid \mu,\sigma^2) & = \int^{z_i +0.5}_{z_i-0.5} p(y_i \mid \mu,\sigma^2) dy_i\\[5pt]
& = \int^{z_i +0.5}_{z_i-0.5} N(y_i \mid \mu,\sigma^2) dy_i\\[5pt]
& = \Phi\left( \frac{z_i+0.5 - \mu}{\sigma}\right) - \Phi\left( \frac{z_i-0.5 - \mu}{\sigma}\right)\\[5pt]
\end{split}
\end{equation*}

Now starting from a noninformative prior distribution $p(\mu,\sigma^2)\propto\sigma^{-2}$, we end up with:

\begin{equation*}
\begin{split}
p(\mu, \sigma^2 \mid z) & \propto p(\mu,\sigma^2) p(z_i \mid \mu,\sigma^2) \\[5pt]
& = \sigma^{-2} \prod^{5}_{i=1}\Phi\left( \frac{z_i+0.5 - \mu}{\sigma}\right) - \Phi\left( \frac{z_i-0.5 - \mu}{\sigma}\right)\\[5pt]
\end{split}
\end{equation*}

(c) Using the results from (a) and (b):

```python
import numpy as np
import pandas as pd
import scipy.stats as stats

z=np.array([10,10,12,11,9])
mu = np.linspace(9, 12, 100)
sigma2 = np.linspace(0.01, 4.5, 100)

mu_, sigma2_ = np.meshgrid(mu, sigma2)

def get_posterior(mu, sigma2, z):
                
    n_rv= stats.norm()
    phi1 = n_rv.cdf((z[:,None,None] + 0.5 - mu[None, :, :])/np.sqrt(sigma2[None,:,:]))
    phi2 = n_rv.cdf((z[:,None,None] - 0.5 - mu[None, :, :])/np.sqrt(sigma2[None,:,:]))
    
    p = (1/sigma2) * np.prod( phi1 - phi2, axis=0)
    return p/p.sum()

def get_posterior_rounded(mu, sigma2, z):
    
    sigma = np.sqrt(sigma2)
    n = z.size
    
    z_bar = np.mean(z)
    s2 = np.square(np.std(z, ddof=1))
    
    p = sigma**(-n-2) * np.exp(-((n-1)*s2 + n*(z_bar-mu)**2)/(2*sigma2))
    
    return p/p.sum()

p = get_posterior(mu_, sigma2_, z)
p_z = get_posterior_rounded(mu_, sigma2_, z)

results = pd.DataFrame(data={'p': p.reshape(-1), 'p_rounded': p_z.reshape(-1)})
results.describe(percentiles=np.array([5, 25, 50, 75, 95])/100).T
```
|           |   count |   mean |         std |   min |          5% |         25% |         50% |         75% |         95% |         max |
|:----------|--------:|-------:|------------:|------:|------------:|------------:|------------:|------------:|------------:|------------:|
| p         |   10000 | 0.0001 | 0.000150288 |     0 | 5.53288e-08 | 1.85669e-05 | 3.91275e-05 | 0.00010643  | 0.000449053 | 0.000852894 |
| p_rounded |   10000 | 0.0001 | 0.000146384 |     0 | 8.05102e-10 | 1.87752e-05 | 4.08523e-05 | 0.000109288 | 0.000445568 | 0.000813034 |

And their contours:
```python
from plotly.subplots import make_subplots

contours={
    'start': p.min(),
    'end': p.max(),
    'size': (p.max() - p.min())/20,
}

fig = make_subplots(
        rows=1, cols=2,
        subplot_titles=['p(mu, sigma^2|y)', 'p(mu, sigma^2|z)'])


fig.add_trace(
    go.Contour(
        x=mu, y=sigma2, z=p,
        colorscale='Hot',
        showscale=False,
        contours=contours
    ),
    row=1, col=1
)

fig.add_trace(
    go.Contour(
        x=mu, y=sigma2, z=p_z,
        colorscale='Hot',
        showscale=False,
        contours=contours
    ),
    row=1, col=2
)

fig.update_layout(
    xaxis1={'title': 'mu'},
    xaxis2={'title': 'mu'},
    yaxis1={'title': 'sigma'},
    yaxis2={'title': 'sigma', 'side': 'right'}
)

fig
```

<img src="figures/fig3.4.png">

(d) I am not sure this is the way to go:

```python
mu_sim = np.random.choice(mu, size=(10000,), p=p.sum(axis=0)/p.sum())
sigma2_sim = np.random.choice(sigma2, size=(10000,), p=p.sum(axis=1)/p.sum())

def get_posterior_sampling(mu, sigma2, z):
                
    n_rv = stats.norm(loc=mu[None, :], scale=np.sqrt(sigma2[None,:]))
    phi1 = n_rv.cdf(z[:,None] + 0.5)
    phi2 = n_rv.cdf(z[:,None] - 0.5)
    
    z_ = n_rv.ppf(phi1 + np.random.uniform(size=phi1.shape)*(phi2-phi1))
    return z_

z_ = get_posterior_sampling(mu_sim, sigma2_sim, z)
print(np.mean(np.square((z_[0] - z_[1]))))
0.15854576510766008
```

# Exercise 3.6: Binomial with unknown probability and sample size

We have $y_i \mid N, \theta \sim \text{Binomial}(N, \theta)$ for all $i\in \{1,\dots, n\}$, $N\mid \mu \sim \text{Poisson}(\mu)$ and $\theta\sim \text{Uniform}[0,1]$ and  define $\lambda = \mu \theta$, with noninformative prior $p(\lambda, \theta) \propto \lambda^{-1}$.

(a) Remark that $\lambda$ corresponds to the expected number of successes in the process i.e. $\lambda = E[\text{Binomial}(N, \theta)]$. The noninformative prior $p(\lambda, \theta) \propto \lambda^{-1}$ is improper since 

\begin{equation*}
\int^\infty_0 \lambda^{-1} d\lambda = \left[ \text{log}(\lambda) \right]^\infty_0
\end{equation*}

Now let $\phi(\mu, \theta) = (\lambda=\mu\theta, \theta)$, then $J_{\phi} = \begin{pmatrix}
\theta & \mu\\
0 & 1
\end{pmatrix}
$, hence 

\begin{equation*}
\begin{split}
p(\mu,\theta) \propto p(\lambda, \theta) \cdot \theta = \mu^{-1} \Longrightarrow p(\mu) = \int^1_0 p(\mu,\theta) d\theta \propto \mu^{-1}
\end{split}
\end{equation*}

Now we obtain the marginal distribution for $N$:

\begin{equation*}
\begin{split}
p(N) & = \int p(N \mid \mu) p(\mu) d\mu\\[5pt]
& \propto \int \frac{\mu^N e^{-\mu}}{N!} \mu^{-1} d\mu\\[5pt]
& = \frac{1}{N!}\int \mu^{N-1} e^{-\mu} d\mu\\[5pt]
& = \frac{(N-1)!}{N!}= \frac{1}{N}
\end{split}
\end{equation*}

Let's compute posterior distributions as they will be useful in the following point of the exercise. Let $y=(y_1, \dots, y_n)$, then:

\begin{equation}
\begin{split}
p(N, \theta \mid y) &  \propto p(y\mid N,\theta) p(N, \theta)\\[5pt]
& = \frac{1}{N} \prod^n_{i=1}{N \choose y_i}\theta^{y_i}(1-\theta)^{N-y_i}\\[5pt]
& = \frac{1}{N} \theta^{\sum_i y_i}(1-\theta)^{nN-\sum_i y_i} \prod^n_{i=1}{N \choose y_i}\\[5pt]
& = \frac{1}{N} \theta^{\sum_i y_i}(1-\theta)^{nN-\sum_i y_i} \prod^n_{i=1}\frac{N!}{(N-y_i)!(y_i)!}\\[5pt]
\end{split}
\end{equation}

(b) Using equation (1), let's compute the marginal posterior distribution for $N$:

\begin{equation}
\begin{split}
p(N\mid y) & \propto \int^1_0 p(N, \theta \mid y) d\theta\\[5pt]
& = \frac{1}{N} \prod^n_{i=1}{N \choose y_i} \int^1_0 \theta^{\sum_i y_i} (1-\theta)^{nN-\sum_i y_i} d\theta\\[5pt]
& = \frac{1}{N} \prod^n_{i=1}{N \choose y_i} \left(\frac{\Gamma(\sum_i y_i + 1) \Gamma(nN-\sum_i y_i + 1)}{\Gamma(nN +2)}\right)\\[5pt]
\end{split}
\end{equation}

Due to overflow issues, let's compute their logarithms:

\begin{equation*}
\begin{split}
\text{log}(p(N, \theta \mid y) & = -\text{log}(N) +  \left(\sum_i y_i\right)\text{log}(\theta) + \left(nN-\sum_i y_i\right)\text{log}(1-\theta) + \sum^n_{i=1}\left(\text{log}(N!) - \text{log}\left((N-y_i)!\right) - \text{log}(y_i!)\right)\\[5pt]
& = -\text{log}(N) +  \left(\sum_i y_i\right)\text{log}(\theta) + \left(nN-\sum_i y_i\right)\text{log}(1-\theta) + \sum^n_{i=1}\left(\sum^N_{x=1}\text{log}(x) - \sum^{N-y_i}_{x=1}\text{log}(x) -\sum^{y_i}_{x=1}\text{log}(x)\right)
\end{split}
\end{equation*}

and similarly:

\begin{equation*}
\begin{split}
\text{log}(p(N \mid y) & = -\text{log}(N) + \sum^n_{i=1}\left(\sum^N_{x=1}\text{log}(x) - \sum^{N-y_i}_{x=1}\text{log}(x) -\sum^{y_i}_{x=1}\text{log}(x)\right) + \left(\sum^{\sum y_i}_{x=1}\text{log}(x) +  \sum^{nN-\sum y_i}_{x=1}\text{log}(x) -\sum^{nN+1}_{x=1}\text{log}(x)\right)
\end{split}
\end{equation*}

```python
import numpy as np
import scipy.stats as stats
from scipy.special import gammaln
import plotly.graph_objects as go
from plotly.subplots import make_subplots

def log_factorial(a):
    return gammaln(a+1)

def log_comb(n, y):
    return log_factorial(n) - log_factorial(n-y) - log_factorial(y)

def log_posterior(N, t, y):
    
    s,n = y.sum(), y.size
    return -np.log(N) + s * np.log(t) + (n*N - s) * np.log(1-t) + log_comb(N[None, :, :], y[:, None, None]).sum(axis=0)

def log_marginal_N(N,y):
    
    s,n = y.sum(), y.size
    return -np.log(N) + log_comb(N[None, :], y[:, None]).sum(axis=0) - log_comb(n*N+1, s) - np.log(n*N+1)    


y = np.array([53, 57, 66, 67, 72])

N=np.arange(y.max(), 800, 1)
t=np.linspace(1, 99, 200)/100
N_, t_ = np.meshgrid(N,t)

posterior = np.exp(log_posterior(N_, t_, y))
posterior = posterior/posterior.sum()

marginal = np.exp(log_marginal_N(N, y))
marginal = marginal/marginal.sum()



contours={
    'start': posterior.min(),
    'end': posterior.max(),
    'size': (posterior.max() - posterior.min())/20,
}


fig = make_subplots(
        rows=1, cols=2,
        subplot_titles=['p(N, theta|y)', 'p(N|y)'])


fig.add_trace(
    go.Contour(
        x=N, y=t, z=posterior,
        colorscale='Hot',
        showscale=False,
        contours=contours
    ),
    row=1, col=1
)

fig.add_trace(
    go.Scatter(
        x=N, y=marginal,
    ),
    row=1, col=2
)

fig.update_layout(
    xaxis1={'title': 'N'},
    xaxis2={'title': 'N'},
    yaxis1={'title': 'theta'},
    yaxis2={'title': 'p(N|y)', 'side': 'right'}
)

fig
```
<img src="figures/fig3.5.png">

Let's use the simulated posterior to give a proxy for the posterior probability that $N > 100$:
```python
N=np.arange(y.max(), 10000, 1)
marginal = np.exp(log_marginal_N(N, y))
marginal = marginal/marginal.sum()
print(1-np.sum(marginal[N<100]))
0.9825474445497654
```

# Exercise 3.7: Poisson and binomial distributions

- Model 1: Let $b\mid \theta_b\sim \text{Poisson}(\theta_b)$ and  $v\mid \theta_v\sim \text{Poisson}(\theta_v)$. Denote the likelihood of this model by $p_1$.
- Model 2: Let $b\mid b+v, \theta\sim \text{Binomial}(b+v, \theta)$ with $\theta=\frac{\theta_b}{\theta_b+\theta_v}$. Denote the likelihood of this model by $p_2$.

\begin{equation*}
\begin{split}
p_1(b \mid b+v = n, \theta_b, \theta_v) & = \frac{p_1(b, b+v=n \mid \theta_b, \theta_v)}{p_1(b+v=n \mid \theta_b, \theta_v)}\\[10pt]
& = \frac{\frac{\theta^b_b}{b!}\text{exp}(-\theta_b) \cdot \frac{\theta^v_v}{v!}\text{exp}(-\theta_v)}{\frac{(\theta_b + \theta_v)^{b+v}}{(b+v)!}\text{exp}(-\theta_b-\theta_v)} \\[5pt]
& = \frac{\theta^b_b\theta^v_v}{(\theta_b + \theta_v)^{b+v}} \cdot {b+v \choose b}\\[5pt]
& = {b+v \choose b}  \cdot \left(\frac{\theta_b}{(\theta_b + \theta_v)}\right)^b \cdot \left(\frac{\theta_v}{(\theta_b + \theta_v)}\right)^v\\[5pt]
& = {b+v \choose b}  \cdot \theta^b (1-\theta)^v\\[5pt]
& = p_2(b\mid b+v=n, \theta)
\end{split}
\end{equation*}

where the denominator after the second equality follows from the fact that $\text{Poisson}(\theta_b) + \text{Poisson}(\theta_v) = \text{Poisson}(\theta_b + \theta_v)$  since they are independent (abusing notation for $X_b + X_v$).

# Exercise 3.8: Analysis of proportions


# Exercise 3.10: Comparison of normal variances

Given the assumptions, we know that $\sigma^2_j \mid y_j \sim \text{Scaled-Inv-}\chi^2(n_j-1, s^2_j)$, and by rescaling we get $\frac{\sigma^2_j}{s^2_j(n_j-1)} \big| \: y_j \sim \text{Inv-}\chi^2(n_j-1)$. Therefore its reciprocal is $\chi^2$-distributed:

\begin{equation*}
\frac{s^2_j(n_j-1)}{\sigma^2_j} \Big| \:y_j \sim \chi^2(n_j-1)
\end{equation*}
Since the two are independent, their ratio $\frac{\frac{s^2_1(n_1-1)}{\sigma^2_1}}{\frac{s^2_2(n_2-1)}{\sigma^2_2}} = \frac{n_1-1}{n_2-1} \frac{s^2_1}{s^2_2} \frac{\sigma^2_2}{\sigma^2_1}$ is $F$-distributed with parameters $n_1-1, n_2-1$.

_Useful links_:
- [Scaled_inverse_chi-squared_distribution](https://en.wikipedia.org/wiki/Scaled_inverse_chi-squared_distribution)
- [F-distribution Characterization](https://en.wikipedia.org/wiki/F-distribution#Characterization)