In [None]:
from datascience import *
from prob140 import *

import numpy as np
from scipy import stats

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
%matplotlib inline

# Worksheet 11 #

## 1. Gamma and Poisson Conjugacy
Let $X_1, X_2, \ldots, X_n$ be i.i.d. Poisson $(\theta)$, and let the prior distribution of $\theta$ be gamma $(r, \lambda)$.

**(a)** Find the posterior distribution of $\theta$ given $X_i = x_i$ for $1 \le i \le n$.

**(b)** Find the Bayes estimator of $\theta$ using squared error loss. Is it a weighted average of the prior mean and the MLE?

## 2. Times to Extinction

*The below is adapted with permission from Professor Mike Jordan's Statistics 260 class [Bayesian Modeling and Inference](https://people.eecs.berkeley.edu/~jordan/courses/260-spring10/lectures/index.html).*

Paleobotanists estimate the moment in the remote past when a given species became extinct by taking cylindrical, vertical core samples well below the earth's surface and looking for the last occurrence of the species in the fossil record, measured in meters above a point $P$ at which the species was known to have first emerged. Letting $\mathbf{Y} = (Y_1, \ldots, Y_n)^{\rm T}$ denote a sample of such distances above $P$ at a random set of locations, the model
    $$Y_i \stackrel{\rm \tiny iid}{\sim} \text{Uniform}(0, \theta), \qquad i = 1, \ldots, n$$
emerges from simple and plausible assumptions. In this model the unknown $\theta > 0$ can be used, through carbon dating, to estimate the species extinction time. This problem is about Bayesian inference for $\theta$, and it will be seen that some of our usual intuitions do not quite hold in this case.


**(a)** Show that the likelihood function for $\theta$ can be written as 
    $${\rm Lik}(\theta) = \theta^{-n} \mathbf{1}(\theta > \max(Y_1, Y_2, \ldots, Y_n))$$
where $\mathbf{1}(A)$ is $1$ if $A$ is true and $0$ otherwise.

**(b)** The Pareto distribution with shape parameter $\alpha > 0$ and scale parameter $\beta > 0$ has density
    $$p(\theta) ~ = ~ 
            \begin{cases}
                \alpha\beta^\alpha \theta^{-(\alpha+1)} & \text{if } \theta \ge \beta \\
                0 & \text{otherwise.}
            \end{cases}$$
We say that $\theta$ has the Pareto $(\alpha, \beta)$ distribution.
    
Show that the likelihood function for $\theta$ corresponds to a Pareto distribution and find its parameters. 

**(c)** Let the prior for $\theta$ be taken to be Pareto $(\alpha, \beta)$. Find the posterior density $p(\theta \mid \mathbf{Y})$. Is the Pareto conjugate to the uniform?

**(d)** In an experiment conducted in the Antarctic in the 1980s to study a particular species of fossil ammonite, the following was a linearly rescaled version of the data obtained, in ascending order: 
    $$\mathbf{Y} = (0.4, 1.0, 1.5, 1.7, 2.0, 2.1, 3.1, 3.7, 4.3, 4.9)^{\rm T}.$$
Prior information regarding the extinction time of the ammonite was equivalent to a Pareto $(2.5, 4)$ prior. 
    
Plot the prior, likelihood, and posterior distributions arising from this dataset on the same graph. Briefly discuss what this picture implies about the updating of information from prior to posterior in this case.

In [None]:
# Part (d)

...

*Your answer here.*

**(e)** Make a table summarizing the mean and standard deviation for the prior, likelihood, and posterior distributions, using the $(\alpha, \beta)$ choices and the data given in Part **(c)** above. 
    
In Bayesian updating the posterior mean is often a weighted average of the posterior mean and the likelihood mean (with positive weights), and the posterior standard deviation is typically smaller than either the prior or likelihood standard deviations. Are each of these behaviors true in this case? Explain briefly.

*You will need to derive the mean and variance of the Pareto $(\alpha, \beta)$ distribution. You may use without proof the result from [Data 140 Chapter 15, Exercise 7](http://prob140.org/textbook/content/Chapter_15/06_Exercises.html) that for $T \sim \text{Pareto}(\alpha, 1)$ with $\alpha > 2$,*
    $${\rm E}(T) = \frac{\alpha}{\alpha - 1} \qquad \textit{ and }  \qquad {\rm Var}(T) = \frac{\alpha}{(\alpha - 1)^2(\alpha - 2)}.$$