
# 3주차

## Posterior Predictive Distribution

* After the data $y$ have been observed, we can predict an unknown observable $\tilde y$
* The posterior predictive distribution of a future observation, $\tilde y$ is:
  $$ \begin{aligned} p(\tilde y \vert y) &= \int p(\tilde y, \theta \vert y) d\theta \\ &= \int p(\tilde y \vert \theta, y) p(\theta \vert y ) d \theta \\ &= \int p(\tilde \vert \theta) p(\theta \vert y) d \theta \end{aligned}$$
* Assumed $y$ and $\tilde y$ are conditional independent given $\theta$

* prior predictive distribution before $y$ observed

### Example 1 : Binomial model
   $$ y_i \overset{iid}{\sim} Bern(\theta) $$
   $$ Y \sim Bin(n, \theta), 0 \le \theta \le 1$$
   $$ \begin{aligned} \theta &\sim Unif(0, 1) \\ &\Rightarrow \theta \vert y \sim Beta(y + 1, n - y + 1) \end{aligned}$$
 * Posterior predictive distribution for $\tilde y = 1$
   $$ P(\tilde y  =1 \vert y) = \frac{y + 1}{n + 2}$$
   which is known as `Laplace's law of succession`.
 
 $$ \begin{aligned} p(\tilde y = 1 \vert y) = \int_0^1 p(\tilde y = 1 \vert \theta) p(\theta \vert y) d \theta = \int_0^1 \theta p(\theta \vert y) d \theta = E[\theta \vert y] = \frac{y + 1}{n + 2} \end{aligned}$$
 $$ y = 0 \text{ (all failures) } \Rightarrow p(\tilde y = 1 \vert y) = \frac{1}{n + 2}$$
 $$ y = 1 \text{ (all successes) } \Rightarrow p(\tilde y = 1 \vert y) = \frac{n + 1}{n + 2}$$

 * prior predictive distribution
   $$p(\tilde y = 1) = \int_0^1 p(\tilde y = 1 \vert \theta) p(\theta) d \theta = \int_0^1 \theta d \theta = \frac{1}{2}$$

### Example 2 : Poisson Model
 * Data model: $y_i \overset{iid}{\sim} Poisson(\theta), i= 1, ..., n$
 * Prior distribution : $\theta \sim Gamma(\alpha, \beta)$
 * Posterior distribution of $\theta$ given $y$
 $$ L(\theta) = \prod_{i=1}^n \frac{1}{y_i !} \theta^{y_i} e ^{- \theta} = (\prod_{i=1}^n \frac{1}{y_i !} \theta^{\sum y_i} e ^{-n \theta}$$
 * MLE for $\theta$:
   $$ log L(\theta) = log(\prod \frac{1}{y_i!} + \sum y_i log \theta - n \theta$$
   $$ \begin{aligned} \frac{\partial log L}{\partial \theta} = \frac{\sum y_i}{\theta} - n = 0 \\ \Rightarrow \hat \theta_{ML} = \frac{1}{n} \sum_{i=1}^n y_i = \bar y \end{aligned}$$
   
 * Posterior Distribution
   $$\begin{aligned} p(\theta \vert y) &\propto p(y \vert \theta) p(\theta) 
   \\ &= [\prod_{i=1}^n \frac{1}{y_i !} \theta^{y_i} e^{- \theta}] \frac{1}{\Gamma(\alpha) \beta^{\alpha}} \theta^{\alpha - 1} e ^{- \frac{\theta}{\beta}} 
   \\ &\propto \theta^{\sum y_i e ^{- n\theta}} e ^{\alpha - 1} e^{- \frac{\theta}{\beta}} 
   \\ &= e^{\sum y_i + \alpha - 1} e ^{-(n + \frac{1}{\beta}) \theta} \end{aligned}$$
   $$ \begin{aligned}\int_0^{\infty} c \theta^{\sum y_i + \alpha - 1} e^{-(n + \frac{1}{\beta})\theta} d \theta = 1 
   \\ \Rightarrow \theta \vert y \sim Gamma(\sum y_i + \alpha, [n + \frac{1}{\beta}]^{-1}) 
   \\ \Rightarrow p(\theta \vert y) = \frac{(n + \frac{1}{\beta})^{\sum y_i + \alpha}}{\Gamma(\sum y_i + \alpha)} e^{\sum y_i + \alpha - 1} e ^{-(n + \frac{1}{\beta}) \theta} \end{aligned}$$
   $$ \begin{aligned}E[\theta \vert y] &= \frac{\sum y_i + \alpha}{n + \frac{1}{\beta}} = \hat \theta_{Bayes} 
   \\ &= \frac{n}{n+\frac{1}{\beta}} (\frac{\sum y_i}{n}) + \frac{\frac{1}{\beta}}{n + \frac{1}{\beta}} (\alpha \beta) 
   \\ &\Rightarrow \text{ Weighted average of sample mean and prior mean} \end{aligned}$$
     * $n \uparrow \Rightarrow E[\theta \vert y] \rightarrow \hat \theta_{ML}$
     * $n \downarrow \Rightarrow E[\theta \vert y] \rightarrow \alpha \beta \text{ (prior mean) }$

## Posterior Predictive Distribution of Poisson Model
 * Posterior predictive distribution, $p(\tilde y \vert y)$:
   $$ \begin{aligned}p(\tilde y \vert y_1, ..., y_n) &= \int_0^{\infty} p(\tilde y \vert \theta) p(\theta \vert y_1, ..., y_n) d \theta 
   \\ &= \int_0^{\infty} \frac{1}{\tilde y !} e^{- \theta} \theta^{\tilde y} \frac{(n + \frac{1}{\beta})^{\sum y_i + \alpha}}{\Gamma(\sum y_i + \alpha)} \theta^{\sum y_i + \alpha - 1} e ^{-(n + \frac{1}{\beta}) \theta} d \theta 
   \\ &= [\frac{1}{\tilde y!} \frac{(n + \frac{1}{\beta})^{\sum y_i + \alpha}}{\Gamma(\sum y_i + \alpha)}] \int_0^{\infty} \theta^{\tilde y + \sum y_i + \alpha - 1} e^{-(n + \frac{1}{\beta} + 1) \theta} d\theta 
   \\ &= [\frac{1}{\tilde y !} \frac{(n + \frac{1}{\beta})^{\sum y_i + \alpha}}{\Gamma(\sum y_i + \alpha)}] [\frac{\Gamma(\tilde y + \sum y_i + \alpha)}{(n + \frac{1}{\beta} + 1)^{\tilde y + \sum y_i + \alpha}}] 
   \\ &= \frac{\Gamma(\tilde y + \sum y_i + \alpha)}{\Gamma(\sum y_i + \alpha) \tilde y!} (\frac{n + \frac{1}{\beta}}{n + \frac{1}{\beta} + 1})^{\sum y_i + \alpha} (\frac{1}{n + \frac{1}{\beta} + 1})^{\tilde y} 
   \\ \Rightarrow \theta \vert y \sim NegBin(\sum y_i + \alpha, \frac{n + \frac{1}{\beta}}{n + \frac{1}{\beta} + 1})
   \\ \tilde y = \text{ the number failures until r th successes}
   \end{aligned}$$

## Normal Model with a Single Observation
 * Normal model with unknown mena $\theta$ and known variance $\sigma^2$
   $$ y \sim N(\theta, \sigma^2)$$
 * Prior distribution : $\theta \sim N(\mu : \tau^2)$
 * Posterior distribution of $\theta$ given $y$
   $$\begin{aligned}p(\theta \vert y) &\propto p(y \vert \theta) p(\theta) 
   \\ &= [\frac{1}{\sqrt{2 \pi} \sigma} e ^{- \frac{1}{2 \sigma^2}(y - \theta)^2}] [ \frac{1}{\sqrt{2 \pi} \tau} e ^{- \frac{1}{2 \tau^2} (\theta - \mu)^2}]
   \\ &\propto exp[-\frac{1}{2 \sigma^2}(y ^2 - 2 y \theta + \theta^2) - \frac{1}{2 \tau^2} (\theta^2 - 2 \theta \mu + \mu^2)]
   \\ &\propto exp[- \frac{1}{2 \sigma^2}(\theta^2 - 2 y \theta) - \frac{1}{2 \tau^2}(\theta - 2 \mu \theta)]
   \\ &= exp[ - \frac{1}{2} (\frac{1}{\sigma^2} + \frac{1}{\tau^2}) \theta^2 + 2 \frac{1}{2}  (\frac{1}{\sigma^2} + \frac{\mu}{\tau^2}) \theta]
   \\ &= exp[ - \frac{1}{2} (\frac{1}{\sigma^2} + \frac{1}{\tau^2}) (\theta^2 - 2 \frac{\frac{y}{\sigma^2} + \frac{\mu}{\tau^2}}{\frac{1}{\sigma^2} + \frac{1}{\tau^2}} \theta)]
   \\ &\propto exp[- \frac{1}{2} (\frac{1}{\sigma^2} + \frac{1}{\tau^2})(\theta - \frac{\frac{y}{\sigma^2} + \frac{\mu}{\tau^2}}{\frac{1}{\sigma^2} + \frac{1}{\tau^2}})^2]
   \\ \theta \vert y \sim N( \frac{\frac{y}{\sigma^2} + \frac{\mu}{\tau^2}}{\frac{1}{\sigma^2} + \frac{1}{\tau^2}}, [\frac{1}{\sigma^2} + \frac{1}{\tau^2}]^{-1})
   \end{aligned}$$
   $$ E[\theta \vert y] = \frac{\frac{1}{\sigma^2}}{\frac{1}{\sigma^2} + \frac{1}{\tau^2}} y + \frac{\frac{1}{\tau^2}}{\frac{1}{\sigma^2} + \frac{1}{\tau^2}} \mu $$
   * $\tau^2$ means prior variance
   * If $\tau^2 \uparrow \Rightarrow \text{ Little information } \Rightarrow E[\theta \vert y] \rightarrow y \text{ (sample mean) }$
   * If $\tau^2 \downarrow \Rightarrow \text{ Much information } \Rightarrow E[\theta \vert y] \rightarrow \mu$
   
* posterior variance = $[\frac{1}{\sigma^2} + \frac{1}{\tau^2}]^{-1}$
* $\text{precision} = \frac{1}{\text{variance}}$ (reciprocal of variance)
* $\frac{1}{\sigma^2}$ : precision of data model
* $\frac{1}{\tau ^2}$ : precision of prior
* posterior precision = prior precision + data precision

## Normal Model with Multiple Observations
 * Normal model with unknown mean $\theta$ and known variance $\sigma^2$
   $$y_i \overset{iid}{\sim} N(\theta, \sigma^2), i =1,...,n$$
 * Prior distribution : $\theta \sim N(\mu, \tau^2)$
 * Posterior distribution of $\theta$ given $y_1, ..., y_n$
   $$\begin{aligned}p(\theta \vert y) \propto p(y \vert \theta) p(\theta)
   \\ &\propto [\prod_{i=1}^n e ^{- \frac{1}{2 \sigma^2} (y_i - \theta)^2}] e^{- \frac{1}{2 \tau^2} (\theta - \mu)^2 } 
   \\ &= e^{- \frac{1}{2 \sigma^2} \sum_{i=1}^n (y_i - \theta)^2} e ^{- \frac{1}{2 \tau^2} (\theta - \mu)^2}
   \\ &\propto exp[-\frac{1}{2} (\frac{n}{\sigma^2} + \frac{1}{\tau^2}) (\theta - \frac{\frac{\sum y_i}{\sigma} + \frac{\mu}{\tau^2}}{\frac{n}{\sigma^2} + \frac{1}{\tau^2}})^2]
   \\ \Rightarrow \theta \vert y \sim N (\frac{\frac{\sum y_i}{\sigma^2} + \frac{\mu}{\tau^2}}{\frac{n}{\sigma^2} + \frac{1}{\tau^2}}, [\frac{n}{\sigma^2} + \frac{1}{\tau^2}]^{-1})
   \end{aligned}$$
   
 * Posterior mean
   $$E[\theta \vert y] = \frac{\frac{n}{\sigma^2}}{\frac{n}{\sigma^2} + \frac{1}{\tau^2}} (\frac{\sum y_i}{n}) + \frac{\frac{1}{\tau^2}}{\frac{n}{\sigma^2} + \frac{1}{\tau^2}}(\mu)$$
   $$ n \uparrow \Rightarrow E[\theta \vert] y \rightarrow \hat \theta_{ML} = \bar y$$
   $$ \tau^2 \uparrow \text{(no information)} \Rightarrow E[\theta \vert y] \rightarrow \bar y$$
   $$ \tau^2 \downarrow \text{(much information)} \Rightarrow E[\theta \vert y] \rightarrow \mu$$
   * Posterior precision = sample precision + prior precision