# P07: Bayesian inference

## Problem 1: Posterior mean of Gaussian random variable

Let us assume that we make $n$ measurements, $D={x_{1}, x_{2}, ..., x_{n}}$, of a Gaussian random variable with mean $\mu$ and variance $\sigma^{2}$ (e.g. weight of a person). Therefore the likelihood for each individual measurement is given by 
$$P(x_{i}\vert \mu, \sigma) = \frac{1}{\sqrt{2 \pi \sigma^{2}}}e^{-\frac{1}{2 \sigma^{2}}(x_{i}-\mu)^{2}}.$$ We are interested in the posterior distribution of the mean of this Gaussian random variable. 

Let us assume that we know the variance $\sigma^{2}$ e.g. because it is solely determined by measurement noise which is well known. 

(i) Compute the posterior distribution for the mean $\mu$ for a known variance $\sigma^{2}$ using Bayes' Theorem and assuming a Gaussian prior 
$$\pi(\mu) = \frac{1}{\sqrt{2 \pi \sigma_{0}^{2}}}e^{-\frac{1}{2 \sigma_{0}^{2}}(\mu-\mu_{0})^{2}},$$ 
on the mean.

(ii) Show and discuss what happens for $\sigma_0\to\infty$.

(iii) Show and discuss what happens for $n\to\infty$.

(iV) Discuss what happens for finite $n$ and $\sigma\gg \sigma_0$.

**Hint**: It will be useful to express the joint likelihood of the $n$ measurements in terms of the sample mean $\bar{x}$: 
$$\bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_{i},$$ 
and biased sample variance $s^{2}$ 
$$s^{2} = \frac{1}{n} \sum_{i=1}^{n} (x_{i}-\bar{x})^{2}.$$

## Problem 2: Posterior mean of Gaussian random variable with unknown variance

Let us repeat the experiment from problem 1, but this time assuming that we do not know the variance in the measurement a priori. Therefore we would like to estimate both the mean and the variance from the data. We assume a *reference prior* on both mean and variance. This results in a uniform prior on $\mu$ and a uniform prior on $\log{\sigma}$ which leads to the joint prior $$\pi(\mu, \sigma) = \frac{1}{\sigma}.$$ 
(i) Using Bayes' Theorem, determine the posterior of the mean by marginalizing the posterior $p(\mu, \sigma \vert D)$ over $\sigma$ i.e. $$p(\mu \vert D) = \int p(\mu, \sigma \vert D) d\sigma.$$
(ii) Do you recognize the distribution? What is the difference with our earlier discussion of this distribution?

**Note**: A reference prior is a prior with which the contribution of the data to the posterior is maximized. This leads to different priors for location and scale parameters (denoted $\theta$) of a pdf, which we can understand intuitively:
* location parameter (measures the location of the pdf, e.g. mean): if we are ignorant about where to center the pdf, we apply a uniform prior on the real axis, i.e. $\pi(\theta) \propto 1$.
* scale parameter (measures the dispersion of the pdf, e.g. variance): if we are ignorant about the dispersion of the pdf, we apply a prior that equally treats each order of magnitude i.e. is uniform in $\log{\theta}$; this is equivalent to $\pi(\theta) \propto \frac{1}{\theta}$.

## Problem 3: The effect of the prior

![PetitNicolas.png](attachment:PetitNicolas.png)

Bob and Tina are both trying to measure the height of *petit Nicolas*. They do this by repeatedly measuring the angle subtended when they are standing at a large distance $L=3$ m from him as shown in the figure above. Uisng the small-angle approximation, the inferred height is thus given by $h_{\mathrm{PN}}=L\theta$.

Let us assume that Bob and Tina conduct their measurements together but then perform the analysis separately. The likelihood of a measurement of $\theta$ given the model $h_{\mathrm{PN}}=L\theta$ is $$p(\theta \vert h_{\mathrm{PN}}) = \frac{1}{\sqrt{2\pi\sigma_{\theta}^{2}}}e^{-\frac{1}{2\sigma_{\theta}^{2}}(\theta-\frac{h_{\mathrm{PN}}}{L})^{2}},$$ where the measurement uncertainty is given by $\sigma_\theta=0.008$ rad.

Let us assume that Bob knows that *petit Nicolas* is the smallest boy in his class, therefore he assumes a flat, top-hat prior in his height $h_{\mathrm{PN}}$ i.e. 
$$\pi(h_{\mathrm{PN}}) = \begin{cases} 
                   \frac{1}{40},& 80 \leq h_{\mathrm{PN}} \leq 120 \\
                    0,& \mathrm{otherwise}. 
                    \end{cases}$$
Tina on the other hand is unaware of this fact, so she adopts a prior set to the average height of eight year old boys, which is approximately $\bar{h} = 125$ cm with a standard deviation of $\sigma_{h}=5$ cm. Therefore she adopts a Gaussian prior given by 
$$\pi(h_{\mathrm{PN}}) = \frac{1}{\sqrt{2 \pi \sigma_{h}^{2}}}e^{-\frac{1}{2 \sigma_{h}^{2}}(h_{\mathrm{PN}}-\bar{h})^{2}}.$$


Bob and Tina perform 1000 joint measurements of the angle subtended by *petit Nicolas*, which are given in `height_PtitNic.npy`. 


(i) Use these measurements to compute the posterior on the height of *petit Nicolas*, $p(h_{\mathrm{PN}}\vert D)$, after 1, 2 and 3 experiments; i.e use the experimental data measured by both Bob and Tina to update their respective priors.

(ii) Compare the posteriors after all 1000 measurements given in `height_PtitNic.npy` have been conducted.

(iii) Compare the resulting posteriors in all cases and, using your results, discuss the effect of the prior on the derived posterior.