# Chapter 7: Advantage examples of Bayesian stats. 

Suggested topics: 
    - Bayesian Billiards game 
    - Rejecting outliers with MCMC 
Idea: create simulation of BB

# Bayesian Billiards
A famous thought experiment comparing Bayesian and frequentist estimators is the 'Bayesian billiards'.

Suppose we have a rectangular billiard table with length $L$. We throw one black ball on the table, which will bounce around until it stops, at a random $x$-coordinate of the table, and call this $z$, $0\leq z\leq L$. For the purposes of this thought experiment, suppose we don't know $z$, and want to estimate it. We could do this by throwing $n$ white balls onto the table, and seeing whether the end up to the left, or to the right of the black ball. By counting how many are to each side, we can reasonably 'guess' the value of $z$. To make the notation easier, suppose $L=1$. 

Note that one white ball is to the left of the black one with probability $z$, and to the right with probability $1-z$. The random variable 
$$
X_i=
\begin{cases}
1 &\text{ if ball $i$ is to the left}\\
0 &\text{if ball $i$ is to the right}
\end{cases}
$$
is then a Bernoulli random variable with parameter $z$. 
The probability density is $f_{z}(X_i)=(1-z)^{1-X_i}+z^{X_i}$. 
The observations $X_1,\ldots,X_n$ are an i.i.d. sample, and the log-likelihood is
$$\ell(z)
=
\log(1-z)(n-\sum_{i=1}^n X_i)
+\log(z)\sum_{i=1}^nX_i .
$$
Setting the derivative of the log-likelihood equal to 0 gives
$$
\frac{\text{d}\ell(z)}{\text{d}z}
=
-\frac{n-\sum_{i=1}^n X_i}{1-z}(1-z)
+
\frac{\sum_{i=1}^n X_i}{z}
=
0,
$$
which gives us the following the frequentist MLE:
$$
\hat z_{\text{MLE}}
=
\frac{\sum_{i=1}^n X_i}{n}=\overline{X}.
$$
So if exactly half of the $n$ white balls lie to the right, the estimate is 
$\hat z_{\text{MLE}}=1/2$,
which is to be expected. 
But if all of the balls lie to the right, the estimate becomes $\hat z_{\text{MLE}}=0$, which means that the black ball would lie on the left edge.
Intuitively, however, we expect the ball to lie a little to the side, depending on $n$. 

Now consider the Bayesian framework. 
The prior of $Z$ is the uniform distribution between 0 and 1, which just has density $\pi(x)=1$ for all $0\leq x\leq1$.
(The rest I'll finish later.)

## Rejecting Outliers with Markov Chain Monte Carlo simulations

<b> I will check for spelling/correct wording later :)<b>

When doing measurements of a certain physical relation, you would not expect that all points are within one standard deiviation of the theoretical relation. If you assume a big enough sample, we can assume a normal distribution (CLT) thus then the chance of getting a datapoint outside of the 1-$\sigma$ region is about $32$%, so you would expect about a third of the measurements to be (at least) 1-$\sigma$ away from the exact relation. However, as we go further and further away from the relation, we expect less and less datapoints there. One of these outliers, a datapoint very far away from the relation, can substantially impact the fit, because it can substatnially change the mean, for example. Hence, there is need to reduce the sensitivty of these outliers. We will do this by removing the outliers, since there is always a change on outliers. Outliers can have multiple causes, for example unmodeled experimental uncertainty or (rare) noice sources, for which aren't always able to account for. One process of removing outliers is by hand. However, as one could image, this is far from ideal, as it can be very subjective and hard to reproduce. 

Let us make a more systematic way of rejecting outliers. For simpliicity let us look at a straight line. Let $X_1, \dots, X_n$ be an iid sample, and let $a = (q_1, \dots, q_n)$ be a set op $n$ binary integers, where $q_i$ is $1$ if the $i$th datapoint is "good" and $q_i$ is $0$ if the $i$th datapoint is "bad"; an outlier. Furthermore, let $P_b$ be the prior probability that a datapoint is bad and let $(Y_b, V_b)$ be the mean and variance of the distribution of bad points. Note that these extra parameters will be latere marginalized out, we do not need to worry about the fact we have more datapoints than our "actual" datapoints. Let $f_g$ be the generative model for the "good" datapoints and $f_b$ be the generative model for the "bad" datapoints. Then, we can calculate the likelihood: 

--- let $m, b$ be the parameters of the line, I other info

$$
\mathcal{L} = \prod_{i = 1}^n  \left[f_g(X_1, \dots X_n | m, b, I)\right]^{q_i} \left[f_g(X_1, \dots X_n | Y_b, V_b, I)\right]^{1-q_i} 
$$

(since for good datapoints $x * 1$ and for bad $1 * x$ )


$$
\mathcal{L} = \prod_{i = 1}^n  \left[ \frac{1}{\sqrt{2\pi \left[V_b + \sigma^2_{X_i}\right]}} \exp \left(-\frac{[X_i - my_i - b]^2}{2\sigma_{X_i}^2} \right)\right]^{q_i}  \cdot  \left[ \frac{1}{\sqrt{2\pi \left[V_b + \sigma^2_{X_i}\right]}} \exp\left(-\frac{[X_i - Y_b]^2}{2[V_b + \sigma_{X_i}^2]}\right)\right]^{1-q_i}
$$

($y_i$)

* p. 11 Hoggs et al: "the marginalization will require that we have a measure on our parameters (integrals require measures) and that measure is provided by a prior." -> marginalization requires Bayesian stats



### Sigma Clipping





used sources:

Data analysis recipes: Fitting a model to data, Hogg et al, 2011 (from BS), chapter 3


possible sources: 

https://www.astroml.org/book_figures/chapter8/fig_outlier_rejection.html 

https://www.stat.cmu.edu/technometrics/59-69/VOL-02-02/v0202123.pdf

https://d-nb.info/1221556185/34

