# Lesson 2: Bayseian and frequentist schools of thought
## 2.1: Bayes' Theorem

Consider two events, $A$ and $B$.

The joint probability of both $A$ and $B$ occuring, $P(A\cap B)$, can be written as:

\begin{align}
P(A\cap B) = P(A|B) \times P(B),
\end{align}

where $P(A|B)$ represents the probability of $A$ occuring given that $B$ has occurred. But it can <i>also</i> be written in the reverse sense:

\begin{align}
P(A\cap B) = P(B|A) \times P(A).
\end{align}

Thus, these two representations are equal and we can write:

\begin{align}
P(A|B) = \frac{P(B|A)}{P(B)}P(A).
\end{align}

$P(A)$ and $P(B)$ are *unconditional* probabilities. In the context of Bayes' theorem, they are our *prior probabilities* (also called Bayesian priors or simply priors). 

$P(B|A)/P(B)$ is the *support* $B$ has for $A$

$P(A|B)$ is the *posterior*: the degree of belief after accounting for $B$.  

Let's do a very simple example with a 6-sided die.  

What is the probability of rolling a 4, given we rolled between 1 and 4, $P(4| 1-4)$?

In [None]:
.....

*example 1.5 (page 19) from Data Analysis in High Energy Physics*

Now let's extend to a more complicated situation.

In some city, 15% of taxis are yellow ($Y$) and 85% are green ($G$). A taxi was in an accident, and the eyewitness says that it was yellow. The police have established that eyewitness statements get the colour correct 80% of the time ($c$), and are wrong 20% of the time ($w$). 

What is the probability that the taxi really was yellow - $P(Y | \mbox{witness says yellow})$?

Let's write this out using Bayes' theorem. 

\begin{align}
P(Y | \mbox{witness says yellow}) = \frac{P( \mbox{witness says yellow}|Y) \times P(Y)}{P(\mbox{witness says yellow})}
\end{align}

But we don't really know what the denominator $P(\mbox{witness says yellow})$ really is, in so many words. However, we can expand it using the *law of total probability*:

\begin{align}
P(\mbox{witness says yellow}) &= P(\mbox{witness says yellow} | Y )P(Y) + P(\mbox{witness says yellow} | G)P(G)\\
&= P(c)P(Y) + P(w)P(G).
\end{align}

Then, the full equation becomes:


\begin{align}
P(Y | \mbox{witness says yellow}) &= \frac{P( \mbox{witness says yellow} | Y) \times P(Y)}{P(c)P(Y) + P(w)P(G)}\\
&= \frac{P(c)P(Y)}{P(c)P(Y) + P(w)P(G)}\\
\end{align}

In [2]:
#plug in the numbers...

## 2.2: Applying Bayes' theorem to the case study from Lesson 1

Recall that we calculated some likelihoods of our observation occuring under various scenarios ($L(N_{obs} | H_{s+b}),  L(N_{obs} | H_{b})$. However, we really wanted to know $P(\mbox{signal+bkg hypothesis} | \mbox{observation}) = P(H_{s+b} | N_{obs} = 9)$. 

Now that we have Bayes' theorem, we can figure this out:

\begin{align}
P(H_{s+b} | N_{obs}) = \frac{L(N_{obs} | H_{s+b}) \times P(H_{s+b})}{P(N_{obs})}.
\end{align}

It's not possible in our scenario to explicity know $P(N_{obs})$ - the unconditional probability of observing e.g. 9 events, but we can expand this with the law of total probability:

\begin{align}
P(N_{obs}) = \sum_{i} L(N_{obs} | H_{i}) \times P(H_{i}),
\end{align}

where the sum runs over *all* possible hypotheses. There are, of course, an infinite number of possible hypotheses - but we have prior belief that the probability of most of them occurring is vanishingly small. The only two hypotheses that we believe have any real possibility, under our test, are our background-only hypothesis and our signal+background hypothesis. Thus,

\begin{align}
P(H_{s+b} | N_{obs}) = \frac{L(N_{obs} | H_{s+b}) \times P(H_{s+b})}{L(N_{obs}|H_{b})P(H_{b}) + L(N_{obs}|H_{s+b})P(H_{s+b})  }.
\end{align}

$P(H_{b})$ and $P(H_{s+b})$ are our *priors*: what is our prior belief in $H_{b}$ and $H_{s+b}$?

In this context it is intructive to compare the formulation of evidence for discovery of a new particle in both frameworks. In the Bayesian framework evidence for a hypothesis is case as an odds ratio. The ratio of probabilities prior to the experiment defines the *prior odds ratio*:

\begin{align}
O_{\mbox{prior}}=\frac{P(H_{s+b})}{P(H_{b})}=\frac{P(H_{s+b})}{1−P(H_{s+b})}.
\end{align}

The posterior odds ratio is defined as the ratio of posterior probabilities:

\begin{align}
O_{\mbox{posterior}}&=\frac{P(H_{s+b}|N_{obs})}{P(H_{b} | N_{obs})}\\
 &= \frac{L(N_{obs} | H_{s+b}) P(H_{s+b})}{L(N_{obs}(H_{b})P(H_{b})}\\
 &=L(N_{obs}|H_{s+b})L(N_{obs} | H_{b})⋅O_{\mbox{prior}}.
\end{align}


The posterior odds ratio can be factorized as the prior odds ratio multiplied with the so-called *Bayes factor*, which contains the experimental information. 

For example, for equal prior odds (both prior probabilities = 0.5) and an observation $L(N_{obs}|H_{b})=0.036$ and $L(N_{obs}|H_{s+b})=0.125$, the posterior odds ratio becomes 3.5:1 in favor of the s+b hypothesis.

## 2.3: The frequentist school of thought

*Inference*: the process of getting from the data to the theory, e.g. our probability $P(H_{s+b} | N_{obs})$. In 2.2, we used Bayesian inference to calculate $P(H_{s+b} | N_{obs})$.

Frequentist statements deal with *frequencies* of something occurring. In the frequentist framework no probabilities can be assigned to theories (i.e. no priors!) as there is no concept of repetition for hypotheses. A frequentist would say $P(H_{b})$ is either 0 or 1. It either is or it is not - it's not something we can test 1000 times and then present our estimated frequency of it occurring in the future.

Framed another way, we can say that frequentist statements are restricted to probabilities on data, while in the Bayesian framework probabilities are assigned to constants of nature.

Consider the current top quark mass measurement, $173.2 \pm 0.9$ GeV. A Bayesian interpretation of this is that there is a 68% chance that the top mass lies within $[172.3, 174.1]$. But the frequentist interpretation says the top mass is just some value $m_{t}$, which is either in the $1\sigma = 68\%$ window or not. What the frequentist can say is that $172.3 < m_{t} < 174.1$ GeV has a 68% chance of being true, or we say $172.3 < m_{t} < 174.1$ GeV with 68% confidence ($172.3 < m_{t} < 174.1$ GeV is the $68\%$ confidence interval).

Generally, in particle physics, present results as *frequentist* statements, thus avoiding the discussion of appropriate priors.

discuss: https://xkcd.com/1132/