# Probability concept review

Please read over this page before the first session! If you have any questions about the concepts presented here, please ask :)

## Contents
1. [Probability laws review](#prob_laws)
2. [Introduction to Bayes Theorem and terminology](#intro_bayes)
3. [Resources for workshop](#resources)

## 1. Probability laws review <a id='prob_laws'></a>

Here are some introductory concepts and definitions that will be helpful to remember. If you have taken a probability or statistics class, you will probably already seen these. So this is just a friendly reminder.

1. Since different places use different symbols and terminology, here's a review to ensure we're on the same page:
    - A **discrete random variable** is a variable with a finite number of possible outcomes, such as: the number of heads in 10 coin flips, number of people registered for a class, or the number of planets in an exoplanet system etc.
    - A **continuous random variable** is a variable that can take a continuous range of values, such as: the height of a random person, the mass of a Galaxy, or the semi-major axis of a planet's orbbit, etc.
    - 


1.  For a case where there are $N$ possible outcomes (let's call them $A$, $B$, etc.), we can say the following.
    - $P(A)$ means the probability of outcome $A$ happening. This value is always between 0 and 1 (in the *discrete* case)
    - The probability of all $N$ outcomes must sum up to 1. So if only outcomes $A$ and $B$ are possible, then by definition, $$\mathrm{prob}(B) = 1 - \mathrm{prob}(A).$$
    - Similarly, if you are only considering whether or not outcome $A$ happened, then the two possibilities are "$A$ happened" and "$A$ didn't happen", which we can write as "$A$" and "not $A$", and by definition, 
    $$\mathrm{prob}(A) + \mathrm{prob}(\mathrm{not}\ A) = 1.$$

2. A **continuous random variable** is a variable 

3. With two events, we might be interested in whether one or the other has happened. In general, the probability that $A$ **or** $B$ happened is
    $$\mathrm{prob}(A\ \mathrm{or}\ B) = \mathrm{prob}(A) + \mathrm{prob}(B) - \mathrm{prob}(A\ \mathrm{and}\ B).$$
    Here, "or" usually means the "inclusive or". We have to subtract the third term because we don't want to double-count cases where both $A$ and $B$ happen. In special cases where $A$ and $B$ are "disjoint" (i.e. they can't both be true, such as $A$ being a coin flip resulting in heads and $B$ being a coin flip resulting in tails), you can drop the third term since $\mathrm{prob}(A\ \mathrm{and}\ B) = 0.$


5. $\mathrm{prob}(A|B)$ means the probability of outcome $A$ happening, given that outcome $B$ already happened. If $A$ and $B$ are independent of each other, then $\mathrm{prob}(A|B) = \mathrm{prob}(A)$ since independence means that the likelihood of A is the same no matter if B happened or not.


6. With two events, we might be interested in whether both happens. In general, the probability that $A$ **and** $B$ happened is
    $$\mathrm{prob}(A\ \mathrm{and}\ B) = \mathrm{prob}(A) * \mathrm{prob}(B|A).$$
    - This is the probability of A happening multiplied by the probability of $B$ happening given that $A$ already happened. Note that there's no special order to $A$ and $B$, therefore, you can also say
    $$\mathrm{prob}(A\ \mathrm{and}\ B) = \mathrm{prob}(B) * \mathrm{prob}(A|B).$$
    - This leads to the useful equality/identity
    $$\mathrm{prob}(A) * \mathrm{prob}(B|A) = \mathrm{prob}(B) * \mathrm{prob}(A|B).$$
    - And, in the special case where $A$ and $B$ are independent of each other, then
    $\mathrm{prob}(A\ \mathrm{and}\ B) = prob(A) * prob(B)$$

## 2. Introduction to Bayes Theorem and terminology <a id='intro_bayes'></a>

We can now introduce Bayes' theorem and the basis for our Bayesian inference framework.

Bayes Theorem can be stated as an equation

$$\mathrm{prob}(M|D) = \frac{\mathrm{prob}(D|M) * \mathrm{prob}(M)}{\mathrm{prob}(D)}.$$

Here, I am using $M$ to mean "the model" and $D$ to mean "the data". When we say "the model" we include all the parameters that describe the model. When we say "the data" we include all of the measurements and uncertainties/errorbars with those measurements. If we put this in words, we're saying:

- $\mathrm{prob}(M | D)$  is $\mathrm{prob}($our model is right, given the data we measured$)$,
- $\mathrm{prob}(D | M)$ is $\mathrm{prob}($getting measured data, given the model$)$,
- $\mathrm{prob}(M)$ is $\mathrm{prob}($the model is right [without consideration to any data]$)$, and
- $\mathrm{prob}(D)$ is $\mathrm{prob}($of the data$)$.

In Bayesian-speak, these terms are often called:

- $\mathrm{prob} (M | D)$ is the **posterior probability** on $M$
- $\mathrm{prob} (D | M)$ is the **likelihood**
- $\mathrm{prob} (M)$ is the **prior probability** on $M$
- $\mathrm{prob} (D)$ is the **evidence** 
- Warning: many different fields / books / websites use different names for these same things and some even switch them!

So if you look at the definitions and the equations again,there is a flow. We start with some initial prior probability on $M$, $\mathrm{prob}(M)$. Bayes Theorem is an equation to get a posterior probability on $M$, $\mathrm{prob}(M|D)$ if we have some data, and we do that using the likelihood $\mathrm{prob}(D|M)$ term. 

In normal practice, we choose a prior probability distribution for $M$, and we know how to calculate $\mathrm{prob}(D|M)$ (e.g. through least squares), so we use Bayes Theorem to compute $\mathrm{prob}(M|D)$. Then, we can repeat this cycle when we have more data: The answer from last time is the prior probability and the new data lets us "update" our posterior.

This is the whole point of Bayesian inference and this concept of "updating our probabilities" is one of the key concepts of the "philosophy" of Bayesian inference (as compared to other models for computing probabilities and statistics, such as frequentist stats).

As humans, we naturally apply these Bayesian concepts. One example: growing up in Richmond, I knew that cloudy skies in the morning means a high chance of rain that day so I would bring an umbrella. When I moved to Pasadena, I started preparing for rain whenever the morning started cloudy (i.e. my "model"). However, I soon experienced that it almost never rains in Pasadena and every cloudy morning became bright sun in the afternoon (i.e. "new data"). So, using this new data, I updated my model and stopped preparing for rain whenever it was cloudy in the morning! 

If you're paying attention, we have ignored the "evidence" term completely. For this workshop, we can ignore it because the evidence term is $\mathrm{prob}(D)$ which does not depend on the model parameters at all. We can treat it like a normalizing constant in Bayes Theorem for the purposes of this workshop. That is, we will only see how the posterior probability varies when the model parameters change, so we can "cancel out" this term, or basically just compute $\mathrm{prob}(M|D) \propto \mathrm{prob}(D|M) * \mathrm{prob}(M).$ We will need this evidence term when doing something more advanced where we need the absolute value of the posterior probability, such as comparing between different types of models. But that is outside the scope of this workshop.

## 3. Resources for workshop <a id='resources'></a>

No need to read these before the workshop, just wanted to provide references in case you want to review materials at a later date.

### "Data Analysis: A Bayesian Tutorial, 2nd edition" by D. S. Sivia 

I really like how this book presents Bayesian inference ideas and also the equations/math is accessible (i.e. it's written with an audience like us in mind, rather than a series of proofs!)

It turns out you can get this book online, for free, from the UVic Library: http://voyager.library.uvic.ca/vwebv/holdingsInfo?searchId=3384&recCount=25&recPointer=0&bibId=1761051

In this workshop, we will be using concepts from the first two chapters only, primarily drawing on the introductory material in Chapter 1 and some examples in Chapter 2. It does not cover MCMC.

### "Markov Chain Monte Carlo", Chapter 15.8 in "Numerical Recipes: The Art of Scientific Computing, 3rd Edition" by Press et al. (2007), pp. 824-836

If you have a copy of this book or can find this book chapter/subsection, it would be a nice companion to our workshop. But you won't need it to do the workshop, it's just a handy reference.


### "Markov Chain Monte Carlo Without all the Bullshit", Jeremy Kun, *Math $\cap$ Programming* blog post

A brief introduction that I recently learned about which does what its title promises!  

https://jeremykun.com/2015/04/06/markov-chain-monte-carlo-without-all-the-bullshit/