Basic theory of likelihood methods
===================

This jupyter notebook gives a basic overview of the theory of likelihood and maximum likelihood estimators applied to *Fermi* LAT. 

In the introduction to probabilistic concepts below, I was blatantly inspired by the excellent [Bayesian methods in astronomy](https://github.com/jakevdp/BayesianAstronomy) tutorial. 

There are two fundamental types of statistical questions we want to answer:

**1. Model Fitting:** *Given this Model, what parameters best fit my data?*

Examples:

- What are the slope and intercept of a line of best-fit?
- What is the frequency, amplitude, and phase of a sinusoidal fit?

**2. Model Selection:** *Given two potential Models, which better describes my data?*

Examples:

- Does a linear or quadratic fit describe our data better?

Often one of the two models is a *null hypothesis*, or a baseline model in which the effect you're interested in is not observed.

## The Bayesian Problem Setting

Thus the end-goal of a Bayesian analysis is a probabilistic statement about the universe.
Roughly we want to measure

$$
P(science)
$$

Where "science" might be encapsulated in the cosmological model, the mass of a planet around a star, or whatever else we're interested in learning about.

We don't of course measure this without reference to data, so more specifically we want to measure

$$
P(science~|~data)
$$

which should be read "the probability of the science *given* the data."

Of course, we should be explicit that this measurement is not done in a vaccum: generally before observing any data we have *some* degree of background information that informs the science, so we should actually write

$$
P(science~|~data, background\ info)
$$

This should be read "the probability of the science given the data *and* the background information".

This is starting to get a bit cumbersome, so let's create some symbols that will let us express this more easily:

$$
P(\theta_S, \theta_N~|~D, I)
$$

- $\theta_S$ represents the "science": the set of parameters that we are interested in constraining
- $\theta_N$ represents the "nuisance parameters": the set of parameters that are important in the model, but are not particularly interesting for the scientific result.
- $D$ represents the "observed data"
- $I$ represents the information or knowledge you had before observing the data, including whatever made you choose the model you're fitting.

Finally, we'll often just write $\theta = (\theta_S, \theta_N)$ as a shorthand for all the model parameters.

This quantity, $P(\theta~|~D,I)$ is called the "posterior probability" and determining this quantity is the ultimate goal of a Bayesian analysis.

Now all we need to do is compute it!