# Framework Introduction

In this section we detail the mathematics that underpin how the python package `GammaBayes` works.

## Intro to the intro

So to make it so that this intro doesn't take up a whole textbook I'm going to make some simplifying assumptions.

1. The individual background components do not have hyperparameters (although they do)
2. The Instrument Reponse Functions are perfectly specified (i.e. no hyperparameters, although in reality they do)
3. The CTA (or detector) pointing scheme is constant, so you don't have some areas of the sky with a larger flux because of the observation strategy of the detector
4. We perform the integration analytically (although we don't)

Now with that out of the way, we are going to essentially start at the end of what we do in our analysis, and work backwards, assuming a full probability model.

## The posterior

So within searches for dark matter we want to constrain hyperparameters. Namely 1. the dark matter mass, $m_\chi$, 2. any other fundamental value underpinning the dark matter model and it's flux, $\vec{\theta}$ and 3. the flux of dark matter or the fraction of events that come from the signal and backgrounds $\vec{\xi}$. More specifically we want to generate a 'map' of probabilities for these values given the observed data and assumptions taken, which in a Bayesian framework is typically called the _posterior_ represented as,

$$p(m_\chi, \vec{\theta}, \vec{\xi}|\vec{d}, \mathcal{S}, \mathcal{B}, \mathcal{L}).$$

Where $\vec{d}$ represents the set of gamma-ray observations, $\mathcal{S}$ represents the dark matter _Signal_ model used, $\mathcal{B}$ represents the _Background_ model and $\mathcal{L}$ represents the IRF model used or observation _Likelihoods_.

In a Bayesian framework, such as the one implemented in `GammaBayes` we can then use Bayes' theorem to find,
$$p(m_\chi, \vec{\theta}, \vec{\xi}|\vec{d}, \mathcal{S}, \mathcal{B}, \mathcal{L}) = \frac{p(\vec{d}|\vec{\xi}, m_\chi, \vec{\theta}, \mathcal{S}, \mathcal{B}, \mathcal{L}) \times \pi(m_\chi, \vec{\theta}, \vec{\xi}| \mathcal{S}, \mathcal{B})}{\mathcal{Z}(\vec{d}|\mathcal{S}, \mathcal{B}, \mathcal{L})}$$

The $\pi(m_\chi, \vec{\theta}| \mathcal{S}, \mathcal{B})$ is our _prior_, representing our _prior_ assumptions on the dark matter model parameters and mixture fractions. 
As we have not observed dark matter we usually put what are called _uninformative_ priors such as uniform or log-uniform priors on the dark matter parameters. 
With previous observations we _could_ construct informative priors on the mixture fractions, however for simplicitly we will leave them as uninformative.

$\mathcal{Z}(\vec{d}|\mathcal{S}, \mathcal{B}, \mathcal{L})$ is called the _evidence_ or _fully marginalised likelihood_ that represents the likelihood once we have integrated or _marginalised_ over the hyperparameters. This quantity can be simply viewed as a normalisation constant, however, later in this document we will detail how these values can be used for robust model comparison.

Finally, $p(\vec{d}|\vec{\xi}, m_\chi, \vec{\theta}, \mathcal{S}, \mathcal{B}, \mathcal{L})$ represents the _likelihood_ on the hyperparameters and can be expanded into the form below.

$$ p(\vec{d}|\vec{\xi}, m_\chi, \vec{\theta}, \mathcal{S}, \mathcal{B}, \mathcal{L}) = \sum_i \xi_1 \times p(d_i|\vec{\xi}, m_\chi, \vec{\theta}, \mathcal{S}, \mathcal{L})   
\\
+ (1-\xi_1)\times\xi_2 \times p(d_i|\vec{\xi}, m_\chi, \vec{\theta}, \mathcal{B_{CCR}}, \mathcal{L}) 
\\
+ (1-\xi_1)\times(1-\xi_2)\times \xi_3 \times p(d_i|\vec{\xi}, m_\chi, \vec{\theta}, \mathcal{B_{Diffuse}}, \mathcal{L}) 
\\
+ (1-\xi_1)\times(1-\xi_2)\times (1-\xi_3) \times p(d_i|\vec{\xi}, m_\chi, \vec{\theta}, \mathcal{B_{Point}}, \mathcal{L})$$

This is an example of a four component mixture model with the mixture fractions ($\xi_j$) set-up as the dirichlet 'stick-breaking' process. By doing the mixture model in this fashion the priors on these mixture fractions are trivial to compute and as we chose to make them uninformative can simply be set to be uniform from 0 to 1 in this set-up. If we instead wanted to do a four component mixture model simultaneously such as in the following format,

$$ p(\vec{d}|\vec{\xi}, m_\chi, \vec{\theta}, \mathcal{S}, \mathcal{B}, \mathcal{L}) = \sum_i \xi_1 \times p(d_i|\vec{\xi}, m_\chi, \vec{\theta}, \mathcal{S}, \mathcal{L})   
\\
+ \xi_2 \times p(d_i|\vec{\xi}, m_\chi, \vec{\theta}, \mathcal{B_{CCR}}, \mathcal{L}) 
\\
+ \xi_3 \times p(d_i|\vec{\xi}, m_\chi, \vec{\theta}, \mathcal{B_{Diffuse}}, \mathcal{L}) 
\\
+ \xi_4 \times p(d_i|\vec{\xi}, m_\chi, \vec{\theta}, \mathcal{B_{Point}}, \mathcal{L}).$$

Where $\sum_j \xi_j = 1 $ then each mixture fraction, $\xi_j$, must be normalised with respect to the 3-simplex, which is a 3D plane embedded in 4D space. This can be used with the pipeline with the used of the relevant `scipy` function for example, but it is much simpler to do this with the stick-breaking process where each mixture weight, $\xi_j$, can just be independently normalised as required without fear of mis-specifying the fractions. 

For more information on dirichlet distributions one can visit the very nice [Wikipedia page](https://en.wikipedia.org/wiki/Dirichlet_distribution) and similarly the [Wikipedia page for dirichlet processes](https://en.wikipedia.org/wiki/Dirichlet_process) (such as the stick-breaking process) is also very nicely written.

## Nuisance parameter marginalisation

It is at this stage that we first encounter the explicit need for the Bayesian tool of __marginalisation__. We will show what this means by picking out a particular term from the mixture model above.

$$ p(d_i|\vec{\xi}, m_\chi, \vec{\theta}, \mathcal{S}, \mathcal{L})  = \int d(d^t_i) \;p(d_i|d^t_i)\times \pi(d^t_i | \vec{\xi}, m_\chi, \vec{\theta}, \mathcal{S}, \mathcal{L}) 

So within the osbervations of detectors such as the CTA, we do not presume that the measured values are the same variable as the actual values of the gamma rays. We describe the actual, or 'true' values as nuisance parameters with the superscript $t$. These variables include the true energy, $E^t$, and the true sky position, $\Omega^t$ making up a full 'true' datum, $d^t$. We do not wish to infer on these parameter thus they are called 'nuisance parameters' within this framework. As such we use the Bayesian tool of marginalisation to take out the dependence of these values on the parameters of more interest.

The process of marginalisation is ef