# KiDS-GGL available priors

Since `v2.0.0`, several priors are available. These are all defined in `kids_ggl_pipeline/sampling/priors.py`:

 * `exp:` Exponential prior, $f=exp(-x)$ (Note: location and scale as per [`scipy.stats`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.expon.html) not yet implemented.)
 * `jeffreys`: [Jeffreys prior](https://en.wikipedia.org/wiki/Jeffreys_prior), $f=1/x$, typically used for scatter (more generally, for *scale estimators*).
 * `lognormal`: Lognormal probability, $f = (1/x)(2\pi\sigma^2)^{0.5}\exp\left[-(\log(x)-x_0)^2/(2\sigma^2)\right]$
 * `normal`: Normal (Gaussian) probability, $f = (2\pi\sigma^2)^{0.5}\exp\left[-(x-x_0)^2/(2\sigma^2)\right]$
 * `student`: [Student's t distribution](https://en.wikipedia.org/wiki/Student%27s_t-distribution), appropriate for slopes (when used with 1 degree of freedom). Exercise: see why this works and a uniform prior does not!
 * `uniform`: Uniform distribution.

Three additional "priors" for fixed parameters (i.e., that are not allowed to vary in the MCMC) are available:

 * `fixed`: Fixed scalar
 * `array`: Array of fixed scalars
 * `read`: Array of fixed scalars, read from a file

And finally, there is a "prior" that signals that a parameter has already been defined in the [configuration](configuration.ipynb) help page: `repeat`.

They are all defined in the configuration file as

```
param_name    prior    [arg1    [arg2]]    [lower    upper]    [starting]
```

but they all take different kinds of arguments (groups of objects between brackets are optional but must be specified together). The notation here follows the `unix` convention that values in brackets are optional; if a set of brackets includes more than one value then if they are specified they must all be specified. The values taken by each of the available priors are:

```
param_name            exp          [lower    upper]    [starting]
param_name            jeffreys     [lower    upper]    [starting]
param_name            lognormal    centre     scale    [lower    upper]     [starting]
param_name            normal       centre     scale    [lower    upper]     [starting]
param_name            student      dof    [lower    upper]     [starting]
param_name            uniform      lower    upper    [starting]
param_name            fixed        value
param_name            array        value1,value2,value3,...
param_name            read         file    column(s)
section.param_name
```

where `lower` and `upper` are lower and upper bounds of the allowed range for the prior. For instance, a mass might have a normal prior $2\pm1$, but it cannot physically go below zero. In this case, you'd want `lower=0`. If not provided, the default limits are as follows:

```
exp: [-10, 10]              # cumulative probability outside this range ~ 2e-9
jeffreys: [1e-10, 100]
lognormal: 10 sigma
normal: 10 sigma
student: [-1e6, 1e6]        # cumulative probability outside this range ~ 3e-7
```

Note that if you set a lower bound, you *must* also set an upper bound, for the pipeline to interpret values correctly; you may set the lower bound to `-inf` and the upper bound to `inf` if you do not wish to set bounds at all (but google "improper Bayesian priors").

`starting` is the starting point for each parameter in the MCMC chain. If not provided, the starting point is calculated as a random number generated from the prior within the specified range.

The `read` prior takes a filename (relative path from the working directory) and a list of comma-separated columns to be read from that file.

The last entry above is a `repeat` "prior", which means that the parameter in question is the same parameter as `param_name` defined in `section`. This is useful to avoid errors from repetition, but most useful when a single free parameter enters more than one section (for instance, if the mass-observable relation for satellites is a scaled version of that for centrals).

Some of these priors are set in `demo/ggl_model_demo.txt` for illustration.

**Notes**
 * The `exp` and `jeffreys` distributions take no free parameters. In order for them to make sense, the variable in question must be relatively small (say, typically ~0-10).
 * As mentioned before, we recommend the `student` prior to be used with the slope of a line, setting `dof=1`.

In [21]:
%matplotlib inline

import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
from scipy.integrate import quad

x = np.logspace(-10, 10, 1000)
f = lambda x: 1/x
dfdx = lambda x: -1/x**2

integrand = lambda x: f(x)
cdf = quad(integrand, 0, 10)
total = quad(integrand, 0, np.inf)
cdf, total

  If increasing the limit yields no improvement it is advised to analyze 
  the integrand in order to determine the difficulties.  If the position of a 
  local difficulty can be determined (singularity, discontinuity) one will 
  probably gain from splitting up the interval and calling the integrator 
  on the subranges.  Perhaps a special-purpose integrator should be used.
  If increasing the limit yields no improvement it is advised to analyze 
  the integrand in order to determine the difficulties.  If the position of a 
  local difficulty can be determined (singularity, discontinuity) one will 
  probably gain from splitting up the interval and calling the integrator 
  on the subranges.  Perhaps a special-purpose integrator should be used.


((41.67684067538799, 9.350560373140489),
 (48.720960971461565, 16.30167063049395))

In [12]:
help(quad)

Help on function quad in module scipy.integrate.quadpack:

quad(func, a, b, args=(), full_output=0, epsabs=1.49e-08, epsrel=1.49e-08, limit=50, points=None, weight=None, wvar=None, wopts=None, maxp1=50, limlst=50)
    Compute a definite integral.
    
    Integrate func from `a` to `b` (possibly infinite interval) using a
    technique from the Fortran library QUADPACK.
    
    Parameters
    ----------
    func : {function, scipy.LowLevelCallable}
        A Python function or method to integrate.  If `func` takes many
        arguments, it is integrated along the axis corresponding to the
        first argument.
    
        If the user desires improved integration performance, then `f` may
        be a `scipy.LowLevelCallable` with one of the signatures::
    
            double func(double x)
            double func(double x, void *user_data)
            double func(int n, double *xx)
            double func(int n, double *xx, void *user_data)
    
        The ``user_data`` is the 