# FIGARO: an introductive guide

This notebook shows how to use FIGARO, *Fast Inference for GW Astronomy, Research & Observations*.

## 1D probability density

We will start from a simple problem: inferring a 1D probability density given a set of samples drawn from it.
Let's draw some samples from a Gaussian distribution.

In [None]:
import numpy as np
from scipy.stats import norm, uniform
import matplotlib.pyplot as plt
from tqdm import tqdm

mu = 30
sigma = 3
n_samps = 1000
dist = norm(mu, sigma) 

samples = dist.rvs(n_samps)

n, b, p = plt.hist(samples, bins = int(np.sqrt(len(samples))), histtype = 'step', density = True)

FIGARO contains a class designed to infer probability densities given a set of samples.

In order to instantiate the class, we need to specify the boundaries of the distribution.
We will assume that our probability density is bounded between 10 and 50.

In [None]:
from figaro.mixture import DPGMM

x_min = 10
x_max = 50

mix = DPGMM([[x_min, x_max]])

Please note that the boundaries must be passed as a 2D array. This is to ensure that the very same syntax holds for multidimensional distributions too.

The idea is that the algorithm *learns* the shape of the probability density from the available samples, one at a time: every new sample adds a piece of information to the inference. Therefore, we need to pass the samples to our mixture one at a time in order to draw a single realisation of the Dirichlet Process.

In [None]:
for s in tqdm(samples):
    mix.add_new_point(s)

Now that our mixture knows the shape of the distribution, we can build the probability density:

In [None]:
rec = mix.build_mixture()

Before starting again with a new inference the mixture must be initialised, otherwise it will remember the samples from the previous run.\
**Please note:** from now on, the mixture we just inferred is stored in `rec`. Calling any of the following methods on the now empty mixture `mix` will result in an exception being raised.

In [None]:
mix.initialise()

Let's have a look at this reconstruction. `dist` contains the realisation we just drew, with some useful methods.

In [None]:
[method_name for method_name in dir(rec)
                  if callable(getattr(rec, method_name)) and not method_name.startswith('_')]

`pdf` and `logpdf` take a 1D or 2D array and return, respectively, the probability and the log_probability of the inferred distribution, while `rvs` takes the number of desiderd samples and returns an array of draws. `cdf` and `logcdf` are the cumulative distribution function and its logarithm. These are, however, defined only for 1D distributions.

We now want to evaluate the probability density over the interval $[x_{min},x_{max}]$.   
**WARNING: FIGARO uses a coordinate change that is singular at boundaries. Be careful not to evaluate the mixture on or outside the boundaries. This will result in infs or NaNs.**

In [None]:
x = np.linspace(x_min, x_max, 1002)[1:-1]
p = rec.pdf(x)

Let's compare the reconstruction with the samples and with the true distribution:

In [None]:
n, b, t = plt.hist(samples, bins = int(np.sqrt(len(samples))), histtype = 'step', density = True, label = 'Samples')
plt.plot(x, dist.pdf(x), color = 'red', lw = 0.7, label = 'Simulated')
plt.plot(x, p, color = 'forestgreen', label = 'DPGMM')
plt.legend(loc = 0, frameon = False)
plt.grid(alpha = 0.6)

This is a *single* realisation from the Dirichlet Process. In order to properly explore the distribution space, we need a set of draws: therefore we need to repeat the exercise of training the DPGMM for every new sample we want. 

The DPGMM class contains a method that is a wrapper for the `for` loop we wrote before, `DPGMM.density_from_samples()`, which returns a realisation from the DP.

In [None]:
n_draws = 100
draws = np.array([mix.density_from_samples(samples) for _ in tqdm(range(n_draws))])

Each call to `density_from_samples` reshuffles the samples and automatically initialise the mixture at the end.

With the set of draws we have, we can compute median and credible regions for the probability distribution.

In [None]:
probs = np.array([d.pdf(x) for d in draws])

percentiles = [50, 5, 16, 84, 95]
p = {}
for perc in percentiles:
    p[perc] = np.percentile(probs, perc, axis = 0)
N = p[50].sum()*(x[1]-x[0])
for perc in percentiles:
    p[perc] = p[perc]/N

n, b, t = plt.hist(samples, bins = int(np.sqrt(len(samples))), histtype = 'step', density = True, label = 'Samples')
plt.fill_between(x, p[95], p[5], color = 'mediumturquoise', alpha = 0.5)
plt.fill_between(x, p[84], p[16], color = 'darkturquoise', alpha = 0.5)
plt.plot(x, dist.pdf(x), color = 'red', lw = 0.7, label = 'Simulated')
plt.plot(x, p[50], color = 'steelblue', label = 'DPGMM')
plt.legend(loc = 0, frameon = False)
plt.grid(alpha = 0.6)   

The same plot can be obtained with the dedicated method:

In [None]:
from figaro.utils import plot_median_cr
plot_median_cr(draws,
               injected = dist.pdf,
               samples  = samples,
               save     = False,
               show     = True,
               )

The draws are uncorrelated from each other. The autocorrelation function is:

In [None]:
from figaro.diagnostic import autocorrelation
acf = autocorrelation(draws, bounds = [20, 40], save = False, show = True)

Let's look at the entropy to assess the convergence of the recovered distribution to the injected one.\
In order to do so, we need to draw a single realisation, saving it every time we add a new sample.

In [None]:
mix.initialise()
updated_mixture = []

for s in tqdm(samples):
    mix.add_new_point(s)
    updated_mixture.append(mix.build_mixture())

Once we have all the history of how the single distribution has been generated, the FIGARO package comes with a method that produces entropy plots:

In [None]:
from figaro.diagnostic import entropy

S = entropy(updated_mixture, show = True, save = False)

It is also possible to compute an approximant of the entropy derivative to assess whether the distribution converged or not.

In [None]:
from figaro.diagnostic import plot_angular_coefficient

ac = plot_angular_coefficient(S, show = True, save = False)

When the number of accumulated samples is large enough to provide a good representation of the underlying distribution. the entropy reaches a plateau, and its derivative fluctuates around zero.\
Let's repeat the exercise with a larger number of samples:

In [None]:
n_samps = 5000
samples = dist.rvs(n_samps)

mix.initialise()
updated_mixture = []

for s in tqdm(samples):
    mix.add_new_point(s)
    updated_mixture.append(mix.build_mixture())

Let's look at the recovered distribution:

In [None]:
plot_median_cr([updated_mixture[-1]],
               injected = dist.pdf,
               samples  = samples,
               save     = False,
               show     = True
               )

Entropy and angular coefficient:

In [None]:
S  = entropy(updated_mixture, show = True, save = False)
ac = plot_angular_coefficient(S, show = True, save = False, ac_expected = 0)

With this number of samples, the angular coefficient starts fluctuating around 0 after ~3000 samples.

## Setting prior parameters

The prior distribution for means and covariances is the Normal-Inverse-Wishart distribution, which requires 4 parameters:
* $\nu$ is the number of degrees of freedom for the Inverse Wishart distribution,. It must be greater than $D+1$, where $D$ is the dimensionality of the distribution;
* $k$ is the scale parameter for the multivariate Normal distribution;
* $\mu$ is the mean of the multivariate Normal distribution;
* $\Lambda$ is the expected value for the Inverse Wishart distribution, a covariance matrix.

Setting these priors is nontrivial, given the fact that FIGARO operates, in a completely user-transparent way, a coordinate change and these distributions are defined in the transformed space.
We suggest using the `figaro.utils.get_priors` method, which provides the user with an easy way to get the right parameters for instancing `figaro.mixture.DPGMM/HDPGMM` given their desired values in the natural space.

The following list describes the arguments that can be passed to `get_priors` and their effect on the parameters:

* `bounds` specifies the boundaries of the interval our reconstructed density will be defined, as in instancing the `DPGMM` class. It is the only mandatory argument;
* `samples` contains the samples that will be used to reconstruct the probability density. They can be used to compute $\mu$ and $\Lambda$ if specific keyword arguments are not provided;
* `mean` is the expected value for $\mu$ in natural space, must be a $(D,)$-shaped array. If provided, it overrides `samples`;
* `std` is the expected standard deviation for each dimension. It can be passed as a 1D array with shape ($D$,) or `double` (if `double`, it assumes that the same std has to be used for all dimensions). If provided, overrides `samples` in computing $\Lambda$;
* `cov` is the expected covariance matrix. It must be passed as 2D array with shape ($D$,$D$). If provided, overrides both `std` and `samples` in computing $\Lambda$;
* `df` corresponds to $\nu$ and must be an integer value. It must be greater than $D+1$, otherwise the default value will be used;
* `k` is the Normal scale parameter $k$ and it must be a positive `float`.

With the exception of `bounds`, all the arguments are optional. Moreover, the user may decide to call `get_priors` with only some of them: the method will return default values for the others.

The following list contains the default values for the prior paramters along with some hints on how to set them for a sensible run, still keeping in mind that, being the NIW a prior distribution, most of the information comes from the data themselves:

* $\mu$, by default, is set to the center of the ND interval, and for most cases it is ok to leave this unchanged. We suggest setting it to the samples mean (by passing the available samples via the `samples` keyword argument) while reconstructing the single-event posterior distributions of a hierarchical inference: this because the interval over which the hierarchical distribution is defined can be wider than the support of the single-event posterior distribution and the samples might be located away from the interval center;
* $\Lambda$ is by default set to be a diagonal matrix. In general, the default value is a good choice for most cases (samples that spans widely over the ND interval when the target distribution is expected to have only blunt features). If, on the other hand, the distribution is expected to show sharp features, like a relatively narrow Gaussian peak, we suggest using the `std` argument and setting it to something below the expected width of the feature. Finally, as above, for single-event posteriors in a hierarchical inference we suggest using the `samples` keyword argument and get this parameter from them;
* $k$ controls the width of the Normal distribution. The default value is $10^{-2}$ and it is a good choice for most cases. In general, we suggest $ 10^{-4} \lesssim k \lesssim 10^{-1}$;
* $\nu$ controls the width of the Inverse Wishart distribution. It must be a positive integer and at least $D+2$, which is the default value. It can be interpreted as the *strenght* of the prior on the covariance matrix: greater values of $\nu$ corresponds to giving more importance to the prior with respect to the likelihood (the samples). For most applications the default value is a good choice. We suggest to increase its value in those situations in which the samples are not available a priori (e.g. online reconstruction of probability densities) and the target distribution is expected to have a support much smaller than the whole ND interval, like for skymaps. We found that a good choice for this situation could be $2(D+2)$.

`get_priors` returns a tuple which can be directly used to instance `DPGMM/HDPGMM`.\
We strongly recommend to use this method to convert the prior parameters. For the brave user that is still willing to pass its own tuple, the order in which the parameters must be passed to FIGARO is $(k,\Lambda,\nu,\mu)$.\
Keep in mind that $\mu$ and $\Lambda$ must be in probit space: FIGARO is not able to distinguish between parameters in natural space or probit space, therefore no exceptions can be raised for this error.\
The behaviour of the code, in the case in which the user passes its own tuple and forgets to convert the parameters first, might be unpredictable.

*Note:* A small fluctuation in $\Lambda$ for subsequent calls with same argument is expected and it due to the fact that transforming a covariance matrix in probit space is nontrivial. In order to simplify the process, we decided to sample $10^4$ points from a multivariate Gaussian centered in $\mu$ with the given covariance or std (still in natural space), transform the samples in probit space and use the covariance of the transformed samples as $\Lambda$: from this, the fluctuations.

The default values for $(k, \Lambda,\nu,\mu)$ are:

In [None]:
from figaro.utils import get_priors

bounds = [[-5,5]]

get_priors(bounds)

One can directly call this method while instancing the `DPGMM` class:

In [None]:
mix = DPGMM(bounds, prior_pars = get_priors(bounds))

Priors from samples:

In [None]:
samples = norm(loc = -3, scale = 0.1).rvs(1000)

get_priors(bounds, samples)

User-defined parameter values:

In [None]:
get_priors(bounds, mean = 1, std = 0.5, df = 7)

FIGARO works also with multidimensional probability densities, as you will see in the following section. This method as well automatically adjust the default parameters:

In [None]:
# 4-dimensional distribution
bounds = [[0,1] for _ in range(4)]

get_priors(bounds)

## Multidimensional probability density

Multidimensional probability densities can be inferred using the same functions.

Let's generate some data from a bivariate Gaussian distribution:

In [None]:
from scipy.stats import multivariate_normal as mn
from corner import corner

n_samps = 1000
samples = mn(np.zeros(2), np.identity(2)).rvs(n_samps)

c = corner(samples, color = 'coral', labels = ['$x$','$y$'], hist_kwargs={'density':True})

The only difference with the previous case is that the mixture needs to be instantiated specifying the bounds for both dimensions.

In [None]:
x_min = -5
x_max = 5
y_min = -5
y_max = 5

mix_2d = DPGMM([[x_min, x_max],[y_min, y_max]])

The inference runs exactly as before:

In [None]:
for s in tqdm(samples):
    mix_2d.add_new_point(s)
rec = mix_2d.build_mixture()
mix_2d.initialise()

Let's compare the initial samples with a set of samples drawn from the recovered distribution.

In [None]:
mix_samples = rec.rvs(n_samps)


c = corner(samples, color = 'coral', labels = ['$x$','$y$'], hist_kwargs={'density':True, 'label':'$\mathrm{Samples}$'})
c = corner(mix_samples, fig = c, color = 'dodgerblue', labels = ['$x$','$y$'], hist_kwargs={'density':True, 'label':'$\mathrm{DPGMM}$'})
l = plt.legend(loc = 0,frameon = False,fontsize = 15, bbox_to_anchor = (1-0.05, 1.8))

Multiple draws:

In [None]:
n_draws = 100
draws = []

for _ in tqdm(range(n_draws)):
    draws.append(mix_2d.density_from_samples(samples))

## Hierarchical inference

Let's assume to have a set of samples $\{x_1,\ldots,x_k\}$ from some probability density $F(x)$. Around each $x_i$, another process generated a set of samples $\mathbf{y}_i = \{y_1^i,\ldots,y_n^i\}$ according to some distribution $f_i(y|x_i)$.   
To give a bit of context, $\{x_1,\ldots,x_k\}$ could be the true masses of the black holes observed by LIGO and Virgo drawn from the mass function $F(x)$ and each $\mathbf{y}_i$ could be the set of single-event primary mass posterior samples drawn from the posterior samples $f_i(y|x_i)$.

In this section we'll see how to use FIGARO to infer $F(x)$ using $\{\mathbf{y}_1,\ldots,\mathbf{y}_k\}$.
In the following example, both $F(x)$ and $f_i(y|x_i)$ are Gaussian distributions.

In [None]:
mu = 30
sigma = 5
n_evs = 1000
n_post_samps = 100

mass_function = norm(mu, sigma)
true_masses = mass_function.rvs(n_evs)

single_event_posteriors = [norm(norm(M, s).rvs(), s).rvs(n_post_samps) for M, s in zip(true_masses, np.random.uniform(1,3, size = len(true_masses)))]

First of all, we need to reconstruct the $k$ probability densities $f_i$. For each $y_i$, we can use the DPGMM class.
A proper analysis would require to draw multiple realisations for each posterior distribution. In this example, for the sake of time, we will draw only a handful of realisations for each event.

In [None]:
n_draws = 10
x_min = 1
x_max = 70
mix = DPGMM([[x_min, x_max]])

posteriors = []
for event in tqdm(single_event_posteriors, desc = 'Events'):
    draws = []
    for _ in range(n_draws):
        draws.append(mix.density_from_samples(event))
    posteriors.append(draws)

Once we have the single-event posterior reconstructions, we need the HDPGMM class:

In [None]:
from figaro.mixture import HDPGMM
hier_mix = HDPGMM([[x_min, x_max]])

The methods for this new class are the same we used before.

In [None]:
n_draws_hier = 100
hier_draws = []

for _ in tqdm(range(n_draws_hier)):
    hier_draws.append(hier_mix.density_from_samples(posteriors))

In the same fashion, we can plot the recovered distribution using the dedicated method:

In [None]:
plot_median_cr(hier_draws,
               samples  = true_masses,
               injected = mass_function.pdf,
               show     = True,
               hierarchical = True
               )