# Generating Mock Posterior Estimates

[![Open in GitHub](https://img.shields.io/badge/Open-GitHub-black?logo=github)][REPRODUCIBILITY_LINK]

## Introduction

In this tutorial, we will generate mock posterior estimates similar to those used in
{cite:p}`PhysRevD.110.063009` using per-defined CLIs in GWKokab. Primary source mass
and mass ratio of BBH are jointly parameterized by Powerlaw Primary Mass Ratio.

$$
p(m_1, q\mid \alpha,\beta) \propto
m_1^{-\alpha} q^{\beta} \quad \text{where} \quad
(m_1,q) \in [m_\mathrm{min}, m_\mathrm{max}]\times [m_\mathrm{min}/m_1, 1]
$$

and eccentricity of their orbits is parameterized by a Normal distribution located at
$0$ and truncated to $[0, 1]$.

$$
p(\epsilon \mid \sigma) =
\mathcal{N}_{[0,1]}(\epsilon \mid \mu=0, \sigma)
\quad \text{where} \quad \epsilon \in [0, 1]
$$

The joint distribution of $(m_1, q, \epsilon)$ scaled by the merger rate is given by:

$$
\rho(m_1, q, \epsilon \mid \mathcal{R}, \alpha, \beta, \sigma) =
\mathcal{R} \cdot p(m_1, q\mid \alpha,\beta) \cdot p(\epsilon \mid \sigma)
$$

````{admonition} Note
:class: note

Above equation is not normalizable due to merger rate $\mathcal{R}$, that's why we have
used $\rho$ instead of $p$.
````

GWKokab defines niche[^1] models as a subclass of
[`numpyro.distributions.Distribution`](https://num.pyro.ai/en/stable/distributions.html#numpyro.distributions.distribution.Distribution),
otherwise common models are imported from NumPyro. GWKokab has defined Powerlaw Primary
Mass Ratio as
[`PowerlawPrimaryMassRatio`](https://gwkokab.readthedocs.io/en/latest/autoapi/gwkokab/models/mass/index.html#gwkokab.models.mass.PowerlawPrimaryMassRatio)
and uses [`numpyro.distributions.TruncatedNormal`](https://num.pyro.ai/en/stable/distributions.html#numpyro.distributions.truncated.TruncatedNormal) for the truncated Normal distribution.

## Model Specification

We have to provide a json file specifying the parameters of the distributions. We are naming it [`model.json`](https://github.com/gwkokab/hello-gwkokab/blob/main/generating_mock_posterior_estimates/model.json) and it looks like this:

```json
{
    "log_rate": 5.0,
    "alpha_m": 2.35,
    "beta": 1.0,
    "mmin": 5.0,
    "mmax": 50.0,
    "loc": 0.0,
    "scale": 0.1,
    "low": 0.0,
    "high": 1.0
}
```

Rate is specified in natural logarithm. Other parameters are self-explanatory.

## Measurement Uncertainties

To generate mock posterior estimates, we also need to simulate the measurement
uncertainties. This tutorial uses what we call banana error described in the section of
3 of {cite:p}`10.1093/mnras/stw2883`. It adds errors in chirp mass
and symmetric mass ratio tuneable via `scale_Mc` and `scale_eta` respectively, and
convert them back to component source masses. Eccentricity has truncated normal
uncertainty with `scale` as the width of the distribution, `low` and `high` as the
truncation limits. Lets save them in 
[`err.json`](https://github.com/gwkokab/hello-gwkokab/blob/main/generating_mock_posterior_estimates/err.json).

```json
{
    "scale_Mc": 1.0,
    "scale_eta": 1.0,
    "scale": 0.1,
    "low": 0.0,
    "high": 1.0
}
```

## Sensitivity of Detectors

For the purpose of mock posterior estimates, GWKokab prefers to use an Multi-Layer
Perceptron (MLP) based neural network (with a specific format) which can predict the
sensitivity of the detectors for a given set of parameters. The purpose is to avoid
running expensive interpolations. We have generated trained such neural network and
collection of such networks can be found [here](https://github.com/gwkokab/VTs).

We have two methods to specify the sensitivity network,

1. Probability of detection ($P_{\mathrm{det}}$)
2. Volume Time Sensitivity ($\mathrm{VT}$)

Volume Time Sensitivity is related to Probability of detection as:

$$
\mathrm{VT}(\theta) = \int_{0}^{z_\mathrm{max}} \frac{dV_c}{dz} \frac{1}{1+z} P_{\mathrm{det}}(\theta, z) dz
$$

where, $dV_c/dz$ is the differential comoving volume element at redshift $z$ and
$z_\mathrm{max}$ is the maximum redshift considered. If redshift is a parameter in the
model, then $P_{\mathrm{det}}$ is used otherwise $VT$ is used. In this tutorial, we are
using $VT$ based sensitivity. Just like model parameters, we have to provide a json file
specifying which sensitivity to use along with few other important quantities like total
observation time. We are naming it
[`pmean.json`](https://github.com/gwkokab/hello-gwkokab/blob/main/generating_mock_posterior_estimates/pmean.json),
because its main purpose is to compute the poisson mean of the Inhomogeneous Poisson
process (Hierarchical Bayesian Inference). It looks like this:

```json
{
    "estimator_type": "neural_vt",
    "filename": "neural_vt_1_200_1000_ecc_matters.hdf5",
    "time_scale": 365.0,
    "num_samples": 2000,
    "batch_size": 1000
}
```

Here, `filename` is the name of the MLP. `time_scale` is the total observation time in
appropriate units. `num_samples` are the samples to draw from population model to
estimate the poisson mean using importance sampling. `batch_size` is to compute the
sensitivity in batches. `neural_vt_1_200_1000_ecc_matters.hdf5` was generated using time
in days therefore 365.0 is used for one year of observation.

## Mock Posterior Estimates

Now we have everything to generate mock posterior estimates. We will use the CLI
`genie_ecc_matters` provided by GWKokab. The command is as follows:

```bash
genie_ecc_matters \
    --error-size 2000 \
    --num-realizations 1 \
    --seed $RANDOM \
    --model-json model.json \
    --pmean-json pmean.json \
    --err-json err.json
```

This will generate one realization of where each event has at max 2000 posterior
samples. `--seed` is used to set the random seed for reproducibility. The output will be
saved in the current working directory in a folder named `data` with following
structure:

```
data
└── realization_0
    ├── injections.dat
    ├── posteriors
    │   ├── event_0.dat
    │   └── ...
    └── raw_injections.dat
```

where, `raw_injections.dat` contains the true injections without selection effects
(i.e. detector's sensitivity) and `injections.dat` contains the injections after
selection effects. `posteriors/event_0.dat` contains the posterior samples for 0th event
and so on. There are some more files generated which are not relevant for this tutorial.

A peek into each file shows,

```
$ head data/realization_0/injections.dat -n 5
mass_1_source mass_2_source eccentricity
1.582525634765625000e+01 1.211350154876708984e+01 1.333083678036928177e-02
2.813809776306152344e+01 1.973993301391601562e+01 2.429485321044921875e-02
1.920602798461914062e+01 1.582667064666748047e+01 5.360946059226989746e-02
1.299305725097656250e+01 8.506275177001953125e+00 8.243023417890071869e-03
```

```
$ head data/realization_0/raw_injections.dat -n 5
mass_1_source mass_2_source eccentricity
6.057229518890380859e+00 5.316216945648193359e+00 4.142199084162712097e-02
6.311927318572998047e+00 5.126795768737792969e+00 1.041059270501136780e-01
8.150541305541992188e+00 8.072440147399902344e+00 6.612396985292434692e-02
1.301545238494873047e+01 1.287599658966064453e+01 1.073014573194086552e-03
```

```
$ head data/realization_0/posteriors/event_0.dat -n 5
mass_1_source mass_2_source eccentricity
1.565965175628662109e+01 1.120151329040527344e+01 4.000317305326461792e-02
1.629046249389648438e+01 1.068410491943359375e+01 1.384858191013336182e-01
1.673245811462402344e+01 1.088646125793457031e+01 1.342954784631729126e-01
1.680053520202636719e+01 1.106582164764404297e+01 1.797602921724319458e-01
```

---

All the code and files used in this tutorial can be found in
[hello-gwkokab/generating_mock_posterior_estimates][REPRODUCIBILITY_LINK].

[REPRODUCIBILITY_LINK]: https://github.com/gwkokab/hello-gwkokab/tree/main/generating_mock_posterior_estimates

## References

```{bibliography} refs.bib
```

[^1]: A niche model is a model that is specific to population inference of CBCs and not
        available in NumPyro.