## Run this notebook in Google Colab by clicking here: [Google Colab](https://colab.research.google.com/github/AaronDJohnson/12p5yr_stochastic_analysis/blob/master/tutorials/model_selection.ipynb)

### Run these cells if using Colab. Otherwise, skip them!

In [None]:
# This cell will reset the kernel.
# Run this cell, wait until it's done, then run the next.
!pip install -q condacolab
import condacolab
condacolab.install_mambaforge()

In [None]:
%%capture
!mamba install -y -c conda-forge enterprise_extensions la_forge
!git clone https://github.com/AaronDJohnson/12p5yr_stochastic_analysis
import sys
sys.path.insert(0,'/content/12p5yr_stochastic_analysis/tutorials')

# Using `enterprise` to perform model selection

In this notebook you will learn:
* How to use `enterprise_extensions` to create models with NANOGrav data,
* How to perform model selection on the NANOGrav 15-year data set using `HyperModel`.
* How to reproduce some of Figure 2 of the NANOGrav 15-year GWB paper

# Load packages and modules

In [3]:
%%capture
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
%load_ext autoreload
%autoreload 2

import json, sys, glob
import matplotlib.pyplot as plt
import numpy as np

from enterprise_extensions import models, model_utils, hypermodel

from h5pulsar.pulsar import FilePulsar

IN_COLAB = 'google.colab' in sys.modules

In [4]:
if IN_COLAB:
    datadir = '/content/12p5yr_stochastic_analysis/tutorials/data'
else:
    datadir = './data'

## Load the full set of Pulsar objects

  * These files have been stored as `HDF5` files to make them much faster to load (and take up little space)
  
  * See the `explore_data.ipynb` tutorial to see what exists in these files and how to load `.par` and `.tim` files

In [5]:
psrs = []
for hdf5_file in glob.glob(datadir + '/hdf5/*.hdf5'):
    psrs.append(FilePulsar(hdf5_file))
print('Loaded {0} pulsars from hdf5 files'.format(len(psrs)))

Loaded 67 pulsars from hdf5 files


## Read in white noise dictionaries
  * We can read-in some previously computed noise properties from single-pulsar white noise analyses. These are things like `EFAC`, `EQUAD`, and (for `NANOGrav`) `ECORR`. 

  * In practice, we set these white-noise properties as fixed in the low-frequency noise / GW searches to reduce the computational cost of the analysis significantly.

  * The noise properties have been stored as `json` files, and are read into a big parameter dictionary.

In [7]:
## Get parameter noise dictionary
noise_ng15 = datadir + '/15yr_wn_dict.json'

wn_params = {}
with open(noise_ng15, 'r') as fp:
    wn_params.update(json.load(fp))

# Model Selection: `model_2a` vs. `model_3a`

* This notebook reproduces one of the Bayes factors found in figure 2 of the 15-year GWB analysis paper

* We want to be able to compute the Bayes factor for a signal in the data. This can be done using the `HyperModel` class, where we choose between a `model_2a` with a common (but uncorrelated) red process in the pulsars, and `model_3a` with a common, HD correlated red process among all pulsars

In [8]:
nmodels = 2
mod_index = np.arange(nmodels)

# Make dictionary of PTAs.
pta = dict.fromkeys(mod_index)
pta[0] = models.model_2a(psrs, noisedict=wn_params, n_gwbfreqs=14,
                         tm_marg=True, tm_svd=True)
pta[1] = models.model_3a(psrs, noisedict=wn_params, n_gwbfreqs=14,
                         tm_marg=True, tm_svd=True)

* In setting up the `HyperModel` in the next cell, we set weights to make the models sample more evenly.
* `log_weights` is a list with the same length as the models, and each entry is added to the corresponding log-likelihood
* We will undo the `log_weights` later in post-processing the chains

In [9]:
super_model = hypermodel.HyperModel(pta, log_weights=[np.log(200), 0])

In [10]:
if IN_COLAB:
    outDir = '/content/15yr_stochastic_analysis/tutorials/chains/ms_2a3a_chains'
else:
    outDir = './chains/ms_2a3a_chains'
sampler = super_model.setup_sampler(resume=True, outdir=outDir, sample_nmodel=True,)

Adding red noise prior draws...

Adding GWB uniform distribution draws...

Adding gw param prior draws...

Adding nmodel uniform distribution draws...



In [11]:
# sampler for N steps
N = int(5e6)  # 5e6 is a good number for a real analysis
x0 = super_model.initial_sample()

In [None]:
# sample
# sampler.sample(x0, N, SCAMweight=30, AMweight=15, DEweight=50, )

## Thermodynamic Integration