# Using enterprise to analyze PTA data

In this notebook you will learn:
* How to use `enterprise` to interact with IPTA data,
* How to setup an analysis of indiviudual pulsar noise properties,
* How to post-process your results.

# Load modules

In [None]:
from __future__ import division

%matplotlib inline
%config InlineBackend.figure_format = 'retina'
%load_ext autoreload
%autoreload 2

import os, glob, json, pickle
import matplotlib.pyplot as plt
import numpy as np
import scipy.linalg as sl

import enterprise
from enterprise.pulsar import Pulsar
import enterprise.signals.parameter as parameter
from enterprise.signals import utils
from enterprise.signals import signal_base
from enterprise.signals import selections
from enterprise.signals.selections import Selection
from enterprise.signals import white_signals
from enterprise.signals import gp_signals
from enterprise.signals import deterministic_signals
import enterprise.constants as const

import corner
from PTMCMCSampler.PTMCMCSampler import PTSampler as ptmcmc

## Get par, tim, and noise files (this is not the preferred method when we have supplied pickled enterprise Pulsar files; see below)
Here we collect the tim and par files. 

In [None]:
psrlist = None # define a list of pulsar name strings that can be used to filter.

In [None]:
datadir = '/Users/taylosr8/Research/NANOGrav/NANOGrav_12y/' # set your data directory

In [None]:
parfiles = sorted(glob.glob(datadir + 'par/*.par'))
timfiles = sorted(glob.glob(datadir + 'tim/*.tim'))

# filter
if psrlist is not None:
    parfiles = [x for x in parfiles if x.split('/')[-1].split('.')[0] in psrlist]
    timfiles = [x for x in timfiles if x.split('/')[-1].split('.')[0] in psrlist]   
    
# Make sure you use the tempo2 parfile for J1713+0747!!
# ...filtering out the tempo parfile... 
parfiles = [x for x in parfiles if 'J1713+0747_NANOGrav_12yv2.gls.par' not in x]

In [None]:
len(parfiles)

## Load into Pulsar class list

* The `enterprise` Pulsar class uses `libstempo` to read in `par` and `tim` files, then stores all pulsar data into a `Pulsar` object. This object contains all data and meta-data needed for the ensuing pulsar and PTA analysis. You no longer to reference the `par` and `tim` files after this cell.
* Note below that you can explicitly declare which version of the JPL solar-system ephemeris model that will be used to compute the Roemer delay between the geocenter and the barycenter (e.g. `DE438`). Otherwise the default values will be taken from the `par` files. Explicitly declaring the version here is good practice.
* You can also explicitly set the clock file to a version of `BIPM`, e.g. `BIPM(2018)`. This is less important, and you can let the code take the value from the `par` file.
* When you execute the following cell, you will get warnings like `WARNING: Could not find pulsar distance for PSR ...`. Don't worry! This is expected, and fine. Not all pulsars have well constrained distances, and will be set to `1 kpc` with a `20%` uncertainty.

### Read par and tim files into enterprise Pulsar objects

In [None]:
## Let's look at 1713
psrs = []
for p, t in zip(parfiles, timfiles):
    if 'J1713' in p:
        psr = Pulsar(p, t, ephem='DE438', clk='BIPM(2018)')
        psrs.append(psr)

## OR... load in enterprise pickled Pulsar instances that we've prepared!
Go here for full details: https://paper.dropbox.com/doc/NG-12.5yr_v3-GWB-Analysis--A2zJbxQU704Oq9jU1oqAWJCHAQ-DICJei6NxsPjxnO90mGMo

Pickled 12.5yr pulsars: https://drive.google.com/file/d/1GUcmdj9OMf7-hrAOydeHEO4ylQxgC9Kb/edit

Noise files: https://drive.google.com/file/d/1V7bu2y5hxFSj_7KWO3uNM7Q_0dciXfSX/edit

Empirical red noise proposal distributions: https://drive.google.com/file/d/19odsqZ93Wh8og1AGdU7SfvXzB0DJwipG/edit

In [None]:
## set your data directory
datadir = '/Users/taylosr8/Downloads/' # set your data directory

## read in pickles
psrs = pickle.load(open(datadir + 'channelized_12yr_v3_partim_DE438.pkl', 'rb'))

In [None]:
## Get parameter noise dictionary
noise_ng12 = datadir + 'channelized_12p5yr_v3_full_noisedict.json'

params = {}
with open(noise_ng12, 'r') as fp:
    params.update(json.load(fp))

In [None]:
## Load in empirical distributions
#emp_dists = pickle.load(open(datadir + '12yr_emp_dist_RNonly_py3.pkl', 'rb'))
#emp_dists = '/home/stephen.taylor/NANOGrav/nanograv_12p5yr_analysis/nanograv_12p5yr_analysis_mar2020/data/12yr_emp_dist_RNonly_py3.pkl'

# Single pulsar analysis

* `enterprise` is structured so that one first creates `parameters`, then `signals` that these `parameters` belong to, then finally a `model` that is the union of all `signals` and the `data`.

* We will show this explciitly below, then introduce some model shortcut code that will make your life easier.
* We test on `J1713+0747`.

In [None]:
psr = [p for p in psrs if p.name == 'J1713+0747'][0]

In [None]:
# find the maximum time span to set red-noise/DM-variation frequency sampling
tmin = psr.toas.min()
tmax = psr.toas.max()
Tspan = np.max(tmax) - np.min(tmin)

In [None]:
# define selection by observing backend
selection = selections.Selection(selections.by_backend)

## Create parameters

In [None]:
# white noise parameters
white_vary = True
if white_vary:
    efac = parameter.Uniform(0.01, 10.0)
    equad = parameter.Uniform(-8.5, -5)
    ecorr = parameter.Uniform(-8.5, -5)
else:
    efac = parameter.Constant() 
    equad = parameter.Constant() 
    ecorr = parameter.Constant() # we'll set these later with the params dictionary

# red noise parameters
log10_A = parameter.Uniform(-20, -11)
gamma = parameter.Uniform(0, 7)

### [NOTE] If fixing white-noise, simply use the previously loaded params dictionary

## Create signals

In [None]:
# white noise
ef = white_signals.MeasurementNoise(efac=efac, selection=selection)
eq = white_signals.EquadNoise(log10_equad=equad, selection=selection)
ec = white_signals.EcorrKernelNoise(log10_ecorr=ecorr, selection=selection)

# red noise (powerlaw with 30 frequencies)
pl = utils.powerlaw(log10_A=log10_A, gamma=gamma)
rn = gp_signals.FourierBasisGP(spectrum=pl, components=30, Tspan=Tspan)

# timing model
tm = gp_signals.TimingModel(use_svd=True) # stabilizing timing model design matrix with SVD

## Piece the full model together

In [None]:
# full model
s = ef + eq + ec + rn + tm

In [None]:
# intialize a single-pulsar pta model
# see how the "model" acts on the "pulsar" object...
pta = signal_base.PTA(s(psr))

In [None]:
# [Optional] Set white-noise parmeters from previous analysis
pta.set_default_params(params)

In [None]:
len(pta.params)

## Draw initial sample from model parameter space

In [None]:
x0 = np.hstack([p.sample() for p in pta.params])
ndim = len(x0)

In [None]:
ndim

## Setup sampler (simple, with no tricks)

In [None]:
# initial jump covariance matrix
cov = np.diag(np.ones(ndim) * 0.01**2) # helps to tune MCMC proposal distribution

# where chains will be written to
outdir = './chains_singlepsr_test_{}/'.format(str(psr.name))

# sampler object
sampler = ptmcmc(ndim, pta.get_lnlikelihood, pta.get_lnprior, cov,
                 outDir=outdir, 
                 resume=False)

## Sample the parameter space

In [None]:
# sampler for N steps
N = int(1e6)

# SCAM = Single Component Adaptive Metropolis
# AM = Adaptive Metropolis
# DE = Differential Evolution
## You can keep all these set at default values
sampler.sample(x0, N, SCAMweight=30, AMweight=15, DEweight=50, )

## Simple post-processing

In [None]:
chain = np.loadtxt(outdir + 'chain_1.txt')
burn = int(0.25 * chain.shape[0]) # experiment with burn-in

In [None]:
# Find column of chain file corresponding to a parameter
ind = list(pta.param_names).index('J1713+0747_log10_A')

In [None]:
# Make trace-plot to diagnose sampling
plt.plot(chain[burn:, ind])

In [None]:
# Plot a histogram of the marginalized posterior distribution
plt.hist(chain[burn:,ind], 50, normed=True, histtype='stepfilled', 
         lw=2, color='C0', alpha=0.5);
plt.xlabel('J1713+0747_log10_A')
plt.ylabel('PDF')

In [None]:
# Make 2d histogram plot
ind_redA = list(pta.param_names).index('J1713+0747_log10_A')
ind_redgam = list(pta.param_names).index('J1713+0747_gamma')
fig = corner.corner(chain[burn:, [ind_redA, ind_redgam]], 
                    labels=['J1713+0747_log10_A', 'J1713+0747_gamma'],
                   levels=[0.68,0.95]);

## Now, the easy way to do all of this

Many of us have created shortcuts to carry out these tasks. You will find them in `enterprise_extensions`: https://github.com/nanograv/enterprise_extensions.

In [None]:
import enterprise_extensions
from enterprise_extensions import models, model_utils

In [None]:
# Create a single pulsar model
pta = models.model_singlepsr_noise(psr, red_var=True, psd='powerlaw', 
                                   noisedict=None, white_vary=True, 
                                   tm_svd=True, components=30)

In [None]:
len(pta.params)

In [None]:
pta.params

In [None]:
# Setup a sampler instance.
# This will add some fanicer stuff than before, like prior draws, 
# and custom sample groupings.
sampler = model_utils.setup_sampler(pta, outdir=outdir, resume=False)

In [None]:
# sampler for N steps
N = int(1e6)
x0 = x0 = np.hstack(p.sample() for p in pta.params)

# SCAM = Single Component Adaptive Metropolis
# AM = Adaptive Metropolis
# DE = Differential Evolution
## You can keep all these set at default values
sampler.sample(x0, N, SCAMweight=30, AMweight=15, DEweight=50, )

In [None]:
chain = np.loadtxt(outdir + '/chain_1.txt')
burn = int(0.25*chain.shape[0])
pars = np.loadtxt(outdir + '/pars.txt', dtype=np.unicode_)