GPCC.jl

ℹ What is this?

It is a julia implementation of the Gaussian Process Cross Correlation (GPCC) method introduced in

A Gaussian process cross-correlation approach to time delay estimation for reverberation mapping of active galactic nuclei.

💾 Installation

Apart from cloning, an easy way of using the package is the following:

1 - Add the registry AINJuliaRegistry

2 - Switch into "package mode" with ] and add the package with

add GPCC

The package exposes the following functions of interest to the user:

gpcc
simulatetwolightcurves and simulatethreelightcurves,
getprobabilities
uniformpriordelay.

These functions can be queried in help mode in the Julia REPL.

If installing GPCC.jl in an existing Julia environment, there is a chance one may run into dependency problems that prevent installation. In this case, it is advisable to work in a new environment. That is

mkdir("myGPCC")
cd("myGPCC")
# press `]` to enter package mode:
(@v1.7) pkg> activate .

and use this environment for installing and working with the package. When restarting Julia, one can re-enter this environment by simply starting Julia in the respective folder ("myGPCC") and using activate . in package mode.

▶ How to simulate data

Method simulatedata can be used to simulate data in 2 arbitrary (non-physical) bands:

using GPCC
tobs, yobs, σobs, truedelays = simulatetwolightcurves() # output omitted

A figure like the one above should show up displaying simulated light curves.

It is important to note how the simulated data are organised because function gpcc expects the data passed to it to be organised in the exact same way. First of all, we note that all three returned outputs are vectors whose elements are vectors (i.e. arrays of arrays) and that they share the same size:

typeof(tobs), typeof(yobs), typeof(σobs) 
size(tobs), size(yobs), size(σobs)

Each output contains data for 2 bands. tobs contains the observed times. tobs[1] contains the observed times for the 1st band, tobs[2] for the 2nd band. Similarly yobs[1] contains the flux measurements for the 1st band and σobs[1] the error measurements for the 1st band and so on. We can plot the data pertaining to the 2nd band as an example:

using PyPlot # must be indepedently installed, other plotting packages can be used instead
figure()
errorbar(tobs[2], yobs[2], yerr=σobs[2], fmt="o", label="2nd band")

▶ How to fit a dataset with `gpcc`

Having generated the simulated data, we will now model them with the GPCC model. To that end we use the function gpcc. Options for gpcc can be queried in help mode.

using GPCC

tobs, yobs, σobs, truedelays = simulatetwolightcurves();

# We choose the Matern32 kernel. Other choices are GPCC.OU, GPCC.rbf, GPCC.matern32, GPCC.matern52
# We fit the model for the given the true delays 
# Note that without loss of generality we can always set the delay of the 1st band equal to zero
# The optimisation of the GP hyperparameters runs for a maximum of 1000 iterations.

loglikel, pred, (α, postb, ρ) = gpcc(tobs, yobs, σobs; kernel = GPCC.matern32, delays = truedelays, iterations = 1000, rhomax = 300)

The call returns three outputs:

the marginal log likelihood loglikel reached by the optimiser.
a function pred for making predictions.
a tuple that contains the scaling coefficients $\alpha$, posterior distribution postb (of type MvNormal) for shift $b$ and lengthscale $\rho$ of the latent Gaussian process.

We show below how function pred can be used both for making predictions and calculating the predictive likelihood.

▶ How to make predictions

Having fitted the model to the data, we can now make predictions. We first define the interval over which we want to predict and use pred:

t_test = collect(0:0.1:20);
μpred, σpred = pred(t_test);

Both μpred and σpred are arrays of arrays. The $l$-th inner array refers to predictions for the $l$-th band, e.g. μpred[2] and σpred[2] hold respectively the mean prediction and standard deviation of the $2$-band. We plot the predictions for all bands:

using PyPlot # must be independently installed, other plotting packages can be used instead

colours = ["blue", "orange"] # define colours

for i in 1:2
    plot(t_test, μpred[i], "-", color=colours[i])
    fill_between(t_test, μpred[i] + σpred[i], μpred[i] - σpred[i], color=colours[i], alpha=0.2) # plot uncertainty tube
end

▶ How to calculate log-likelihood on test data

Suppose we want to calculate the log-likelihood on some new data (test data perhaps):

ttest = [[9.0; 10.0; 11.0], [9.0; 10.0; 11.0]]
ytest = [ [6.34, 5.49, 5.38], [13.08, 12.37, 15.69]]
σtest = [[0.34, 0.42, 0.2], [0.87, 0.8, 0.66]]

pred(ttest, ytest, σtest)

▶ Evaluating a set of candidate delays

Given the simulated data, suppose we would like to evaluate the posterior probability of a set of candidate delays. Noting that without loss of generality we can always set the delay of the 1st band equal to zero, we define the following grid of delays:

candidatedelays = collect(0.0:0.2:20)

We use map to run gpcc on all candidate delays as follows:

using GPCC

using PyPlot # we need this for plotting the posterior probabilities, must be independently installed. Other plotting packages can be used instead

tobs, yobs, σobs, truedelays = simulatetwolightcurves();

helper(delay) = gpcc(tobs, yobs, σobs; kernel = GPCC.matern32, delays = [0;delay], iterations = 1000, rhomax = 300)[1] # keep only first output

loglikel = map(helper, candidatedelays)

figure()

plot(candidatedelays, getprobabilities(loglikel))

▶ Evaluating a set of candidate delays in parallel

One can easily parallelise cross-validation on multiple cores by simply replacing map with pmap. Before that, one has to make sure that multiple workers are available:

using Distributed

addprocs(4) # add four workers. Alternatively start Julia with mulitple workers e.g. julia -p 4

@everywhere using GPCC # make sure GPCC is made available to all workers

@everywhere using ProgressMeter, Suppressor # need to be independently installed

using PyPlot # we need this to plot the posterior probabilities, must be independently installed. Other plotting packages can be used instead

candidatedelays = collect(0.0:0.1:20)

tobs, yobs, σobs, truedelays = simulatetwolightcurves();

# macro @showprogress below reports progress of pmap with a progress bar
# macro @suppress below suppresses terminal messages produced by gpcc

loglikel = @showprogress pmap(candidatedelays) do delay

  @suppress gpcc(tobs, yobs, σobs; kernel = GPCC.matern32, delays = [0;delay], iterations = 1000, rhomax = 300)[1] # keep only first output

end

figure()

plot(candidatedelays, getprobabilities(loglikel))

▶ Evaluating a set of candidate delays for 3 light curves

We show an example for calculating the posterior for 3 light curves. Instead of function simulatetwolightcurves, we use function simulatethreelightcurves to generate 3 synthetic light curves. We evaluate the delays using a nested map:

using GPCC

using ProgressMeter, Suppressor # need to be independently installed

using PyPlot # we need this to plot the posterior probabilities, must be independently installed. Other plotting packages can be used instead

candidatedelays = collect(0.5:0.05:6) # use smaller and finer range

tobs, yobs, σobs, truedelays = simulatethreelightcurves();

out = @showprogress map(d2 -> map(d1 -> (@suppress gpcc(tobs, yobs, σobs; kernel = GPCC.matern32, delays = [0;d1;d2], iterations = 1000, rhomax = 300)[1]), candidatedelays), candidatedelays);

posterior = getprobabilities(reduce(vcat, out));

posterior = reshape(posterior, length(candidatedelays), length(candidatedelays));

figure()
subplot(311)
title("joint posterior")
pcolor(candidatedelays,candidatedelays,posterior)
ylabel("lightcurve 2"); xlabel("lightcurve 3")

subplot(312)
title("marginal posterior for lightcurve 2")
plot(candidatedelays,vec(sum(posterior,dims=2)))

subplot(313)
title("marginal posterior for lightcurve 3")
plot(candidatedelays,vec(sum(posterior,dims=1)))

We should obtain a joint posterior and marginal posteriors like the ones plotted below:

The above computation can be parallelised by starting additional workers and replacing the outer map with a pmap.

Name		Name	Last commit message	Last commit date
Latest commit History 327 Commits
.github/workflows		.github/workflows
.vscode		.vscode
src		src
test		test
.gitignore		.gitignore
2Dposterior.png		2Dposterior.png
2Dposterior.svg		2Dposterior.svg
LICENSE		LICENSE
Project.toml		Project.toml
README.md		README.md
delay_vs_prob.png		delay_vs_prob.png
logo.png		logo.png
simulateddata.png		simulateddata.png
simulateddata_predictions.png		simulateddata_predictions.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPCC.jl

ℹ What is this?

💾 Installation

▶ How to simulate data

▶ How to fit a dataset with `gpcc`

▶ How to make predictions

▶ How to calculate log-likelihood on test data

▶ Evaluating a set of candidate delays

▶ Evaluating a set of candidate delays in parallel

▶ Evaluating a set of candidate delays for 3 light curves

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GPCC.jl

ℹ What is this?

💾 Installation

▶ How to simulate data

▶ How to fit a dataset with gpcc

▶ How to make predictions

▶ How to calculate log-likelihood on test data

▶ Evaluating a set of candidate delays

▶ Evaluating a set of candidate delays in parallel

▶ Evaluating a set of candidate delays for 3 light curves

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

▶ How to fit a dataset with `gpcc`

Packages