Skip to content

Getting started

João Faria edited this page Jun 4, 2018 · 22 revisions

Let's start by briefly describing the organization of kima.
If you want to skip this section and just run the code, scroll down.

Submodules

  • /DNest4
    This folder links back to DNest4, a C++11 implementation of Diffusive Nested Sampling.
    See Brewer et al. 2011 and Brewer & Foreman-Mackey 2016 for references.
    This is the MCMC algorithm that kima uses to sample the posterior distribution for the model parameters.

  • /eigen
    Eigen is a C++ library for linear algebra.
    We use it to invert the covariance matrix in models with Gaussian processes

Examples

The /examples directory contains a few test cases on how to work with kima. These can be run from the terminal, simply typing make on each directory to compile everything (or make examples from the kima directory) followed by ./run to do the sampling. We will take a closer look at each one of them in the next section and later see how to analyze the results.

pykima

The /pykima directory contains a Python package which can be used to analyse the results obtained with kima. A more detailed explanation of what this package can do will be given in one of the next sections.

The source!

The /src folder contains several files, the most important of which are:

  • RVmodel.cpp contains the implementation of the Keplerian model, and the GP covariance function.
    kima was created to analyse RV data with a model composed of a sum-of-Keplerians and correlated noise. This source file implements: a solver for Kepler's equation, the function that builds the covariance matrix using a quasi-periodic kernel, the calculation of the log-likelihood, and a few methods necessary for DNest4.

  • RVConditionalPrior.cpp implements the priors and proposals for the orbital parameters.
    kima can also use another level of hyperparameters which control the orbital period and the semi-amplitudes, as was done in Brewer & Donovan (2015) and Faria et al. (2016).

  • main.cpp sets the most important options for a model.
    First off, it includes the default priors: see this page for more information.
    A number of boolean variables are then set as options for the model:

    const bool obs_after_HARPS_fibers = false; 
    const bool GP = false;
    const bool hyperpriors = false;
    const bool trend = false;
    

    The first option is specific for HARPS observations, and should be set to true if you have observations after the HARPS change of optical fibers. kima will adjust an offset between observations taken before and after this change.
    The GP option defines if you want to use a Gaussian process model (= true) or a standard sum-of-Keplerians model (= false).
    The third option allows you to choose if the planet parameters should be defined conditionally on hyperparameters, as in Brewer & Donovan (2015) and Faria et al. (2016).
    The last option allows for a linear trend to be included in the model.

    The RVmodel constructor

    RVmodel::RVmodel()
    :planets(5, 1, true, RVConditionalPrior())
    

    sets the number of parameters per planet (this is always equal to 5, and should not be changed), the maximum number of planets (in this case set to 1), and whether of not the number of planets is fixed (in this case yes). With these options, kima will sample the posterior distributions for the 5 parameters of a 1-planet model.

    Inside the RVmodel constructor, you can define extra priors which are data-dependent. For example, the prior for the systemic velocity is usually defined between the minimum and maximum observed RV.

    Calling the save_setup() function at the end of the constructor will save a file with information about the current kima model, to be read by pykima when analysing the results.

    To finish, the main function is where the file with the RV data can be set, using the datafile variable. The load function will read this file, assuming the RVs are in km/s and skipping 0 rows in the header. The second and third arguments to load can be changed to "ms" and the number of header rows to skip, respectively.

kima/OPTIONS

This file is quite important, so there's a page just for it.


Running kima

Ok, we're ready to start sampling!
The first step (you probably did this already) is to go to the kima main directory and compile everything using

make

To run the sampler, type

./run

By default this script starts DNest4 with 4 threads, runs the model, and then prints how much time it took. When it's finished, a few .txt files should have been created in main directory.

  • levels.txt contains information about the levels created during the run. The first column is the enclosed prior mass in units of nats, the second column contains the log likelihood of each level, the third column gives a tiebreaker value for the levels, the forth columns has the number of accepted proposals of the MCMC, and the fifth column contains the total number of proposals in the level.

  • sample.txt contains samples from the mixture of constrained distributions. It does not contain posterior samples! To get those, we need to assign weights to the samples and this is done in a postprocessing step (see Brewer, 2015).

  • sample_info.txt contains extra information about the samples from sample.txt. The first column is the level corresponding to that sample, the second column is the value of the log-likelihood for each sample, the third a likelihood tiebreaker, and the fourth column informs you o the thread the particle was in.

And now we're ready to analyze the results.