# K2-24

RadVel and exoplanet both have excellent tutorials where they fit for the two planets b and c of the K2-24 system. We will do the same here to show how to fit RV data in Ravest.

**Links:**  
K2-24 paper (Petigura et al. 2015): https://arxiv.org/abs/1511.04497  
RadVel tutorial: https://radvel.readthedocs.io/en/latest/tutorials/K2-24_Fitting+MCMC.html  
exoplanet tutorial: https://gallery.exoplanet.codes/tutorials/rv/  

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

import ravest.prior
from ravest.fit import Fitter
from ravest.model import calculate_mpsini
from ravest.param import Parameter, Parameterisation

Import the data

In [None]:
url = "https://raw.githubusercontent.com/California-Planet-Search/radvel/master/example_data/epic203771098.csv"
data = pd.read_csv(url, usecols=[1,2,3], names=["errvel", "time", "vel"], skiprows=1)
data

In [None]:
plt.figure(figsize=(15,3.5))
plt.title("K2-24 radial velocity data")
plt.ylabel("Radial Velocity [m/s]")
plt.xlabel("BJD_TDB - 2454833")
plt.errorbar(data["time"], data["vel"], yerr=data["errvel"], marker=".", linestyle="None")
plt.show()

Create a `Fitter` object, and choose which parameterisation to fit with, whether to fix or fit for each parameter, and the initial parameter values. We can fit a circular model by fixing eccentricity $e=0$ (the argument of periapsis $\omega_\star$ is now degenerate and can be fixed at any value, by convention we fix at $\pi/2$.) The reference zero-point time `t0` is used for linear and quadratic trends terms $\dot{\gamma}$ and $\ddot{\gamma}$.

In [None]:
fitter = Fitter(planet_letters=["b","c"], parameterisation=Parameterisation("per k e w tc"))
fitter.add_data(time=data["time"].to_numpy(), 
                vel=data["vel"].to_numpy(), 
                verr=data["errvel"].to_numpy(), 
                t0=2420)

# Construct the params dict
# These values will be used as your initial guess for the fit
params = {"per_b": Parameter(20.8851, "d", fixed=True),
          "k_b": Parameter(10, "m/s", fixed=False),
          "e_b": Parameter(0, "", fixed=True),
          "w_b": Parameter(np.pi/2, "rad", fixed=True),
          "tc_b": Parameter(2072.7948, "d", fixed=True),

          "per_c": Parameter(42.3633, "d", fixed=True),
          "k_c": Parameter(10, "m/s", fixed=False),
          "e_c": Parameter(0, "", fixed=True),
          "w_c": Parameter(np.pi/2, "rad", fixed=True),
          "tc_c": Parameter(2082.6251, "d", fixed=True),
          
          "g": Parameter(0, "m/s", fixed=False),
          "gd": Parameter(0, "m/s/day", fixed=False),
          "gdd": Parameter(0, "m/s/day^2", fixed=True),
          
          "jit": Parameter(0, "m/s", fixed=False),}

fitter.params = params
fitter.params

Define the prior functions for the free parameters. You can see a list of available prior functions at `ravest.prior.PRIOR_FUNCTIONS`.

In [None]:
ravest.prior.PRIOR_FUNCTIONS

In [None]:
# Construct the priors dict. Every parameter that isn't fixed requires a prior.
priors = {
          "k_b": ravest.prior.Uniform(0,20),
          "k_c": ravest.prior.Uniform(0,20),

          "g": ravest.prior.Uniform(-10, 10),
          "gd": ravest.prior.Uniform(-1, 1),
          
          "jit": ravest.prior.Uniform(0, 5),
         }

fitter.priors = priors
fitter.priors

Now that we have loaded the `Fitter` with the data, our parameterisation, our initial parameter values, and priors for each of the free parameters, we can now fit the free parameters of the model to the data.  
  
First, Maximum A Posteriori (MAP) optimisation is performed to find the best-fit solution.

In [None]:
map_results = fitter.find_map_estimate(method="Powell")
map_results

Then, MCMC is used to explore the parameter space and estimate the parameter uncertainties. For the purposes of making this notebook run quickly, this is only running for 10000 steps - you should run considerably more. `ravest` enforces a minimum of at least 2 walkers per each free parameter, again though you should consider running more. You should also consider using randomly initialised starting points, rather than the MAP solution, to better explore the parameter space.

In [None]:
nwalkers = 2 * len(fitter.free_params_dict)
nsteps = 10000

# Fit the free parameters to the data
fitter.run_mcmc(initial_values=map_results.x, nwalkers=nwalkers, nsteps=nsteps, progress=True)  # This will take a few minutes!

Now that the MCMC is finished, the state of the `emcee` sampler has been saved into the `Fitter` object. We can therefore export the posteriro samples, as a `numpy` array that can be passed into other functions (such as for comparing two models by calculating the Bayesian evidence - example notebook coming soon!). We can also export them into a Pandas dataframe, which keeps each parameter labelled. In both cases, we can pass in the `discard`, `thin` and `flat` arguments as desired.

In [None]:
# Get the samples as a numpy array
samples = fitter.get_samples_np(discard_start=0, discard_end=0, thin=1, flat=False) # shape (nsteps, nwalkers, ndim)

# Get the samples as a labelled Pandas dataframe
samples_df = fitter.get_samples_df(discard_start=0, discard_end=0, thin=1)  # shape (nsteps*nwalkers, ndim)
samples_df

To inspect the chains visually, we can plot (and optionally save) the time series of each parameter in the chain.

In [None]:
fitter.plot_chains(discard_start=0, discard_end=0, thin=1, save=False)

We can also visualise the posterior parameter distributions in corner plots, using the `corner` module.

In [None]:
fitter.plot_corner(discard_start=0, discard_end=0, thin=1, save=False)

Inspecting the posteriors, we can see the 16th, 50th and 84th percentiles, which could be used for a quoted value and uncertainty. It's a good idea to inspect the posterior distribution visually with the corner plots though, as they may not always be nice Gaussians, which means those percentiles may not be a good representation. For further analysis and inspection, recall that we can get a dataframe of the samples (e.g. to plot them in a histogram to inspect the distribution closer) by using the `Fitter.get_samples_df()` method that we saw earlier.

Let's inspect how well our fitted solution matches the data. We take every sample in the chain and calculate the resultant RV. Then at every timestep, we can then look at the distribution of all the calculated velocities from each of the samples, and plotting the median, 16th and 84th percentile.
  
Here the chains are being thinned by 100 to ensure this notebook runs quickly. You may want a lower thinning factor.

In [None]:
fitter.plot_posterior_rv(discard_start=0, discard_end=0, thin=100)

In [None]:
fitter.plot_posterior_phase(discard_start=0, discard_end=0, thin=100)

To see the resulting planetary mass estimate $M_p\sin{i}$, we need to know the stellar mass. Using the value $M_*=1.12\pm0.05$ used in Dai et al. 2016, we can generate a 

In [None]:
# Stellar mass values from Dai et al. 2016
mstar_val = 1.12  # [M_sun]
mstar_err = 0.05 # [M_sun]

# Create a distribution of stellar mass values from the published value and uncertainty
mstar = np.random.normal(loc=mstar_val, scale=mstar_err, size=len(samples_df))
# Ensure all values in mstar are positive
while any(mstar <= 0):
    mstar[mstar <= 0] = np.random.normal(loc=mstar_val, scale=mstar_err, size=sum(mstar <= 0))

In [None]:
# get the posterior samples (both free and fixed) as a dictionary
posterior_params = fitter.get_posterior_params_dict(discard_start=0, discard_end=0, thin=1)  # get the fixed value for fixed parameters, get the MCMC samples for the free parameters

# use the MCMC samples and the stellar mass distribution to get a distribution for Mp sin(i)
mpsini_b = calculate_mpsini(mstar, posterior_params["per_b"], posterior_params["k_b"], posterior_params["e_b"], unit="M_earth")
mpsini_c = calculate_mpsini(mstar, posterior_params["per_c"], posterior_params["k_c"], posterior_params["e_c"], unit="M_earth")

# calculate the median and 1-sigma uncertainties
perc_b = np.percentile(mpsini_b, [16, 50, 84])
perc_c = np.percentile(mpsini_c, [16, 50, 84])
print("Planet b Mpsin(i):", perc_b[1], "+", perc_b[1] - perc_b[0], "-", perc_b[2] - perc_b[1])
print("Planet c Mpsin(i):", perc_c[1], "+", perc_c[1] - perc_c[0], "-", perc_c[2] - perc_c[1])

# Plot the mass posteriors for inspection
plt.hist(mpsini_b, bins=15, histtype="step")
plt.hist(mpsini_c, bins=15, histtype="step")
plt.show()

## Eccentric orbits

Let's make a new `Fitter` object and fit for eccentricity. We'll fit in the $\sqrt{e}\cos{\omega_\star}$ and $\sqrt{e}\sin{\omega_\star}$ parameterisation 

In [None]:
# Fit in the sqrt(e) parameterisation
parameterisation_se = Parameterisation("per k secosw sesinw tc")

fitter_se = Fitter(planet_letters=["b","c"], parameterisation=parameterisation_se)
fitter_se.add_data(time=data["time"].to_numpy(), 
                   vel=data["vel"].to_numpy(), 
                   verr=data["errvel"].to_numpy(), 
                   t0=2420)
print(fitter_se.t0)

# Construct the params dict
# These values will be used as your initial guess for the fit
params_se = {"per_b": Parameter(20.8851, "d", fixed=True),
            "k_b": Parameter(10, "m/s", fixed=False),
            "secosw_b": Parameter(0, "", fixed=False),
            "sesinw_b": Parameter(0, "", fixed=False),
            "tc_b": Parameter(2072.7948, "d", fixed=True),

            "per_c": Parameter(42.3633, "d", fixed=True),
            "k_c": Parameter(10, "m/s", fixed=False),
            "secosw_c": Parameter(0, "", fixed=False),
            "sesinw_c": Parameter(0, "", fixed=False),
            "tc_c": Parameter(2082.6251, "d", fixed=True),
            
            "g": Parameter(0, "m/s", fixed=False),
            "gd": Parameter(0, "m/s/day", fixed=False),
            "gdd": Parameter(0, "m/s/day^2", fixed=True),

            "jit": Parameter(0, "m/s", fixed=False),
            }

fitter_se.params = params_se
fitter_se.params

In [None]:
# Construct the priors dict. Every parameter that isn't fixed requires a prior.
priors_se = {
          "k_b": ravest.prior.Uniform(0,50),
          "secosw_b": ravest.prior.Uniform(-np.sqrt(0.8), np.sqrt(0.8)),
          "sesinw_b": ravest.prior.Uniform(-np.sqrt(0.8), np.sqrt(0.8)),

          "k_c": ravest.prior.Uniform(0,50),
          "secosw_c": ravest.prior.Uniform(-np.sqrt(0.8), np.sqrt(0.8)),
          "sesinw_c": ravest.prior.Uniform(-np.sqrt(0.8), np.sqrt(0.8)),

          "g": ravest.prior.Uniform(-10, 10),
          "gd": ravest.prior.Uniform(-0.1, 0.1),
          "jit": ravest.prior.Uniform(0, 5),
        }

fitter_se.priors = priors_se
fitter_se.priors

In [None]:
map_results_se = fitter_se.find_map_estimate(method="Powell")
map_results_se

In [None]:
nwalkers = 2 * len(fitter_se.free_params_dict)
nsteps = 10000

# Fit the free parameters to the data
samples_se = fitter_se.run_mcmc(initial_values=map_results_se.x, nwalkers=nwalkers, nsteps=nsteps, progress=True)  # This will take a while!

In [None]:
# Get the samples as a numpy array
samples_se = fitter_se.get_samples_np(discard_start=0, discard_end=0, thin=1, flat=False) # shape (nsteps, nwalkers, ndim)

# Get the samples as a labelled Pandas dataframe
samples_df_se = fitter_se.get_samples_df(discard_start=0, discard_end=0, thin=1)  # shape (nsteps*nwalkers, ndim)
samples_df_se

In [None]:
fitter_se.plot_chains(discard_start=0, discard_end=0, thin=1, save=False)

In [None]:
fitter_se.plot_corner(discard_start=0, discard_end=0, thin=1, save=False)

In [None]:
samples_df_se

Let's inspect how well our fitted solution matches the data. Again we are thinning the chains just to keep this notebook running quickly.

In [None]:
fitter_se.plot_posterior_rv(discard_start=0, discard_end=0, thin=100)

In [None]:
fitter_se.plot_posterior_phase(discard_start=0, discard_end=0, thin=100)

Let's see how allowing for eccentric orbits affects the mass estimate $M_p\sin{i}$. We will use the same stellar mass from Petigura et al. 2015.

In [None]:
# Stellar mass values from Dai et al. 2016
mstar_val = 1.12  # [M_sun]
mstar_err = 0.05 # [M_sun]

# Create a distribution of stellar mass values from the published value and uncertainty
mstar = np.random.normal(loc=mstar_val, scale=mstar_err, size=len(samples_df_se))
# Ensure all values in mstar are positive
while any(mstar <= 0):
    mstar[mstar <= 0] = np.random.normal(loc=mstar_val, scale=mstar_err, size=sum(mstar <= 0))

In [None]:
# get the posterior samples (both free and fixed) as a dictionary
posterior_params_se = fitter_se.get_posterior_params_dict(discard_start=0, discard_end=0, thin=1)  # get the fixed value for fixed parameters, get the MCMC samples for the free parameters

# we need to convert secosw and sesinw to e and w
# let's do this using the parameterisation class, parameterisation_se
posterior_params_se["e_b"], posterior_params_se["w_b"] = parameterisation_se.convert_secosw_sesinw_to_e_w(posterior_params_se["secosw_b"], posterior_params_se["sesinw_b"])
# and the same for _c
posterior_params_se["e_c"], posterior_params_se["w_c"] = parameterisation_se.convert_secosw_sesinw_to_e_w(posterior_params_se["secosw_c"], posterior_params_se["sesinw_c"])


# use the MCMC samples and the stellar mass distribution to get a distribution for Mp sin(i)

mpsini_b_se = calculate_mpsini(mstar, posterior_params_se["per_b"], posterior_params_se["k_b"], posterior_params_se["e_b"], unit="M_earth")
mpsini_c_se = calculate_mpsini(mstar, posterior_params_se["per_c"], posterior_params_se["k_c"], posterior_params_se["e_c"], unit="M_earth")

# calculate the median and 1-sigma uncertainties
perc_b_se = np.percentile(mpsini_b_se, [16, 50, 84])
perc_c_se = np.percentile(mpsini_c_se, [16, 50, 84])
print("Planet b Mpsin(i):", perc_b_se[1], "+", perc_b_se[1] - perc_b_se[0], "-", perc_b_se[2] - perc_b_se[1])
print("Planet c Mpsin(i):", perc_c_se[1], "+", perc_c_se[1] - perc_c_se[0], "-", perc_c_se[2] - perc_c_se[1])

# Plot the mass posteriors for inspection
plt.hist(mpsini_b_se, bins=15, histtype="step")
plt.hist(mpsini_c_se, bins=15, histtype="step")
plt.show()