-
Notifications
You must be signed in to change notification settings - Fork 54
/
intro-to-pymc3.py
403 lines (341 loc) · 17.7 KB
/
intro-to-pymc3.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
# -*- coding: utf-8 -*-
# ---
# jupyter:
# jupytext:
# text_representation:
# extension: .py
# format_name: light
# format_version: '1.5'
# jupytext_version: 1.7.1
# kernelspec:
# display_name: Python 3
# language: python
# name: python3
# ---
# + nbsphinx="hidden"
# %matplotlib inline
# + nbsphinx="hidden"
# %run notebook_setup
# -
# # A quick intro to PyMC3 for exoplaneteers
# [Hamiltonian Monte Carlo (HMC)](https://en.wikipedia.org/wiki/Hamiltonian_Monte_Carlo) methods haven't been widely used in astrophysics, but they are the standard methods for probabilistic inference using Markov chain Monte Carlo (MCMC) in many other fields.
# *exoplanet* is designed to provide the building blocks for fitting many exoplanet datasets using this technology, and this tutorial presents some of the basic features of the [PyMC3](https://docs.pymc.io/) modeling language and inference engine.
# The [documentation for PyMC3](https://docs.pymc.io/) includes many other tutorials that you should check out to get more familiar with the features that are available.
#
# In this tutorial, we will go through two simple examples of fitting some data using PyMC3.
# The first is the classic fitting a line to data with unknown error bars, and the second is a more relevant example where we fit a radial velocity model to the public radial velocity observations of [51 Peg](https://en.wikipedia.org/wiki/51_Pegasi).
# You can read more about fitting lines to data [in the bible of line fitting](https://arxiv.org/abs/1008.4686) and you can see another example of fitting the 51 Peg data using HMC (this time using [Stan](http://mc-stan.org)) [here](https://dfm.io/posts/stan-c++/).
#
# ## Hello world (AKA fitting a line to data)
#
# My standard intro to a new modeling language or inference framework is to fit a line to data.
# So. Let's do that with PyMC3.
#
# To start, we'll generate some fake data using a linear model.
# Feel free to change the random number seed to try out a different dataset.
# +
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42)
true_m = 0.5
true_b = -1.3
true_logs = np.log(0.3)
x = np.sort(np.random.uniform(0, 5, 50))
y = true_b + true_m * x + np.exp(true_logs) * np.random.randn(len(x))
plt.plot(x, y, ".k")
plt.ylim(-2, 2)
plt.xlabel("x")
plt.ylabel("y")
# -
# To fit a model to these data, our model will have 3 parameters: the slope $m$, the intercept $b$, and the log of the uncertainty $\log(\sigma)$.
# To start, let's choose broad uniform priors on these parameters:
#
# $$
# \begin{eqnarray}
# p(m) &=& \left\{\begin{array}{ll}
# 1/10 & \mathrm{if}\,-5 < m < 5 \\
# 0 & \mathrm{otherwise} \\
# \end{array}\right. \\
# p(b) &=& \left\{\begin{array}{ll}
# 1/10 & \mathrm{if}\,-5 < b < 5 \\
# 0 & \mathrm{otherwise} \\
# \end{array}\right. \\
# p(\log(\sigma)) &=& \left\{\begin{array}{ll}
# 1/10 & \mathrm{if}\,-5 < b < 5 \\
# 0 & \mathrm{otherwise} \\
# \end{array}\right.
# \end{eqnarray}
# $$
#
# Then, the log-likelihood function will be
#
# $$
# \log p(\{y_n\}\,|\,m,\,b,\,\log(\sigma)) = -\frac{1}{2}\sum_{n=1}^N \left[\frac{(y_n - m\,x_n - b)^2}{\sigma^2} + \log(2\,\pi\,\sigma^2)\right]
# $$
#
# [**Note:** the second normalization term is needed in this model because we are fitting for $\sigma$ and the second term is *not* a constant.]
#
# Another way of writing this model that might not be familiar is the following:
#
# $$
# \begin{eqnarray}
# m &\sim& \mathrm{Uniform}(-5,\,5) \\
# b &\sim& \mathrm{Uniform}(-5,\,5) \\
# \log(\sigma) &\sim& \mathrm{Uniform}(-5,\,5) \\
# y_n &\sim& \mathrm{Normal}(m\,x_n+b,\,\sigma)
# \end{eqnarray}
# $$
#
# This is the way that a model like this is often defined in statistics and it will be useful when we implement out model in PyMC3 so take a moment to make sure that you understand the notation.
#
# Now, let's implement this model in PyMC3.
# The documentation for the distributions available in PyMC3's modeling language can be [found here](https://docs.pymc.io/api/distributions/continuous.html) and these will come in handy as you go on to write your own models.
# +
import pymc3 as pm
with pm.Model() as model:
# Define the priors on each parameter:
m = pm.Uniform("m", lower=-5, upper=5)
b = pm.Uniform("b", lower=-5, upper=5)
logs = pm.Uniform("logs", lower=-5, upper=5)
# Define the likelihood. A few comments:
# 1. For mathematical operations like "exp", you can't use
# numpy. Instead, use the mathematical operations defined
# in "pm.math".
# 2. To condition on data, you use the "observed" keyword
# argument to any distribution. In this case, we want to
# use the "Normal" distribution (look up the docs for
# this).
pm.Normal("obs", mu=m * x + b, sd=pm.math.exp(logs), observed=y)
# This is how you will sample the model. Take a look at the
# docs to see that other parameters that are available.
trace = pm.sample(draws=1000, tune=1000, chains=2, cores=2)
# -
# Now since we now have samples, let's make some diagnostic plots.
# The first plot to look at is the "traceplot" implemented in PyMC3.
# In this plot, you'll see the marginalized distribution for each parameter on the left and the trace plot (parameter value as a function of step number) on the right.
# In each panel, you should see two lines with different colors.
# These are the results of different independent chains and if the results are substantially different in the different chains then there is probably something going wrong.
# +
import arviz as az
with model:
az.plot_trace(trace, var_names=["m", "b", "logs"])
# -
# It's also good to quantify that "looking substantially different" argument.
# This is implemented in PyMC3 as the "summary" function.
# In this table, some of the key columns to look at are `n_eff` and `Rhat`.
# * `n_eff` shows an estimate of the number of effective (or independent) samples for that parameter. In this case, `n_eff` should probably be around 500 per chain (there should have been 2 chains run).
# * `Rhat` shows the [Gelman–Rubin statistic](https://docs.pymc.io/api/diagnostics.html#pymc3.diagnostics.gelman_rubin) and it should be close to 1.
with model:
summary = az.summary(trace, var_names=["m", "b", "logs"])
summary
# The last diagnostic plot that we'll make here is the [corner plot made using corner.py](https://corner.readthedocs.io).
# The easiest way to do this using PyMC3 is to first convert the trace to a [Pandas DataFrame](https://pandas.pydata.org/) and then pass that to `corner.py`.
# +
import corner # https://corner.readthedocs.io
names = ["m", "b", "logs"]
_ = corner.corner(
trace,
var_names=names,
truths=dict(zip(names, [true_m, true_b, true_logs])),
)
# -
# **Extra credit:** Here are a few suggestions for things to try out while getting more familiar with PyMC3:
#
# 1. Try initializing the parameters using the `testval` argument to the distributions. Does this improve performance in this case? It will substantially improve performance in more complicated examples.
# 2. Try changing the priors on the parameters. For example, try the "uninformative" prior [recommended by Jake VanderPlas on his blog](http://jakevdp.github.io/blog/2014/06/14/frequentism-and-bayesianism-4-bayesian-in-python/#Prior-on-Slope-and-Intercept).
# 3. What happens as you substantially increase or decrease the simulated noise? Does the performance change significantly? Why?
#
# ## A more realistic example: radial velocity exoplanets
#
# While the above example was cute, it doesn't really fully exploit the power of PyMC3 and it doesn't really show some of the real issues that you will face when you use PyMC3 as an astronomer.
# To get a better sense of how you might use PyMC3 in Real Life™, let's take a look at a more realistic example: fitting a Keplerian orbit to radial velocity observations.
#
# One of the key aspects of this problem that I want to highlight is the fact that PyMC3 (and the underlying model building framework [Theano (recently renamed to Aesara)](https://aesara.readthedocs.io/)) don't have out-of-the-box support for the root-finding that is required to solve Kepler's equation.
# As part of the process of computing a Keplerian RV model, we must solve the equation:
#
# $$
# M = E - e\,\sin E
# $$
#
# for the eccentric anomaly $E$ given some mean anomaly $M$ and eccentricity $e$.
# There are commonly accepted methods of solving this equation using [Newton's method](https://en.wikipedia.org/wiki/Newton%27s_method), but if we want to expose that to PyMC3, we have to define a [custom Theano operation](https://aesara.readthedocs.io/en/latest/extending/index.html) with a custom gradient.
# I won't go into the details of the math (because [I blogged about it](https://dfm.io/posts/stan-c++/)) and I won't go into the details of the implementation (because [you can take a look at it on GitHub](https://github.com/exoplanet-dev/exoplanet/tree/main/exoplanet/theano_ops/kepler.py)).
# So, for this tutorial, we'll use the custom Kepler solver that is implemented as part of *exoplanet* and fit the publicly available radial velocity observations of the famous exoplanetary system 51 Peg using PyMC3.
#
# First, we need to download the data from the exoplanet archive:
# +
import requests
import pandas as pd
import matplotlib.pyplot as plt
# Download the dataset from the Exoplanet Archive:
url = "https://exoplanetarchive.ipac.caltech.edu/data/ExoData/0113/0113357/data/UID_0113357_RVC_001.tbl"
r = requests.get(url)
if r.status_code != requests.codes.ok:
r.raise_for_status()
data = np.array(
[
l.split()
for l in r.text.splitlines()
if not l.startswith("\\") and not l.startswith("|")
],
dtype=float,
)
t, rv, rv_err = data.T
t -= np.mean(t)
# Plot the observations "folded" on the published period:
# Butler et al. (2006) https://arxiv.org/abs/astro-ph/0607493
lit_period = 4.230785
plt.errorbar(
(t % lit_period) / lit_period, rv, yerr=rv_err, fmt=".k", capsize=0
)
plt.xlim(0, 1)
plt.ylim(-110, 110)
plt.annotate(
"period = {0:.6f} days".format(lit_period),
xy=(1, 0),
xycoords="axes fraction",
xytext=(-5, 5),
textcoords="offset points",
ha="right",
va="bottom",
fontsize=12,
)
plt.ylabel("radial velocity [m/s]")
_ = plt.xlabel("phase")
# -
# Now, here's the implementation of a radial velocity model in PyMC3.
# Some of this will look familiar after the Hello World example, but things are a bit more complicated now.
# Take a minute to take a look through this and see if you can follow it.
# There's a lot going on, so I want to point out a few things to pay attention to:
#
# 1. All of the mathematical operations (for example `exp` and `sqrt`) are being performed using Theano instead of NumPy.
# 2. All of the parameters have initial guesses provided. This is an example where this makes a big difference because some of the parameters (like period) are very tightly constrained.
# 3. Some of the lines are wrapped in `Deterministic` distributions. This can be useful because it allows us to track values as the chain progresses even if they're not parameters. For example, after sampling, we will have a sample for `bkg` (the background RV trend) for each step in the chain. This can be especially useful for making plots of the results.
# 4. Similarly, at the end of the model definition, we compute the RV curve for a single orbit on a fine grid. This can be very useful for diagnosing fits gone wrong.
# 5. For parameters that specify angles (like $\omega$, called `w` in the model below), it can be inefficient to sample in the angle directly because of the fact that the value wraps around at $2\pi$. Instead, it can be better to sample the unit vector specified by the angle or as a parameter in a unit disk, when combined with eccentricity. In practice, this can be achieved by sampling a 2-vector from an isotropic Gaussian and normalizing the components by the norm. These are implemented in the [pymc3-ext](https://github.com/exoplanet-dev/pymc3-ext) package.
# +
import pymc3_ext as pmx
import aesara_theano_fallback.tensor as tt
import exoplanet as xo
with pm.Model() as model:
# Parameters
logK = pm.Uniform(
"logK",
lower=0,
upper=np.log(200),
testval=np.log(0.5 * (np.max(rv) - np.min(rv))),
)
logP = pm.Uniform(
"logP", lower=0, upper=np.log(10), testval=np.log(lit_period)
)
phi = pm.Uniform("phi", lower=0, upper=2 * np.pi, testval=0.1)
# Parameterize the eccentricity using:
# h = sqrt(e) * sin(w)
# k = sqrt(e) * cos(w)
hk = pmx.UnitDisk("hk", testval=np.array([0.01, 0.01]))
e = pm.Deterministic("e", hk[0] ** 2 + hk[1] ** 2)
w = pm.Deterministic("w", tt.arctan2(hk[1], hk[0]))
rv0 = pm.Normal("rv0", mu=0.0, sd=10.0, testval=0.0)
rvtrend = pm.Normal("rvtrend", mu=0.0, sd=10.0, testval=0.0)
# Deterministic transformations
n = 2 * np.pi * tt.exp(-logP)
P = pm.Deterministic("P", tt.exp(logP))
K = pm.Deterministic("K", tt.exp(logK))
cosw = tt.cos(w)
sinw = tt.sin(w)
t0 = (phi + w) / n
# The RV model
bkg = pm.Deterministic("bkg", rv0 + rvtrend * t / 365.25)
M = n * t - (phi + w)
# This is the line that uses the custom Kepler solver
f = xo.orbits.get_true_anomaly(M, e + tt.zeros_like(M))
rvmodel = pm.Deterministic(
"rvmodel", bkg + K * (cosw * (tt.cos(f) + e) - sinw * tt.sin(f))
)
# Condition on the observations
pm.Normal("obs", mu=rvmodel, sd=rv_err, observed=rv)
# Compute the phased RV signal
phase = np.linspace(0, 1, 500)
M_pred = 2 * np.pi * phase - (phi + w)
f_pred = xo.orbits.get_true_anomaly(M_pred, e + tt.zeros_like(M_pred))
rvphase = pm.Deterministic(
"rvphase", K * (cosw * (tt.cos(f_pred) + e) - sinw * tt.sin(f_pred))
)
# -
# In this case, I've found that it is useful to first optimize the parameters to find the "maximum a posteriori" (MAP) parameters and then start the sampler from there.
# This is useful here because MCMC is not designed to *find* the maximum of the posterior; it's just meant to sample the shape of the posterior.
# The performance of all MCMC methods can be really bad when the initialization isn't good (especially when some parameters are very well constrained).
# To find the maximum a posteriori parameters using PyMC3, you can use the `optimize` function from [pymc3-ext](https://github.com/exoplanet-dev/pymc3-ext):
with model:
map_params = pmx.optimize()
# Let's make a plot to check that this initialization looks reasonable.
# In the top plot, we're looking at the RV observations as a function of time with the initial guess for the long-term trend overplotted in blue.
# In the lower panel, we plot the "folded" curve where we have wrapped the observations onto the best-fit period and the prediction for a single overplotted in orange.
# If this doesn't look good, try adjusting the initial guesses for the parameters and see if you can get a better fit.
#
# **Exercise:** Try changing the initial guesses for the parameters (as specified by the `testval` argument) and see how sensitive the results are to these values. Are there some parameters that are less important? Why is this?
# +
fig, axes = plt.subplots(2, 1, figsize=(8, 8))
period = map_params["P"]
ax = axes[0]
ax.errorbar(t, rv, yerr=rv_err, fmt=".k")
ax.plot(t, map_params["bkg"], color="C0", lw=1)
ax.set_ylim(-110, 110)
ax.set_ylabel("radial velocity [m/s]")
ax.set_xlabel("time [days]")
ax = axes[1]
ax.errorbar(t % period, rv - map_params["bkg"], yerr=rv_err, fmt=".k")
ax.plot(phase * period, map_params["rvphase"], color="C1", lw=1)
ax.set_ylim(-110, 110)
ax.set_ylabel("radial velocity [m/s]")
ax.set_xlabel("phase [days]")
plt.tight_layout()
# -
# Now let's sample the posterior starting from our MAP estimate (here we're using the sampling routine from [pymc3-ext](https://github.com/exoplanet-dev/pymc3-ext) which wraps the PyMC3 function with some better defaults).
with model:
trace = pmx.sample(
draws=2000,
tune=1000,
start=map_params,
chains=2,
cores=2,
target_accept=0.95,
)
# As above, it's always a good idea to take a look at the summary statistics for the chain.
# If everything went as planned, there should be more than 1000 effective samples per chain and the Rhat values should be close to 1.
# (Not too bad for less than 30 seconds of run time!)
# +
import arviz as az
with model:
summary = az.summary(
trace,
var_names=["logK", "logP", "phi", "e", "w", "rv0", "rvtrend"],
)
summary
# -
# Similarly, we can make the corner plot again for this model.
_ = corner.corner(trace, var_names=["K", "P", "e", "w"])
# Finally, the last plot that we'll make here is of the posterior predictive density.
# In this case, this means that we want to look at the distribution of predicted models that are consistent with the data.
# As above, the top plot shows the raw observations as black error bars and the RV trend model is overplotted in blue.
# But, this time, the blue line is actually composed of 25 lines that are samples from the posterior over trends that are consistent with the data.
# In the bottom panel, the orange lines indicate the same 25 posterior samples for the RV curve of one orbit.
# +
fig, axes = plt.subplots(2, 1, figsize=(8, 8))
period = map_params["P"]
ax = axes[0]
ax.errorbar(t, rv, yerr=rv_err, fmt=".k")
ax.set_ylabel("radial velocity [m/s]")
ax.set_xlabel("time [days]")
ax = axes[1]
ax.errorbar(t % period, rv - map_params["bkg"], yerr=rv_err, fmt=".k")
ax.set_ylabel("radial velocity [m/s]")
ax.set_xlabel("phase [days]")
for i in np.random.randint(len(trace) * trace.nchains, size=25):
axes[0].plot(t, trace["bkg"][i], color="C0", lw=1, alpha=0.3)
axes[1].plot(
phase * period, trace["rvphase"][i], color="C1", lw=1, alpha=0.3
)
axes[0].set_ylim(-110, 110)
axes[1].set_ylim(-110, 110)
plt.tight_layout()