Add MLDA stepper #3926

gmingas · 2020-05-18T15:36:13Z

The MLDA stepper

The MLDA (Multi-Level Delayed Acceptance) [1] method employs a hierarchy of chains to sample from a posterior. The top-level chain samples from the fine model posterior (i.e. the most accurate one / the one we are interested in) and the lower level chains sample from coarse model posteriors (i.e. approximations of declining accuracy).

MLDA uses samples generated in one level as proposals for the level above. A chain runs for a fixed number of iterations and then the last sample is used as the proposal for the higher-level chain. The bottom level is a Metropolis sampler, although an algorithm like pCN or adaptive MH is expected to be used in the future.

The stepper is suitable for situations where a model is expensive to evaluate in high spatial or temporal resolution (e.g. realistic subsurface flow models where a PDE needs to be solved in each iteration). In those cases we can use cheaper, lower resolutions as coarse models. If the approximations are sufficiently good, this leads to good quality proposals, high acceptance rates and high ES/sec compared to other methods because only a few expensive fine model solves are needed to achieve the same ESS.

Changes

MLDA is a new ArrayStepShared step method in metropolis.py. Its constructor instantiates step methods for all the levels in the hierarchy.
RecursiveDA is a new Proposal proposal method in metropolis.py. It recursively calls sample to run the hierarchy of chains in multiple levels, using the step methods instantiated by MLDA.init. Note that logging is switched off here to avoid unwanted console output. sample() each time generates a fixed number of samples and then we only keep the last and return control to the level above. We rerun sample() from that point each time control goes back to that level.
A new internal variable called is_mlda_base is added to Metropolis. This is detected within sample() to avoid running reset_tuning(). This is done because we want the Metropolis step in the bottom level to maintain the tuning information across subsequent sample() calls. Note that resetting was introduced by Metropolis chain tuning is differs between single- and multiprocessing #3733. MLDA is not affected by the issue there, as it always uses cores=1 and chains=1 in each level of the hierarchy. A more elegant solution might be the detection of cores and chains and passing the info to _iter_sample to decide if reset_tuning runs.
Tests for MLDA and RecursiveDA and two new models for testing in tests/models.py
A notebook comparing the performance of MLDA with Metropolis for a groundwater flow model (a Bayesian inverse problem setting). The forward model requires a PDE solve which is done using the C-based FEniCS library (see notebook for details). Thus it is also a demonstration of using black box external code in the likelihood. The PR adds all the necessary code to solve the model. The example is also provided in the form of a python script.

Performance

Running the notebook on a MacBook Pro (see specs within), with 3 unknown parameters and model resolution (30, 30) and (120, 120), the resulting performance comparison is the following:

Work in progress

I am trying to find ways to accelerate the method. It seems that there is a lot of overhead from creating/destroying traces and Theano parsing the likelihood when calling sample() multiple times. Stopping and restarting is necessary as I need to switch between chains and continue a chain from where it last left off. Are there faster ways to pause and resume a chain than saving the trace/tuning and calling a new sample() afterwards? Would iter_sample be more efficient in this case?
I am working on adding an adaptive bottom level sampler and also on applying an adaptive likelihood correction technique found in [2]. Is there any work going on adaptive samplers like adaptive MH or pCN within the pymc3 community?
I plan to add one more feature contained in [1] (the variance reduction technique to reduce variance of integral estimates).

Usage

Can be used as any other method but has one positional argument that needs to be provided (coarse_models). See notebook and tests for examples.

References

[1] Dodwell, Tim & Ketelsen, Chris & Scheichl, Robert & Teckentrup, Aretha. (2019). Multilevel Markov Chain Monte Carlo. SIAM Review. 61. 509-545. https://doi.org/10.1137/19M126966X
[2] Cui, Tiangang & Fox, Colin & O'Sullivan, Michael. (2012). Adaptive Error Modelling in MCMC Sampling for Large Scale Inverse Problems.

review-notebook-app · 2020-05-18T15:36:19Z

Check out this pull request on

Review Jupyter notebook visual diffs & provide feedback on notebooks.

Powered by ReviewNB

twiecki · 2020-05-18T16:24:34Z

This looks really cool and massive @gmingas. Thanks for the contribution. I'll wait if someone jumps on checking this out in more detail. Some questions:

What type of feedback are you looking for at this stage? If there is serious work that you still plan to do it makes more sense to wait with a review until you consider it final, as reviewing a PR of this size is quite a time investment.
Could this sampler live outside of PyMC3, as an extension module? A related question is how widely applicable this sampler is, especially compared to NUTS. I suppose this doesn't require gradients which would require a comparison of MH (which you did somewhat) as well as to DE-Metropolis.

pymc3/sampling.py

junpenglao · 2020-05-19T10:46:40Z

Adding @aseyboldt as reviewer as he have been working with PDE.

gmingas · 2020-05-19T16:39:35Z

What type of feedback are you looking for at this stage? If there is serious work that you still plan to do it makes more sense to wait with a review until you consider it final, as reviewing a PR of this size is quite a time investment.

At its current form the stepper is useable and has benefits in the type of problems described above (also see below). There are a few extra features that I would like to add in the near future. These will be enhancements/add-ons rather than refactoring/rewriting so it might be worth reviewing it at its current form. I published this as WIP to see initial reactions and prioritise next steps. I can remove the WIP from the title if this makes sense. In terms of feedback I was thinking about a few things:

Comments/preferences on future development and features from any people familiar with multilevel models and MCMC methods or users of them.
Ways in which the recursive part of the code can be accelerated by avoiding repeated calls to sample() or making them more efficient.
Comments on code style, the example provided, extra tests that might be needed.
Discussion on adaptive enhancements to the algorithm, i.e. what adaptive algorithm to use in the bottom level (is there any work in progress on adaptive method in pymc3?) and how to correct for errors between approximations, e.g. at the moment I am considering the method described in Cui et al. 2012 (link in initial PR post).

Could this sampler live outside of PyMC3, as an extension module? A related question is how widely applicable this sampler is, especially compared to NUTS. I suppose this doesn't require gradients which would require a comparison of MH (which you did somewhat) as well as to DE-Metropolis.

In terms of applicability, it does not require gradients. It has the requirement that you need to provide at least one coarse (approximate) model apart from the main model. Despite that, I think the sampler is general enough to live inside PyMC3, as there are many people working with likelihoods where an approximate/coarse version can be defined: Inverse problems for various physical systems where spatial resolution can vary and PDEs are usually necessary (e.g. subsurface fluid transportation, resistivity/conductivity field of impedance imaging, ultrasound imaging, emission tomography, flow field of ocean circulation). MLDA can reduce the computational cost in those cases. There are also other problems where an approximate model is possible to define (it would be good to find out about possible applications from people). Note also that in many of these problem it might not be easy to get gradients for use with NUTS.

A comparison with DE-Metropolis is needed indeed and I will work on this. Apart from the ES/sec metric, what other metrics would you like to have for benchmarking and scaling exploration?

volpatto · 2020-05-19T16:56:30Z

Sorry for including myself in this discussion (I'm not a PyMC3-dev, but a user), I think this feature is absolutely amazing! This is related to MLMC that Mike Giles has been working on, right? That's fantastic, it's a method that the only lib I know that provides it in a generic way is QUESO. Just for reference, I did a post in PyMC3 discourse about it.

If you are looking for some user feedback, I can try it out in a recent work that I have been done in a COVID-19 research group, with the potential to change our current method (SMC from PyMC3) to your new method. The group has previous experience in using MLMC from QUESO.

And it's nice that you tested it with FEniCS. I use Firedrake in my PhD, which also uses UFL and has almost the same interface. This PR helps me a lot in several ways, thank you for your effort.

twiecki · 2020-05-19T17:00:20Z

@volpatto That's valuable feedback. Certainly if you'd be willing to help test and review it would be very helpful and much appreciated.

volpatto · 2020-05-19T17:11:14Z

@volpatto That's valuable feedback. Certainly if you'd be willing to help test and review it would be very helpful and much appreciated.

Thanks, @twiecki! I don't know well PyMC3 internals, but I think that I can give some help (with my experience as a Python developer).

Actually this is interesting because I have been willing to try to implement MLMC (a basic version, the version in this PR is improved, I need to study it) in PyMC3, and I think that it can be really useful.

gmingas · 2020-05-19T18:13:29Z

pymc3/step_methods/metropolis.py

+        """One MLDA step, given current sample q0"""
+        # Check if the tuning flag has been changed and if yes,
+        # change the proposal's tuning flag and reset self.accepted
+        # This is triggered by iter_sample while the highest-level MLDA step
+        # method is running. It then propagates to all levels.
+        if self.proposal_dist.tune != self.tune:
+            self.proposal_dist.tune = self.tune
+            # set tune in sub-methods of compound stepper explicitly because
+            # it is not set within sample() (only the CompoundStep's tune flag is)
+            if isinstance(self.next_step_method, CompoundStep):
+                for method in self.next_step_method.methods:
+                    method.tune = self.tune
+            self.accepted = 0


Here, if the bottom-level Metropolis methods are within a Compound step, I manually set their tune flags

CompoundStep has a stop_tuning() function that is called by _iter_sample to set tune=False for all the methods contained in it. But the Metropolis class does not have a stop_tuning() function thus the metropolis flags are not set.

If there is no reason not to, should I add a stop_tuning() to Metropolis?

tjdodwell · 2020-05-19T21:30:23Z

@volpatto thanks for your interest in our contribution. Yes it builds on Mike Giles work directly, in his original work on Multilevel Monte Carlo (MLMC), which was just the forward problem. This was then extended to the Bayesian setting, aka. Multilevel Markov Chain Monte Carlo - which can be found here . The multilevel delayed acceptance is very close to this; and the connection is discussed in the MLMCMC paper. Paper to follow soon, but ask if you have any questions.

gmingas · 2020-05-20T10:42:25Z

@volpatto thanks for the comments. It would be great if you could try it out on the COVID work you mentioned. I would be interested in seeing if you notice any unexpected behaviour or bugs and from what extra features and improvements your application would benefit. Also, how performance compares to SMC and QUESO. We could potentially add new examples and tests based on your application.

volpatto · 2020-05-21T14:27:06Z

@volpatto thanks for your interest in our contribution. Yes it builds on Mike Giles work directly, in his original work on Multilevel Monte Carlo (MLMC), which was just the forward problem. This was then extended to the Bayesian setting, aka. Multilevel Markov Chain Monte Carlo - which can be found here . The multilevel delayed acceptance is very close to this; and the connection is discussed in the MLMCMC paper. Paper to follow soon, but ask if you have any questions.

I see! I have been reading this paper (btw, nicely done) in order to have a proper understanding of your method. But as soon as I can I'll come back here to help, hopefully in the current week.

@volpatto thanks for the comments. It would be great if you could try it out on the COVID work you mentioned. I would be interested in seeing if you notice any unexpected behaviour or bugs and from what extra features and improvements your application would benefit. Also, how performance compares to SMC and QUESO. We could potentially add new examples and tests based on your application.

Amazing! Looking forward to do some tests with MLDA. I'll prepare a simplified SIRD/SEIRD model calibration and forward UQ with SMC for us, so we can compare it with MLDA. Would it be helpful if I fork your PyMC3 fork and submit a PR (with the ipynb) pointing to this branch here, @gmingas?

Cheers!

gmingas · 2020-05-21T17:55:24Z

Amazing! Looking forward to do some tests with MLDA. I'll prepare a simplified SIRD/SEIRD model calibration and forward UQ with SMC for us, so we can compare it with MLDA. Would it be helpful if I fork your PyMC3 fork and submit a PR (with the ipynb) pointing to this branch here, @gmingas?

@volpatto Sure, this sounds great if you are willing to do it! Let me know if you don't understand something or if I can help in any other way.

gmingas · 2020-05-21T20:17:05Z

Could this sampler live outside of PyMC3, as an extension module? A related question is how widely applicable this sampler is, especially compared to NUTS. I suppose this doesn't require gradients which would require a comparison of MH (which you did somewhat) as well as to DE-Metropolis.

I have added a comparison with DEMetropolis in the existing notebook here, summary shown below:

I also included a second notebook here with the same comparison done when using blocked sampling for Metropolis and MLDA:
.

michaelosthege · 2020-05-22T10:42:17Z

Hi, I'm currently just on the phone, but skimming through the notebook..
Please note that the effective sample size is not reliable if the chains are not converged.

gmingas · 2020-05-26T11:03:35Z

Hi, I'm currently just on the phone, but skimming through the notebook..
Please note that the effective sample size is not reliable if the chains are not converged.

@michaelosthege it looks like MLDA and DEMetropolis are have r_hat 1.00-1.02 and Metropolis up to 1.06. What is a good threshold to assess convergence?

michaelosthege · 2020-05-26T11:21:26Z

Hi, I'm currently just on the phone, but skimming through the notebook..
Please note that the effective sample size is not reliable if the chains are not converged.

@michaelosthege it looks like MLDA and DEMetropolis are have r_hat 1.00-1.02 and Metropolis up to 1.06. What is a good threshold to assess convergence?

depends.. For the benchmarks, I'd say something like 1.03. The Metropolis trace didn't look that great..

michaelosthege · 2020-05-26T11:25:44Z

The DEMetropolis doesn't perform well with few chains. Try DEMetropolisZ instead - it can really outperform DEMetropolis and parallelizes much better.

It does need enough tuning though. Initializing chains in the MAP often helps. See the example

gmingas · 2020-05-27T16:07:08Z

Also, check this topic in PyMC3 discourse, it is related to the additional feature I am looking to add to perform adaptive error correction in the coarse likelihoods.

gmingas · 2020-05-27T16:09:11Z

The DEMetropolis doesn't perform well with few chains. Try DEMetropolisZ instead - it can really outperform DEMetropolis and parallelizes much better.

depends.. For the benchmarks, I'd say something like 1.03. The Metropolis trace didn't look that great..

Cool, I will do longer runs and compare with DEMetropolisZ too.

gmingas · 2020-06-19T12:19:28Z

Sorry for the big delay, I have now updated this notebook to contain a comparison with DEMCMC-Z and also longer runs. All the r-hats are now very close to 1.0 and the traces look decent. Also, I am initialising all samplers on the true values of the parameters (which should be quite close to the MLE), although I have not seen any big change by doing that.

I am also preparing a second notebook similar to the example that @michaelosthege linked to above to benchmark more model configurations and also demonstrate MLDA's tuning process.

gmingas · 2020-06-23T17:53:27Z

I have also added this notebook which contains some extra benchmarks and a demonstration of how MLDA performance changes with one of the tuning parameters (subsampling_rate).

You are also now allowed to pick a different subsampling rate for each level.

gmingas · 2020-07-09T14:48:12Z

@michaelosthege @aseyboldt I have now removed the WIP from the title as I think this is now ready to be reviewed at its current form.

junpenglao · 2020-07-10T06:43:22Z

@volpatto did you have a chance to test it out?

…le notebook

…benchmarking one)

…or stats and also test for declining acceptance rate

This is done because of the bug described in arviz-devs/arviz#1279. The commit also changes a few parameters in the MLDA .py example to match the ones in the equivalent notebook.

…new MetropolisMLDA class which inherits Metropolis and just alters the reset_tuning function, modify test_step and __init__ files accordingly

…enics and delete reference to example from MLDA class docstring

…ods, fix some stylistic issues

…g point for new users

… data, more comments, performance comparison with Metropolis

…ark, arviz style, imports, etc)

gmingas · 2020-09-14T15:08:51Z

This is odd. Tests are now passing here but still failing in my fork (with the message I posted earlier).

Also, now codecov fails but the results/flagged issues do not make sense (e.g. it says that metropolis.py coverage dropped ~44% but the PR modifies only a few spaces in Metropolis, also some of the lines the tools thinks are not tested are actually tested and there was never a warning before).

twiecki · 2020-09-15T05:37:22Z

Yeah, codecov isn't always that reliable. We'll merge this as I believe those issues you're seeing are unrelated to anything in this PR.

Congrats -- this is a massive contribution and a very high quality PR, thank you guys so much for your contribution!

gmingas · 2020-09-15T08:44:05Z

That's great! Thank you @twiecki and everyone else for the feedback and support, this was a very positive experience for all of us and helped us understand and improve our method and code. We will keep working on improving/enhancing MLDA, you will hear from us again very soon. Cheers!

michaelosthege reviewed May 19, 2020

View reviewed changes

pymc3/sampling.py Outdated Show resolved Hide resolved

michaelosthege reviewed May 19, 2020

View reviewed changes

pymc3/sampling.py Outdated Show resolved Hide resolved

junpenglao assigned aseyboldt May 19, 2020

junpenglao assigned michaelosthege May 19, 2020

gmingas commented May 19, 2020

View reviewed changes

This comment has been minimized.

Sign in to view

gmingas force-pushed the mlda branch from 478162f to 3061ae2 Compare June 23, 2020 17:22

gmingas changed the title ~~[WIP] Add MLDA stepper~~ Add MLDA stepper Jul 9, 2020

gmingas added 25 commits September 14, 2020 14:42

Add MLDA groundwater flow notebook which uses blocked samplers only

ff9f92d

Add comparison with DEMCMC-Z amd longer runs for convergence in examp…

3db97ae

…le notebook

Add separate subsampling rates

e3ab26e

Edit docstring

847938c

Fix handling of one integer for subsampling rate

d5ae292

Add tests and change subsampling argument name in examples/tests

a693dbf

Add notebook with extra benchmarks and tuning demo

74cdf2a

Change subsampling rate to rates in notebook examples (including new …

dfa357f

…benchmarking one)

Do not count stuck proposals from lower chain as accepted

86e7855

Convert base_scaling stats to one stat of type object, modify tests f…

3c6e516

…or stats and also test for declining acceptance rate

Fix pm.DensityDist in notebooks/scripts by replacing with pm.Potential

fb46e00

This is done because of the bug described in arviz-devs/arviz#1279. The commit also changes a few parameters in the MLDA .py example to match the ones in the equivalent notebook.

Move MLDA and RecursiveDAProposal classes to new mlda.py file, add a …

18b91c2

…new MetropolisMLDA class which inherits Metropolis and just alters the reset_tuning function, modify test_step and __init__ files accordingly

Remove MLDA notebooks and .py example script, remove code that uses f…

b439653

…enics and delete reference to example from MLDA class docstring

Blacken MLDA code and tests, separate MLDA type tests from other meth…

1213c09

…ods, fix some stylistic issues

Add simple linear regression notebook which can serve as MLDA startin…

3f42b56

…g point for new users

Modify some comments in the simple notebook and re-run

1ceb4bd

Add typing to mlda.py

2aa19cc

Connect the two separate strings in mlda.py

562e347

Modify simple linear regression notebook: coarse model uses subset of…

4e9d8b3

… data, more comments, performance comparison with Metropolis

Add more comments, format using PyMC3 NB style guide (blacken, waterm…

3108d83

…ark, arviz style, imports, etc)

Add simple regression notebook to table of examples

f0b13da

Add gravity surveying notebook

9125984

Add MLDA line in 3.9.x release notes

9c7b221

Edit notebook imports to adhere to nbqa-isort rules

82307dd

Add gravity notebook to table of contents

b5459c2

gmingas force-pushed the mlda branch from 14ba4c3 to b5459c2 Compare September 14, 2020 13:45

twiecki merged commit 07b584a into pymc-devs:master Sep 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MLDA stepper #3926

Add MLDA stepper #3926

gmingas commented May 18, 2020 •

edited

review-notebook-app bot commented May 18, 2020

twiecki commented May 18, 2020

junpenglao commented May 19, 2020

gmingas commented May 19, 2020

volpatto commented May 19, 2020 •

edited

twiecki commented May 19, 2020

volpatto commented May 19, 2020 •

edited

gmingas May 19, 2020

tjdodwell commented May 19, 2020

gmingas commented May 20, 2020

volpatto commented May 21, 2020 •

edited

gmingas commented May 21, 2020 •

edited

gmingas commented May 21, 2020

michaelosthege commented May 22, 2020

gmingas commented May 26, 2020

michaelosthege commented May 26, 2020

michaelosthege commented May 26, 2020 •

edited

gmingas commented May 27, 2020

gmingas commented May 27, 2020

gmingas commented Jun 19, 2020

This comment has been minimized.

gmingas commented Jun 23, 2020

gmingas commented Jul 9, 2020

junpenglao commented Jul 10, 2020

gmingas commented Sep 14, 2020

twiecki commented Sep 15, 2020

gmingas commented Sep 15, 2020

Add MLDA stepper #3926

Add MLDA stepper #3926

Conversation

gmingas commented May 18, 2020 • edited

The MLDA stepper

Changes

Performance

Work in progress

Usage

References

review-notebook-app bot commented May 18, 2020

twiecki commented May 18, 2020

junpenglao commented May 19, 2020

gmingas commented May 19, 2020

volpatto commented May 19, 2020 • edited

twiecki commented May 19, 2020

volpatto commented May 19, 2020 • edited

gmingas May 19, 2020

Choose a reason for hiding this comment

tjdodwell commented May 19, 2020

gmingas commented May 20, 2020

volpatto commented May 21, 2020 • edited

gmingas commented May 21, 2020 • edited

gmingas commented May 21, 2020

michaelosthege commented May 22, 2020

gmingas commented May 26, 2020

michaelosthege commented May 26, 2020

michaelosthege commented May 26, 2020 • edited

gmingas commented May 27, 2020

gmingas commented May 27, 2020

gmingas commented Jun 19, 2020

This comment has been minimized.

gmingas commented Jun 23, 2020

gmingas commented Jul 9, 2020

junpenglao commented Jul 10, 2020

gmingas commented Sep 14, 2020

twiecki commented Sep 15, 2020

gmingas commented Sep 15, 2020

gmingas commented May 18, 2020 •

edited

volpatto commented May 19, 2020 •

edited

volpatto commented May 19, 2020 •

edited

volpatto commented May 21, 2020 •

edited

gmingas commented May 21, 2020 •

edited

michaelosthege commented May 26, 2020 •

edited