Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MCMCSamples methods for burn-in removal and Gelman--Rubin statistic #258

Merged
merged 23 commits into from
Feb 9, 2023

Conversation

lukashergt
Copy link
Collaborator

@lukashergt lukashergt commented Feb 8, 2023

This PR introduces new MCMCSamples methods that are particular to MCMC chains:

  • remove_burn_in: discards the first few samples
  • Gelman_Rubin: computes the Gelman--Rubin Rminus1 convergence statistic

Fixes #219

Checklist:

  • I have performed a self-review of my own code
  • My code is PEP8 compliant (flake8 anesthetic tests)
  • My code contains compliant docstrings (pydocstyle --convention=numpy anesthetic)
  • New and existing unit tests pass locally with my changes (python -m pytest)
  • I have added tests that prove my fix is effective or that my feature works

For the upcoming additions to the `MCMCSamples` class of a
`remove_burn_in` and a `Gelman_Rubin` method we need good example data
that actually visibly require the removal of a burn in. The new example
data was generated with an artificially low proposal scale that ensures
a slow and hence visible burn-in process.
@lukashergt lukashergt added the enhancement New feature or request label Feb 8, 2023
@lukashergt lukashergt self-assigned this Feb 8, 2023
@codecov
Copy link

codecov bot commented Feb 8, 2023

Codecov Report

Merging #258 (7ef7fba) into master (0aeaded) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##            master      #258   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           29        28    -1     
  Lines         2422      2442   +20     
=========================================
+ Hits          2422      2442   +20     
Impacted Files Coverage Δ
anesthetic/read/chain.py 100.00% <100.00%> (ø)
anesthetic/read/cobaya.py 100.00% <100.00%> (ø)
anesthetic/read/getdist.py 100.00% <100.00%> (ø)
anesthetic/samples.py 100.00% <100.00%> (ø)
anesthetic/weighted_pandas.py 100.00% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@lukashergt
Copy link
Collaborator Author

With the new remove_burn_in method, we could now ask ourselves whether we would like to make this the only way of getting rid of burn-in samples, i.e. remove the burn_in kwargs in the readers and remove the read/utils.py file. This would be a major change, so should be part of the 2.0.0 milestone.

I'd be happy to handle it that way. That makes it quite clear that always the full data is read in. And if you would like to get rid of burn-in samples, then that should be an active part of post-processing. It can even be done within the same line as the chain reading: mcmc = read_chain("root").remove_burn_in(100).

Thoughts, @williamjameshandley?

@lukashergt
Copy link
Collaborator Author

I'm running into new CI issues (unrelated to the changes of this PR...), making me think that it might be better to turn off parallel tests again...

anesthetic/samples.py Outdated Show resolved Hide resolved
@williamjameshandley
Copy link
Collaborator

I'm running into new CI issues (unrelated to the changes of this PR...), making me think that it might be better to turn off parallel tests again...

In the interests of sanity I have turned this off e9b0749

Copy link
Collaborator

@williamjameshandley williamjameshandley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work! I presume this is functionality you've tested in the wild a little?

With the new remove_burn_in method, we could now ask ourselves whether we would like to make this the only way of getting rid of burn-in samples, i.e. remove the burn_in kwargs in the readers and remove the read/utils.py file. This would be a major change, so should be part of the 2.0.0 milestone.
I'd be happy to handle it that way. That makes it quite clear that always the full data is read in. And if you would like to get rid of burn-in samples, then that should be an active part of post-processing. It can even be done within the same line as the chain reading: mcmc = read_chain("root").remove_burn_in(100).

Providing this is clear in the examples then this is indeed conceptually neater. We would need to put in a temporary error which informs users in the initial release in how to update their code.

You should go ahead and remove the corresponding code for burn in at read-time

anesthetic/samples.py Outdated Show resolved Hide resolved
anesthetic/samples.py Outdated Show resolved Hide resolved
anesthetic/samples.py Show resolved Hide resolved
tests/test_samples.py Outdated Show resolved Hide resolved
* Allow negative `burn_in` inputs, which specify the last samples to
  keep, as opposed to positive `burn_in` inputs, which specify the first
  samples to remove.
* Fix list/array input to `burn_in`, to work for both `0<abs(burn_in)<1`
  and `1<abs(burn_in)`, matching the behaviour with scalar input.
@lukashergt
Copy link
Collaborator Author

Excellent work! I presume this is functionality you've tested in the wild a little?

Yes, I played a bit with this for the BeyondPlanck stuff.

Providing this is clear in the examples then this is indeed conceptually neater. We would need to put in a temporary error which informs users in the initial release in how to update their code.

You should go ahead and remove the corresponding code for burn in at read-time

Done in 0c65b44.

Copy link
Collaborator

@williamjameshandley williamjameshandley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good -- please squash and merge!

anesthetic/read/chain.py Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

remove_burn_in function for MCMCSamples
2 participants