New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stochastic Gradient Hamiltonian Monte Carlo #1958

Closed
shkr opened this Issue Mar 27, 2017 · 15 comments

Comments

Projects
None yet
7 participants
@shkr
Copy link
Contributor

shkr commented Mar 27, 2017

I was wondering if this is part of the roadmap, or if anyone is working on this implementation of Hamiltonian MH Sampler ?

If not then, I can work on the implementation and submit a PR.

https://arxiv.org/pdf/1402.4102.pdf

@twiecki

This comment has been minimized.

Copy link
Member

twiecki commented Mar 27, 2017

@shkr Yes, that one has been of interest. @jsalvatier had some thoughts too, he might chime in. This paper was also relevant in that regard: http://aad.informatik.uni-freiburg.de/papers/16-NIPS-BOHamiANN.pdf

@shkr

This comment has been minimized.

Copy link
Contributor

shkr commented Mar 27, 2017

Thanks, thats very helpful! Skimmed through that paper estimation of the hyper parameters introduced in the original paper will make the sampler robust and user-friendly.

@jsalvatier

This comment has been minimized.

Copy link
Member

jsalvatier commented Mar 27, 2017

It's important to note that Stochastic Gradient HMC unfortunately doesn't preserve the detailed balance.

It may still be quite useful because it will be a scalable way of getting close to the region of high probability.

I suspect that Stochastic Gradient Langevin Dynamics does preserve the detailed balance, though I haven't checked.

Langevin Dynamics only have a minor scaling penalty relative to HMC (O(n^1.33) vs O(n^1.25) ).

@twiecki

This comment has been minimized.

Copy link
Member

twiecki commented Mar 27, 2017

@jsalvatier Good points.

@shkr Perhaps it's better to start with SGLD or SG Fisher Information (Max Welling).

@shkr

This comment has been minimized.

Copy link
Contributor

shkr commented Mar 28, 2017

This seems to be the latest (and comprehensive) paper on SGLD with http://people.ee.duke.edu/~lcarin/782.pdf - I am referring to this now for implementation, instead of the earlier paper given the evidence

@asifzubair

This comment has been minimized.

Copy link

asifzubair commented Mar 29, 2017

Actually, this paper - https://arxiv.org/abs/1506.04696 - nicely summarizes all stochastic gradient based approaches.

@shkr

This comment has been minimized.

Copy link
Contributor

shkr commented Mar 31, 2017

I have read through the two papers. IMO there is much in common w/ the implementation of SGLD and SGFS enough that they can share a base call. I have a WIP PR submitted (it is not close at all to completion).

Any pointers on how (and where) to handle the batching of the data from observed variables ?

Currently, I am looking at:

https://github.com/pymc-devs/pymc3/blob/250e2f81a19c38a88b38be5cfef7a6c212890b1a/pymc3/tests/test_advi.py

and

https://github.com/pymc-devs/pymc3/blob/master/pymc3/variational/advi_minibatch.py

for reference

@ferrine

This comment has been minimized.

Copy link
Member

ferrine commented Apr 4, 2017

Minibatch can be handled via callback or experimental pm.generator. You can use https://github.com/ferrine/pymc3/blob/master/pymc3/tests/test_variational_inference.py#L166 as a reference

@twiecki

This comment has been minimized.

Copy link
Member

twiecki commented Apr 4, 2017

@asifzubair

This comment has been minimized.

Copy link

asifzubair commented May 10, 2017

Hi Folks,

some great discussion here. for completeness, i just wanted to add Betancourt's paper here - http://proceedings.mlr.press/v37/betancourt15.pdf which raises some concern for stochastic gradient methods. I thought it would be nice to be mindful of it.

Thanks!

@shkr

This comment has been minimized.

Copy link
Contributor

shkr commented May 11, 2017

@asifzubair Thanks for the reference. I am running into scaling issue with the number of parameters in the current implementation of SGFS in the PR #1977, with @twiecki CNN example problem using lasagne.

Do you have recommendations for other models w/ smaller set of parameters to test it on ?

@shkr

This comment has been minimized.

Copy link
Contributor

shkr commented May 27, 2017

@twiecki I will try those this weekend

@philipperemy

This comment has been minimized.

Copy link

philipperemy commented Aug 14, 2017

@shkr any updates?

@junpenglao

This comment has been minimized.

Copy link
Member

junpenglao commented Aug 14, 2017

@philipperemy SGMCMC is already implemented by @shkr in pymc3 ;-)
You can have a look of an example here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment