Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Particle samplers and traces for Emcee #1689

Closed

Conversation

philastrophist
Copy link
Contributor

@philastrophist philastrophist commented Jan 20, 2017

So one of the major gripes I have with pymc3 is that I couldn't use the wonderful emcee from @dfm's emcee sampler very easily with the flexible model specification that pymc3 has. So I've added a generic particle sampler and an emcee sampler along with a trace to be used for particles.

This is somewhat hacky, but it works. There are obvious improvements to be made and a modified emcee sampler could be built in to avoid those pesky reshapes!

However, I have a limited knowledge of the workings of pymc3 despite writing this and so I'm not sure how to incorporate initialisations correctly for the walkers. I want to either use the advi and randomise the result for the walkers, or generate samples from the specified priors (currently doing this).

It'll be great if someone else could look at this!

Thanks

EDIT: Implemented some suggestions and improvements. Still a bit hacky, but I have some ideas

@philastrophist
Copy link
Contributor Author

philastrophist commented Jan 20, 2017

An example:

import pymc3 as pm
import numpy as np
import matplotlib.pyplot as plt
from pymc3.external.emcee_samplers import EmceeEnsemble

x = np.linspace(0, 1, 100)
y = 3 + (x*2)
y += np.random.normal(0, 0.5, size=len(x))
yerr = np.ones_like(y) * 0.1

with pm.Model() as model:
    eps = pm.HalfCauchy('eps', beta=10)

    intercept = pm.Normal('intercept', mu=0, sd=10)
    gradient = pm.Normal('gradient', mu=0, sd=10)
    obs = pm.Normal('obs', mu=y, sd=yerr, shape=len(y))
    line = pm.Deterministic('line', intercept + (x * gradient))
    like = pm.Normal('like', mu=line, sd=eps, observed=obs)
    chain = pm.sample(5000, init='advi', step=EmceeEnsemble())

i = np.percentile(chain['intercept'], [16, 50, 84])
g = np.percentile(chain['gradient'], [16, 50, 84])
e = np.percentile(chain['eps'], [16, 50, 84])

print i
print g
print e

pm.traceplot(chain[::10], varnames=['eps', 'intercept', 'gradient'])

@philastrophist
Copy link
Contributor Author

There was a previous proposal about this: #659

@@ -83,7 +85,7 @@ def assign_step_methods(model, step=None, methods=(NUTS, HamiltonianMC, Metropol

def sample(draws, step=None, init='advi', n_init=200000, start=None,
trace=None, chain=0, njobs=1, tune=None, progressbar=True,
model=None, random_seed=-1):
model=None, random_seed=-1, ):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trailing comma

return bij.rmap(apoint)


class EmceeSamplerStep(ParticleStep):
Copy link
Member

@fonnesbeck fonnesbeck Jan 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scratch the last comment (deleted), it is using emcee directly. In that case, I think this should be in pymc3.external, just as the Edward extension is.

@fonnesbeck
Copy link
Member

Thanks for the PR; I had a quick peek at the code and will look at it in depth later. I'm delighted to see the addition of particle samplers, in general. As I commented above, my own preference would be to put the EmceeSampler class in the external submodule, which is where we keep the Edward extension, and have emcee as a soft dependency.

return bij.rmap(apoint)


class EmceeSamplerStep(ParticleStep):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our convention is not to include Step in the name of user-facing sampler classes, so I suggest renaming to simply Emcee.


f = theano.function([inarray0], logp0)
f.trust_input = True
return f
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new line

@philastrophist
Copy link
Contributor Author

philastrophist commented Jan 20, 2017

I'll get onto those suggestions listed above, but I'm thinking now that there could be a factory function to build a step around an external sampler. This would remove the dependency on emcee and allow easy addition of other samplers by user, no?

@twiecki
Copy link
Member

twiecki commented Jan 21, 2017

@philastrophist Very cool. Did you look at #1569?

@springcoil
Copy link
Contributor

Very cool stuff @philastrophist. A PhD student I mentored at a previous job was very upset that particle samplers weren't in PyMC3 - I'll let him know.

@twiecki twiecki mentioned this pull request Jan 21, 2017
@philastrophist philastrophist changed the title Particle samplers and traces [WIP] Particle samplers and traces Jan 22, 2017
@twiecki
Copy link
Member

twiecki commented Jan 25, 2017

@philastrophist also needs tests and an example NB for the docs.

@philastrophist
Copy link
Contributor Author

Now the particle samplers can use the advi/map etc easily in the call to pm.sample!

Not read #1569 yet but plan to see how it works.

Tests and the NB are underway. For now I'll update the example above.

There is currently somewhat more lines than strictly necessary in sample but I'll prune those in the near future. I currently have it so that you can have multiple multiprocessing jobs along with multiple particles. I don't know why you would want to do that but the functionality is there. I might remove it though to clean up some confusing sections.
I'm almost there with integrating other samplers into a ParticleCompoundStep for those scenarios that require something like discrete choice (emcee doesn't do that).

All in all it's basically done and works. What I'm currently doing is making sure it can do all the things current samplers can whilst incorporating some flexibility for people to add other ParticleSteps.

P.S. This started out as something I did for myself and I'm doing it when I have time. So it may take time for me to finish everything!

@twiecki
Copy link
Member

twiecki commented Jan 27, 2017

@philastrophist Seems like you need to rebase.

@fonnesbeck
Copy link
Member

Great, just remove the WIP from the tag and title when you want a pre-merge review.

@twiecki
Copy link
Member

twiecki commented Feb 14, 2017

@philastrophist Any progress?

@philastrophist
Copy link
Contributor Author

Currently the way this method works is quite restrictive on how the sampler works. So it only works for 1 particle sampler at a time. I think this is fine for now since any further development will need the multindtrace.
I need to rebase but i'll push the latest version very soon.
I've made some progress on generalising particle samplers in a separate branch which I'd like some feedback on once I've finished it.

@philastrophist philastrophist changed the title [WIP] Particle samplers and traces Particle samplers and traces for Emcee Feb 18, 2017
@philastrophist
Copy link
Contributor Author

I have made some subtle pm.sample api changes with regard to initialisation and start points to make it more general (and so it can be applied to particlesteps easily). However, this means it breaks in a small subset of cases (4 tests fail) where start is specified. The error thrown relates to variable type.

Since this is a minor problem with a change in api, I think now is the time to remove WIP (since it works for any of my cases) and start some discussion...

PS. A notebook for an intro into how to use EmceeEnsemble is included in the last commit

return start, step
def transform_start_particles(start, nparticles, model=None):
"""
:param start:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These doc-strings need to be made numpy-style.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's my editor's default. Will change

start = start[0]
else:
raise NotImplemented('Initializer {} is not supported.'.format(init))
def get_random_starters(nwalkers, model=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this only be used with a particle sampler? Seems like the function is more general than that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I'll move it outside the particlestep block


return start, step
def transform_start_particles(start, nparticles, model=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be moved to external/emcee?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this should stay here since it's applicable to specifying jobs and other particle steps not just emcee

@twiecki
Copy link
Member

twiecki commented Feb 22, 2017

I think this looks good overall. I'm a bit concerned of the added code complexity but the particle stuff should be usable by other samples except emcee too.

The changes to the API make sense.

It does need tests however.

@philastrophist
Copy link
Contributor Author

Thanks for the feedback. I've made some comments, but what in particular are you concerned about with added code complexity? The additions to .sample? Or the duplication of NDArray?

With respect to NDArray, I could merge MultiNDArray and NDarray by making particles=1 the default instead of having two separate classes for particles=None and particles!=None.
I guess since I have implemented nparticles in all steps it would make sense to have nparticles=1 as the default and merge the different trace styles.

Tests will be implemented of course, I just wanted to get the api out of the way first.

----------
varname : str
burn : int
thin : int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing particles doc

@philastrophist
Copy link
Contributor Author

There is a test: test_step.TestStepMethods.test_sample_exact(NUTS) that fails specifically for NUTS.
The arrays don't match where x is the master "known sample" and y is the one generated.

x: array([  1.203436e+00,   1.203436e+00,   3.284390e-01,  -5.632208e-01,
         1.374194e+00,   1.729373e+00,   4.398738e-01,   4.398738e-01,
        -4.584570e-01,  -3.028040e-01,   9.709908e-01,   1.390899e+00,...
 y: array([ 1.118324,  1.118324,  0.428752, -1.063373, -1.175014, -0.277752,
       -0.277752,  0.003286,  0.23796 ,  1.655296,  1.790141, -0.623905,
        0.60124 ,  0.60124 ,  0.684407,  0.704353,  0.602374,  0.037173,...

I'm not quite sure what to make of this.

Other than that I think I have, or can easily, fix the errors raised in testing for this branch. Now I'll just put in a MultiNDArray test and we'll be set.

@twiecki
Copy link
Member

twiecki commented Mar 6, 2017

@philastrophist Any progress?

@philastrophist
Copy link
Contributor Author

Still going... Rectifying errors

`transform_start_positions` now accounts for njobs
@philastrophist
Copy link
Contributor Author

Tests just underway

@reac2
Copy link

reac2 commented May 19, 2017

Just wondering - what's the status with the Emcee sampler? Excited to test this out on our data.

@twiecki
Copy link
Member

twiecki commented May 19, 2017

Good question, would need to ask @philastrophist.

@reac2 is your model non-differentiable?

@reac2
Copy link

reac2 commented May 19, 2017

Thanks - will do!
Sadly yes and currently seems to look like it has converged after relatively lengthly chains but away from some of the correct values - was hoping the many walker approach would explore parameter space more efficiently. Will try implementing something similar to your 2013 hack too http://twiecki.github.io/blog/2013/09/23/emcee-pymc/

@twiecki
Copy link
Member

twiecki commented May 19, 2017

Have you tried the new ATMCMC sampler?

@reac2
Copy link

reac2 commented May 19, 2017

Ah, no I was looking for ATMCMC and thought it had been removed. That's great - I will try this one out! Thanks very much.

@reac2
Copy link

reac2 commented May 19, 2017

Is there an example for the ATMCMC code you can point me to? Our model is similar in form to the arbitrary deterministic disasters example so i'm starting there but running into trouble with redefining the likelihood_name parameter in smc.SMC.

@ColCarroll
Copy link
Member

I'd check out the test case in test_smc.py. There's another super basic one in test_step.py. I'd be very interested for more examples of using it, and how to make the API better. Note -- still working on checking it thoroughly for accuracy, too!

@reac2
Copy link

reac2 commented May 19, 2017

Great, found it! Thanks. I'll see if I can get this going with the arbitrary deterministic disasters model.

@junpenglao
Copy link
Member

@reac2 you can also find an example here fitting a mixed-effect model https://github.com/junpenglao/GLMM-in-Python/blob/master/pymc3_different_sampling.py#L86-L113

@philastrophist
Copy link
Contributor Author

Please see #2253 for the up-to-date refactored version. I consider this pull request outdated and will close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants