-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Factor out repeated code in inference algorithms #64
Comments
Also, instrument inference algorithms with hooks such that we can pass in functions to collect and compute diagnostics without the diagnostics code being part of the core algorithms. |
I've made an initial attempt to refactor mh and smc. The basic idea is to structure things around functions which operate on traces. For example, there's a new One thing I'm keeping in mind is Noah's suggested interface for SMC. This gets us part of the way there, although it's not clear to me how much work it would take to make it possible to plug in other kernels. This is a very rough sketch (lots of rough edges, I'm not certain how well it extends to the other algorithms, I don't know if it incurs a performance hit, etc.), but do you think it's heading in a useful direction? |
My initial attempt at refactoring some of the inference algorithms (more detail in previous comment) factors out the (rejection) strategy used to initialize the Markov chain. At present this increases the amount of code (because the initialization strategy is a new coroutine) but it seems possible that going this way yields further opportunities to simplify things that will leave us with less code once we're finished. The idea of "make traces more central" that was mentioned in the planning meeting sounds superficially similar to this approach. I'll investigate further and report back once I've understood incrementalization. |
I've spent a little more time on this - here's what MH and SMC currently look like. This approach has already removed quite a lot of duplication, and if anything SMC and MH are now more similar than they were before so I'm confident they can be simplified further. I'll make a little more progress and perhaps have an initial stab at (the closely related) #86 before eliciting initial feedback from anyone interested. |
This is now at a point where it would be good to get some feedback. In particular I'd like to know whether you think this is the direction we should be heading. Here's the latest diff. To save you reading back through the thread I'll give a brief overview again here: The core idea is to organize things around procedures which operate on traces. I've applied this to simplified versions of MH and SMC so far. Doing this helps in two main ways:
I think the resulting code is quite a bit easier to understand than what we have now. (Though I'm not the best person to judge having just written the new stuff.) I think the biggest improvement is to the PF rejuvenation code, so that might be something to look at and compare. I've also begun to implement the // num iterations etc. dropped for clarity
Infer(model, { method: MCMC, init: Rejection, kernel: MHKernel });
// Initialize a Markov chain using a particle filter, then do MH.
Infer(model, { method: MCMC, init: PFInit, kernel: MHKernel });
Infer(model, { method: PF, rejuvKernel: MHKernel }); This was straight-forward to implement once things were organized around traces. What about all the other inference algorithms?
One outstanding question is whether organising things this way has a significant impact on performance. I don't expect it will but I need to check. Hope that makes sense, happy to answer any questions if not. |
This direction looks good to me. I think factoring out the trace data structure and making the core inference algorithms operate on traces is the way to go. Some comments, at various levels of abstraction:
|
The For particles, you can simply extend the |
These are missing from my simplified/refactored MH/SMC implementation:
I also need to:
Enhancements:
WIP plan of what happens to each inference algorithm once we're done:
We might consider tackling HMC and IncrementalMH later as separate pieces of work. If we do that I think it would be a good idea to at least convince ourselves that this approach (organizing things around traces) is going to work. |
Thanks for this, I think I agree with everything you said! I've had a couple of new thoughts about naming.
|
Thanks for making the table - very helpful!
Agreed.
If we take the planned SMC implementation and ignore the rejuvenation bits, will it be essentially identical in complexity to what a PF implementation would be, or are there a bunch of extra bits? If there is extra complexity (outside of the rejuvenation loop), it would be better to maintain a separate ParticleFilter implementation. If not (which I expect to be the case), then let's retire ParticleFilter.
At some point, we should rename it, but it could use a bit more testing (in real usage) before then.
Sounds fine to me, as long as we are confident that we'll be able to integrate those later.
Yes, would be good to have everything available through the new interface.
That would be great to have (though could also be a separate piece of work, if you wanted). |
I like having
My understanding of the naming conventions (which are indeed confusing) is that neither SIS nor SIR usually have rejuvenation steps, and that SMC is used to refer to SIR with rejuvenation steps. I think of SIS, SIR and SMC to have the same type, so I'm not sure we should rename [what is plugged in in place of]
The sampler we use to initialize MH is a weird sampler, though, no? Unlike a traditional rejection sampler, we'll accept as soon as we find any trace with non-zero probability (and we wouldn't want it to be otherwise, or init would take forever for realistic problems). I think it's fine to name it
Interesting idea. It would indeed be useful to have a |
I've been trying to familiarize myself with IncrementalMH to make sure we can. This has raised a couple of questions, the answers to which might help me see how I need to structure things:
(cc @dritchie) |
First: yes, the ERP master list is just a listing of all the ERP nodes that are in the cache. This is maintained separately so that choosing an ERP to propose to at random can be done quickly, without having to traverse the cache. As for interop with other algorithms, this has been on my mind a bit lately. It would be ideal to have an I've been thinking that the best way to do this might be to define a minimal interface that traces need to expose. Operations like "get/set the value of an ERP given an address," "re-execute from this address," or "reject proposal and reset trace." If algorithms that use MCMC kernels only interact with traces via these methods, then things should work out. I'm hopeful that such a common interface does exist, but obviously I haven't done the legwork to flesh it out. If you do start designing such an interface, and you're wondering if Incremental MH can support such-and-such an operation, feel free to ask. |
@dritchie It appears that the HMC code does include something along the lines you specify; particularly, the code in The |
@dritchie That's helpful, thanks! I want to make sure everyone understands what I'm up to, so here's a quick summary to save you reading back through the issue history.
A large part of this (i.e. issue #64) is exactly that. It already includes interop between SMC and MH via an interface on traces similar to the one you described and similar to that which @iffsid has in HMC. The rough plan is:
I'm thinking about IncrementMH now because I don't want to head down this path and find out later that it's a dead end.
I'm currently imagining that this is slightly more general than that. One consequence of my approach is that operations like:
... would belong to particular algorithms and not the trace itself. (@iffsid described something similar for HMC.) My current feeling is that this will indeed work out and I should press ahead, but if there's something I'm overlooking please let me know! I'm happy to explain things in more detail to anyone who is interested. |
I did some crude benchmarking of the refactored algorithms. In summary:
Here are the details. This is runtime as measured with The refactored code contains lots of additional assertions which I removed before doing this.
(*) |
Looks great!
One of the reasons for keeping it around was that it was easier to understand than the version with rejuvenation, so it was potentially useful as an educational tool for getting people started on the codebase (the next step up from the particle filter described in dippl) and as a basis for inference experiments. The refactored code is hopefully readable enough that we don't need to keep around an extra particle filter implementation, but it's worth keeping that motivation in mind.
Is the refactored version (doing the equivalent of) not deep-copying the trace? If so, we should make sure that that's correct in general. There may have been a reason for using deep copies. |
Gotcha, thanks. You'll have to see what you think of the refactored version. My feeling is that
Yes. I think that's correct, but I'll double check and then maybe give you an argument why. |
Is |
Yeah, good idea.
Just for testing, for now - but my intuition is that our SMC implementation should work in the extreme case of only one particle. |
In the case where advancing the single particle to the next observation leads to a score of -Inf, I wonder if it would be preferable to keep re-running that step of SMC until we find an extension to the particle/trace with non-zero probability before doing rejuv. Otherwise, the first change rejuv proposes will be accepted, and it seems quite likely it will wipe out all the progress made during this careful initialization. |
Progress update: There are only a few things left to do on my todo list. I'm hoping to open a pull request for this issue at the beginning of next week. |
I don't think that's possible - we don't know that there is any local extension with non-zero probability. (Consider the case where no random choices have happened since the last factor.)
A big reason for this is that if both old and new score are
Awesome! |
I've not been able to think of a reason why a deep copy is necessary. The reason I think a shallow copy is sufficient is that I don't think we mutate any of the objects representing the choices made at sample statements (e.g. by reaching back into the trace history) nor do we mutate anything that these choice object reference. (With the exception of the store of course, but we take care to clone that where necessary.) For reference the trace in the current version of
A deep copy happens during resampling and before MH generates a new proposal.
I had a look at the commit history to see if I could find any clues but didn't have much luck. It looks like the initial commit will have been broken as traces will have been shared between particles. The deep copy was added here - ring any bells? |
It turns out that this is because the current rejuvenation code doesn't have the optimizations which bail out of the proposal early e.g. when the probability becomes zero. |
I've not done this yet and would prefer to tackle it as a separate task. The refactored MH doesn't do anything with |
This is not true for HMC. |
I don't remember why exactly this was added. I should have added a note in the commit message or code - sorry! I suspect I was concerned about the store not getting copied wherever it needed to be copied, possibly because I ran into a case where this actually happened (though I also wouldn't be surprised if I just added it preemptively).
Fine with me.
Also fine - we can revisit this when we add hooks.
If it's only necessary for HMC, we should only deep-copy when we're actually running HMC, so that we don't incur unnecessary cost otherwise. |
This should be fairly straightforward. You'd only need to check if the score for the trace is a However, one thing to note is that this does necessitate that the trace is entirely self-contained; i.e, the trace is built without other properties of coroutine/kernel also computing/referring to parts of the tape. I believe that this is handled correctly in the refactor, but I haven't checked that thoroughly, so I'm not 100% certain. |
Closed by #169. |
There is a lot of overlap in pf, pmcmc, smc; and in mh and smc. One way to combat this would be to just have a single smc algorithm, and implement all of the algorithms mentioned above as parameterizations of this uber algorithm. This has some advantages, but doesn't feel entirely right, based on what smc currently looks like; it would be better if we had smaller compositional pieces.
The text was updated successfully, but these errors were encountered: