Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign up(Setting) Algorithm Parameters #23
Comments
|
Thanks Adam. I had intended on making it easy to use algorithm parameters that aren't adapted online like we discussed, but I think I got a bit carried away thinking about adaptation. My reasoning behind the current set up is that users can still add additional algorithm parameters through the template argument. Not allowing users to change them was pretty silly though, so I'll add some setters. Another thing that doesn't seem ideal in the adaptation is that you need to write things like On an unrelated note, I was thinking the other day about adding more getters to sampler - especially getters related to the history. We have a GetESS(lGeneration) function, so I think it would be useful to do similar things with everything else stored in history. I used code like There are a few other small things like that that I've been thinking about doing, but I'm wary of making changes to the library together with changes to examples in pull requests. P.S. A quick update: I found the normalizing constant for the 5D linear Gaussian example using a Kalman filter and it matches what I'm getting with the BPF. I guess this suggests that my APF normalizing constant is the one that is wrong so now I need to find the error... |
|
Thanks, @LeahPrice, I don't think there's any problem with the way that additional parameters are specified, it's just as you noted inconvenient not to have a simple way to set them Although Adding extra helper functions to simplify access to things which are otherwise buried within objects indeed seems like a good idea. (What we had to begin with was enough to do things, but not necessarily in the simplest possible way.) There is a slight issue in dealing with the fact that the history isn't necessarily stored ( If the Kalman filter agrees with the bootstrap particle filter then, indeed, they're probably both right... the KF should give ground truth, bugs excepted. |
|
Having spent quite a bit of today trying to implement some rather non-standard SMC things using RcppSMC, I've had a few more thoughts on related things. I think there's actually a more fundamental design issue here, actually, and I think that it's something that will need to be addressed at some point. I don't know whether it's feasible to get this done during what remains of GSOC, but I think this is the right place to make a note of it, anyway. The issue as I see it is that there's a bit of a "failure of encapsulation" when we're dealing with algorithm parameters in the library. In an ideal world, allocating the "workspace" with new and cleaning up afterwards with delete should be the only place in which the algorithm parameters get touched without being mediated by the sampler object if you see what I mean. In the current implementation, the fact that the parameters are allocated externally to the sampler is used to allow them to be accessed in other places as global variables (in the exemplar adaptive SMC code, for example, One significant problem with the approach is that there is no way for the function to "know" which parameter object is associated with the sampler that called it unless there is only one sampler object in the code. I think I'd like to see a situation in which the parameters get passed explicitly to any function that might want to use them (an alternative would be providing a reference to the associated sampler from which any needed parameters could be retrieved). This would probably break backward compatability, but I think this is easily an important enough issue to justify doing that and it would make sense to do it earlier rather than later if we're going to end up needing to eventually anyway. [It might make sense to consider how population-level initialisation and move functions might be supported at the same time, of course.] Here's a use case which illustrates why the encapsulation is important. Imagine a programme which contains many instances of a sampler which makes use of the same functions (concrete example: it is the same sampler applied to different data sets). Each instance of the sampler has an associated collection of parameters and the move functions should use the correct instance of the parameter object. Unfortunately, those functions at present have no way of knowing which sampler called them or what the appropriate parameters are. Any thoughts would be very welcome (even if those thoughts are that this is completely incoherent in which case I will try to clarify what I mean). |
|
Thanks Adam. It's really good to get your thoughts on these things because I've got a lot to learn about C++ still and also because you have a different (and obviously more experienced!) perspective on SMC and how people might want to use the library. I agree pretty much with everything you said. Making sure the code that's currently there is designed well sounds like it should be a priority, so I've started working on this and will hopefully have a fork ready by the end of my day/start of yours. |
|
Getting config objects right is really hard. We could try to aim for, say, a single I don't think there are perfect answers. There may just be reasonable attempts. I'd be open for trying different things. |
|
Would this be to get around some examples needing parameters passed to them and others not? It seems that changing the initialize, move and MCMC functions to have the parameters as an argument will mean that I need a parameter type for all examples. I guess I could use an empty class but if there's a way to get around that then that might be nice. I'm not really familiar with some of the terminology you used in that last comment @eddelbuettel . I've found a couple of sites which seem to talk about related topics (e.g. this and perhaps this), but if you think that might be the way to go and if you happen to know of any good sources then that would be great. |
|
I typed sloppily: lists or environments objects. The different is minor, we'd use either as key-value lookups. |
|
It's certainly hard, and as with many things good solutions are thin on the ground. Adequate solutions that can be incorporated without completely refactoring things similarly, but I don't think the situation is so bad here. I don't think we need a hash table of this sort because @LeahPrice has already provided a mechanism for associating a particular parameter object with a particular sampler. The problem arises downstream of that in that the sampler calls user-supplied callback functions which don't know which sampler they're associated with (and which may be called by more than one sampler object). A key wouldn't help here, particularly, as we'd still need to add a mechanism for passing the key to the function -- and a reference to the parameter object could be passed directly if we were adding parameters. That is, the new bits that @LeahPrice has added work sensibly but they add functionality that wasn't originally envisaged and the existing code needs a bit of reworking to accommodate the extra flexibility in a sensible way. This dates back to an unfortunate design design from smctc which led to callback functions being passed as pointers. If I'd done things properly at that stage and insisted that inheritance be used to specify them as proper member functions then this issue wouldn't arise as they would belong to the appropriate sampler. One day maybe refactoring this would be sensible. In the short term, I think all that is really needed is to change the specification of the various functions used in running the sampler so that they take an additional parameter which receives a reference to the appropriate parameter object. |
|
I started this morning with passing the parameters to the functions so I figured I would get something basic working with that. Otherwise there is a good chance I would miss details about what works and what doesn't work with that approach. I did a branch for this, although there are a lot of changes there so here's a brief summary:
And my thoughts on it
I am a little bit lost on how the list/environment as key-value lookup thing works. That would mean that the parameters are accessed in a similar way to what we were doing before rather than passing them to functions, right? Do you know any packages that might do this? Perhaps I'm looking up the wrong search terms, but I haven't been able to find any detailed descriptions to help me get started with this. |
|
I think the last two comments were written simultaneously, so to respond to @LeahPrice.... Wow! You appear to have implemented more or less what I was suggesting in less time than it's taken me to ponder about what might make sense... Obviously, there's a lot of code here and I haven't looked that carefully at it, but this seems sensible, consistent with the way that other things currently work and likely to address the issue that I'd identified. I think that's quite a big step forward. I'd have thought specifying an smc::nullParams object in the library and using that whenever an empty parameter class is needed would be slightly cleaner than defining a new empty class in each application, but that's a detail and was the only thing that struck me on skimming through the code. |
|
Thanks @adamjohansen. Yeah I think we were writing at the same time. An extra detail that's probably more relevant now is that accessing the parameters outside of the main function and outside of initialize/move/MCMC functions was a bit annoying. I think a nice way to get around that might be to make Sampler available in the example namespace. To do that, I imagine I would need to declare it in the namespace first but I can't do Also, I have the feeling that it will be easier to do the work with the arma::mat population values using inheritance rather than functions passed as pointers. Of course that feeling could be wrong, but I'd be happy to try refactoring it now. Thanks for the suggestion of using a single smc::nullParams. That sounds good. |
|
Sorry, @LeahPrice, I don't quite follow what you mean by "make Sampler available in the namespace". It's possible to have a The copy constructor should be defined, but there were some technical complications in the early days and for most purposes people don't need to copy what might be very large objects. I don't think there are currently any problems with deep copying a sampler object and it would be good to have the operator defined unless I'm overlooking anything. |
|
Yeah sorry about that. I edited it right after sending to say "in the example namespace" which is probably also not very clear, but you've answered the question anyway. It sounds like using a pointer to the sampler might be a good way to go. Cool, I'll add the copy constructor but won't use it for this. |
|
Having a constructor which creates an "empty" sampler and using the copy constructor to initialise it would also be absolutely fine; I was just suggesting something that could be done without a copy constructor. (Without want to be repetitive, do commit early and commit often, as it's much easy to understand what's going on if each commit corresponds to a self contained change, like adding a copy constructor.) |
|
Okay, I will do the copy constructor in its own commit. I know there is a very annoying number of changes in the branch I did today, sorry. To be honest I didn't put too much time into creating sensibly sized commits because I just wanted to get it working and I wasn't sure if this was going to be the solution we ended up on. It sounds like it will be so I can break it into smaller changes for the pull requests (i.e. no need to spend too long looking at the details now if you would prefer not to). Having said that, there will still be at least one pull request with 16+ files changed because changelog, moveset and all examples need to be changed together to keep it all working. |
|
I wasn't complaining; sometimes when you change the structure of things it difficult to avoid changing many things at the same time. That's not so bad if all the changes are coherent and doing the same thing. |
|
Can this we considered closed per #24? |
|
I would consider it closed but it would be good to get Adam's opinion also. |
|
It's resolved as far as I'm concerned, too. |
Context: I've been playing around with the current library while implementing some code for a research problem and some things which I'd imagined would be simple have proved less so than I'd hoped. My impression is that a few tweaks to the library would be helpful in similar situations, but @LeahPrice has thought about this more than me so perhaps there are complications I'm overlooking.
Neither
algParamsnorsamplerprovides a mechanism to allow the user to explicitly set the value of the parameters (which are protected). It seems likely that there will be situations in which the user doesn't want to do anything complicated but does want to explicitly specify the value of parameters -- think about a case in which a program contains many samplers and the parameters specify a particular data subset, for example.An I missing something, or would it make sense for their to be a
SetParamfunction inalgParamsand aSetAlgParamValuefuncion insampler?Slightly related, at present the only way that
pAlgParamscan be specified is via one of the constructors ofsmc::sampler. It certainly isn't critical, but might it be sensible to have aSetAlgParamPointerfunction or similar to allow this to be changed on the fly?