-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Particle marginal Metropolis Hastings Example #13
Conversation
The log normalising constant estimate was not initialised to 0 (which causes problems when doing multiple SMC runs by calling Initialise() and Iterate() functions multiple times)
Adding an example using particle marginal Metropolis Hastings. This example is based on a very similar model to pfNonlinBS. So that a single function can be used to simulate data from these models, the simNonlin function has been generalised and documentation updated accordingly.
Hi Leah, thanks for this. This is a trivial thing, but it's likely to recur so I'll mention it... There are a few more minor things popping up in the travis checks; nothing major, but it's a good idea to take notice of notes and warnings as they appear. Check out lines 900-905 in particular and see Dirk's commit bdc201c for a prototypical solution to this type of NOTE. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch on the initialisation, this could have caused some subtle problems later on if you hadn't noticed this.
Thanks Adam. I ran R CMD check on my computer and saw I didn't get any warnings so I'll have to make sure I look more closely at the Travis checks next time. Are lines 907-912 an issue too? I haven't touched the linear regression examples in this pull request so I'm not sure where that's coming from. I'm using Lazydata: TRUE so I might need to look into what else could be causing the warning. Are Travis checks only done for pull requests? I tried to find the checks from Dirk's most recent commit but I couldn't. By the way, I wrote a message on the PMMH issue ticket this morning thanking you for your comments and saying I'd followed your advice with the cosine sequence offset. It seems it's been sitting on my computer all day, but I guess there's no point hitting send now. |
I realized I can just do --as-cran in my CMD check, so feel free to disregard my question about whether Travis checks are only done for pull requests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is essentially there, but it would be good if you could resolve the NAMESPACE issue identified by travis, tweak any of the code that you want to in response to any of these rather trivial comments, and see if there's anything you can do about the identation/whitespace in your editor.
If it's too much of a pain to force your editor to indent consistently then it can be fixed later, it would just be nice to be as efficient as possible.
man/pfNonlinBS.Rd
Outdated
\code{\link{simNonlin}} for a function to simulate from the model and | ||
\code{\link{nonLinPMMH}} for an example of particle | ||
marginal Metropolis Hastings applied to a non-linear state space model. | ||
} | ||
\author{Adam M. Johansen and Dirk Eddelbuettel} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and yourself...
#include "nonLinPMMH.h" | ||
|
||
namespace nonLinPMMH { | ||
const double a_prior = 0.01; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation seems a bit variable (cf. 27, 42 amongst others). This is easily fixable, but can your editor be persuaded to "do the right thing"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I have fixed this now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. It looks much more consistent now. BTW I hope you didn't spend too much time on this as it can be done quite automatically (even if you don't like emacs, you can always use it occasionally for things like auto-indent, for example).
(I know things like this seem trivial but it's amazing how much more readable code is if it's indented in a consistent way and when you're working with a big code base it /really/ helps.)
src/nonLinPMMH.cpp
Outdated
sigw(i) = theta_prop.sigw; | ||
loglike(i) = loglike_prop; | ||
logprior(i) = logprior_prop; | ||
Rcpp::Rcout << "ACCEPTED proposal (" << setw(6) << sigv(i) << ", " << setw(6) << sigw(i) << ") at iteration " << setw(4) << i+1 << " with loglike " << setw(8) << loglike(i) << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want to produce this output? I can see it has value for debugging but it's not clear to me that it's something a normal user would want to see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had this stuff printed out for debugging and also to give the user a sense of how long is left. This example takes a while to run (especially if you want to do the 50,000 iterations suggested by the paper!), but I guess it's fairly standard to not get a progress update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. I wouldn't personally want ~10k lines of output printed to the console when doing something like this; a boolean "verbose" or "quiet" argument might be a good idea so that the user can suppress it. Personally, I'd just print an "iteration xx/yy" every 100 or 1000 iterations if I wanted to reassure the user that something was happening.
I don't recall how the buffering works with Rcout, but you might need some sort of flush instruction to guarantee that the output appears in a timely fashion, too. @eddelbuettel?
src/nonLinPMMH.cpp
Outdated
logweight += R::dnorm(y(lTime),mean,theta_prop.sigw,TRUE); | ||
} | ||
|
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More trivia: It's conventional (and good practice) to end source files with a new line (some compilers expect the final source line to be terminated with one).
src/nonLinPMMH.cpp
Outdated
double MH_ratio; | ||
for (unsigned int i = 1; i<lMCMCits; i++){ | ||
// RW proposal for parameters | ||
theta_prop.sigv = R::rnorm(sigv(i-1),0.15); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks fine and as we've said before working is far more important than fast, but one thought which might be worth bearing in mind for the future.
In this context, we know how many normal and uniform random variables we need in advance. From a speed perspective, it would make more sense to call R::rnorm only once or twice to obtain vectors of these RVs and similarly to call R::runif only once rather than once per iteration. I.e. To generate vectors of all of the RVs required in advance outside of the main loop. There's a memory cost associated with it, but one which isn't enormous and it's best to minimize the number of calls back to R. It might be interesting to use rprof (or profvis) to compare the two approaches, actually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that's a good point. I've used vectors for this now. I tried using profvis quickly to compare the two approaches. I did 100 iterations without any plotting and the non-vectorised time was 91040 (milliseconds I think) and vectorised was 92240, so it was slightly slower with vectorisation. That's probably not a very fair test though -- 100 iterations is unrealistically small.
I can do a proper check with a more realistic number of iterations and look at time and memory if that's useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't worry too much; I'm slightly surprised by the numbers but this isn't a priority anyway.
The data() NOTE is another small thing. I didn't mention it because, as you observe, it's unrelated to this pull request. It's nice if individual commits / pulls are coherent change sets. It would, in the fullness of time, be worth cleaning this up in that it currently dumps the data into the global workspace which is bad practice because (a) the name might collide with something which is already there and (b) it will persist there until it's deleted. Using the "envir=environment()" parameter in the call to data might be sufficient to avoid this. Dirk might have stronger views about data in R packages, but I'm largely agnostic as long as we don't do anything too antisocial (and putting stuff in a global workspace when it doesn't really need to be there might qualify as mildly antisocial). Another think @eddelbuettel is better placed to answer, is whether we ought to do a bit of work to kill the notes at line 924 of the travis log (cf. RcppCore/Rcpp#636). |
Correcting indentation in nonLinPMMH.h and nonLinPMMH.cpp, storing vectors for random component of RW and correcting authors in simNonlin and nonLinPMMH.
I tried using envir=environment() and the NOTE doesn't appear anymore. Thanks for this fix. If @eddelbuettel is also happy with this, then I'll update it in a separate pull request. I have the news file updated too and ready for a future pull request. |
Yes, it is good practice to run I'll merge this now and take a look later. |
Adding an example using particle marginal Metropolis Hastings. This example is based on a very similar model to pfNonlinBS. So that a single function can be used to simulate data from these models, the simNonlin function has been generalized and documentation updated accordingly.