Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Particle marginal Metropolis Hastings Example #13

Merged
merged 4 commits into from
Aug 3, 2017

Conversation

LeahPrice
Copy link
Collaborator

Adding an example using particle marginal Metropolis Hastings. This example is based on a very similar model to pfNonlinBS. So that a single function can be used to simulate data from these models, the simNonlin function has been generalized and documentation updated accordingly.

The log normalising constant estimate was not initialised to 0 (which causes problems when doing multiple SMC runs by calling Initialise() and Iterate() functions multiple times)
Adding an example using particle marginal Metropolis Hastings. This example is based on a very similar model to pfNonlinBS. So that a single function can be used to simulate data from these models, the simNonlin function has been generalised and documentation updated accordingly.
@adamjohansen
Copy link
Collaborator

Hi Leah, thanks for this. This is a trivial thing, but it's likely to recur so I'll mention it...

There are a few more minor things popping up in the travis checks; nothing major, but it's a good idea to take notice of notes and warnings as they appear. Check out lines 900-905 in particular and see Dirk's commit bdc201c for a prototypical solution to this type of NOTE.

Copy link
Collaborator

@adamjohansen adamjohansen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on the initialisation, this could have caused some subtle problems later on if you hadn't noticed this.

@LeahPrice
Copy link
Collaborator Author

Thanks Adam. I ran R CMD check on my computer and saw I didn't get any warnings so I'll have to make sure I look more closely at the Travis checks next time. Are lines 907-912 an issue too? I haven't touched the linear regression examples in this pull request so I'm not sure where that's coming from. I'm using Lazydata: TRUE so I might need to look into what else could be causing the warning. Are Travis checks only done for pull requests? I tried to find the checks from Dirk's most recent commit but I couldn't.

By the way, I wrote a message on the PMMH issue ticket this morning thanking you for your comments and saying I'd followed your advice with the cosine sequence offset. It seems it's been sitting on my computer all day, but I guess there's no point hitting send now.

@LeahPrice
Copy link
Collaborator Author

I realized I can just do --as-cran in my CMD check, so feel free to disregard my question about whether Travis checks are only done for pull requests.

Copy link
Collaborator

@adamjohansen adamjohansen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is essentially there, but it would be good if you could resolve the NAMESPACE issue identified by travis, tweak any of the code that you want to in response to any of these rather trivial comments, and see if there's anything you can do about the identation/whitespace in your editor.

If it's too much of a pain to force your editor to indent consistently then it can be fixed later, it would just be nice to be as efficient as possible.

\code{\link{simNonlin}} for a function to simulate from the model and
\code{\link{nonLinPMMH}} for an example of particle
marginal Metropolis Hastings applied to a non-linear state space model.
}
\author{Adam M. Johansen and Dirk Eddelbuettel}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and yourself...

#include "nonLinPMMH.h"

namespace nonLinPMMH {
const double a_prior = 0.01;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation seems a bit variable (cf. 27, 42 amongst others). This is easily fixable, but can your editor be persuaded to "do the right thing"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have fixed this now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. It looks much more consistent now. BTW I hope you didn't spend too much time on this as it can be done quite automatically (even if you don't like emacs, you can always use it occasionally for things like auto-indent, for example).

(I know things like this seem trivial but it's amazing how much more readable code is if it's indented in a consistent way and when you're working with a big code base it /really/ helps.)

sigw(i) = theta_prop.sigw;
loglike(i) = loglike_prop;
logprior(i) = logprior_prop;
Rcpp::Rcout << "ACCEPTED proposal (" << setw(6) << sigv(i) << ", " << setw(6) << sigw(i) << ") at iteration " << setw(4) << i+1 << " with loglike " << setw(8) << loglike(i) << std::endl;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to produce this output? I can see it has value for debugging but it's not clear to me that it's something a normal user would want to see.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had this stuff printed out for debugging and also to give the user a sense of how long is left. This example takes a while to run (especially if you want to do the 50,000 iterations suggested by the paper!), but I guess it's fairly standard to not get a progress update.

Copy link
Collaborator

@adamjohansen adamjohansen Aug 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. I wouldn't personally want ~10k lines of output printed to the console when doing something like this; a boolean "verbose" or "quiet" argument might be a good idea so that the user can suppress it. Personally, I'd just print an "iteration xx/yy" every 100 or 1000 iterations if I wanted to reassure the user that something was happening.

I don't recall how the buffering works with Rcout, but you might need some sort of flush instruction to guarantee that the output appears in a timely fashion, too. @eddelbuettel?

logweight += R::dnorm(y(lTime),mean,theta_prop.sigw,TRUE);
}

}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More trivia: It's conventional (and good practice) to end source files with a new line (some compilers expect the final source line to be terminated with one).

double MH_ratio;
for (unsigned int i = 1; i<lMCMCits; i++){
// RW proposal for parameters
theta_prop.sigv = R::rnorm(sigv(i-1),0.15);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks fine and as we've said before working is far more important than fast, but one thought which might be worth bearing in mind for the future.

In this context, we know how many normal and uniform random variables we need in advance. From a speed perspective, it would make more sense to call R::rnorm only once or twice to obtain vectors of these RVs and similarly to call R::runif only once rather than once per iteration. I.e. To generate vectors of all of the RVs required in advance outside of the main loop. There's a memory cost associated with it, but one which isn't enormous and it's best to minimize the number of calls back to R. It might be interesting to use rprof (or profvis) to compare the two approaches, actually.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that's a good point. I've used vectors for this now. I tried using profvis quickly to compare the two approaches. I did 100 iterations without any plotting and the non-vectorised time was 91040 (milliseconds I think) and vectorised was 92240, so it was slightly slower with vectorisation. That's probably not a very fair test though -- 100 iterations is unrealistically small.

I can do a proper check with a more realistic number of iterations and look at time and memory if that's useful.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't worry too much; I'm slightly surprised by the numbers but this isn't a priority anyway.

@adamjohansen
Copy link
Collaborator

adamjohansen commented Aug 3, 2017

The data() NOTE is another small thing. I didn't mention it because, as you observe, it's unrelated to this pull request. It's nice if individual commits / pulls are coherent change sets.

It would, in the fullness of time, be worth cleaning this up in that it currently dumps the data into the global workspace which is bad practice because (a) the name might collide with something which is already there and (b) it will persist there until it's deleted. Using the "envir=environment()" parameter in the call to data might be sufficient to avoid this.

Dirk might have stronger views about data in R packages, but I'm largely agnostic as long as we don't do anything too antisocial (and putting stuff in a global workspace when it doesn't really need to be there might qualify as mildly antisocial).

Another think @eddelbuettel is better placed to answer, is whether we ought to do a bit of work to kill the notes at line 924 of the travis log (cf. RcppCore/Rcpp#636).

Correcting indentation in nonLinPMMH.h and nonLinPMMH.cpp, storing vectors for random component of RW and correcting authors in simNonlin and nonLinPMMH.
@LeahPrice
Copy link
Collaborator Author

I tried using envir=environment() and the NOTE doesn't appear anymore. Thanks for this fix. If @eddelbuettel is also happy with this, then I'll update it in a separate pull request. I have the news file updated too and ready for a future pull request.

@eddelbuettel
Copy link
Collaborator

Yes, it is good practice to run R CMD check --as-cran. You can also tell you system to do it (ie if you use RSrudio) and/or just do it by hand. "Real submissions" also have to do it via the R-devel version which is often pickier still (and you get access to that for free via win-builder and/or R-hub if you don't have R-devel built locally).

I'll merge this now and take a look later.

@eddelbuettel eddelbuettel merged commit fe4ea98 into rcppsmc:master Aug 3, 2017
@LeahPrice LeahPrice deleted the examplePMMH branch August 4, 2017 01:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants