Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add user output (convergence diagnostics) for PMMH example #50

Merged
merged 4 commits into from
May 21, 2021

Conversation

ilyaZar
Copy link
Member

@ilyaZar ilyaZar commented May 19, 2021

This addresses bullet point 5 of #25

  • adding two arguments to nonLinPMMH_impl() and nonLinPMMH() to alter user output:
    • boolean verbose: if TRUE, then (additionally to percentage completion)
      mean parameter estimates, current estimated log-likelihood and log-prior values
      are reported; if FALSE, only percentage of completion is printed
    • int msg_frq: number of iterations after which percentage completion is printed;
      defaults to every 100 iterations

The output is not exactly a one-liner but still quite readable, I hope. I have not added the log-target density yet which, as suggested by @adamjohansen , may be useful to investigate to determine if the chain is in a transient regime or reached a typical set (I can add this if you like).

- adding two arguments to onLinPMMH_impl() and nonLinPMMH() to alter user output:
    - boolean verbose: if TRUE, then (additionally to percentage completion)
    mean parameter estimates, current estimated log-likelihood and log-prior values
    are reported; if FALSE, only percentage of completion is printed
    - int msg_frq: number of iterations after which percentage completion is printed;
    defaults to every 100 iterations
Copy link
Collaborator

@eddelbuettel eddelbuettel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me (at a casual glance)

@adamjohansen
Copy link
Collaborator

Indeed, it looks good and does what is needed.

A few very minor thoughts;

  • Pedantically, it would be good to have an entry in the ChangeLog noting what has been modified here.
  • It wasn't clear to me why check.prog was a double rather than an integer type (I'm always nervous about equality comparisons with floating points...).
  • It has no functional effect, but in lines 117,134 casting with static_cast rather than double(...) might be nicer (see, e.g., the top answer at https://stackoverflow.com/questions/103512/why-use-static-castintx-instead-of-intx) and consistent with the style elsewhere. [I won't claim to be entirely consistent with this principle when writing code myself, despite my best intentions, but I thought I may as well mention it.]

@LeahPrice
Copy link
Collaborator

This looks great to me also and I agree with Adam's suggestions. It might also be nice to add the acceptance rate to the output of nonLinPMMH_impl.

Additions are:
  - add a ChangeLog entry
  - change check_prog to integer (with integer equality comparison in l.117)
  - use static_cast() instead of double() for all (both) occasions
  - add NumericVector acceptance_rate to output of nonLinPMMH_impl() to store all
  acceptance rates
@ilyaZar
Copy link
Member Author

ilyaZar commented May 20, 2021

Thanks for all remarks!

@adamjohansen hopefully addressed your points (special thanks for the note on static_cast() with the accompanying SO-post, this was completely new to me!).

@LeahPrice added a NumericVector to the dataframe output that reports all acceptance rates (may seem redundant as the acceptance rate tend to converge to a rather constant number as well but the output is a dataframe so it requires all its columns to be of the same length, hence all acceptance rates and not just the last from the final iteration).

@adamjohansen
Copy link
Collaborator

Looks great.

I feel tremendously pedantic at this point, but the convention with the changelog is to identify the files in which the changes took place with a

  • filename: what changed
  • other filename: idem
    sort of structure, so in this case it would be good to identify that there are changes in
    src/nonLinPMMH.cpp and R/nonLinPMMH.R

I guess this is a bit outdated, but old habits die hard and we've done this so far. (I think we're more or less following the GNU style insofar as there is one, see https://www.gnu.org/prep/standards/html_node/Style-of-Change-Logs.html#Style-of-Change-Logs).

@ilyaZar
Copy link
Member Author

ilyaZar commented May 20, 2021

I don't think this is pedantic at all, and the earlier I get used to the style of this project the better!

Also:
I guess we do these ChangeLog entries with each successful pull request, solved issue then?

Copy link
Collaborator

@adamjohansen adamjohansen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks perfect to me; can you merge this if you're happy, @LeahPrice?

I tend to keep the ChangeLog more or less up to date as I go, as much because I'd probably forget about things otherwise as anything, but as long as its consistent with what's been done by the time a pull request reaches the repository it should be fine.

@eddelbuettel
Copy link
Collaborator

I picked up the ChangeLog habit from other GNU project and use it in almost all my projects to, well, "log changes". The entries are typically a little more detailed than git log entries. (Emacs makes it easy: in a file, press C-x 4 a and a stanza with date + user + email for the file gets created, sadly other editors do not always offer that by default but there may be add-ons.) No one-to-one mapping to git commits. Sometimes a commit can be large (and hit multiple files), sometimes one changelog line may need multiple git commits.

@eddelbuettel
Copy link
Collaborator

@ilyaZar I just looked into why the continuous integration failed, on both Travis CI and at GitHub Actions. It is one of those small things: You expanded the interface of nonLinPMMH_impl from three parameters to five. I am sure it builds for you at your end. But what you missed that interface changes require running of Rcpp::compileAttributes() (which RStudio does for you, so one sometimes forgets) to update the generated files R/RcppExports.R and src/RcppExports.cpp. And those did not get committed leading the compiler to look for the old three argument one:

[.... more stuff omitted ...]
** testing if installed package can be loaded from temporary location
Error: package or namespace load failed for ‘RcppSMC’ in dyn.load(file, DLLpath = DLLpath, ...):                                                                                                                   
 unable to load shared object '/usr/local/lib/R/site-library/00LOCK-rcppsmc/00new/RcppSMC/libs/RcppSMC.so':                    
  /usr/local/lib/R/site-library/00LOCK-rcppsmc/00new/RcppSMC/libs/RcppSMC.so: undefined symbol: _Z15nonLinPMMH_implN4arma3ColIdEEmm
Error: loading failed                                                                                    
Execution halted                                                                                         
ERROR: loading failed                                                                                                                                                                                              
* removing ‘/usr/local/lib/R/site-library/RcppSMC’                                                                                                                                                                 
* restoring previous ‘/usr/local/lib/R/site-library/RcppSMC’                                                                                                                                                       
Warning message:                                                                                                                                                                                                   
In install.packages(pkgs = f, lib = lib, repos = if (isMatchingFile(f)) NULL else repos) :
  installation of package ‘.’ had non-zero exit status                                                                                                                                                             
edd@rob:~/git/rcppsmc(pr50)$ c++filt _Z15nonLinPMMH_implN4arma3ColIdEEmm                                                                                                                                           
nonLinPMMH_impl(arma::Col<double>, unsigned long, unsigned long)
edd@rob:~/git/rcppsmc(pr50)$ 

So I just added those two (as GitHub lets us commit (by default, unless opted out) to repos of PR submitters). We soon should have 'green' CI as we prefer. Keep an eye on it going forward.

@adamjohansen
Copy link
Collaborator

Thanks, Dirk, I'd made a note to look at what was causing the trouble but as ever you're more efficient...

@eddelbuettel
Copy link
Collaborator

Hah. I am timely for tasks if and when they fall into the very small window of morning coffee hackery before the real day starts.

@ilyaZar
Copy link
Member Author

ilyaZar commented May 20, 2021

@eddelbuettel sorry! And thanks, I was not aware how Travis CI and Github Actions handle these and for, whatever reason, assumed that generated files should not bee committed...

So to just to sort things out for the future:

As far as I understood so far, compileAttributes() deals with exporting C++ functions to R i.e. both generated files (as they only contain the external C wrappers required to call exported C++ functions and their corresponding .Call wrappers required to call them) are "platform independent boilerplate" and thus can always be included in a commit? (so as to be always on the safe side, not worrying if some function signature changed...)

Or would it be particularly annoying or bad practice, say for small changes to the function body, to always include them in a commit?

@eddelbuettel
Copy link
Collaborator

Yes. Just experiement with it a little in a corner and it should become clear. git status is good. Start at steady state (say by checking out any odd project, RcppExamples is fine as small and self-contained and trivial-ish). Mod one file. Rebuild. Look at git status. Object files (as real temporaries) are ignored. RcppExport.*, while also "generated" are not as they are needed to build. (Real test: R CMD build ... to create a new tar.gz, R CMD check ... on that.)

And what also becomes clear after a while is that people around the world write code in projects. This isn't math -- we don't stand around a board with chalk all over our pants. It's code, and it runs. So more like 'engineering' if you will. And that is why we use GitHub with all the bells and whistles. Which ... is a lot but that is why we have GSoC to make new victims^Hvolunteers more familiar with how it works. Just ask -- there are no silly questions!

So the iron rule: don't break things (at least not in the main branch). PRs tend to get ignored when they do not show 'green'.

As for 'small commits versus large' and small versus large change sets: that is a more personal matter. I try to keep them in self contained chunks. But e.g. last evening I had to 'tame' and adjust CI for the work project, and it took me six commits but only commits trigger GitHub Actions runs here. At the end the change was overall simple so I "rebased" and "squashed" what was six commits (for purely procedural reasons) into just one and PRed that earlier. So it depends.

Your change set was great as start. We're here to help you with the small mechanics -- like the compileAttributes() run. It's all good. (And yes, compileAttributes() is smart enough. When we change a function body no change results, no git status or git diff change on the RcppExports* file and all is good. Hope this helps.

@ilyaZar ilyaZar marked this pull request as ready for review May 20, 2021 14:48
@ilyaZar
Copy link
Member Author

ilyaZar commented May 20, 2021

Thanks for all the details @eddelbuettel !

Sorry @LeahPrice , I probably realised too late that draft pull requests cannot be merged until I mark them as "ready for review", now it should be ok.

@LeahPrice
Copy link
Collaborator

These changes look good to me. Thanks everyone and sorry for the slow reply!

@LeahPrice LeahPrice merged commit e7c60de into rcppsmc:master May 21, 2021
@adamjohansen
Copy link
Collaborator

Excellent. It's always nice to be able to cross something of a list. There is still one tiny glitch we should sort out, not something that's critical but we ought to keep what documentation we do have as current as we can:

  • checking for code/documentation mismatches ... WARNING
    Codoc mismatches from documentation object 'nonLinPMMH':
    nonLinPMMH
    Code: function(data, particles = 5000, iterations = 10000, burnin =
    0, verbose = FALSE, msg_freq = 100, plot = FALSE)
    Docs: function(data, particles = 5000, iterations = 10000, burnin =
    0, plot = FALSE)
    Argument names in code not in docs:
    verbose msg_freq
    Mismatches in argument names:
    Position: 5 Code: verbose Docs: plot

which really just means we need to add the new arguments that got added to the interface in this PR to the documentation in rcppsmc/man/nonLinPMMH.Rd

@ilyaZar
Copy link
Member Author

ilyaZar commented May 21, 2021

Excellent. It's always nice to be able to cross something of a list. There is still one tiny glitch we should sort out, not something that's critical but we ought to keep what documentation we do have as current as we can:

  • checking for code/documentation mismatches ... WARNING
    Codoc mismatches from documentation object 'nonLinPMMH':
    nonLinPMMH
    Code: function(data, particles = 5000, iterations = 10000, burnin =
    0, verbose = FALSE, msg_freq = 100, plot = FALSE)
    Docs: function(data, particles = 5000, iterations = 10000, burnin =
    0, plot = FALSE)
    Argument names in code not in docs:
    verbose msg_freq
    Mismatches in argument names:
    Position: 5 Code: verbose Docs: plot

which really just means we need to add the new arguments that got added to the interface in this PR to the documentation in rcppsmc/man/nonLinPMMH.Rd

There might be two things to consider here:

  1. I add the arguments manually, commit and push again (can one do this within this pull request i.e. does it require re-opening for that?).

  2. I noticed that I could not document the package automatically with 'devtools::document()', presumably, because roxygen hasn't been used before (see the 'NAMESPACE' part in the code below, that brought that thought to me in the first place, and also I have noticed any roxygen commenting style i.e. #' or //'). I am not sure if that's on me, or whether the project does not use roxygen, though.

This is what happens when I run 'devtools::document()' on my plattform:

> devtools::document()
Updating RcppSMC documentation
First time using roxygen2. Upgrading automatically...
Loading RcppSMC
Warning: The existing 'NAMESPACE' file was not generated by roxygen2, and will not be overwritten.
Warning message:
roxygen2 requires Encoding: UTF-8 

There seems to be non UTF-8 character(s) somewhere hidden, I'll try to find them on this weekend :) !

@adamjohansen
Copy link
Collaborator

Logically further commits belong in a separate PR given that this one has already been merged, at least IMO -- and we should really take this discussion elsewhere, but the ongoing discussion at the end of a closed PR is entirely my fault.

Although smctc did use doxygen, and the C++ code seems more or less to have the right /// comments for this, I don't recall using Roxygen for RcppSMC and there indeed doesn't seem to be any evidence of it in the R code. I suspect the Rd files were produced manually, but I think @LeahPrice has worked on these rather more recently than I have. I don't think there was a strong reason for that, but others may remember something I don't.

Non UTF-8 characters seem unlikely to be wanted... they often seem to be hyphen-like symbols in references copied from elsewhere.

@ilyaZar ilyaZar deleted the pmmh-user-diagnostics branch May 21, 2021 10:51
@LeahPrice
Copy link
Collaborator

LeahPrice commented May 21, 2021

We haven't used Roxygen so far but I'm not opposed to using it. It might be good to stick with the current approach for now and then we can all discuss whether we want to switch to Roxygen later. I believe Dirk said he'd prefer to only use Roxygen to generate the .Rd files when we were having discussions about this. Is that right, @eddelbuettel?

Apologies for putting this discussion here. I'm happy to write it elsewhere too if we decide to create an issue ticket or discussion on the topic.

@eddelbuettel
Copy link
Collaborator

eddelbuettel commented May 21, 2021

We are splattering discussion now over

  • multiple issue tickets (fine, these can cross reference, hello add missing documentation to nonLinPMMH() (or moving documentation to roxygen) #52 how do you do?) -- so I still prefer that
  • parallel email threads (why? can we do without those)
  • freshly setup discussion forum (fine, we do not have to use it)
  • video calls (coming ... because it is 2021 and everybody else does them, but I will NOT have the pencilled in time for it ;-) )

We can chat next week on the call (kidding aside that should be useful).

Now, kudos to @adamjohansen for spotting the discrepance. Help updates sometimes fall to the way side but that is also what we have R CMD check for. I don't mind roxygen2, I use it in newer projects but I have too many existing ones (still with working Rd files) to obsess over converting old ones. We can spend time there, but I would rather schedule that at a lower priority and keep it to the end. My $0.02, if others feel strongly (and do the work :-) ) I will not object, of course.

@eddelbuettel
Copy link
Collaborator

For completeness, the standard workflow (and the builder in RStudio does that for you too, whether or not you use devtools which I don't):

  • run compileAttributes() (where my personal preference is a shell wrapper compAttr.r using the littler package's r binary)
  • run roxygenize() with proper arguments, I tend to limit roxygen2 often to 'just .R -> .Rd, nothing else' and even use an older version of roxygen2 for that as I despise the newer default of always re-compiling / re-installing the package for that (which is wasteful in time and resources and I am impatient, I also recompile enough by myself). I use roxy.r for that, with switches to flip between the old 'cached' version of roxygen2 and the new one (which I need for some project where I do use @export and NAMESPACE auto-generation.

The key point here is that the former updates the (generated) R file for which the documentation has to match, so the sequence matters for the latter to pick this up and refresh the Rd file. Again, the tools generally do that, one just has to remain aware and check that it really happens. Or else R CMD check nags. One can (should, really) also make sure R CMD check before committing but I don't always do that either knowing that the CI will do it anyway. Shame is then on me when I get a red fail in a rush. It happens.

But for any changes meriting a ChangeLog entry (which, really, should be most) it is worth to go a little slower if in doubt.

Lastly, the utf8 nag is, I think, automatic these days. We can just add the line Encoding: UTF-8 to DESCRIPTION. (Then again I am right now updating RcppArmadillo for a new release and it doesn't have it -- interesting for a widely-used 10+ year old package. Maybe no utf8 char anywhere in its docs....)

@adamjohansen
Copy link
Collaborator

@eddelbuettel: agreed, we don't want lots of parallel channels and there's not much benefit to using email in parallel with all of the other stuff.

The point about video calls and time is well taken, I'm certainly not expecting us to do this regularly (at least not with all 4 of us given the time zones) but I do think there's some value in having one "meeting" to say "hi" and put names to faces.

FWIW, I have no objection to sticking "Use roxygen" up as a new issue, especially if it can be at least partially automated, although I agree its not a high priority.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants