Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conditional SMC #60

Merged
merged 19 commits into from
Aug 22, 2021
Merged

Conditional SMC #60

merged 19 commits into from
Aug 22, 2021

Conversation

ilyaZar
Copy link
Member

@ilyaZar ilyaZar commented Aug 15, 2021

The conditional SMC branch from the discussion.

ilyaZar added 16 commits July 1, 2021 17:50
- add specific Initialize functions
- change the order of conditioning: now, conditioning is performed exactly after
initialization or propagation
- TO-DO (IMPORTANT):
  * conditioning coordinate needs to be re-weighted and the weights
  need to be re-normalized
  * get rid of, or adjust, facilities for adaptation; adaptive resampling; additional MCMC moves and similar
- user specified weighting function added to moveset.h
- sampler.h:
    - possible bugfix in base sampler class: line 515 and 727 have integer in
    if-condition where bool is required, so change to bool via
    "htHistoryMode == HistoryType::AL"
    - fix redundant variable in getALineInd and getALineSpace: long N = GetNumber()
    is never used

- "moveset.h"
  - add whitespace in function argument i.e. "time, value, weight,..." instead of
  "time,value,weight,..." e.g. in l73,76 etc.
  - add boilerplate for virtual functions for setting conditional particle coordinate
  and re-weighting appropriately: "pfWeight", "DoConditionalMove", "*defaultWeight"
  - update constructr for moveset-class accordingly to provide placeholder for
  weighting function

- sampler.h: changes to conditionalSampler-class
  - add referenceTrajectoryIndices for storing specifically index assignment of the
  conditional reference path which is useful for cond. SMC resampling schemes
  - set referenceTrajectoryIndices(T) = referenceTrajectoryIndices(T-1) if no
  resampling is performed and add a conditionalResampling function (instead of
  sampler<Space,Params>::Resample) that does a proper conditional resampling not
  necessarily permuting the particle set but allowing to draw from the lambda-density
  (see @adam 's WP)
  - conditionalResampling currently only supports multinomial resampling for the
  trivial lambda()=1/N density case (with other resampling schemes to be adjusted for
  conditional resampling)
- Implements Alg. 3-5 of Appendix C of WP: multinomial,
stratified, systematic and residual conditional resampling.
- Alg. 6, conditional residual resampling yet to come.
- floor(unif_rand()*N) instead of a call to Rcpp::sample() that produced too much
overhead when drawing just one uniform index.
…ampling cases.

- Fixed errors
  - referenceTrajectoryIndices is never initilialized with proper length; now takes
  length of referenceTrajectory
  - derived conditionalSampler class used Resample instead of conditionalResample
  - for some cases wrong indexing of
    referenceTrajectoryIndices.at(T) = referenceTrajectoryIndices.at(T - 1);
    changed to correct version
    referenceTrajectoryIndices.at(T_ + 1) = referenceTrajectoryIndices.at(T);
  - uRSIndices uninitialized in call to conditionalResample();
  - other algorithmic mistakes for stratified and systematic

WIP part searching for bugs:
  - see `Rcpp::Rcout` statements
- Change name lookup in derived conditionalSampler class from explicit
naming to using which improves readability of the code.
Completes blueprint started in commit 4ec3b5f.
Some facilities are not required/implemented in the conditionalSampler derived class
e.g. adaptation and MCMC moves.

Overload anything that works in the base sampler class but not in the conditional
derived class a statement that throws an exception, and make adjustments to member
functions and data members used in the derived class. Specifically

1. conditionalSampler.h
    - Delete redundant data members (e.g. nRepeats)
    - Redundant member functions (e.g. SetAdaptMethods) throw an exception
    - Other member functions using above data members and member functions are
    adjusted or deleted (which also has the benefit of making the code more readable
    and understandable in case of conditional SMC)
    - access to member function or member variable of other object
    (e.g. adaptMethods<Space,Params>;) through a pointer (pAdapt->updateForMCMC())
    are deleted
2. history.h
    - add function overload to Set()-member of the historyelement class that does not
    use nAccepted or nRepeat since that need not be tracked within a conditional
    sampler that doesn't make use of adaptation or MCMC moves
3. smc-exception
    - add new exception (message) types to throw when facilities under 1. are used
- ostream<< operator overload for derived class to print:
   - particle value
   - particle log-weight
   - particle weight
- add a digitsPrint integer so that rounding is performed for digitsPrint number of
digits
- Printing of an instance of a conditionalSampler (within e.g. other packages or
.cpp files sourced via Rcpp::source()) is safest for Rcpp::Rcout()
- Because of the above, a dedicated .print() member may be useful.
- Also, one should add a pretty printing of ancestral lines (either per default in
current facilities other via an addtional bool argument to a .print() member)
Rcpp::Rostream<true>& Rcpp::Rcout = Rcpp::Rcpp_cout_get();
Rcpp::Rostream<false>& Rcpp::Rcerr = Rcpp::Rcpp_cerr_get();
#endif

Copy link
Member Author

@ilyaZar ilyaZar Aug 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @eddelbuettel. Sorry for not checking RcppExports.cpp for that particular commit (I remembered that this was a thing last time).

It think it appeared because of my use of Rcpp::Rcout to do some debugging thing and then I forgot to remove print statements before committing (as I usually do) to have a clean commit.

I think it probably appears via compile attributes

  • R CMD INSTALL --preclean --no-multiarch --with-keep.source is executed directly in vscode whenever I build rcppsmc there ->lines 10-13 do NOT appear
  • "Clean and rebuild in RStudio" from reading the RStudio pane:
    ==> Rcpp::compileAttributes()
    
    * Updated src/RcppExports.cpp
    * Updated R/RcppExports.R
    
    ==> R CMD INSTALL --no-multiarch --with-keep.source rcppsmc
    

So the latter specifically adds those lines while my vscode build execution does not (compile attributes is missing).

Now here is the thing (why this might be worth mentioning):

  1. Now if comment out that particular statement like so // Rcpp::Rcout <<.... and rebuild neither RStudio nor vscode removes that entry in RcppExports.cpp.
  2. If I delete that (commented out) line alltogether neither RStudio nor vscode remove the entry in RcppExports.cpp
  3. If I delete lines 10-13 manually while either 1. or 2. happened, nothing is added via RStudios building statement or R CMD INSTALL --preclean --no-multiarch --with-keep.source of my vscode build.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(That is a non issue and simply due to Rcpp 1.0.7 being newer than 1.0.6 or other. It's a minor new feature for the output device Rcpp uses -- see the release notes for Rcpp 1.0.7. In short nothing to worrry about. [ Unless I completely misunderstand what you are asking. ] )

@adamjohansen
Copy link
Collaborator

Hi @ilyaZar, I'm starting to lose track of where we are.

Is there stuff you still want to do before we consider this "finished" or should I have a final look at the overall PR now?

@ilyaZar
Copy link
Member Author

ilyaZar commented Aug 17, 2021

Hi @adamjohansen !

Sorry, you are absolutely right, I probably should have said what my plans are before doing a random walk through the library and changing and committing things that may not be necessary.

I want to do a small change int the main body of the conditionalSampler and check on the residual resampling part of the conditionalResampling function a final time, going through it step by step. I'll let you know when I am done (probably tommorrow afternoon)!

Thanks!

@eddelbuettel
Copy link
Collaborator

eddelbuettel commented Aug 18, 2021

I am not a fan of PRs that cover a dozen files in whitespace-only changes.

It maybe too much work to revert that now (you could revert to a prior commit and only apply the real changes and force push, but don't do it unless you're comfortable doing so). It's not the end of the world but it obscures the 'who changed what where and why' and just adds noise.

@ilyaZar
Copy link
Member Author

ilyaZar commented Aug 18, 2021

Sorry, I was in the middle of writing some remarks in the discussion and I missed this message in the process @eddelbuettel . I'll have a thought how to do revert this (removing whitespaces helped me a bit with my editor behaviour but I realize now that it's on me to make my vscode not highlighting all the extra whitespaces and that this does not really belong into a committ...)

@eddelbuettel
Copy link
Collaborator

Yeah if you could tame vscode for now (as Adam and I do with The Editor That Matters (TM) ;-) ) and maybe resubmit it would be cleaner. It is clearly on us -- should not have those back then when -- but there is also a general sentiment that bulk whitespace commits are meh.

- add a MoveReferenceParticle() function to call DoConditionalMove member
    function of  moveset-class similar to MoveParticles() function in sampler.h
- change to deterministic assignment Kt=0 instead of sampling from lambda for the
conditional multinomial ressampling case to save one draw
- change rounding in operator<< overload when printing conditional class:
    - rounding is still performed for log-weights and normalized weights
    - rounding is removed when printing state values since useful implementation
    of operator<< overload for Space-class cannot be assumed (or should be well
    documented so the user implements this herself)
- fix a bug at the end of conditional residual resampling

- Update ChangeLog with commit hisory of this branch.
@ilyaZar
Copy link
Member Author

ilyaZar commented Aug 19, 2021

Hi, Done: just reverted to a prior commit and forced pushed only relevant changes.

I hope I did not mess around with your git-histories @adamjohansen @LeahPrice @eddelbuettel. From what I could find out, one should not git rebase and force push because rebasing changes previously written commit objects in git histoy(I think).

I believe that caused the problem @adamjohansen had last time when merging my pull request because I did exactly that...

It may be different if one git revert, then apply only relevant changes and then force push (as @eddelbuettel suggested) but I could not really find out more about this ... So sorry if this causes any problems in your local branches or annoying merge conflicts...

Copy link
Collaborator

@eddelbuettel eddelbuettel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me (on a somewhat superficial skim).

Thanks for removing the whitespace-only changes, it's appreciated. We should be fine pulling once merged.

inst/include/conditionalSampler.h Show resolved Hide resolved
Copy link
Collaborator

@adamjohansen adamjohansen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @ilyaZar -- I've just read through all of this and still have a few small comments, one in particular about exactly how you're managing the act of setting particle values within the conditional resamping that didn't seem completely clear to me. But this is very close to being done, I think.

@@ -0,0 +1,673 @@
// -*- mode: C++; c-indent-level: 4; c-basic-offset: 4; indent-tabs-mode: nil; -*-
//
// sampler.h: Rcpp integration of SMC library -- sampler object
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A pedantic point, but shouldn't this be conditionalSampler.h with a slughtly more descriptive name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! I think I understood the point about the first line so I won't remove it (for consistency with the other files in the package).

// along with RcppSMC. If not, see <http://www.gnu.org/licenses/>.

//! \file
//! \brief Defines the overall sampler object.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good for the one-line summary of what the file does to be different to that for sampler.h

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

inst/include/conditionalSampler.h Show resolved Hide resolved
inst/include/conditionalSampler.h Outdated Show resolved Hide resolved
inst/include/conditionalSampler.h Show resolved Hide resolved
inst/include/smc-exception.h Outdated Show resolved Hide resolved
inst/include/smc-exception.h Outdated Show resolved Hide resolved
//Perform the replication of the chosen.
for(int i = 0; i < N ; ++i) {
if(uRSIndices(i) != static_cast<unsigned int>(i)){
pPopulation.SetValueN(pPopulation.GetValueN(static_cast<int>(uRSIndices(i))), i);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be misunderstanding something here, but is there a possible race condition here? The value of each particle is set to the value of its ancestor in a loop but the value of the ancestor could already have been overwritten, no? (This was dealt with in the standard case by assuring that if particle i had any offspring then one of them had index i but I don't think that this is the case here.)

Am I missing something (I'm quite prepared to believe that I am)? Otherwise the cleanest option is to copy the pre-resampling particle set and use the values within that collection where we have GetValueN at present.

Copy link
Member Author

@ilyaZar ilyaZar Aug 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are absolutely right @adamjohansen !

I've done a change though I am not sure if it is 100% what you suggested. Copying the whole particle set might be too much (and also It think as currently implemented = does a shallow copy of the pPopulation object).

So I copied only pPopulation.value member via GetValue() into a seperate populationValueCopy and did the ancestor assignment with values taken from that vector.

We might want to add a note to the To-Do:

  1. Either add a member that permutes the particle set to the population class (and then implement it efficiently later). Maybe the operation might be common enough to justify a member-function a dedicated member function?
  2. Permutation member function ccould be optimized. I think there are some algorithmic ways how to permute without pre-copying that might be worth considering for large particle sets, but that is definitely something for the future too.
  3. Shallow vs. deep copy of the population object? (just mentioning it for completeness)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Indeed, I meant copy the collection of values rather than the object itself.
I agree that 1, 2 might make sense -- and that we at least want the facility to perform deep copying, whether that's default behaviour (which perhaps it should be) or not.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you have here should be fine; but perhaps the copy should be explicitly cleared once we're finished with it?

conditionalSampler.h:
- Improve description of conditionalSampler class.
- Add comment when fixing reference trajectory index to zero for conditional
    multinomial resampling.
- Fix error when replication particle values in conditionalResampling():
    - pre-copy pPopulation values before replicatoin (based on ancestor indices) such
    that previous ancestor value won't be overwritten in for loop

smc-exception.h:
- remove duplicated lines
- add unique codes to different exception types
@ilyaZar
Copy link
Member Author

ilyaZar commented Aug 20, 2021

Hi all,

quick sorry @eddelbuettel . I wanted to ask @adamjohansen for reviewing but ended up hitting the wrong button, hence the unnecessary re-request. Seems like I can't undo it once the request is out...

Are there still changes you'd like me to add or test something to help finalize the PR? (I am essentially free this weekend)

If not, I'll tend to focus on finalizing and writing up the remaining parts (in terms of commits made so far and other things) for GSoC now (deadline's next monday) and we can discuss this PR and other issues that might or might not come up in the next/following weeks?

Copy link
Collaborator

@adamjohansen adamjohansen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're essentially there; one query about whether the vector of copied particle values ought to be explicitly cleared after use is all that remains and that's not critical.

@adamjohansen
Copy link
Collaborator

Is there an example that uses this code available to the world at large? That's the one thing that it would obviously be good to provide; e.g. a simple linear gaussian setting in which one could compare the various implementations (perhaps via normalizing constant estimates which are pretty well understood).

@adamjohansen
Copy link
Collaborator

@LeahPrice does this look good to you?

Copy link
Collaborator

@eddelbuettel eddelbuettel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine, one trivial comment.

ChangeLog Outdated
2021-08-11 Ilya Zarubin <ilya.a.zarub@gmail.com>

* inst/include/conditionalSampler.h: resolve nondependent name look-up in
derived conditionalSampler class via using-statements instead of explicit naming to make code cleaner and facilitate readability (implicit name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line length could do with adjustment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@LeahPrice
Copy link
Collaborator

Thanks everyone. I read through all of the code and made a couple of very small comments last night. I can't say I understand algorithm 6 completely, but I'm happy with the PR and ready for it to be merged when others are

//Step 2:
//Calculation of empirical distribution function F_{t - 1}^N(i) is done and equal to computation of cumulative normalized weights stored in dRSWeightsCumulative.
//Step 3: Generate ancestor indices and sssign to offspring indices {1,...,N}\{Kt}.
std::vector<unsigned int> tmpIterator(N - 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this and line 493 have (N) rather than (N-1)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if this and the other comment didn't come through last night (20th) - I didn't realise I had to select approve before these would be visible

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

//Generate D_i set:
if(expectedNumberOffspring > 0) {
//Set D_i={l + 1, ..., l + expectedNumberOffspring}
DsetCurrent = arma::linspace<arma::Col<unsigned int> >(l + 1, l + expectedNumberOffspring);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a third argument in the linspace function? Based on the Armadillo documentation, I think this by default creates a vector of length 100 but I think we want expectedNumberOffspring instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch!

@ilyaZar
Copy link
Member Author

ilyaZar commented Aug 21, 2021

  • Is there an example that uses this code available to the world at large? That's the one thing that it would obviously be good to provide; e.g. a simple linear gaussian setting in which one could compare the various implementations (perhaps via normalizing constant estimates which are pretty well understood).

    @adamjohansen yes! The SVmodelRcppSMC package is implementing PG. I added a link to the package in PR Documentation and miscellaneous #61 in the README file (where the other packages are listed). I guess we can provide an "example use case to the world" whenever we decide that this PR is finished. (simple linear Gaussian setting as an explicit package example is certainly also doable if you fancied that let me know!)

  • I feel like this is a really stupid question on my side but why do you want to clear populationValueCopy (the vector of particle values) explicitly? It's a local variable to the function (call) conditionalResample() and should be destroyed (is destroyed or deleted from the stack what you mean by "explicitly cleared"?) automatically once the function returns? I found https://stackoverflow.com/questions/35514909/how-to-clear-vector-in-c-from-memory and based on the post from Johan Lundbeg populationValueCopy.clear() would do the job, yet it gets "cleared/destroyed" right after the loop, or?

  • @LeahPrice thanks for pointing out these cases! (I could swear we already had the N-1 vs N thing, but strangely, though, that this keeps hunting me :) )

ChangeLog:
- adjust line width in ChangeLog

conditionalSampler.h:
- fix missing `break;` in switch statements from conditionalResample()
- fix N-1 vs. N sizes in construction of the tmpIterator
- add missing argument to arma::linspace
@adamjohansen
Copy link
Collaborator

I'm certainly not insisting on explicitly tidying up local variables, it just always seems to me that code is more readable if you get rid of things which you've explicitly created when you've finished with them and it keeps things tidy if the code is extended later (and I tend to distrust automatic garbage collection because I'm old...).

@adamjohansen
Copy link
Collaborator

Time is finite and it may not be worth the energy, but the advantage of a linear Gaussian case is that you can compare Monte Carlo estimates with the truth which makes it easier to check that everything is working -- and particularly with the NC estimates, one can exploit unbiasedness to gain some confidence that everything is working.

@adamjohansen
Copy link
Collaborator

I think we're probably ready to merge, but @LeahPrice raised the most recent queries so it probably makes sense for her to press the button if the current version look okay.

@LeahPrice LeahPrice merged commit ff94516 into rcppsmc:master Aug 22, 2021
@LeahPrice
Copy link
Collaborator

This all looks good to me now, thanks @ilyaZar for the hard work!

@ilyaZar
Copy link
Member Author

ilyaZar commented Aug 22, 2021

Thanks for merging @LeahPrice !

@ilyaZar ilyaZar deleted the conditional-smc2 branch September 5, 2021 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants