Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add testing and examples for conditional SMC. #67

Merged
merged 8 commits into from
Jun 14, 2022

Conversation

ilyaZar
Copy link
Member

@ilyaZar ilyaZar commented Sep 11, 2021

So hopefully this goes into the right direction for testing and adding a simple example to consolidate some of the past work on conditional SMC.

It's not yet ready for review but I decided to make it visible so you can have a look at it from time to time.

@eddelbuettel
Copy link
Collaborator

Note that the CI failed here. Maybe you forgot to commit an updated RcppExports.{R, cpp} or NAMESPACE file?

@ilyaZar
Copy link
Member Author

ilyaZar commented Sep 11, 2021

Yes I keep screwing things up quite strongly here for quite some time already ...

I sorry I do not recall the exact workflow with compileAttributes() (used only the roxygen command from RStudio before).

If you have a minute, could you be so kind to briefly sketch the workflow:

  • add functions/files in src/... and R/...
  • add documentation to man/..
  • run Rcpp::compileAttributes() from within package folder to update RcppExports.cpp and RcppExports.R
  • build the package

Now, simGaussianSSM is not documented within ./man or NAMESPACE yet (which causes above CI error, maybe?); but it doesn't need to be documented to be validly used within the package, or? (I only use it inside the function compareNCestimates() from the file ./R/cSMCexamples.R and do not want to export it yet, still it seems to be alien to the package).

Just pushed again with updated NAMESPACE maybe that was the thing... I am probably missing something very obvious ?

@eddelbuettel
Copy link
Collaborator

In the narrow sense, compileAttributes() deals with the Rcpp side. It updates RcppExports.{R,cpp} as well as symbol registration for the dynamic library (otherwise in init.c). All is well there. You can test that in baby packages created via RcppArmadillo.package.skeleton().

Where users can tripped up is the R and roxygen2 side. You either do or do not let roxygen2 write NAMESPACE. For many (smaller) packages I do not and then add the function to NAMESPACE myself. Others, and I as well in larger packages, like @export tags in the roxygen header in the C++ source that compileAttributes() bring to R code (!!) from where roxygen2 picks it up. More brittle.

In any event, if it doesn't pass R CMD build ... followed by R CMD check ... locally then there is little point committing and just hoping it may sort itself out. It likely won't. It can all be extra tedious but try to break it into small steps you can check one by one.

@eddelbuettel
Copy link
Collaborator

BTW, given that we are all one happy SMC family now you can just branch off RcppSMC/rcppsmc and it becomes easier for us to follow.

@ilyaZar
Copy link
Member Author

ilyaZar commented Sep 11, 2021

Thanks for the information @eddelbuettel !

Yes keeping on and try to hammer through with my forced pushes, hoping for a miracle hole to appear in the CI wall, (which is rather well protected against brute force and should be passed via thinking) is unwise for sure...

I'll give it a try next week, we are not in a hurry (I hope).

@eddelbuettel
Copy link
Collaborator

eddelbuettel commented Sep 11, 2021

In this case you were missing the second closing ] in // [[Rcpp::export]]. The little things.

(The fact that compareNCestimates_imp was not in either RcppExports.* was a tell. Once I fix that I get ... a set of vanilla compiler error -- can I hand that back to you?)

(Part of those are due to a bad idea. I would not do

using namespace std;
using namespace arma;
using namespace cSMCexamples;

and then

Rcpp::DataFrame compareNCestimates_imp(vec data,

as 'nobody knows' what vec is -- skip the using and use arma::vec. It is easier and saner.)

@ilyaZar
Copy link
Member Author

ilyaZar commented Sep 11, 2021

Yes!! I'll check it locally, thanks for the hint!

  1. standard BPF
  2. conditional BPF with (conditional) multinomial resampling
@ilyaZar
Copy link
Member Author

ilyaZar commented Sep 11, 2021

Note for all: though CI passed, this remains highly WIP and is far from ready for a merge since the other parts from #63 are meant to be implemented within this PR as well.

@ilyaZar
Copy link
Member Author

ilyaZar commented Apr 12, 2022

BTW, given that we are all one happy SMC family now you can just branch off RcppSMC/rcppsmc and it becomes easier for us to follow.

Just tried that @eddelbuettel but it seems I do not have write/read permissions (or I am doing sth. completely wrong here...)

git push -u upstream conditional-smc-testing 
ERROR: Permission to rcppsmc/rcppsmc.git denied to ilyaZar.
fatal: Could not read from remote repository.

Please make sure you have the correct access rights

I am absolutely fine with working with the fork and do PRs from there (as before), so no need to change this for me, but could probably make things easier for all to follow, as you said above.

@eddelbuettel
Copy link
Collaborator

eddelbuettel commented Apr 12, 2022

What does git remote -v say? Do you by change still have the https:// entries?

edd@rob:~/git/rcppsmc(master)$ git remote -v
origin  git@github.com:eddelbuettel/rcppsmc.git (fetch)
origin  git@github.com:eddelbuettel/rcppsmc.git (push)
edd@rob:~/git/rcppsmc(master)$ 

Edit Never mind, my mistake. You are indeed not listed at the repo so my bad. Just fork as previously then.

- D.E. bugfix during development of conditional SMC testing.
- Some changes to the innovation and measurement variances of data simulation.
@ilyaZar
Copy link
Member Author

ilyaZar commented Apr 29, 2022

Merged the changes from the other skeleton PR, so nothing to have a look at atm. Will try to fill in my additions over the weekend so we may have a more detailed look at it next week or so!

- some bugfixes
- make simulation function simGaussianSSM easier to handle
  - reduce number of model parameters
  - but include optional initial state value x_0
- update doc
- add cSMC to output
- add preliminary plot of the results
@ilyaZar
Copy link
Member Author

ilyaZar commented May 27, 2022

I've been working for some time on this PR, specifically I was double checking the conditional resampling code I implemented back then and the working paper. Regarding the stratified resampling, the following is given (p.7 of the WP):

$$p_{t-1}^i(j) := min(\sum_{k=1}^i \bar{w}^k_{t-1}, \frac{j}{N}) - max(\sum_{k=1}^{i-1} \bar{w}^{k}_{t-1}, \frac{j-1}{N}) \in[0,1/N]$$

The definition is important since, for implementing cond. resampling we first need to sample the $k_t$ index, given $k_{t-1}, w_{t-1}^{1:N}$, then set $A_{t-1}^{k_t}=k_{t-1}$, and after that sample the remaining indices. To get the realization $k_t$, we have to be able to draw from:

$$\lambda(k_t|w_{t-1}^{1:N},k_{t-1})=\frac{p^{k_{t-1}}_{t-1}(k_t)}{w^*}$$

where

  • $w^* = \bar{w}^{k_{t-1}}_{t-1}$
  • $\bar{w}^i=w^i / \sum_{j\in (1,...N)} w^j $ is a normalized weight for some index $i\in {1,\ldots,N}$

I do not even see why $p_{t-1}^i(j)$ is always positive, the cumulative sum can certainly be smaller than $j-1/N$ i.e. $\exists j$ s.th.

$$\sum_{k=1}^i \bar{w}^k_{t-1} < j-1/N < j/N \Rightarrow min(\sum_{k=1}^i \bar{w}^k_{t-1}, j/N) = \sum_{k=1}^i \bar{w}^k_{t-1} \land max(\sum_{k=1}^{i-1} \bar{w}^{k}_{t-1}, \frac{j-1}{N}) = \frac{j-1}{N} $$

(where the latter follows by the monotonicity of the normalized weight sum). And that would mean

$$p_{t-1}^i(j)= \sum_{k=1}^i \bar{w}^k_{t-1} - \frac{j-1}{N} < 0 $$

(implictly by my assumption that such a $j$ exists). I am of course also implying that the above can be true for $i=k_{t-1}$ and $j=k_t$, but then, if $p_{t-1}^i (j)$ is not always positive, how can $\lambda$ be a density we can sample from the $k_t$ index. It must be something very obvious I am missing here...

@adamjohansen
Copy link
Collaborator

Thanks, @ilyaZar, indeed the probabilities need to be positive for it to make sense. I think the presentation in the technical report was a bit lazy (and should be fixed) and this expression applies to those i for which the probability is non-zero and that the issue is resolved by replacing the probabilties specified here with the maximum of 0 and the expression given.
What you need to arrive at is a probability corresponding to the proportion of the region between j/n and (j-1)/n which the inverse of the empirical distribution function maps to index i.
Does that resolve the issue or have I missed something else? (Thanks for pointing this out, we'll tidy it up if that ancient document ever sees the light of day.)

- revise systematic and stratiefied resampling
   - improve in function documentation of algorithms
   - fix error that allowed for negative weights by wrapping
   with max(previous_expression, 0) the strata weights

- revise plot design: use layoutMatrix to fit all into one plot

- toDo: fix error in residual resampling from back then GSOC
@ilyaZar
Copy link
Member Author

ilyaZar commented May 28, 2022

Update:

To make the PR easier to follow, I moved most of the "technical" discussion of the simulation results to #63
and summarise here only the main additions:

  • fix systematic/stratified resampling scheme (see above discussion of the exact form of $p_{t-1}^i(j)$)
  • add the conditional residual resampling scheme
  • fix the the log-likelihood estimate difference of the standard BPF and conditional BPF relative to the Kalman log-likelihood estimate by:
    • changing the autoregressive parameter from 0.9 to 0.7 (which sometimes helps as it reduces the autocorrelation in the states), increased the number of particles to 1000 and the number of simulation runs to 10000
    • fixing a subtle bug: I first called Sampler.Initialize() and only then reset the resampling scheme via Sampler.SetResampleParams() that led to the first state $X_1$ being probably incorrectly approximated; now the order is changed
    • tweaking the initialization step of the BPFs: specifically initialize all states to $X_0 = 0$ (the true initialization value of the data generating process) and perform one step, from t=0 to t=1, within void sampler<Space,Params>::Initialise(void) which uses the following pfInitialize
    void cSMCexamples_move::pfInitialise(States& stateValue,
                                            double& logweight,
                                            smc::nullParams& param)
       {
           // Initialize state value: X_0 set deterministically:
           stateValue.xState = stateInit;
           //Move/propose particles from t=0 to t = 1, i.e. X_0 to X_1
           stateValue.xState = params.statePhi * stateValue.xState + R::rnorm(0.0,sqrt(params.stateVarEvol));
           // Initilialize log-weight W_1
           logweight = computeLogLikelihood(0, stateValue);
       }
    

Maybe some of my thoughts:

  • one could think of putting all this plots jointly in a single big one (as now) or split them into smaller ones?
  • one could think of any other aesthic changes to these plots?
  • regarding the PR code and commits: the changes relate to the example implementation (cSMCexamples.cpp, cSMCexamples.h and cSMCexamples.R) as well as some changes to the main library i.e. the conditional resampling facilities fixes and additions from above; I can split up these changes a bit, the PR is quite messy atm. because there is a lot of changes to different places, but this may need some time

@ilyaZar
Copy link
Member Author

ilyaZar commented May 30, 2022

Updates to this PR

I just shortened results of this PR in the above comment, and moved the main simulation results to the corresponding discussion section.

I would like to turn this PR from a draft to be merged. What do you think @adamjohansen @LeahPrice @eddelbuettel , are there any specific requests, or things I have missed and could check additionally?

I know this is quite a lot for a PR, but at least we have now a good and thorough testing of these conditional facilities and a good reference point/ documentation we can go back to and re-check if something breaks in the future.

When this is done I would like to turn to writing up the paper finally, but I still do not have write access to the paper repo (https://github.com/rcppsmc/post-gsoc) so I don't think I can merge @LeahPrice 's PR https://github.com/rcppsmc/post-gsoc/pull/2 , unfortunately.

@ilyaZar ilyaZar force-pushed the conditional-smc-testing branch 5 times, most recently from 21629f3 to 0793cea Compare June 6, 2022 12:34
- add improvement of initialization step: move from X_0 to X_1
within the pfInitialize function
- possible bug fixed: change order of .SetResampleParams() and
.Initialise() member functions of the sampler and conditionalSampler
class; SetResampleParams should be performed BEFORE first
initializaion (otherwise first period state X_1 is incorrectly
sampled)
- add printing statemenet of the simulation iteration
- change default autoregressive parameter phi from 0.9 to 0.7
(reduce autocorellatio in the simulated model by default)
- provide possible fix to conditional residual resampling alg.
within conditionalSampler class
    -> output of outputNCestimates indicates that residual
    resampling is working fine but further checks may be necessary
- allow outputNCestimates to make plot of conditional resampling
as an illustration of this feature (changes to cSMCexamples.cpp
and cSMCexamples.R)
    -> update labels for Kalman filter and title of boxplots:
    use 'log-likelihood' as this is what is actually shown
- make boxplots horizontal
- major revision of the documentation
    - add model description to the details part of simGaussianSSM
    - remove old text parts
    - change output names to more evocative
    - fix typos: 'likelihood' must be 'log-likelihood' and other
    typos
- reduce number of "headers" in cSMCexamples.{h,cpp} for clarity
- update ChangeLog
@ilyaZar ilyaZar marked this pull request as ready for review June 9, 2022 11:37
@LeahPrice
Copy link
Collaborator

I've taken a look at the output Ilya created here and in #63 and it all looks reasonable to me. I've taken a quick look at the code but haven't checked every line thoroughly. I suggest we merge it now as Ilya suggests so he can move on with other things. What do you think @adamjohansen and @eddelbuettel?

@ilyaZar
Copy link
Member Author

ilyaZar commented Jun 10, 2022

If all give green light (please feel free to add any suggestions before of course!!) I could go and merge this (would be my first merge, so I am curious to try).

Overusing the screenshot functions once more, write access is given here in this repo (otherwise I could not merge), but (and this is strange, or I am missing something obvious) the repo where this PR lives, https://github.com/rcppsmc/post-gsoc/pull/3, seems to not grant write access or I am not able to merge because of some Github quirk I am unaware of.

See the two snippets

From this repo:

From https://github.com/rcppsmc/post-gsoc/

@adamjohansen
Copy link
Collaborator

Sorry for being so very slow to get back to you with this, given the work involved I did want to take the time to have at least a bit of a look at it (and somehow although I kept expecting to have time to fit it in other things kept happening).

It looks great. I'd be happy to merge. I'll have a slightly more careful look at the conditional resampling code when things have settled down but it looked good on a quick reading and the testing results are very suggestive.

There's a trade-off between being self-contained and re-inventing the wheel and I'd have been inclined to use, say, fkf::fks to do smoothing for a linear Gaussian model rather than implementing something that's already been done in a fast and extensively tested way but it isn't a big deal with something this simple particularly given that we only need a univariate implementation (and I can see there are advantages to avoiding multiplying dependencies which are only used with what you might think of as rather ancillary code).

@adamjohansen
Copy link
Collaborator

Overusing the screenshot functions once more, write access is given here in this repo (otherwise I could not merge), but (and this is strange, or I am missing something obvious) the repo where this PR lives, https://github.com/rcppsmc/post-gsoc/pull/3, seems to not grant write access or I am not able to merge because of some Github quirk I am unaware of.

I've now added you as a contributor with write access to that repo. Hopefully that resolves the issue.

@eddelbuettel
Copy link
Collaborator

Seconded. It's 2022, we've been here for a while together now and training wheels should be off for everybody -- we may as well set everybody to owner. Sadly, I think, changing it at the org level does not auto-propagate to repos within so if you "see something, say something" i.e. just ping @adamjohansen or myself and we adjust settings accordingly..

- does the type 2 simulation study, see the discussion in rcppsmc
- changes from manual KF and smoother implementation to FKF::fkf
and FKF::fks
@ilyaZar
Copy link
Member Author

ilyaZar commented Jun 13, 2022

@adamjohansen added FKF::fkf and FKF::fks to the last commit and removed the manual implementation.

  1. I agree that there is a trade-off and though I checked my results against the FKF and both got the same likelihood values etc. it's probably for the better to rely on that package as also other users may recognize it.

  2. Regarding code dependency/duplication: for a perfect world one might think about the "cleanest" way being with 2 "real" dependencies Rcpp and RcppAramadillo (kitten was not a strict dependency) and re-working all old examples and move them into some demo package (e.g. RcppSMCexamples or RcppSMCclassics) and polish their figures, documentation etc.

That might have the appeal that it allows demo/example packages to be with loads of dependencies, nice graphics from ggplot2, the FKF or other implementation dependencies, or other data handling functions from the tidyverse, and god knows what, and if there is an issue with some dependency it is "only" the examples package that is affected whereas the main RcppSMC implementation still works perfectly due to its minimal dependencies. But that's just a thought and certainly something for the future, if it becomes important.

  1. Also added the boxplot of the last commit to To-Dos for possible 0.2.7 release #63 for documentation purposes, but it all looks the same as expected.

@eddelbuettel @LeahPrice @adamjohansen I'll leave that PR for a couple of days if someone comes up with something, and if not, I'll be happy to merge it by the end of the week!

@adamjohansen
Copy link
Collaborator

Looks good to me @ilyaZar and it doesn't hurt to have a few fewer lines of code to maintain.
On 2. I agree in principle separating essentials from examples which have their own dependencies does make sense in principle. (But time is finite and I don't think the current setup is causing significant difficulties at the moment.)

@adamjohansen
Copy link
Collaborator

(So I think you can go ahead and merge, if that wasn't apparent in what I wrote...)

@ilyaZar ilyaZar merged commit b47484e into rcppsmc:master Jun 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants