Skip to content

wrap BNP cluster samplers so only used if cluster has observations#839

Closed
paciorek wants to merge 64 commits into
BNPfrom
nosample_empty_clusters
Closed

wrap BNP cluster samplers so only used if cluster has observations#839
paciorek wants to merge 64 commits into
BNPfrom
nosample_empty_clusters

Conversation

@paciorek
Copy link
Copy Markdown
Contributor

@paciorek paciorek commented Jan 3, 2019

Do not merge: for @danielturek to review @paciorek approach to not running MCMC samplers for BNP cluster parameters when a given cluster has no constituent observations.

This work builds off @perrydv reversible jump sampler, extending it to wrap whatever sampler is assigned to the cluster parameter by configureMCMC.

danielturek and others added 19 commits November 21, 2018 15:38
* defensive WAIC model handling

* undid changes to tr travis testing again

* changed to model$calculate()

* changed to currentVals2

* commented out currentVals <- values(model)

* added back in someVals <- values(model)

* added back in someVals <- values(model)

* changed to using model$values() syntax

* using values(model, sampledNodes) again....

* wont work

* different approach: storing logProb values
also fixed bug where $waic not $WAIC used in demo code
* fix dimension handling in matrix2VecNimArr

* added test.  removed C++ diagnostic code.  Supported only matrix input.
* added warning about multiply-defined nodes in genExpandedNode...

* fix check of redefined nodes for temporary unknownIndex declarations
* monkeying with ignore.stderr

* augmented failed to create shared library msg

* full rework of compilation output to
print errors under failure

* update compiler error output so only printed upon
user request if showCompilerOutput is FALSE
* avoid flag multiply defined nodes
when node is lifted from RHS

* fix car_proper test model
and omit stray browser
* fix bug in compareFilesByLine for gold checking

* updated test-graphStructure to use std goldFile system

* updated filtering goldFile in light of 0.6-12 changes;
now that goldFile testing is fixed

* minor edit to test-dynIdx to suppress output
from sapply

* minor edits to various test files to
suppress sapply output so goldfile results match

* updated user goldfile in light of changes
s

* reinstantiated user-supplied fxn tests in test-user;
not clear why they were turned off

* clean up error msg in initialize model regarding determ nodes

* updated mcmc goldfile

* update stripping of testthat deprecation message

* clean up typo in test-mcmc.R

* fixed extraneous stuff in test-mcmc.R

* trying to fix goldfile comparisons

* debug dynIdx test failures

* try again with dynIdx test

* another try with dynIdx

* another try with dynIdx

* fix dynIdx goldfile
… control param to RW_block. (#772)

* set up RW and RW_block to calculate prior log prob first.  Added tries control param to RW_block.

* make correct separation between first and second stage of dependency calculations in sampler_RW_block

* set jump <- FALSE on -Inf prior logProb value

* put in unnecessary calls to jump() in order to trigger runif() to get same sampling sequence for testing

* fix typo in sampler_RW_block

* in inst/CppCpde/Utils.cpp decide(), force RNG use even when input is NaN, purely for test comparison purposes.

* Also modify R version of decide()

* remove code designed for matching chains to calcPriorFirst-devel-proxy

* updated test-dynamicIndexing in light
of calcPriorFirst (changes to "basic mixture model with conjugacy")

* monkey with exact mcmc output for calcPriorFirst

* monkey with exact mcmc output for calcPriorFirst(2)

* update test-trunc in light of calcPriorFirst

* remove stray line from decide in calcPriorFirst machinations

* fix bug in compareFilesByLine that was causing
no goldFile comparison to be made; ever since Sep. 2017

* update mcmc/trunc goldfiles in light of calcPriorFirst changes to RNG sequence

* updated tolerance for interval cens test
@paciorek paciorek requested a review from danielturek January 3, 2019 01:50
@paciorek
Copy link
Copy Markdown
Contributor Author

paciorek commented Jan 3, 2019

@danielturek ignore the line

clusterNodeInfo <- list(clusterVars = 'muTilde', clusterNodes = list( paste0('muTilde[', 1:50, ']')))

in configureMCMC.R. That is only there because there is some work on a separate branch that would need to get merged in for us to be able to automatically create clusterNodeInfo. At the moment, I'm just hard-coding in what clusterNodeInfo looks like for a specific example model I was using when developing this.

In case you want to actually be able to run the code, here is the example model:

code <- nimbleCode({
    xi[1:n] ~ dCRP(conc0, n)
    for(i in 1:n){
        mu[i] <- muTilde[xi[i]]  
        y[i] ~ dnorm(mu[i], 1)
    }
    for(i in 1:n) {
        muTilde[i] ~ dnorm(mu0, sd = s0)
    }
})

n = 50
model <- nimbleModel(code, constants = list(n=n), data = list(y=rnorm(n)), inits = list(xi = rep(1,n),
                                                                                        conc0 = 1, muTilde = c(1:n), mu0=0, s0=1))
conf <- configureMCMC(model)
mcmc = buildMCMC(conf)
cmodel = compileNimble(model)
cmcmc <- compileNimble(mcmc,project=model)

@paciorek
Copy link
Copy Markdown
Contributor Author

paciorek commented Jan 4, 2019

Also, @danielturek just flagging that this is now multiple times that I've revised configureMCMC such that I ignore your work on samplerAssignmentRules. I wanted to mention that as I'm not sure where that stands and want to make sure I am not messing anything up.

@danielturek
Copy link
Copy Markdown
Member

@paciorek Thanks for flagging this for review.

I've taken a pretty careful look over it, and the approach looks sound, only one thing I wasn't sure if it's correct. Could a model have more than one dCRP node, and hence the code you added for clusterNodInfo, etc, would all be invoked twice? In which case, the additional code for modifying the samplers to "wrap" these (beginning on line 271 of MCMC_configuration.R), since this appears outside of the main loop beginning on line 201, that would only catch one of the occurrences of the dCRP node, right? Since closterNodeInfo and depNode would have been overwritten. Is that right? Is this situation possible?

If it's right, then I understand why the "wrapping" code must appear later, to remove the samplers from all the clusterNodes. But still, could easily be fixed by maintaining clusterNodeInfo as a list of lists, where each element contains the clusterNodeInfo of one dCRP node.

Of course, please disregard this if I'm missing some part of the big picture, and this situation isn't possible.

Thanks for the heads up about not maintaining samplerAssignmentRules. That's fine. If that system ever comes into use, I would need to bring it up-to-date anyway.

I pushed one commit to this PR, only with stylistic changes.

@paciorek
Copy link
Copy Markdown
Contributor Author

paciorek commented Jan 7, 2019

Thanks @danielturek I think you are right that we need to handle the possibility of two dCRP nodes. I will fix that when I come back to this (hopefully soon).

danielturek and others added 5 commits January 8, 2019 16:43
* added case for 5-dimensional nimArray to NimArr.h

* adding missing "int" to 5d nimArray implementation

* updated NimArrBase.h to handle up to 5-dimensional nimArrays

* added a test for 5-dimensional nimArrays

* fixed type 4 becomes 5 in NimArr.h

* added a failing 5d nimArray test to blockTests in test-coreR.R

* Filled in tests.  Fixed two identified and one new bug.
* added inherits=FALSE to exists where seems safe

* full ddexp with tests

* align parameterization of ddexp with Gelman/BUGS; fix tests; add conjugacy
fix incorrect param names in manual tables for ddexp
paciorek and others added 26 commits January 24, 2019 07:41
…split in calculations (#845)

* make sampler_categorical and sampler_binary use a prior/dependencies split in calculations

* fix underflow in sampler_categorical
* change initializeModel approach to be faster

* speed up by using maps objects more directly
@paciorek
Copy link
Copy Markdown
Contributor Author

paciorek commented Feb 9, 2019

Ok, I am fixing the issue with possibility of having two dCRP nodes in one model and also making use of nimbleFunctionList in wrapped sampler in case one model has two types of wrapped samplers. For reasons not worth going into, I'm removing this branch and replacing with nosample_empty_clusters2.

@paciorek paciorek closed this Feb 9, 2019
@paciorek paciorek deleted the nosample_empty_clusters branch February 23, 2019 16:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants