fixed error with non-constant mu vector in dcar_proper sampler by danielturek · Pull Request #1158 · nimble-dev/nimble

danielturek · 2021-09-15T22:48:23Z

Fixes a bug in MCMC sampling of dcar_proper distributions, when the mean vector mu is non-constant.

The original (buggy) version correctly implemented the equations in the GeoBUGS User Manual, Version 1.2 (September 2004), which contains a typo in the conditional specification of the proper CAR distribution.

Fixes #1157

danielturek · 2021-09-15T22:49:30Z

Noting this PR might not initially pass testing, since testing may well include posterior results for proper CAR models.

paciorek · 2021-09-16T01:03:01Z

Thanks, @danielturek . Would you like to fix the equation for the conditional density for the CAR in the manual or should I?

…ecification mean term

danielturek · 2021-09-16T12:06:33Z

Done. Nice eye @paciorek

paciorek · 2021-09-16T16:08:38Z

@danielturek I have an efficiency request here. Can you call getParam for mu once (instead of twice) and then index into it for the target and for the neighbors. I did a quick check and changing to that looks like it might save 10% of the MCMC time on Connor's example.

danielturek · 2021-09-16T17:14:53Z

@paciorek I just made these changes. Only one call to getParam now. See what you think.

paciorek · 2021-09-16T19:20:19Z

@danielturek did you forget to push to github?

danielturek · 2021-09-16T20:25:46Z

@paciorek Sorry. Done. Pushed just now.

paciorek · 2021-09-17T01:42:33Z

Three further thoughts (sorry to belabor things)

So the 'plusOne' business seems hard to read. Is there a reason not to assign the full vector of means to a variable and then index that as needed to get the targetMu and neighborMus? I don't think there will be extra cost since there presumably is at least temporary vector created by the output of getParam, regardless of whether we create one explicitly. Sorry to back-seat drive, but given I'm looking at it, that's my reaction.
Can we take this opportunity to remove the for loop in the setup code and use %in%:

neighborIndices[1, ] <- which(targetDCARscalarComponents %in% neighborNodes)

Also, do you know why we are using 1-row 2-d arrays instead of a vector for neighborsC, and neighborIndices?

danielturek · 2021-09-17T02:05:34Z

No worries about pushing back on these design/implementation choices.

Logic is that the total number of means could be huge, but largest number of neighbors comparatively small. I think this makes sense.
Could use %in%, but trying to be defensive against non-consecutive declarations of CAR process nodes. Maybe that's not possible, but this is explicit, and it's in setup code, so this seemed safer to me. Also noting, the code you drafted would not work, would be incorrect, due to LHS indexing - so more care and code would be necessary regardless.
Option is between creating vectors of at least +2 length extra (to avoid scalar vs. vector ambiguity, while also accounting for the num neighbors = 0 case) or otherwise to use arrays, as was done here. If using vectors is known to be much better for performance reasons, then this design choice could be changed.

paciorek · 2021-09-17T15:22:20Z

Ok, thanks for responses.

Regarding item 1, I don't see how your approach saves anything. In this code

targetNeighborMus <- model$getParam(targetDCAR, 'mu')[targetNeighborIndices[1,1:numNeighborsPlusOne]]

based on looking at the generated C++, an intermediate variable of length N is getting created based on the call to getParam, so doing the indexing within the RHS is no different than assigning the result of getParam to an explicit variable and then grabbing the subsets out of the full vector.

paciorek · 2021-09-17T15:27:00Z

Side note: I was curious if it mattered that we call getParam on every individual target, so I did some monkeying around (I needed to create a version of $run that took an input parameter, which required creating sampler_BASE2) and if we instead call getParam before looping through the componentSamplerFunctions and pass the mean vector into the individual samplers it does save a bit of time on this example (9.2 seconds vs. 9.8). Not sure how that would scale as N changes. Not advocating we do this now, but something to consider more at some point.

danielturek · 2021-09-22T21:59:20Z

@paciorek Thanks for explaining this. I understand your comments, about each component sampler getting the entire mean vector, into a length-N vector, then subsetting as necessary.

I agree, this could be avoided by one call to getParam in the encompassing CAR sampler function, then passing the relevant neighbor mean components to each component (scalar) sampler function. Given the small amount of testing you did, my inclination is to leave this as-is - which is working, simpler, and very similar in your speed test.

How do you feel about that? Or, is there another suggestion that you made, which I missed?

paciorek · 2021-09-22T22:32:17Z

I'm happy to skip using getParam once outside of the individual sampler functions, but I would still vote that we use getParam inside the individual sampler function run code to extract the entire mean vector to a local variable in the run code and then subset that to get the target and the neighbor means, avoiding the "plusOne" approach of concatenating the target and neighbor indices, which I think makes the code harder to read. As I mentioned in either your approach or my suggestion, there is a temporary N-long vector being created in the C++, so using getParam and then extracting the target+neighbor values in one line of code doesn't actually save us anything.

danielturek · 2021-09-22T23:18:02Z

@paciorek Ok, that makes sense. Are you able to make this change in the CAR component sampler code?

paciorek · 2021-09-22T23:59:30Z

@danielturek Yes, I'll make that change.

paciorek · 2021-09-23T01:37:54Z

Ok, we now set a variable that contains the entire vector of mu values in the run code and then index that.

I also realized that we could use match in place of the for loop in setup code and it should achieve the same robustness but much faster. I do think that with a for loop over target elements and then the for loop in setup code that we could have non-negligible time spent on that loop and the use of match should speed things up by at least an order of magnitude if not more.

@danielturek feel free to take a look.

danielturek · 2021-09-23T01:46:53Z

Just pushed a minor change of variable name.

(The previous name targetNeighborIndices was meant to suggest "target index and neighbor indices together". Changed this name back to neighborIndices)

paciorek · 2021-09-23T01:54:34Z

Good catch. Thx.

and remove Mi<=0 check in car proper sampling

fixed error with non-constant mu vector in dcar_proper sampler

71b807b

updated testing for bug fix for dcar_proper distribution

fb94a2b

updated User Manual to reflect bug fix for dcar_proper conditional sp…

5b874cc

…ecification mean term

changed dcar_proper density to make one call to get_param

3da8cd2

paciorek added 2 commits September 22, 2021 18:04

speed up finding of neighbors with match()

a84e5b0

finalize speedup of CAR_proper_evaluateDensity

e08174e

variable renaming

bdf41c8

paciorek added 4 commits September 23, 2021 08:19

add note about islands and car proper

1bc6203

and remove Mi<=0 check in car proper sampling

one more comment on islands in car

ac47ca5

Merge branch 'devel' into fix_car_proper_mean

573f605

reorder car test results order given modelvalues reordering

39730ab

paciorek merged commit 5449f68 into devel Sep 24, 2021

paciorek deleted the fix_car_proper_mean branch September 24, 2021 14:53

Conversation

danielturek commented Sep 15, 2021

Uh oh!

danielturek commented Sep 15, 2021

Uh oh!

paciorek commented Sep 16, 2021

Uh oh!

danielturek commented Sep 16, 2021

Uh oh!

paciorek commented Sep 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danielturek commented Sep 16, 2021

Uh oh!

paciorek commented Sep 16, 2021

Uh oh!

danielturek commented Sep 16, 2021

Uh oh!

paciorek commented Sep 17, 2021

Uh oh!

danielturek commented Sep 17, 2021

Uh oh!

paciorek commented Sep 17, 2021

Uh oh!

paciorek commented Sep 17, 2021

Uh oh!

danielturek commented Sep 22, 2021

Uh oh!

paciorek commented Sep 22, 2021

Uh oh!

danielturek commented Sep 22, 2021

Uh oh!

paciorek commented Sep 22, 2021

Uh oh!

paciorek commented Sep 23, 2021

Uh oh!

danielturek commented Sep 23, 2021

Uh oh!

paciorek commented Sep 23, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

paciorek commented Sep 16, 2021 •

edited

Loading