Skip to content

fixed error with non-constant mu vector in dcar_proper sampler#1158

Merged
paciorek merged 11 commits into
develfrom
fix_car_proper_mean
Sep 24, 2021
Merged

fixed error with non-constant mu vector in dcar_proper sampler#1158
paciorek merged 11 commits into
develfrom
fix_car_proper_mean

Conversation

@danielturek
Copy link
Copy Markdown
Member

Fixes a bug in MCMC sampling of dcar_proper distributions, when the mean vector mu is non-constant.

The original (buggy) version correctly implemented the equations in the GeoBUGS User Manual, Version 1.2 (September 2004), which contains a typo in the conditional specification of the proper CAR distribution.

Fixes #1157

@danielturek
Copy link
Copy Markdown
Member Author

Noting this PR might not initially pass testing, since testing may well include posterior results for proper CAR models.

@paciorek
Copy link
Copy Markdown
Contributor

Thanks, @danielturek . Would you like to fix the equation for the conditional density for the CAR in the manual or should I?

@danielturek
Copy link
Copy Markdown
Member Author

Done. Nice eye @paciorek

@paciorek
Copy link
Copy Markdown
Contributor

paciorek commented Sep 16, 2021

@danielturek I have an efficiency request here. Can you call getParam for mu once (instead of twice) and then index into it for the target and for the neighbors. I did a quick check and changing to that looks like it might save 10% of the MCMC time on Connor's example.

@danielturek
Copy link
Copy Markdown
Member Author

@paciorek I just made these changes. Only one call to getParam now. See what you think.

@paciorek
Copy link
Copy Markdown
Contributor

@danielturek did you forget to push to github?

@danielturek
Copy link
Copy Markdown
Member Author

@paciorek Sorry. Done. Pushed just now.

@paciorek
Copy link
Copy Markdown
Contributor

Three further thoughts (sorry to belabor things)

  1. So the 'plusOne' business seems hard to read. Is there a reason not to assign the full vector of means to a variable and then index that as needed to get the targetMu and neighborMus? I don't think there will be extra cost since there presumably is at least temporary vector created by the output of getParam, regardless of whether we create one explicitly. Sorry to back-seat drive, but given I'm looking at it, that's my reaction.
  2. Can we take this opportunity to remove the for loop in the setup code and use %in%:
neighborIndices[1, ] <- which(targetDCARscalarComponents %in% neighborNodes)
  1. Also, do you know why we are using 1-row 2-d arrays instead of a vector for neighborsC, and neighborIndices?

@danielturek
Copy link
Copy Markdown
Member Author

No worries about pushing back on these design/implementation choices.

  1. Logic is that the total number of means could be huge, but largest number of neighbors comparatively small. I think this makes sense.
  2. Could use %in%, but trying to be defensive against non-consecutive declarations of CAR process nodes. Maybe that's not possible, but this is explicit, and it's in setup code, so this seemed safer to me. Also noting, the code you drafted would not work, would be incorrect, due to LHS indexing - so more care and code would be necessary regardless.
  3. Option is between creating vectors of at least +2 length extra (to avoid scalar vs. vector ambiguity, while also accounting for the num neighbors = 0 case) or otherwise to use arrays, as was done here. If using vectors is known to be much better for performance reasons, then this design choice could be changed.

@paciorek
Copy link
Copy Markdown
Contributor

Ok, thanks for responses.

Regarding item 1, I don't see how your approach saves anything. In this code

targetNeighborMus <- model$getParam(targetDCAR, 'mu')[targetNeighborIndices[1,1:numNeighborsPlusOne]]

based on looking at the generated C++, an intermediate variable of length N is getting created based on the call to getParam, so doing the indexing within the RHS is no different than assigning the result of getParam to an explicit variable and then grabbing the subsets out of the full vector.

@paciorek
Copy link
Copy Markdown
Contributor

Side note: I was curious if it mattered that we call getParam on every individual target, so I did some monkeying around (I needed to create a version of $run that took an input parameter, which required creating sampler_BASE2) and if we instead call getParam before looping through the componentSamplerFunctions and pass the mean vector into the individual samplers it does save a bit of time on this example (9.2 seconds vs. 9.8). Not sure how that would scale as N changes. Not advocating we do this now, but something to consider more at some point.

@danielturek
Copy link
Copy Markdown
Member Author

@paciorek Thanks for explaining this. I understand your comments, about each component sampler getting the entire mean vector, into a length-N vector, then subsetting as necessary.

I agree, this could be avoided by one call to getParam in the encompassing CAR sampler function, then passing the relevant neighbor mean components to each component (scalar) sampler function. Given the small amount of testing you did, my inclination is to leave this as-is - which is working, simpler, and very similar in your speed test.

How do you feel about that? Or, is there another suggestion that you made, which I missed?

@paciorek
Copy link
Copy Markdown
Contributor

I'm happy to skip using getParam once outside of the individual sampler functions, but I would still vote that we use getParam inside the individual sampler function run code to extract the entire mean vector to a local variable in the run code and then subset that to get the target and the neighbor means, avoiding the "plusOne" approach of concatenating the target and neighbor indices, which I think makes the code harder to read. As I mentioned in either your approach or my suggestion, there is a temporary N-long vector being created in the C++, so using getParam and then extracting the target+neighbor values in one line of code doesn't actually save us anything.

@danielturek
Copy link
Copy Markdown
Member Author

@paciorek Ok, that makes sense. Are you able to make this change in the CAR component sampler code?

@paciorek
Copy link
Copy Markdown
Contributor

@danielturek Yes, I'll make that change.

@paciorek
Copy link
Copy Markdown
Contributor

Ok, we now set a variable that contains the entire vector of mu values in the run code and then index that.

I also realized that we could use match in place of the for loop in setup code and it should achieve the same robustness but much faster. I do think that with a for loop over target elements and then the for loop in setup code that we could have non-negligible time spent on that loop and the use of match should speed things up by at least an order of magnitude if not more.

@danielturek feel free to take a look.

@danielturek
Copy link
Copy Markdown
Member Author

Just pushed a minor change of variable name.

(The previous name targetNeighborIndices was meant to suggest "target index and neighbor indices together". Changed this name back to neighborIndices)

@paciorek
Copy link
Copy Markdown
Contributor

Good catch. Thx.

@paciorek paciorek merged commit 5449f68 into devel Sep 24, 2021
@paciorek paciorek deleted the fix_car_proper_mean branch September 24, 2021 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

dcar_proper posterior appearing biased with covariates in monte carlo study

2 participants