Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
Already on GitHub? Sign in to your account
simulate new random effects/conditional modes conditional on observed data #388
Comments
|
Nothing obvious springs to mind (although I read this fairly quickly). Can we have a reproducible example please? |
alireza202
commented
Jul 7, 2016
alireza202
commented
Jul 7, 2016
•
|
In the
Since it uses |
|
We really need a reproducible example. An anonymized version would be fine; a simplified version (i.e. construct a small, artificial data set that reproduces the general problem) would be even better. |
|
Just a hunch: what does |
alireza202
commented
Jul 7, 2016
alireza202
commented
Jul 7, 2016
|
Here is the RData file with a smaller set of the training and testing data, along with the R code: I got a few warnings while running |
alireza202
commented
Jul 8, 2016
|
Also it would be great if you can point me towards any article or writing on the math behind |
|
I'm afraid there isn't much more available about the math behind There are a number of things going on here. The basic problem is that item-specific confidence intervals are quite tricky. I don't think any of the issues are specific to your data set or to negative binomial models (sorry for making you make up a reproducible example that I turned out not to need ... I have set up the example below with the built-in Confidence and prediction intervals for unobserved levels(I think after all that this may not have been your particular question, but it was useful to think about ...) For unobserved levels, you need to make a population-level prediction (i.e. you don't have any information about the new, previously unobserved group), and there are three levels of uncertainty to consider: (1) parameter uncertainty (fixed- and random-effect, i.e. "beta" and "theta"); (2) variation of groups around population mean values; (3) conditional variation, i.e. Gaussian/Poisson/binomial distributions around predicted group-level values. Therefore, when you simulate you have to use The following code illustrates three different levels of variation.
The points are the population-level prediction; the narrowest CI is parametric uncertainty only; the middle one is parametric + random-effects variation; the outer is parametric + random-effects + conditional variation. Confidence and prediction intervals for observed levelsThis one is, believe it or not, even slightly trickier. Here we have parametric uncertainty and conditional variation as before, but the random-effect (group-level predictor) variation is more subtle. We do have some information about the uncertainty of the conditional modes (see
At present there is no easy way to incorporate the uncertainty in the conditional mode, but I may try to do something with it soon ... |
alireza202
commented
Jul 11, 2016
|
Thanks for the comprehensive answer Ben. I will closely follow future developments then. |
alireza202
referenced
this issue
in paul-buerkner/brms
Jul 11, 2016
Closed
Understanding predict function #82
|
So, just to clarify this for the record: this boils down to a wishlist item to be able to simulate new values of the conditional modes, conditioned on the observations (rather than the extremes of setting the conditional modes/random effects to zero or simulating them unconditionally)? |
alireza202
commented
Aug 30, 2016
|
My goal is to get full prediction interval/capture all uncertainties for GLMM models. I'm not sure what elements it needs. |
|
That may not be a sufficiently precise statement. For a start, I'm assuming that you want to compute the uncertainty for particular subjects, conditioning on the observed values? |
alireza202
commented
Aug 31, 2016
|
That would be accurate. |



alireza202 commentedJul 7, 2016
•
edited
I'm doing a multilevel negative binomial regression:
and do the prediction for new data using:
Since I need the prediction interval, I use bootMer:
Now when I plot the mean that I get from
bootMerand the one frompredict, they wildly differ. Here is an example:where red is the data, green is the prediction from
predictfunction, blue is the prediction frombootMer, with the corresponding confidence interval. Left of the dotted line was the training, and right side is the testing. Am I doing something wrong here? I tried doing the same thing with poisson regression, and got the same result.