Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rgampois seems to be incorrectly defined ( with respect to neg_binomial_2 from stan) #53

Closed
grepinsight opened this issue Sep 22, 2016 · 3 comments
Labels

Comments

@grepinsight
Copy link

grepinsight commented Sep 22, 2016

Hi,

Thank you so much writing an amazing textbook. I really enjoyed it a lot and also really liked the rethinking package. It made bayesian analysis enjoyable thing to do.

While using rethinking::rgampois, I noticed some inconsistency.
Here's my small experiment that illustrates the point:

# this is the function definition of rethinking::rgampois 
rgampois_rethinking <- function (n, mu, scale)  {
  shape <- mu/scale
  prob <- 1/(1 + scale)
  rnbinom(n, size = shape, prob = prob)
}

# this is what I propose
rgampois_proposed <- function(n, mu, scale) {
  prob <- scale/(scale + mu) 
 #  this definition is directly copied off of the documentation of rnbionom : 
 # "...An alternative parametrization (often used in ecology) is 
 # by the mean mu (see above), and size, the dispersion parameter, 
 # where prob = size/(size+mu). The variance is mu + mu^2/size in this parametrization...."

  rnbinom(n = n, size = scale,  prob = prob)
}
library(tidyverse)
library(rethinking)

b0_truth  <- 3
phi_truth <- 10
data_proposed <- rgampois_proposed(100,exp(b0_truth), phi_truth)
data_rethinking <- rgampois_rethinking(100, exp(b0_truth), phi_truth)

Plot

We can see from the bottom plot that the samples are very different from each other

list(data_proposed=data_proposed, data_rethinking=data_rethinking) %>%
  bind_rows() %>%
  gather(type, data) %>%
  ggplot(aes(x=data, fill=type)) +
  geom_density(alpha=0.3)

Fitting the data with map2stan

m2s_proposed <- map2stan(
  alist(
    y ~ dgampois(mu,phi),
    log(mu) ~ b0,
    b0 ~ dnorm(0,1),
    phi ~ dcauchy(0,1)
  ),
  data=data.frame(y=data_proposed)
)

m2s_rethinking <- map2stan(
  alist(
    y ~ dgampois(mu,phi),
    log(mu) ~ b0,
    b0 ~ dnorm(0,1),
    phi ~ dcauchy(0,1)
  ),
  data=data.frame(y=data_rethinking)
)
precis(m2s_proposed)
##     Mean StdDev lower 0.89 upper 0.89 n_eff Rhat
## b0  3.03   0.05       2.97       3.11   734    1
## phi 8.27   1.91       5.44      11.07   753    1
precis(m2s_rethinking)
##     Mean StdDev lower 0.89 upper 0.89 n_eff Rhat
## b0  3.05   0.09       2.92       3.18   719    1
## phi 1.78   0.28       1.36       2.21   616    1

As you can see, the estimate of phi is more or less closer to the correct phi_truth=10 in m2s_proposed but way off in m2s_rethinking.

@rmcelreath
Copy link
Owner

Thanks. If I understand right, the issue is just that the internal Stan function has a different parameterization than is assumed to be consistent with random number generator function? The map2stan model seems to be working fine.

@rmcelreath rmcelreath added the bug label Sep 23, 2016
@grepinsight
Copy link
Author

You are right.

Based on my observation, I think the parameterization of negative binomial in stan and rethinking are different as follows:

Using X ~ Gamma(a, b) where E[X]= a * b, Var[X]=a* b^2 as a reference gamma distribution(where a is a shape parameter and b a scale parameter),

  1. In rgampois of rethinking, the focus is mu and b( which is the scale parameter of the above gamma distribution), hence its function definition.
  2. In neg_binomial_2 of stan, the focus is on mu and a(which is a shape parameter of the above gamma distribution), which is also an inverse dispersion parameter. The critical insight I learned while debugging this is that the notion of dispersion is embeded in a as opposed to b.

I was a bit confused initially, because when I think of negative binomial(or gamma-poisson mixture), I always think of poisson + overdispersion. So I automatically assumed that "scale" represented the parameter for the overdispersion, when in fact it's not.

@stevejburr
Copy link

Just found this bug myself - I was going to post an issue but found this one.

Rethinking will fit an NBD model correctly using rgampois - I can get results to match brms() glm.nb() and match the model parameters for some simulated data. (I can share detail on this if helpful)

However, if you run sim() on the fitted rethinking model, then the simulated data won't be right as there's a mismatch between the different parameterisations. To get the correct simulations, you need to take the fitted values and pass them to the built in rnbinom function with different transformations.

I agree with rgampois_proposed() as above for getting the right numbers out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants