Skip to content

numerical instability in beta_binomial_lpmf at large shape parameters #3153

@NikVetr

Description

@NikVetr

hi all,

(sorry if this is the wrong place for this)

I was fitting a mixture model that involves multiple calls to beta_binomial_lpmf. There, I put a linear model on log(concentration) before converting to shapes to pass to beta_binomial_lpmf). Fitting the model many hundreds of times over replicate datasets, I noticed that chains would occasionally get stuck at super high values for log(concentration) and log target density. I think this is due to numerical instability in how beta_binomial_lpmf is computed, probably when evaluating the log Beta function.

As a point of contrast, if you look eg the VGAM implementation of the beta-binomial log-pmf in R, VGAM::dbetabinom.ab has the argument:

Inf.shape Numeric. A large value such that, if shape1 or shape2 exceeds this, then special measures are taken, e.g., calling dbinom. Also, if shape1 or shape2 is less than its reciprocal, then special measures are also taken. This feature/approximation is needed to avoid numerical problem with catastrophic cancellation of multiple lbeta calls.

Here's a quick demo showing the behavior. If I fit the Stan model:

data {
int<lower=0> count;
int<lower=count> total;
real<lower=0> sd_logconc;
}

parameters {
real log_concentration;
}

model {
log_concentration ~ normal(0, sd_logconc);
}

generated quantities {
// Convert to alpha, beta
real shape = exp(log_concentration) / 2;

// Compare binomial PMF at p = 0.5 vs. beta-binomial
real log_binomial_05 = binomial_lpmf(count | total, 0.5);
real log_beta_binomial = beta_binomial_lpmf(count | total, shape, shape);
real diff_lprob = log_beta_binomial - log_binomial_05;
}

with whatever count and total, and with sd_logconc at a small value (eg count = 57, total = 117, sd_logconc = 5), then diff_lprob converges onto 0 when log_concentration takes on high values, as expected:

Image

but if you set sd_logconc = 50, its behavior grows erratic before settling at a high positive value:

Image

(in this example, it starts showing this behavior around log_concentration > 33).

The linear model I had on log(concentration) should not have put much probability at all on parameters that could produce such high log_concentrations (which would induce a very informative prior on concentration), so I'm guessing it got there during a loosey-goosey adaptation phase (if the chain post-adaptation is initialized to wherever it ends up at the end of adaptation).

here's a code snippet for the graph: https://gist.github.com/NikVetr/2d99b235a66423429776762ab73c9388

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions