Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"numeric overflow" error for models with von Mises distributions and large kappa values #1076

Open
syclik opened this issue Dec 4, 2018 · 15 comments
Labels

Comments

@syclik
Copy link
Member

syclik commented Dec 4, 2018

From @a-hurst on December 4, 2018 22:59

Summary:

Attempting to run a model that uses the von_mises_lpdf() function will cause the sampler to error out immediately if kappa values are too large (~10 or larger).

Description:

When trying to run a model that uses a von Mises distribution via the von_mises_lpdf() function, the sampler crashes immediately with an error about a boost function (cyl_bessel_i) that's raised an exception due to a 'numeric overflow' when kappa values are too high. This appears to be due to an issue either with the boost function implementation in Stan, or a bug in the cyl_bessel_i function itself.

This is an issue that has come up multiple times over the past few years, from several threads in the old Stan Google Groups page (1, 2, 3) and on the new Stan forum (4). To my knowledge, a formal bug report has yet to be filed for it. I'm new to Stan (just trying to make sure someone else's analysis is reproducible before I upload everything to OSF), so the linked threads above are probably a better source for info on the issue than what I've provided here.

Thanks in advance for the help!

Reproducible Steps:

Here is the model in full:

data{
  int<lower=0> N ;
  int<lower=0> L ;
  real lrt[L] ;
  real angle[L] ;
  int<lower=1> id[L] ;
  int<lower=1,upper=3> cue[L] ;  // invalid/valid/neutral
}

transformed data{
  vector[10] zeros ;
  real neglog2pi ;
  neglog2pi = -log(2.0 * pi()) ; // log-probability of uniform component (it's data invariant)
  zeros = rep_vector(0,10) ;
}

parameters{
  vector[10] within_means ;
  vector<lower=0>[10] within_sds ;
  // correlation
  corr_matrix[10] cor ;
  //dummy variable for matt trick
  matrix[N,10] beta;
}

transformed parameters{
vector[L] p ; // for storing log-probabilities associated with model
{
  // id-level parameter values
  vector[N] id_logit_rho_intercept ;
  vector[N] id_logit_rho_cuing1 ;
  vector[N] id_logit_rho_cuing2 ;
  vector[N] id_log_kappa_intercept ;
  vector[N] id_log_kappa_cuing1 ;
  vector[N] id_log_kappa_cuing2 ;
  vector[N] id_lrt_mean_intercept ;
  vector[N] id_lrt_mean_cuing1 ;
  vector[N] id_lrt_mean_cuing2 ;
  vector[N] id_lrt_sd ;
  // id-level cell values as matrix
  vector[N] id_logit_rho[3] ;
  vector[N] id_log_kappa[3] ;
  vector[N] id_lrt_mean[3] ;
  //useful transformations
  vector[N] id_kappa[3] ;
  vector[N] id_rho[3] ;
  
  //convert from beta scale to observed scale & add group effects
  id_logit_rho_intercept = beta[,1] * within_sds[1]
  + within_means[1]
  ;
  id_logit_rho_cuing1 = beta[,2] * within_sds[2]
  + within_means[2]
  ;
  id_logit_rho_cuing2 = beta[,3] * within_sds[3]
  + within_means[3]
  ;
  id_log_kappa_intercept = beta[,4] * within_sds[4]
  + within_means[4]
  ;
  id_log_kappa_cuing1 = beta[,5] * within_sds[5]
  + within_means[5]
  ;
  id_log_kappa_cuing2 = beta[,6] * within_sds[6]
  + within_means[6]
  ;
  id_lrt_mean_intercept = beta[,7] * within_sds[7]
  + within_means[7]
  ;
  id_lrt_mean_cuing1 = beta[,8] * within_sds[8]
  + within_means[8]
  ;
  id_lrt_mean_cuing2 = beta[,9] * within_sds[9]
  + within_means[9]
  ;
  id_lrt_sd = exp(
  beta[,10] * within_sds[10]
  + within_means[10]
  )
  ;
  
  //compute values for each cell
  id_lrt_mean[1] = id_lrt_mean_intercept + id_lrt_mean_cuing2;
  id_lrt_mean[2] = id_lrt_mean_intercept - id_lrt_mean_cuing1;
  id_lrt_mean[3] = id_lrt_mean_intercept;
  
  id_logit_rho[1] = id_logit_rho_intercept + id_lrt_mean_cuing2;
  id_logit_rho[2] = id_logit_rho_intercept - id_lrt_mean_cuing1;
  id_logit_rho[3] = id_logit_rho_intercept;
  
  id_log_kappa[1] = id_log_kappa_intercept + id_lrt_mean_cuing2;
  id_log_kappa[2] = id_log_kappa_intercept + id_lrt_mean_cuing1;
  id_log_kappa[3] = id_log_kappa_intercept;
  
  //compute the transforms
  for(i in 1:3){
    id_kappa[i] = exp(id_log_kappa[i]) ;
    for(n in 1:N){
      id_rho[i][n] = inv_logit(id_logit_rho[i][n]) ;
    }
  }
  
  //iterate over trials (this version doesn't have pre-computed id cell values)
  for(l in 1:L){
    // if (id_kappa[cue[l]][id[l]] > 10) {
    //   p[l] = normal_lpdf(lrt[l] | id_lrt_mean[cue[l]][id[l]], id_lrt_sd[id[l]])
    //   + log_mix(
    //     id_rho[cue[l]][id[l]]
    //     , normal_lpdf(angle[l] | pi(), sqrt(1/id_kappa[cue[l]][id[l]]))
    //     , neglog2pi
    //   );
    // } else {
      p[l] = normal_lpdf(lrt[l]|id_lrt_mean[cue[l]][id[l]], id_lrt_sd[id[l]])
      + log_mix(
        id_rho[cue[l]][id[l]]
        , von_mises_lpdf(angle[l]|pi(), id_kappa[cue[l]][id[l]])
        , neglog2pi
      );
    // }
  }
}
}

model{
  //priors
  within_means ~ student_t(4,0,1) ;
  within_sds ~ student_t(4,0,1) ;
  within_means[1] ~ student_t(4,3,3) ; //logit-rho intercept
  within_means[4] ~ student_t(4,3,3) ; //log-kappa intercept
  cor ~ lkj_corr(4) ;
  //assert sampling of each id's betas from multivariate student-t with df=4
  for(this_id in 1:N){
    beta[this_id,] ~ multi_student_t(4,zeros,cor) ;
  }
  //update the log-probability from p (defined in transformed parameters)
  target += p ;//used to be: increment_log_prob(p) ;
}

Here's the R code being used to run the model:

# Upload stan_data ----
load("endo_data_for_stan.rdata")


# Load Packages ----
library(tidyverse)
library(rstan)


# Run Stan Model 1 ----
mod = rstan::stan_model("_endo_model.stan")

post = sampling(
  mod
  , data_for_stan
  , pars = c("p")
  , include = FALSE
  , chains = 10
  , cores = 10
  , iter = 2000
  , refresh = 1
  , verbose = T
  , control = list(
    adapt_delta=0.99
    # , max_treedepth = 15
    )
)

I have an Rdata with all the data for the model as well, but I'm not sure how best to share that.

Current Output:

Here's what the output looks like (number of chains dropped to 1 for readability):


CHECKING DATA AND PREPROCESSING FOR MODEL '_endo_model' NOW.

COMPILING MODEL '_endo_model' NOW.

STARTING SAMPLER FOR MODEL '_endo_model' NOW.

SAMPLING FOR MODEL '_endo_model' NOW (CHAIN 1).
Chain 1: Unrecoverable error evaluating the log probability at the initial value.
Chain 1: Exception: Error in function boost::math::cyl_bessel_i<double>(double,double): numeric overflow  (in 'model1b1d79c30c67__endo_model' at line 120)

[1] "Error in sampler$call_sampler(args_list[[i]]) : "
[2] "  Exception: Error in function boost::math::cyl_bessel_i<double>(double,double): numeric overflow  (in 'model1b1d79c30c67__endo_model' at line 120)"
error occurred during calling the sampler; sampling not done

RStan Version:

2.18.2

R Version:

R version 3.5.1 (2018-07-02)

Operating System:

macOS 10.14.1

Copied from original issue: stan-dev/rstan#593

@syclik
Copy link
Member Author

syclik commented Dec 4, 2018

@a-hurst, thank you very much for reporting. I'm going to move it to the Math library because that's where it'll need to be fixed.

There's going to be a little noise on this issue (the issue mover will close this automatically and I want to reopen it). I'd like the issue to remain open here until we can verify that it's been fixed properly.

@a-hurst
Copy link

a-hurst commented Dec 11, 2018

So I did a bit of digging into this today to see if I could find anything helpful. Here's what I learned:

  • The bug isn't macOS specific: it happens with RStan on Ubuntu 16.04 LTS as well.
  • The bug only happens sometimes: if I try and run the sampling function a couple times, sometimes it'll work and the chain will start and proceed just fine. I tried running the model with 10 chains, and only 8 of them crashed immediately with the 'numeric overflow' exception, the 2 that didn't proceeded normally until I left for the day. In other words: if it doesn't crash the chain at the very start, it doesn't crash the chain at all.
  • It looks like there was a merged pull request from a little over a year ago (Feature/log bessel i #640) that was meant to deal with "numeric instability" in the von_mises_lpdf() function by writing a new and more robust log modified bessel function to replace the problematic one from boost. However, boost's cyl_bessel_i() function is still indirectly used twice in the von_mises_lpdf() function in certain cases (which I don't fully understand based on my reading of the code, but seem to have something to do with whether kappa is a constant or not?):

T_partials_return bessel0 = 0;
if (compute_bessel0)
bessel0 = modified_bessel_first_kind(0, kappa_dbl[n]);
T_partials_return bessel1 = 0;
if (compute_bessel1)
bessel1 = modified_bessel_first_kind(-1, kappa_dbl[n]);

the return values from the modified_bessel_first_kind() calls (which is just a wrapper for cyl_bessel_i()) only appear to be used in the case where kappa_const is false:

if (!kappa_const)
ops_partials.edge3_.partials_[n]
+= kappa_cos / kappa_dbl[n] - bessel1 / bessel0;

Would the improved bessel function provided by @bgoodri be able to be adapted to replace the invocations of modified_bessel_first_kind() here? If not, in what use cases would those conditionals that cause modified_bessel_first_kind() to run be false when using von_mises_lpdf(), so that the bug might be avoided that way while waiting for a more permanent fix?

Hope any of this is helpful in finding a fix!

@bob-carpenter
Copy link
Contributor

Thanks a lot for sharing what you found. I'm sure it'll help in debugging this.

In some cases, we just can't get the stability we need and have reverted to things like normal approximations. So we can branch on segments of the input domain if necessary.

@a-hurst
Copy link

a-hurst commented Jan 5, 2019

@bob-carpenter @syclik Is there anything else I can do to help with this, whilst not being confident enough in my C or advanced math to try my hand at a fix myself?

I'm in a bit of a strange dilemma: The Stan code I posted above apparently ran without issue ~2 years ago with rStan 2.15 on a rented Amazon Linux VM (a colleague did the analysis for me), and I wrote up a whole paper based on the findings from those posterior samples. Now there's an upcoming special issue that the paper would be great for, with a submission deadline at the end of this month, but now I can't seem to get the analysis code to run due to this bug and thus can't put it on OSF (my colleague says he remembers encountering the same bug on his MacBook, and doesn't remember how he got it to work on the Amazon VM). Since I'd ideally like my science to be reproducible, I'm willing to to any background work that would make a fix easier.

If it would help, I can post the .Rdata file with the successful posterior distribution here so you can look at the compiled model code from it (I've already confirmed that the Stan code in the object is identical to that posted above) and see if that sheds light on anything. I have yet to try running the old compiled model with its matching old version of rStan in a VM, but that's on my list of things to try next.

Thanks again!

@bbbales2
Copy link
Member

bbbales2 commented Jan 5, 2019

In other words: if it doesn't crash the chain at the very start, it doesn't crash the chain at all.

It sounds like the initial conditions are making it difficult to compute the log densities. It's totally possible that these numeric difficulties don't really exist where your posterior mass is and it's just an artifact of the random initialization.

You might be able to get away from these crazy initial values by just making the random initializations smaller. Check the "init_r" argument to stan in rstan. Maybe make it 1 instead of 2 (the default) and see if that gets rid of the problem. You can use custom initial values to try to track down more precisely what is blowing up here. You can also add prints to your model since you'll know pretty quickly if the model is or is not working (and check which parameters seem to cause the blow up).

@bob-carpenter
Copy link
Contributor

bob-carpenter commented Jan 6, 2019 via email

@a-hurst
Copy link

a-hurst commented Jan 6, 2019

@bbbales2 @bob-carpenter thank you both for your helpful replies, it sounds extremely likely that this is a problem with the initial parameter values sometimes starting too high or low. Looking at the posterior object it looks like the default init radius of 2 was used for the successful run, but I'm going to try extracting the random seed and init values from it and seeing if the model runs now when I use them. Thanks again!

@syclik
Copy link
Member Author

syclik commented Jan 7, 2019

I think I've seen something similar and it's been fixed post v2.18, but we should verify.

I think what's going on is that certain exceptions aren't caught and it isn't caught, which ends up completely stopping the sampler from running. Instead, it should be caught and treated as a rejection.

(With this particular issue, this might be have not been a problem with the version of Boost we used in 2.15.)

@syclik syclik added the bug label Jan 7, 2019
@a-hurst
Copy link

a-hurst commented Jan 9, 2019

If this is the case, should I try installing the latest Stan/RStan from the develop branch on GitHub and see if the bug still occurs?

EDIT: Just tried again with RStan installed from source using

remotes::install_github("stan-dev/rstan", ref = "develop", 
                        subdir = "rstan/rstan", build_opts = "")

and I get the same error:

Chain 3: Unrecoverable error evaluating the log probability at the initial value.
Chain 3: Exception: Error in function boost::math::cyl_bessel_i<double>(double,double): numeric overflow  (in 'model4b485d3c172d__endo_model' at line 120)

[1] "Error in sampler$call_sampler(args_list[[i]]) : "
[2] "  Exception: Error in function boost::math::cyl_bessel_i<double>(double,double): numeric overflow  (in 'model4b485d3c172d__endo_model' at line 120)"
Chain 6: Unrecoverable error evaluating the log probability at the initial value.
Chain 6: Exception: Error in function boost::math::cyl_bessel_i<double>(double,double): numeric overflow  (in 'model4b485d3c172d__endo_model' at line 120)

error occurred during calling the sampler; sampling not done
[1] "Error in sampler$call_sampler(args_list[[i]]) : "
[2] "  Exception: Error in function boost::math::cyl_bessel_i<double>(double,double): numeric overflow  (in 'model4b485d3c172d__endo_model' at line 120)"
error occurred during calling the sampler; sampling not done

Unless I didn't install the latest Stan/RStan from source properly, it looks like the bug is still present post 2.18 as well.

@a-hurst
Copy link

a-hurst commented Jan 9, 2019

Okay, after some reading up, I understand now that the 'develop' branch of RStan != the' develop' branch of Stan, and that I would need to use CmdStan somehow if I wanted to try out the bleeding edge version. Given that I don't really understand how CmdStan works well enough to test out the model this way, I'm going to leave it for now.

Also, I just tried the suggestion of setting init_r to 1 and all 10 chains started properly, so that seems to confirm it's a problem with starting values being too extreme. Setting it to 1.5 made 2 of the 10 chains fail (as opposed to the usual ~8/10). Does changing this constraint have any notable effects on the output model that I should be worried about?

Oddly enough, I tried reusing the seed and inits from the old 2017 stanfit as arguments for the new one and I still got the exception on most of the chains, so the fact that it worked a couple versions ago doesn't seem to be due to some particularly lucky random values that avoided an overflow. This lends some support to @syclik's Boost theory, though the fact that this bug has been reported as early as 2016 in the old Stan mailing list would make that very odd, since it would have been broken, fixed, and then broken again by a later Boost release. Maybe the Amazon VM was using a custom version of Boost?

@bbbales2
Copy link
Member

Does changing this constraint have any notable effects on the output model that I should be worried about?

Shouldn't matter, but this is a function of your problem. Did changing this affect your conclusions at all? If it didn't, rock n' roll. bessel functions n' such can be very finicky, so it's not surprising one might blow up if it gets a weird input. Since you're not seeing this problem after warmup, no need to worry. All sorts of weird things are going on in warmup which is why we don't use those samples in our posterior estimates.

As far as I know 2 is just an arbitrary number.

@bob-carpenter
Copy link
Contributor

bob-carpenter commented Jan 10, 2019 via email

@tbrown122387
Copy link

FWIW, I was brought here by the same error message, and for me, it wasn't really a bug. Personally, the reason I was getting it was because I was simulating from a nonsensical noninformative prior. I had one crazy variance parameter out of many, and that in turn generated some even crazier latent data.

@bob-carpenter
Copy link
Contributor

bob-carpenter commented Feb 3, 2020 via email

@venpopov
Copy link
Contributor

Fixed by #3036

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants