Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception: In serializer: Storage capacity [53] exceeded while writing value of size [1] #3182

Closed
seabbs opened this issue Apr 24, 2023 · 8 comments
Labels

Comments

@seabbs
Copy link
Sponsor

seabbs commented Apr 24, 2023

Summary:

I am one of the developers of the epinowcast package which uses stan via cmdstan and cmdstanr. The latest cmdstan release (v2.32.0) gives the above error for some model formulations (including our default) resulting in CI failures across all platforms we test on which worked as expected on v2.31.0 and earlier versions). Model specifications that cause issues are only those that have no intercept for one of our submodules which has a range of impacts in the stan code but the line indicated by the error is a simple integer declaration in the data section of the model (happy to go into more details of exactly what is happening here but assuming it isn't key).

As per the error message reporting this here but given our fairly complex use case and the experimental nature of epinowcast it is very possible the bug is ours (if that is the case any pointers as to where we have gone wrong would be appreciated).

Any pointers as to the kinds of thing that can trigger this error would also help as we look into it in more detail.

Very happy to move this to another repository or the forums etc. depending on where it best fits.

Reproducible Steps:

There is a reprex here using epinowcast: epinowcast/epinowcast#246 (comment)

Stan code: https://github.com/epinowcast/epinowcast/blob/develop/inst/stan/epinowcast.stan
Data: https://drive.google.com/file/d/1ORPb40YFhaZI5CJGayko2DBxJ_9dt_c8/view?usp=share_link
Initital conditions: https://drive.google.com/file/d/1PkmGhqDp3C6KiWkbcPlQ3gHckTCORkiE/view?usp=share_link

Note this is fairly involved and removed from the stan code. I am working on trying to isolate the issue with a simple reprex. without making use of any of our package machinery.

Current Output:

#> Running MCMC with 2 sequential chains, with 2 thread(s) per chain...
#>
#> Chain 1 Unrecoverable error evaluating the log probability at the initial value.
#> Chain 1 Exception: In serializer: Storage capacity [53] exceeded while writing value of size [1] from position [53]. This is an internal error, if you see it please report it as an issue on the Stan github repository. (found before start of program)
#> Chain 1 Exception: In serializer: Storage capacity [53] exceeded while writing value of size [1] from position [53]. This is an internal error, if you see it please report it as an issue on the Stan github repository. (found before start of program)
#> Chain 1 Exception: In serializer: Storage capacity [53] exceeded while writing value of size [1] from position [53]. This is an internal error, if you see it please report it as an issue on the Stan github repository. (found before start of program)
#> Warning: Chain 1 finished unexpectedly!
#> Chain 2 Unrecoverable error evaluating the log probability at the initial value.
#> Chain 2 Exception: In serializer: Storage capacity [53] exceeded while writing value of size [1] from position [53]. This is an internal error, if you see it please report it as an issue on the Stan github repository. (found before start of program)
#> Chain 2 Exception: In serializer: Storage capacity [53] exceeded while writing value of size [1] from position [53]. This is an internal error, if you see it please report it as an issue on the Stan github repository. (found before start of program)
#> Chain 2 Exception: In serializer: Storage capacity [53] exceeded while writing value of size [1] from position [53]. This is an internal error, if you see it please report it as an issue on the Stan github repository. (found before start of program)
#> Warning: Chain 2 finished unexpectedly!
#> Warning: All chains finished unexpectedly! Use the $output(chain_id) method for more information.
#> Warning: Use read_cmdstan_csv() to read the results of the failed chains.
#> Warning: No chains finished successfully. Unable to retrieve the fit.
#> Error: No chains finished successfully. Unable to retrieve the sampler diagnostics.

Expected Output:

A successful call to sample which returns a cmdstan object containing posterior draws etc.

Additional Information:

I've tested both locally and in CI keeping everything the same but changing between v2.31.0 and v2.32.0.

Something to flag is we are passing quite a lot of variables of different kinds so this might be quite a strange edge case (hence I guess triggering this error).

Current Version:

v2.32.0

@bob-carpenter
Copy link
Contributor

Thanks much for reporting with a reproducible example, @seabbs. This doesn't look good.

@WardBrian
Copy link
Member

@seabbs is there an easy way to get your reproducible example to produce just the data (and initialization, if it's supplied) files that are being passed to cmdstan ultimately? I'm currently installing the requirements for epinowcast in R, but if you could drop those files in JSON format somewhere I could try to debug with just your Stan code and that

@seabbs
Copy link
Sponsor Author

seabbs commented Apr 24, 2023

Thanks much for reporting with a reproducible example, @seabbs. This doesn't look good.

No problem, sorry it isn't a cleaner reprex.

@seabbs is there an easy way to get your reproducible example to produce just the data (and initialization, if it's supplied) files that are being passed to cmdstan ultimately? I'm currently installing the requirements for epinowcast in R, but if you could drop those files in JSON format somewhere I could try to debug with just your Stan code and that

Yes, I can do this. In a few hours of meetings at the moment but will try and get to this in the breaks.

@seabbs
Copy link
Sponsor Author

seabbs commented Apr 24, 2023

I've pulled out the data and initial conditions as JSON (hopefully in the correct format etc.) and updated the issue above.

@WardBrian
Copy link
Member

Thanks, very helpful. I'm working on tracking it down now. The bug was introduced sometime between stan-dev/stanc3@7961ac5 and the release, I suspect in stan-dev/stanc3#1305

@WardBrian
Copy link
Member

Ah, I found a very small reproducible example:

parameters {
  array[0] real a;
  array[1] real b;
}

and inits.json:

{ "a":  2.1223971837869353 , "b": [ 5.1902179174507062 ] }

this yields:

Unrecoverable error evaluating the log probability at the initial value.
Exception: In serializer: Storage capacity [1] exceeded while writing value of size [1] from position [1]. This is an internal error, if you see it please report it as an issue on the Stan github repository. (found before start of program)

The mismatch between a dimension 0 object and actually supplying an initial value is the problem. This is caused by the variable expr_r_int in the repex you provided. It has size 0 (really the size is expr_fintercept ? 1 : 0, and expr_fintercept is 0) but in the inits json it is given a value: "expr_r_int": 0.011131409787436,.

The PR I identified above (stan-dev/stanc3#1305) changed how transform_inits reads in variables.
It did so in such a way that the results were the same (or better, it fixed a bug) on correct inputs, but for incorrect inputs it seems that it got pickier than it was before.

We should be generating a nicer error than we are, which is our bad!

@WardBrian
Copy link
Member

I've opened an issue about better errors for this sort of thing in stan-dev/stanc3#1315 and PR which adds that in stan-dev/stanc3#1316.

In any case, you should be able to fix the issue in your model as it stands by making sure the initial values you're passing are the correct shape/size for the model as written. It appears your current code is generating initializations which are the wrong shape for (at least): rep_beta_sd, refp_sd_int, refp_mean_int, expl_beta_sd, expr_beta_sd, expr_r_int and expr_lelatent_int.

Thank you again for reporting!

@seabbs
Copy link
Sponsor Author

seabbs commented Apr 24, 2023

The mismatch between a dimension 0 object and actually supplying an initial value is the problem. This is caused by the variable expr_r_int in the repex you provided. It has size 0 (really the size is expr_fintercept ? 1 : 0, and expr_fintercept is 0) but in the inits json it is given a value: "expr_r_int": 0.011131409787436,.

😍

In any case, you should be able to fix the issue in your model as it stands by making sure the initial values you're passing are the correct shape/size for the model as written. It appears your current code is generating initializations which are the wrong shape for (at least): rep_beta_sd, refp_sd_int, refp_mean_int, expl_beta_sd, expr_beta_sd, expr_r_int and expr_lelatent_int.

Nice one thanks. I think we were being lazy and abusing this before which has come around to bite us clearly!

Thanks for handling this so quickly and clearly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants