-
-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NaNs in generated quantities since 2.27.0 #3057
Comments
Can you try running the model and just print everything: vector[Niactions] iaction_labu = obj_base_labu[iaction2obj];
print(obj_base_labu);
print(iaction2obj);
print(iaction_labu); |
I'm also printing:
Here's the excerpt:
And this looks fine. However, in the chain output there are mostly NaNs for |
Gentle ping. |
I can confirm that things are really odd here and I do agree that this is worrying! So when running this model with 2.26.1 then I do get no exceptions thrown during the sampling phase and all Rhats are 1.0. When running the model with 2.27.0 then I do get a huge amount of time this message (during sampling only):
... and even worse, the Rhats are all crap (much bigger than 1.0 as they are with 2.26.1). From skimming over the model it uses a lot of Eigen expressions. @SteveBronder do you recall if anything changed in this regard from 2.26 to 2.27 and if expressions could be the root cause. Also pinging @t4c1, just in case. I'd think we should fix this issue before releasing 2.28. @alyst Can you think of a way of how one can very quickly tickle the buggy behaviour with 2.27 (while not tickling it with 2.26)? E.g. use a good initial and few samples. This way we should be able to quickly bisect the commits between 2.26 and 2.27... I have never done that... @rok-cesnovar do we have some Jenkins facility for that or how would that work? @alyst You could try to create a debug version of the model by these steps (for example):
|
Actually... we do have a BUG here which needs fixing. The gradients are crap with 2.27. The diagnose utility with 2.27 gives me
while with 2.26 they are ok:
This really needs fixing. |
I just wanted to build this under develop, but then I get this:
@WardBrian maybe... ideas? |
@alyst No more model simplification is needed... the cmdstan Thanks for reporting and thanks for being persistent!!! |
That error looks like what I’d expect if I fed in something that wasn’t a stan model into the compiler, like an xml file |
is what triggers the above. The stan file is posted in the issue description above and the same call just works with 2.27 and 2.26... could you try to compile yourself? |
Yeah, unable to recreate with a local build of stanc3/master |
Here is the header file that produces: https://gist.github.com/WardBrian/db929d3edbc2bc7490a168f008e76b1a |
And why does develop fail to produce the hop? Is this an issue with stanc3? |
Stanc3 doesn't use the |
Agreed, we will postpone until we figure this out. Hopefully now that we know we are able to use diagnose, this will be quick. |
I can confirm that I have no issue compiling with develop CmdStan that uses the nightly version of stanc3 (current master) and can see the issues with diagnose. |
The minimal example I can make is this: transformed data {
int<lower=0> N = 5;
vector[N] obsXobs_shift0_w = [-0.67082,0.5,-0.223607,-0.223607,-0.5]';
int obsXobs_shift0_v[N] = {1,2,3,1,2};
int obsXobs_shift0_u[N+1] = {1,2,3,4,5,6};
}
parameters {
vector[N] obs_shift0;
}
model {
vector[N] obs_repl_shift_unscaled;
obs_repl_shift_unscaled = csr_matrix_times_vector(N, N, obsXobs_shift0_w, obsXobs_shift0_v, obsXobs_shift0_u, obs_shift0);
obs_repl_shift_unscaled ~ std_normal();
} you can also replace the model part with: model {
target += sum(csr_matrix_times_vector(N, N, obsXobs_shift0_w, obsXobs_shift0_v, obsXobs_shift0_u, obs_shift0));
} I am fairly certain the issue is in @SteveBronder I dont have enough knowledge on the csr_* parts of the code. It would be great if you could replicate my findings. I would prefer finding a fix over a simple revert though. |
I am still having trouble with stanc3 master as of now. Maybe this is a Mac thing? I will wait for the RC 2.28 and try again. Really cool that @rok-cesnovar found the culprit of this! |
ARGH...
... so not even the Bernoulli example works for me! Let's wait for the RC... |
@wds15 can you try downloading the latest nightly for your system (https://github.com/stan-dev/stanc3/releases/tag/nightly) and running it directly (outside cmdstan)? |
Now it runs ok when I download as you suggest... and it shows that develop is affected by the bug as well. |
@rok-cesnovar yes I'm seeing a bug in the test for |
Thanks!
I think fixing it for 2.28 is fine. At least that is how we handled things in the past with similar problems. We will prominently list this bugfix in the release notes though. |
Just posted the patch above which should fix this, @alyst sorry for the trouble! The issue was that in the 2.27 version I was basing the non-zero indices passed by the users as starting from zero and not 1 (which is rather egregious, I'm surprised we did not catch that). The patch above adds some tests to make sure the indexing is all correct |
Summary:
Since Stan v2.27.0 my model generates NaNs in generated quantities section.
In v2.26.1 it works fine, also the parameters and transformed parameters blocks look fine with both versions.
Description:
I use Stan via cmdstanr. When I run the model (in NUTS mode) compiled with Stan v2.27.0 (both with and without MKL), I get tons of messages like
This refers to the generated quantities section, and it's related to the generated quantity on L630, which evaluates to NaN, although the values at the RHS are just fine.
This model works fine in Stan v2.26.1 and versions before.
Reproducible Steps:
I've uploaded my model and the data.
Unfortunately, the model is rather big.
I've tried to come up with some minimal reproducible example, but very simple examples with single vector variable work just fine.
However, the fact that I'm doing sparse matrix multiplication at L630 is not essential. Simple indexing expressions like
also generate NaNs.
Current Output:
NaNs for generated quantities. I can provide the stan output file.
Expected Output:
The generated quantities are properly calculated and don't contain NaNs.
Additional Information:
Provide any additional information here.
Current Version:
v2.27.0
The text was updated successfully, but these errors were encountered: