-
-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Feature/issue 2224 allow abs state tols #2244
Conversation
…4.1 (tags/RELEASE_600/final)
Sure, no problem. This is a 3 minute job. Once this PR is in just make a stanc3 issue and please also assign me. EDIT: or if you need a version to test this in a model let me know and I can do it now. |
…hub.com/stan-dev/math into feature/issue-2224-allow-abs-state-tols
If you could prep a branch for testing in stanc3 then that would be great. It's good for any reviewer here to know that things will work out upstream. |
@bbbales2 maybe you are anyway going to review, but I would like to hear your opinion about passing absolute tolerances by state which is done via Eigen vectors. I was thinking about doing this via std vectors, but I thought that we handle state vectors of the ODE system as Eigen things and as such an Eigen vector makes sense. Would you agree with an Eigen vector or prefer a std vector for this? |
The new interfaces favor Eigen Vectors, and so that'd be my preference here. As far as the feature goes, I'm just so disconnected from ODE modeling in my day to day that it's hard for me to think about how to evaluate these things. This looks like the type of thing you'd want if you had some variables get near zero but other stay away. I saw the conversation on Discourse over the sensitivity tolerance option that we turned on a couple releases ago which makes me instantly nervous about changing anything. This isn't necessarily changing anything though -- it's adding a new argument, which the way Stan works means adding a new function. Would default arguments or named arguments at a stanc3 level get us anywhere with this? In that case we'd be more free to add new options and such without messing up the default experience or exploding the number of different ODE functions. I don't know how difficult that would be, but the combinatorics of new ODE functions isn't gonna be fun at all, nor is changing peoples' defaults. |
Jenkins Console Log Machine informationProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010CPU: G++: Clang: |
This change allows to set the absolute error for each system state. Since absolute errors are only relevant for things which go down to zero this change matters if you have an ODE system with multiple things tending towards zero which have different noise levels. An example is a pharmacokinetic / pharmacodynamic model which looks at drug concentrations and effects of the drug in some clinical output. The concentrations and the clinical measurements have different scales they live on and their noise levels are usually different.
Well, I tried to do a more through evaluation of the change. I think the study I did justifies our change. This PR would not need to introduce a new function name - my suggestion is here to go with a simple overload of the
Changing defaults is never nice, sure. To me the default call In any case, this change here extends only the advanced use case of the ode functions. Named arguments at some point sound like a good thing, but for now everything is still fine. Maybe I am going to play with scaling factors at a later point which would give the solver order of magnitude information of the parameter sensitivities to tune it even further, but let's go one step at a time. If you could review the change that would be great, of course. I am not so sure who else could review it with limited effort, since you do know the structure of the code very well... and it's really not a massive change as absolute tolerance becomes a vector, not more. Once @rok-cesnovar has a prototype of the stanc3 parser for this I am happy to code up an example using this facility with a PK/PD example. |
Hmm, you're right that we do have overloads at the Stan language level. And that's only three additional functions to doc if we do each I guess also can we do this for rk45? Couldn't a user also work around this issue manually by doing: vector f(vector y_, real t, vector scales) {
vector[size(y)] y = scales .* y_;
...
return dydt ./ scales;
} And the difficulty from a user perspective is all going to be in documentation anyway -- how we communicate what this is for and what the interface is, etc. etc.. We could just document how to do a scale transformation like this somewhere, and then that saves us from all the coding, testing, interface doc, etc. It can just be a user's guide/case study thing.
At this point I'm basically willing to accept that there is a use case for any option CVODES includes in their library. I thought the experiments you did were interesting and the plots were really good. However, it seems like with these ODE things it's fairly easy to find a counterexample to any attempt to establish Good Defaults, and so our practical options are less about justifying defaults and more about how do we expose options in a way that isn't a ton of work. |
So, yes, the approach you outline of rescaling stuff should be equivalent to setting the per state tolerances. It's just that you force the user to change the problem formulation in a possibly awkward way which is prone to errors, I think. Just giving users the option to set tolerances differently by state is a lot easier than having to rescale the ODE integration process and then scale back when you compare things to the data. I think that this change is easy enough for us to implement such that I would think we should do it. ODE problems are really a difficult thing and it's good to give users knobs which improve performance... and absolute tolerance can be such a knob depending on the problem. The plots I did were very helpful for me. The counter-example given by Yi did not even qualify his own ambitions in terms of what he considers worthwhile experiment for a bdf integrator, but it's definitely true that there are no ubiquitous defaults. So, yes, I think exposing options from CVODES is a sensible thing to do. These options are in that software for some reason. I do not quite see that we have to make firm recommendations in our doc about every option of ODE solving. The topic is way too complex... but we should provide at least some control for those users who have these problems where it's worth to explore these options. It's possible to code this up for RK45 as well, but it's far more work since odeint does not know the concept of a by-state absolute tolerance. One has to code the error checking in a custom way completely from scratch, which could be easy or a lot of work (don't know yet). |
That's true that you have to scale the output and initial conditions too, and that is inconvenient. Also as much as I don't want to break symmetry in the solver interfaces because that seems confusing, we're certainly not going to go about duplicating all the other CVODES behaviors that are not implemented in rk45 as we enable them. Vaguely this seems okay cause we can do it without modifying defaults via overloading. I'm gonna have to think on this some more. But I guess a simple way of possibly avoiding another disagreement here is asking -- @yizhang-yiz -- vector absolute tolerance arguments, do you think exposing this is good or no (assuming we expose this in a way that does not affect current defaults)? |
And by that I mean if I go silent here for a couple days, ping me. Just not sure I'm thinking all the way through the problem at this second. |
https://github.com/stan-dev/stanc3/actions/runs/409006719 The signature compiles but the error messages were not updated. But for testing it will suffice. |
@bbbales2 okdoki... I now used @rok-cesnovar's stanc3 binary together with the really cool and new cmdstanr expose_stan_function thing to benchmark a very simple system. The ode system is a 2-cmt oral dosing PK system coupled to a turnover PD. Below is the stan code to give the equations. I did run this ode with fixed parameter values at a low (1E-4) and a high (1E-10) absolute tolerance and a relative tolerance of 1E-8. This evaluation is for the basic case without any sensitivities: Unit: microseconds
expr min lq mean median uq max neval
high 430.055 480.4465 515.0107 506.0725 524.1485 852.905 500
low1 418.729 467.1560 499.1071 493.1225 504.7150 835.634 500
low2 395.812 444.2555 472.7079 467.3970 481.3845 692.243 500
low3 381.697 427.0020 460.9625 450.7065 465.1390 853.172 500
low4 180.049 198.4460 212.8404 209.5215 213.8555 380.342 500
low_pd 402.840 450.7460 483.4028 474.3370 488.8185 1351.513 500
high_pkpd 400.880 446.2325 478.1826 468.0600 481.8160 811.185 500 high has all 4 states at high precision. the low1..4 has state 1, state 1+2... state 1-4 on the lower precision. Obviously, the biggest gain is setting the absolute tolerance for all states low, but the gains in speed when setting things partially to a lower precision are quite a bit.... in particular the gains should be much larger once sensitivities are being computed as the lower precision for one state is propagated to the respective sensitivities. To me this all looks quite good and an easy knob for users to get more speed with these nasty ODE problems. functions {
vector pk_2cmt(real time, vector a, real ka, real ke, real k12, real k21) {
vector[3] dydt;
dydt[1] = -ka * a[1];
dydt[2] = +ka * a[1] - ke * a[2] - k12 * a[2] + k21 * a[3];
dydt[3] = +k12 * a[2] - k21 * a[3];
return dydt;
}
vector turnover(real time, vector pd, real kin, real kout) {
vector[1] dydt;
dydt[1] = kin - kout * pd[1];
return dydt;
}
vector pkpd_rhs(real time, vector pkpd,
real ka, real ke, real k12, real k21,
real kin, real kout, real ea50) {
vector[rows(pkpd)] dydt;
real a_main = pkpd[2];
dydt[1:3] = pk_2cmt(time, pkpd[1:3], ka, ke, k12, k21);
dydt[4:4] = turnover(time, pkpd[4:4], kin, kout * (1.0 - a_main / (ea50 + a_main)));
return dydt;
}
vector transform_2cmt_cl_rate(real Cl, real Q, real V1, real V2) {
real k12 = Q / V1;
return [
Cl / V1, // ke
k12, // k12
k12 * V1 / V2 // k21
]';
}
vector[] pk_simulate(real t0, vector a0, real[] ts, real rel_tol, vector abs_tol, real ka, real Cl, real Q, real V1, real V2) {
int num_sim = num_elements(ts);
vector[3] rates = transform_2cmt_cl_rate(Cl, Q, V1, V2);
real ke = rates[1];
real k12 = rates[2];
real k21 = rates[3];
vector[3] y[num_sim] = ode_bdf_tol(pk_2cmt, a0, t0, ts, rel_tol, abs_tol, 10000, ka, ke, k12, k21);
for(i in 1:num_sim) {
y[i,2] = y[i,2] / V1;
y[i,3] = y[i,3] / V2;
}
return y;
}
vector[] pkpd_simulate(real t0, vector a0, real[] ts, real rel_tol, vector abs_tol,
real ka, real Cl, real Q, real V1, real V2,
real kin, real kout, real ec50) {
int num_sim = num_elements(ts);
vector[3] rates = transform_2cmt_cl_rate(Cl, Q, V1, V2);
real ke = rates[1];
real k12 = rates[2];
real k21 = rates[3];
vector[4] y[num_sim] = ode_bdf_tol(pkpd_rhs, a0, t0, ts, rel_tol, abs_tol, 10000,
ka, ke, k12, k21,
kin, kout, ec50 * V1);
for(i in 1:num_sim) {
y[i,2] = y[i,2] / V1;
y[i,3] = y[i,3] / V2;
}
return y;
}
} |
So I wanted to see this in action for an actual model. I did turn the above example into a full Stan model which simulates fake data from the above parameters and then fits the problem. This triggers all sensitivities to be calculated. The relative tolerance is always 1E-6 and I setup 3 scenarios for the absolute tolerance for each state
Timings for 500 warmup + 500 iterations are this:
So I think the approach is useful and gives users more control over the precision which has obviously a huge impact on speed. The Stan results do agree closely in this case. |
I'm all for vector abstol. My questions:
|
Ok, great, then we have agreement that we want this PR in. To your, @yizhang-yiz , questions:
Still, the above example gives one case where this feature is useful. The example is chosen so that it should generalise to a number of other problems as I said (different systems PK and PD, latent states may need less precision). Given we have one case here where the feature shows it's utility we can move ahead and merge. @bbbales2 bump?! |
We have agreement that vector absolute tolerances are something we want
I think we should try to doc this, especially since we're expanding it. The user's guide is an appropriate place, and the example you gave is probably good for the docs.
Well we should be able to say things without overstating our confidence or generalizing too far. Like we have to make all of these decisions ourselves, so what are we thinking?
Can we rescale all the states internally and get this behavior? Or will this mess up the relative tolerances? We have the eq. for the RK45 tolerances: https://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.RK45.html
Go ahead and ask. If they're willing to implement such a thing, that makes our lives easier. -- Yeah I can do a code review of what is here such but I'd like to have an idea of how RK45 is going to work out. |
No. What I said is I agree we want vector-abstol feature . I can say little about this PR without looking into the code.
I'm assuming by "lowering" you meant "increasing" but still don't quite understand what you mean. Could you please explain?
In that case I suggest we add this model in this PR for future reference, so when people ask about instead of simply saying "go dig up the PR" we can point to the model and tests. In particular, I'd suggest add the model and document the fact that it
because what's emphasized earlier was speed, while "noise level" indicates accuracy.
Are latent states of ODE system always allowed to be less accurate than the states added to likelihood?
You're addressing it in general as this doesn't answer my second question. If there's a situation that vector-abstol doesn't help, or in general decreasing abstol doesn't help, we should add it to this PR too. Such a problem renders the above PKPD model argument less powerful, and can serve as an antidote. I believe so far all our ODE development have been replying a handful of models and we're eager to demo "it works!". To show/document "when it shouldn't work" is as well important. Bug can exist in both ways.
This statement is self-contradictory. odeint doesn't provide sensitivity analysis but we still did it. Had we stayed with "as long as the base integrator does not provide this functionality" there wound't be rk45, not mention keeping improving it.
Why not? Is that PKPD model believed to be stiff? If not, why do we want to suggest user to use bdf integrator, as we clearly indicate in the user guide it's "for stiff systems"? If rk45 is more efficient in a single lower abstol, the above PKPD model becomes more or less artificial and loses some of its speedup claim power. |
Sure, we can add the intuition behind my example here to the docs. That makes sense. Still, I want to link to CVODES documentation as the main source for this. The CVODES guys are the experts on ODEs, I am not in comparison. I raised an issue with odeint, but that is going to take a while if it happens at all. Are you fine with filing an issue for RK45 absolute tolerance vectors in stan-math for now and postpone the feature for RK45 for the moment? I looked into how errors are calculated and the docs from CVODES say that they use in their error norms the term |y_i| * rel_error + abs_error_i And hence scaling is not a good idea, but shifting of y may do it... but I do find it very messy to shift/unshift states back and forth. I suggest we wait for the odeint author to answer and take it from there. In any case my suggestion is to move forward here now and enable what CVODES offers. Does that sound like a plan? To answer @yizhang-yiz Qs:
Your input was asked about if you agree to the feature, which you do, but not about the code.
Latent states only need to be as accurate as they need to be in order to give sufficient accuracy for what is being put into the log-likelihood. I have seen that the ESS improves whenever ODE integrators are asked to spit out greater accuracy... but I will not make any claim here about this nor plan to benchmark this. If in doubt this is again very problem specific and it‘s extremely difficult to say what gives the better ESS per time. A lot of work is need to tease this out and this is not needed for a PR.
I don‘t think we need to raise the bar for including things. If you have an example to share which you want to contribute, please do so. Other than that reasoning to include these changes is that „CVODES implements it for a reason“ and „here is an example which should generalize to some extent where its useful“. I am happy to include this example model in some form...but someone would need to tell me how. We have never done this.
CVODES knows that we are calculating sensitivities of the ODE solution. It knows what we are structurally doing. Odeint on the contrary is just given a huge coupled ODE system when we ask for sensitivities. That‘s the difference. Since CVODES knows the structure it will do the right thing for the sensitivities depending on the state absolute tolerances accordingly.
Things are too problem specific. So in principle RK45 should be able to handle an oral 2cmt system coupled to a turn-over model just fine... but for Stan this is not always the case as I have seen. When you use weak priors, then Stan will kill a RK45 integrator during warmup and things will crawl until you reach sampling (provided the data you fit is any informative sampling will run okish). So doing warmup with bdf and then during sampling switch to RK45 is the best thing to do as @charlesm93 has shown a while ago. However, this is way too intricate to detail to users nor is this settled enough such that I would document it in our manuals. Still, bdf is useful for this system and RK45 is also useful (with rather informative priors). It‘s too complicated for my taste such that I do not think that I would be able to write down anything useful in this regard. These choices are hard and users need to find their own way trough their problems. |
Yeah I am happy providing this upfront and recommending it for details.
Want to try writing a User's Guide thing for this? That way @yizhang-yiz can suggest changes directly to that and we can iterate on exactly what needs said to make it clear how/when to use this feature. That will give us some time to hear back from odeint people on rk45 too. @yizhang-yiz, do you have any ideas on doing this for rk45? |
If that's what the physical phenomenon behind ODE requires, then yes.
Not necessarily, as show in my example. I can also show example that computation cost remains same across different abstol. But my point is there's not many definitive thing we can say about computation.
Yes. Even if not, the solution may be polluted. In that above 3-states 1e-10 vs 1e-12 example, if we keep solving state 2 when it reaches 1e-10(its presumed physical scale), even before it crosses zero its solution is useless because it's no longer in its right scale.
"computation" & "instability" are intertwined. vector abstol is the solution only when user believe states have different noise level. This is what cvodes document suggests, and it says nothing about computation efficiency(I don't want to give you bad news but if you go back to textbook you'll find out even retol + vector atol cannot guarantee global accuracy). |
Well yeah, but there is something we can say. The same things happen with statistics in general, all models are wrong, etc. etc. Alright so I'll update my list: Pros of lower abstol:
And cons:
Solutions for if you tried abstol and hit one of the cons:
What does noise level mean in an ODE solution? I'm assuming there's no data/inverse problem/statistics happening |
Also I tried to make a list without computational cost, but both you and Sebastian have provided examples now where computational cost is a big component of what happened. Different things are happening in those two problems and we don't get to say something simple, but there should be something there that gives people a heads up. |
It roughly means a threshold below which the state is not observable in reality and its effect can be safely ignored without compromising computing states that depend on/coupled with it.
Which is why I said efficiency is byproduct not purpose of tolerance, and user should focus on the accuracy of their solution and do whatever necessary that incurs. What we can say is
|
Just taking a pause from abstol -- will get back to this tomorrow, but is reltol this finicky? Or is it as simple as, more precision -> more computation |
The story here is similar: no definitive map between rtol & speed. If you expect lower rtol leads to more computing, here's a counter example. functions {
vector ode(real t, vector y) {
vector[2] dydt;
dydt[1] = 1.0 + sin(y[2] - y[1]);
dydt[2] = 1.5 + sin(y[1] - y[2]);
return dydt;
}
} transformed parameters {
vector[2] y0 = [3., 0.]';
vector[2] y[n] = ode_rk45_tol(ode, y0, 0.0, t, rtol, atol, 100000);
} The model doesn't do fitting so I ran 1000 fixed param iters to make time more obvious. Since the solution at end is at scale 1.e-1 to 1.e+4, I fix
|
Huh. I always thought the basic ODE tradeoff was speed vs. accuracy where tolerance was a proxy for accuracy. So let me extend the list to include reltol: Pros of lower reltol:
And cons:
Pros of lower abstol:
And cons:
Solutions for if you tried abstol and hit one of the cons:
-- I don't want to get rid of the performance component yet, because it is kinda dramatic. Like, by this tolerance is all that matters argument, we could flip a lot of switches and be conservative about things and end up with real slow solvers. But is it fair to say that the method is the most important thing in terms of performance? And then the tolerance is a smaller knob that you can jiggle a bit if you want? |
I like the saying "tolerance was a proxy for accuracy" but believe the "speed-accuracy tradeoff" is dangerous. In addition to the counter examples showing there's no definitive relation between, many users will not hesitate sacrificing accuracy in pursuing speed(I've committed the same sin because it's so attractive) with documentation's blessing, and are extremely reluctant to put thoughts on tolerance in the context of the problem. So let me repeat that tolerance is a tool for sake of accuracy not speed tuning, and we need test that and doc that. The rest just background noise. I want to like your list but it's still missing something(if it's intended for documentation, and if so maybe we should move discussion there). There are mainly two factors that determine solvers' stepsize & orders: accuracy & stability. If stability dominates, the cost is nearly unaffected by tolerance: one can use rk45 with a crude tolerance on a stiff problem and it still gonna cost more than bdf with a fine tolerance, and rk45's cost can remain basically the same across different orders of tolerance. This is another circumstance that tuning tolerance for speed's sake is not "totally obvious and trivial". For that I gave one last example, based on the ode I showed earlier. A user unknowingly using rk45 on this stiff problem will get wall time akin to the following(here I ran only one subject among 8 subjects from the original post, with fixed param and
See? Tolerance has no power here. And the user can be rightly assured that each run has reached the accuracy he requested, because that's the purpose of tolerance. Had he believed he can get better speed by tuning atol, he'd be grossly confused. You may think our bdf is free from this kind of thing but it's not. It's not a silver bullet for stiff problems. But I think that's a topic for another day. |
Yes, probably. I agree at this point about the no-simple-relationship thing. Obviously it's still valuable to change your tolerances if you find a set that gives a suitably accurate answer and the solve runs faster, but the predictive power of raise-tolerance-get-speed isn't gonna be great (cause there's so much else going on). Gonna take a pause on this for a bit though. I think we should try to write something from the perspective of speed-vs-accuracy tradeoff is misleading and why for the User's guide. For this I guess the question that remains is odeint. You think you can get to that in the not-to-distant future (before 2.26 which feature freezes on January 13th)? |
I like that there is discussion about this. I have also talked about a speed-accuracy tradeoff, originating from the fact that when doing a single ODE solve with a fixed-step explicit solver, then there is theory which shows that a smaller step size means more accuracy, and more steps need to be taken so more computation. But it is a good point that maybe it is bad to use the term tradeoff with adaptive solvers because the total number of steps that need to be taken depends on the curvature of the solution and how that changes over the time course. Then the MCMC aspect to it. Even if some relation between control parameters and speed exists for a single ODE solution, it might not be appropriate to say that when applied with HMC, because the accuracy will affect the accuracy of gradients, log posterior evaluations, and thus things that can cause a change in the overall speed of fitting a model. Edit: For example, I imagine that when changing the tolerance (either way), the sampler can then be allowed to go to a new parameter region where the step size needs to be adapted very small, and therefore cause a dramatic change in speed. |
I couldn't agree more that we should fix the user guide. In fact, I'd like to have that first before this feature. If this one doesn't make to 2.26, the documentation certainly can. @jtimonen |
Jenkins Console Log Machine informationProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010CPU: G++: Clang: |
I am not aware of any progress on the ODE doc nor are there new commits to this PR for now more than 3 weeks. As such I would propose to move forward with this PR in its current form. The rationale is that
If that makes sense to others then we should proceed and merge the PR - then we can look forward to have this available in 2.26 which is around the corner and that would be great to have. |
Well three weeks ago was the time to plan this for going into 2.26. I don't mind rebooting this, but the reason this stopped moving forward was because it dropped as a priority (#2244 (comment)). I get the argument that we can just add this feature cause it's CVODES and presumably it's there for a reason, but I don't like the asymmetry of the interfaces, and I don't like the problems we have even describing what this feature is and how it's useful. I honestly don't have time to try to resolve this stuff in three days, so I want to wait on this. |
There is anyway going to be asymmetry with solvers once we introduce adjoint. In the discussion with adjoint we also agreed on saying in the doc that things are difficult in general which also applies here given the discussion. Anyway, whoever wants to take this up can do so of course. |
Hmm, I don't think this it's a good resolution to close this. We all agreed it was probably useful and we disagreed on the way forward. I'm down to do these changes, but I wanna work out the RK45/documentation problem. If you disagree with this, call a vote. We're supposed to use those when we get deadlocked like this. It's not a big deal. Maybe everyone else that looks at this thinks we should just merge. Anyway, apologies for slowing this down. It wasn't my intention to stop it though (take as evidence all the conversations we've had so far about this feature -- if I didn't think it was worth doing, I'd like to think I would have told you earlier). This isn't the highest thing on my priority list though, so when you decided not to spend much time on this a few weeks ago, I happily mirrored that. We can certainly knock the dust off, but not in three days. |
My apologies, but it was not clear to me that you wanted to work on this. However, the code in here is by now not anymore even feature complete given that RK45 support is deemed a must have. This PR is at best a draft PR by now then in order to not waste too many resources. I do not mean to stop this activity by closing the PR. It just reflects that no one is working on it and things anyway changed too much. So please feel free to just reopen this PR. I am personally concentrating on the adjoint thing (really cool that you got that started!) which is promising to unlock much larger problems for Stan to become feasible due to the much better scaling properties of adjoint as compared to forward. |
Yeah, that's true. I'm gonna reopen this and leave it cause it has some unfinished code/commentary and there's really no better place to put it (even if this isn't the cleanest way to record this). Anyhow we can always close it in the future. |
Sorry guys I want to apologize for not following up in a coupled of weeks. I tool a 2-wk time-off around holidays and last week got caught up in SGB crash course. To echo @bbbales2 's points I believe we need to address rk45 & doc gap, in addition it's also agreed among some of us that more tests are needed to test effect of tol on the accuracy. For rk45 I'm moving Torsten toward using ARKode instead of boost::odeint, and can follow up later when it's ready. Vector tol is not my main motivation of using arkode there but do come out as a trivial by-product. For documentation there's been some discussion tho I haven't got time to respond. So I wouldn't say everything is in limbo but it just may take a while for being not a priority. |
Alright I haven't gotten around to doing anything here for another month and a half. Anyone have objections with me closing this down? Can re-open a new pull or whatever in the future. |
@betanalpha did you overlook this PR? I was lobbying hard for per-state absolute tolerances to be added to our CVODES solvers, but I was not successful with it due to too much resistance from others. I am not sure if you are aware of this episode. |
@wds15 Yes, I agree that the vector tolerances in the BDF integrator should be exposed for the same reason all of the tolerances should have been exposed in the adjoint solver -- to allow advanced users to experiment with the integrator configurations on realistic problems to develop better understanding of what configurations are robust enough for use with Markov chain Monte Carlo. We should not expect to know how to optimally configure the integrators before we expose those configurations; that is an incredibly challenging problem that is not going to be solved in a GitHub thread. We have no idea how good the default configurations actually are, and we cannot even explore that question until the configurations are exposed to as many advanced users as possible for experimentation. Maintaining arbitrary defaults is not a "safe" or "conservative" option. I fear that that precedence set by the adjoint integrator design doc discussion will provide an excuse for not exposing integrator configurations here as well, but my opinion is the same in both: until we have an explicit argument for how to set default configurations robust to Markov chain Monte Carlo use all of the configurations should be exposed. |
@betanalpha this PR has all the code ready to expose vector absolute tolerances for the states. No other extra options are exposed and this is only done for the CVODES solvers (adams and bdf). Should we re-open this PR with the content as is? If we re-open I would be happy to write some tests which integrate the new signatures with the testing framework in a basic way. (This is not to excuse anything here, but take it as a matter of fact that exposing what is already implemented is straightforward). |
Summary
This is to address performance issues with ODEs which have state variables with differing absolute tolerance needs. The absolute tolerance is critically influencing the integration performance and giving users more control over it should allow for faster ODE integration. This feature is supported by CVODES integrators (bdf and Adams).
The implementation is based on overloading the
ode_bdf_tol
andode_adams_tol
calls. So far the argument for the absolute tolerance was a scalar only and now there is an additional signature available where the absolute tolerance is an Eigen vector. This vector must have the same length as the state vector and specified for each state the absolute tolerance to use during integration.To make this feature available to Stan users, the stanc3 parsers needs an update as well (@rok-cesnovar could you look at this possibly?).
Tests
Tests were added for wrong input and one case was added where a problem was integrated with different absolute tolerances for a harmonic oscillator.
Side Effects
No
Release notes
Allow absolute tolerances to be specified for each ODE state for the
ode_bdf_tol
andode_adams_tol
integrators.Checklist
Math issue BDF ODE integrator slow #2224
Copyright holder: Sebastian Weber
The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
- Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
- Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
the basic tests are passing
./runTests.py test/unit
)make test-headers
)make test-math-dependencies
)make doxygen
)make cpplint
)the code is written in idiomatic C++ and changes are documented in the doxygen
the new changes are tested