Hi :)
I am using Diffrax through the Dynamiqs quantum simulation library, and I think I encountered an endpoint-handling issue in Diffrax.
This issue is similar but not the same mechanism as : Issue #703
Issue
I was investigating the stability of Tsit5() on a difficult quantum simulation where the PID controller converges to small time steps compared with the final integration time. The simulation runs fine for almost the whole interval, but at the very last step the solver gets stuck in what looks like an infinite rejection loop until max_steps is reached.
Diagnosis of the issue
Diffrax has a mechanism to avoid taking a very small last time step. If the proposed tnext is within the last 100 * ULP(t1) before the final time, then Diffrax treats this as close enough to the end and clips toward t1:
def _clip_to_end(tprev, tnext, t1, t1_clip_floor, keep_step):
clip = tnext > t1_clip_floor
tclip = jnp.where(keep_step, t1, tprev + 0.5 * (t1 - tprev))
return jnp.where(clip, tclip, tnext)
I understand the motivation: if the solver is at
tprev = t1 - epsilon
with very small epsilon, then taking a tiny final step to exactly t1 may be numerically unstable, especially for dense output.
In my case, with float32 and t1 = 600.0:
ULP(t1) ≈ 6.1e-5
100 * ULP(t1) ≈ 6.1e-3
PID-selected dt ≈ 3e-3
So the PID controller wants steps smaller than the final clipping window.
For example, suppose we are at:
tprev ≈ 599.993
and the PID controller naturally proposes:
tnext ≈ 599.996
This is already inside the interval:
[t1 - 100 * ULP(t1), t1]
so Diffrax treats it as a near-final step. If it tries to go to t1, the effective final step becomes roughly:
599.993 -> 600.0
which is about 0.007, larger than what the PID controller wanted. The step is rejected.
Then Diffrax proposes the midpoint:
tnext = 0.5 * (tprev + t1)
which is around:
599.9965
but this is still inside the final 100-ULP window. So Diffrax again treats it as a near-final step, tries to finish, rejects, proposes another point still inside the window, and the loop repeats until max_steps is reached.
Suggested fix
A possible fix, if you agree with the diagnosis, would be to make endpoint handling an option. The current behavior could remain the default, but another mode could be “overshoot and interpolate”:
Instead of forcing the last numerical step to land exactly on t1, allow the solver to take one normal accepted step past t1, then evaluate the existing dense/local interpolation at t1.
This would treat the final time like any other saved intermediate time. For example:
9.9975 -> 9.9985 -> 9.9995 -> 10.005
and then return:
y(10.0)
by evaluating the local interpolation on the accepted step [9.9995, 10.005].
Would this endpoint mode make sense for Diffrax? If so, I would be happy to try opening a PR.
Hi :)
I am using Diffrax through the Dynamiqs quantum simulation library, and I think I encountered an endpoint-handling issue in Diffrax.
This issue is similar but not the same mechanism as : Issue #703
Issue
I was investigating the stability of Tsit5() on a difficult quantum simulation where the PID controller converges to small time steps compared with the final integration time. The simulation runs fine for almost the whole interval, but at the very last step the solver gets stuck in what looks like an infinite rejection loop until max_steps is reached.
Diagnosis of the issue
Diffrax has a mechanism to avoid taking a very small last time step. If the proposed tnext is within the last 100 * ULP(t1) before the final time, then Diffrax treats this as close enough to the end and clips toward t1:
I understand the motivation: if the solver is at
tprev = t1 - epsilonwith very small epsilon, then taking a tiny final step to exactly t1 may be numerically unstable, especially for dense output.
In my case, with
float32andt1 = 600.0:So the PID controller wants steps smaller than the final clipping window.
For example, suppose we are at:
tprev ≈ 599.993and the PID controller naturally proposes:
tnext ≈ 599.996This is already inside the interval:
[t1 - 100 * ULP(t1), t1]so Diffrax treats it as a near-final step. If it tries to go to t1, the effective final step becomes roughly:
599.993 -> 600.0which is about
0.007, larger than what the PID controller wanted. The step is rejected.Then Diffrax proposes the midpoint:
tnext = 0.5 * (tprev + t1)which is around:
599.9965but this is still inside the final 100-ULP window. So Diffrax again treats it as a near-final step, tries to finish, rejects, proposes another point still inside the window, and the loop repeats until max_steps is reached.
Suggested fix
A possible fix, if you agree with the diagnosis, would be to make endpoint handling an option. The current behavior could remain the default, but another mode could be “overshoot and interpolate”:
Instead of forcing the last numerical step to land exactly on t1, allow the solver to take one normal accepted step past t1, then evaluate the existing dense/local interpolation at t1.
This would treat the final time like any other saved intermediate time. For example:
9.9975 -> 9.9985 -> 9.9995 -> 10.005and then return:
y(10.0)by evaluating the local interpolation on the accepted step
[9.9995, 10.005].Would this endpoint mode make sense for Diffrax? If so, I would be happy to try opening a PR.