Infinite final-step rejection loop due to conflicting endpoint clipping and adaptative RK step rejection

Hi :)

I am using Diffrax through the Dynamiqs quantum simulation library, and I think I encountered an endpoint-handling issue in Diffrax.

This issue is similar but not the same mechanism as : [Issue #703](https://github.com/patrick-kidger/diffrax/issues/703#issue-3591397040) 

### Issue

I was investigating the stability of Tsit5() on a difficult quantum simulation where the PID controller converges to small time steps compared with the final integration time. The simulation runs fine for almost the whole interval, but at the very last step the solver gets stuck in what looks like an infinite rejection loop until max_steps is reached.

### Diagnosis of the issue 

Diffrax has a mechanism to avoid taking a very small last time step. If the proposed tnext is within the last 100 * ULP(t1) before the final time, then Diffrax treats this as close enough to the end and clips toward t1:

```
def _clip_to_end(tprev, tnext, t1, t1_clip_floor, keep_step):
    clip = tnext > t1_clip_floor
    tclip = jnp.where(keep_step, t1, tprev + 0.5 * (t1 - tprev))
    return jnp.where(clip, tclip, tnext)
```

I understand the motivation: if the solver is at

`tprev = t1 - epsilon`

with very small epsilon, then taking a tiny final step to exactly t1 may be numerically unstable, especially for dense output.

In my case, with `float32` and `t1 = 600.0`:

```
ULP(t1) ≈ 6.1e-5
100 * ULP(t1) ≈ 6.1e-3
PID-selected dt ≈ 3e-3
```

So the PID controller wants steps smaller than the final clipping window.

For example, suppose we are at:

`tprev ≈ 599.993
`
and the PID controller naturally proposes:

`tnext ≈ 599.996
`
This is already inside the interval:

`[t1 - 100 * ULP(t1), t1]
`
so Diffrax treats it as a near-final step. If it tries to go to t1, the effective final step becomes roughly:

`599.993 -> 600.0
`
which is about `0.007`, larger than what the PID controller wanted. The step is rejected.

Then Diffrax proposes the midpoint:

`tnext = 0.5 * (tprev + t1)
`
which is around:

`599.9965
`
but this is still inside the final 100-ULP window. So Diffrax again treats it as a near-final step, tries to finish, rejects, proposes another point still inside the window, and the loop repeats until max_steps is reached.

### Suggested fix

A possible fix, if you agree with the diagnosis, would be to make endpoint handling an option. The current behavior could remain the default, but another mode could be “overshoot and interpolate”:

**Instead of forcing the last numerical step to land exactly on t1, allow the solver to take one normal accepted step past t1, then evaluate the existing dense/local interpolation at t1.**

This would treat the final time like any other saved intermediate time. For example:

`9.9975 -> 9.9985 -> 9.9995 -> 10.005`

and then return:

`y(10.0)`

by evaluating the local interpolation on the accepted step `[9.9995, 10.005]`.

Would this endpoint mode make sense for Diffrax? If so, I would be happy to try opening a PR.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Infinite final-step rejection loop due to conflicting endpoint clipping and adaptative RK step rejection #756

Issue

Diagnosis of the issue

Suggested fix

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Infinite final-step rejection loop due to conflicting endpoint clipping and adaptative RK step rejection #756

Description

Issue

Diagnosis of the issue

Suggested fix

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions