solve_ivp has a massive overhead #8257

Wrzlprmft · 2017-12-31T20:18:22Z

While the new integrators or solve_ivp, respectively, can compete with ode for large differential equations, it is up to twenty times slower for small ones, which suggests a massive overhead. This is not so nice, in particular considering that ode already has a considerable overhead when compared to odeint (which mostly comes through the latter internalising the loops, as far as I can tell).

Benchmark

The benchmark can be found in this Gist.

I here use my own module JiTCODE to compile the derivative to make things generally faster and emphasise what exactly is done by the benchmarked modules. Through this, the benchmark is not fair for odeint since the compiled derivative has to be wrapped to invert the arguments.

On my machine this yields:

----------------------------------------
two coupled FitzHugh–Nagumo oscillators
----------------------------------------
ode (dopri5) took 0.32029 s
odeint with suboptimal function (LSODA) took 0.24195 s
solve_ivp (RK45) took 6.30288 s
solve_ivp (RK45) with dense_output took 6.03647 s
RK45 with dense output took 5.78149 s
RK45 with manual resetting took 5.74096 s
----------------------------------------
random network of Kuramoto oscillators
----------------------------------------
ode (dopri5) took 6.65827 s
odeint with suboptimal function (LSODA) took 7.76285 s
solve_ivp (RK45) took 9.19205 s
solve_ivp (RK45) with dense_output took 9.28966 s
RK45 with dense output took 9.04160 s
RK45 with manual resetting took 9.15578 s

The text was updated successfully, but these errors were encountered:

ghost · 2018-01-01T21:59:42Z

Have you profiled the source to determine what it taking up the most time?

pv · 2018-01-02T08:33:06Z

iirc, this is mostly python per-call overhead, as per my last profiling. The plan probably is to rewrite the innermost tight looping parts later. 1.1.2018 23.59 "xoviat" <notifications@github.com> kirjoitti:

…

Have you profiled the source to determine what it taking up the most time? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#8257 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AACI5iGyiur9zygE4tD7UPl6dMVBmhN2ks5tGVVOgaJpZM4RP7VC> .

Wrzlprmft · 2018-01-02T10:09:15Z

iirc, this is mostly python per-call overhead, as per my last profiling.

@pv @xoviat My profiling confirms this and this would match the symptoms.

nmayorov · 2018-01-20T16:19:42Z

What can I say, it was written in Python relatively fast to provide features that people enjoy in Matlab. It is very far from ideal in efficiency for small problems, but I think it is sort of acceptable for typical scipy usage I think having code in Python also provides some value, it is easy to read and understand.

Sure let's try to improve things step by step. For example, Runge-Kutta methods are quite straightforward and can be rewritten to compiled code relatively easy.

ghost · 2018-02-09T05:38:27Z

I'm attempting to address this in gh-8386. However, I couldn't get the benchmark to work with either master or the cythonized version.

ghost · 2018-02-09T17:39:22Z

With SciPy 1.0 and other dependencies installed, I saw the following error:

Python 3.6.3 |Anaconda, Inc.| (default, Nov  8 2017, 15:10:56) [MSC v.1900 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]:             0.0135*y(2)-0.02*y(3)
   ...:         ]),
   ...:     initial = np.array([1.,2.,3.,4.]),
   ...:     times = 2000+np.arange(0,100000,10),
   ...:     rtol = 1e-5,
   ...:     atol = 1e-8,
   ...: )
   ...:
   ...:
   ...: n, c, q = 100, 3.0, 0.2
   ...: A = choice( [1,0], size=(n,n), p=[q,1-q] )
   ...: omega = uniform(-0.5,0.5,n)
   ...: def kuramotos_f():
   ...:     for i in range(n):
   ...:         coupling_sum = sum(
   ...:                 sin(y(j)-y(i))
   ...:                 for j in range(n)
   ...:                 if A[j,i]
   ...:             )
   ...:         yield omega[i] + c/(n-1)*coupling_sum
   ...:
   ...: test_scenario(
   ...:     name = "random network of Kuramoto oscillators",
   ...:     fun = get_compiled_function(kuramotos_f),
   ...:     initial = uniform(0,2*np.pi,n),
   ...:     times = range(1,10001,10),
   ...:     rtol = 1e-13,
   ...:     atol = 1e-6,
   ...: )
   ...:
jitced.c
   Creating library jitced.cp36-win_amd64.lib and object jitced.cp36-win_amd64.exp
Generating code
Finished generating code
----------------------------------------
two coupled FitzHughNagumo oscillators
----------------------------------------
Call-back argument must be function|instance|instance.__call__|f2py-function but got NoneType.
ode (dopri5) took 0.00000s
---------------------------------------------------------------------------
error                                     Traceback (most recent call last)
<ipython-input-1-169f9c9d0e1a> in <module>()
    112     times = 2000+np.arange(0,100000,10),
    113     rtol = 1e-5,
--> 114     atol = 1e-8,
    115 )
    116

<ipython-input-1-169f9c9d0e1a> in test_scenario(name, fun, initial, times, rtol, atol)
     31         I.set_integrator(solver_ode,rtol=rtol,atol=atol,nsteps=10**8)
     32         I.set_initial_value(initial,0.0)
---> 33         result = np.vstack(I.integrate(time) for time in times)
     34     assert I.successful()
     35

~\Miniconda3\lib\site-packages\numpy\core\shape_base.py in vstack(tup)
    232
    233     """
--> 234     return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
    235
    236 def hstack(tup):

~\Miniconda3\lib\site-packages\numpy\core\shape_base.py in <listcomp>(.0)
    232
    233     """
--> 234     return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
    235
    236 def hstack(tup):

<ipython-input-1-169f9c9d0e1a> in <genexpr>(.0)
     31         I.set_integrator(solver_ode,rtol=rtol,atol=atol,nsteps=10**8)
     32         I.set_initial_value(initial,0.0)
---> 33         result = np.vstack(I.integrate(time) for time in times)
     34     assert I.successful()
     35

~\Miniconda3\lib\site-packages\scipy\integrate\_ode.py in integrate(self, t, step, relax)
    430             self._y, self.t = mth(self.f, self.jac or (lambda: None),
    431                                   self._y, self.t, t,
--> 432                                   self.f_params, self.jac_params)
    433         except SystemError:
    434             # f2py issue with tuple returns, see ticket 1187.

~\Miniconda3\lib\site-packages\scipy\integrate\_ode.py in run(self, f, jac, y0, t0, t1, f_params, jac_params)
   1088     def run(self, f, jac, y0, t0, t1, f_params, jac_params):
   1089         x, y, iwork, istate = self.runner(*((f, t0, y0, t1) +
-> 1090                                           tuple(self.call_args) + (f_params,)))
   1091         self.istate = istate
   1092         if istate < 0:

error: failed in processing argument list for call-back fcn.

In [2]:

ghost · 2018-02-15T18:18:18Z

cc @Wrzlprmft

Have you any idea why the benchmark isn't working?

Wrzlprmft · 2018-02-15T18:24:15Z

Call-back argument must be function|instance|instance.call|f2py-function but got NoneType.

@xoviat: This is weird. Could be a problem of scipy.ode. Can you remove lines 29 to 34 in the benchmark (which I moved to a Gist) and try again?

If this fails as well, can you try to run all tests of jitcxde_common and jitcode and raise the issue at the first module which throws errors?

ghost · 2018-02-15T18:28:08Z

Result without lines 29-34:

Generating code
Finished generating code
----------------------------------------
two coupled FitzHughNagumo oscillators
----------------------------------------
odeint with suboptimal function (LSODA) took 0.00000s
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-9a360840056b> in <module>()
    106     times = 2000+np.arange(0,100000,10),
    107     rtol = 1e-5,
--> 108     atol = 1e-8,
    109 )
    110

<ipython-input-1-9a360840056b> in test_scenario(name, fun, initial, times, rtol, atol)
     34                 y0=initial, t=[0.0]+list(times),
     35                 rtol=rtol, atol=atol,
---> 36                 mxstep=10**8
     37                 )
     38

~\Miniconda3\lib\site-packages\scipy\integrate\odepack.py in odeint(func, y0, t, args, Dfun, col_deriv, full_output, ml, mu, rtol, atol, tcrit, h0, hmax, hmin, ixpr, mxstep, mxhnil, mxordn, mxords, printmessg)
    213     output = _odepack.odeint(func, y0, t, args, Dfun, col_deriv, ml, mu,
    214                              full_output, rtol, atol, tcrit, h0, hmax, hmin,
--> 215                              ixpr, mxstep, mxhnil, mxordn, mxords)
    216     if output[-1] < 0:
    217         warning_msg = _msgs[output[-1]] + " Run with full_output = 1 to get quantitative information."

<ipython-input-1-9a360840056b> in <lambda>(y, t)
     28
     29
---> 30     inv_fun = lambda y,t: fun(t,y)
     31     with timer("odeint with suboptimal function (LSODA)"):
     32         result = odeint(

TypeError: 'NoneType' object is not callable

I'm running the tests now.

ghost · 2018-02-15T19:41:24Z

@Wrzlprmft I'm struggling to create an environment that satisfies all of the dependencies required to run your script. I've attempted to build symengine.py from source to get ahead of the relevant issues, but have not been successful. Perhaps you could give some instructions to set-up the environment?

Wrzlprmft · 2018-02-15T20:36:27Z

@xoviat I never encountered problems with that or had any reports along that line (otherwise I would have left instructions in the documentation). Can you please open a separate issue here detailing your problems, so we do not have to digress too much here?

My first guess is that the compilation fails but doesn’t throw an error for whatever reason. I vaguely recall that something like this happens with Jupyter and similar. Did you run it in a terminal?

ghost · 2018-02-15T21:26:20Z

What I might do is wait for a symengine release and then come back to this.

15940260868 · 2018-02-24T00:17:25Z

@Wrzlprmft hi, did you run the benchmark successfully? I run it with a problem of "TypeError: test_scenario() missing 6 required positional arguments: 'name', 'fun', 'initial', 'times', 'rtol', and 'atol'" Do you know why? Thank you

Wrzlprmft · 2018-02-24T07:13:23Z

@15940260868 This doesn’t happen to me and I really do not see why it should happen. It’s almost certainly not a problem of SciPy or JiTCODE. Did you make any changes to the code or run it in a weird way?

billtubbs · 2019-11-18T20:56:21Z

Would be great to see solve_ivp speeded up.

For my problem (Lorenz system for 5000 time steps):
odeint took 0.21s
solve_ivp (with method='LSODA') took 1.17s

Maximum difference in solutions: 5.4e-06

syockit · 2020-01-05T15:52:42Z

@billtubbs For LSODA, try using scipy-1.4.1 (should be fixed with gh-9901)

I found that for the FitzHugh–Nagumo oscillators, dopri5 and solve_ivp(RK45) do not converge to the same solution. I tried looking at the trajectory (for time period 0-2000, instead of 2000-100000 as in the benchmark) and dopri5 shows a spike at around t=1800 before correcting itself to ivp(RK45)'s trajectory (but being late in phase)

billtubbs · 2020-01-05T19:55:04Z

Thanks. I just upgraded to scipy-1.4.1 and here are the results (again, on my simulated Lorenz system for 5000 timesteps).

Not much different:

Scipy: 1.3.1

odeint: 93.1 ms
solve_ivp(LSODA): 217.1 ms

Scipy: 1.4.1

odeint: 98.1 ms
solve_ivp(LSODA): 214.7 ms

It's not a very accurate test (using time.time()) but I also checked using the iPython %timeit function in a Jupyter notebook I get a similar difference - (89 ms vs 217 ms).

B.t.w. I'm not concerned that they converge to different solutions. The Lorenz system is divergent.

Code is here if you want to take a look.

Nicholaswogan · 2021-07-06T18:07:32Z

I wrote a wrapper to LSODA which has no overhead: https://github.com/Nicholaswogan/NumbaLSODA . Solving ODEs with this wrapper should be just as fast as pure C/C++. Below is a comparison with your sample lorenz system:

Scipy: 1.5.2

odeint: 143.4 ms
solve_ivp(LSODA): 310.7 ms
NumbaLSODA: 3.6 ms

"""Comparison of scipy's integration functions.
"""
import time
from functools import partial
import numpy as np
import scipy
from scipy.integrate import odeint, solve_ivp

from NumbaLSODA import lsoda_sig, lsoda
import numba as nb

def lorenz_odes(t, y, sigma, beta, rho):
    """The Lorenz system of ordinary differential equations.
    Returns:
        dydt (tuple): Derivative (w.r.t. time)
    """
    y1, y2, y3 = y
    return (sigma * (y2 - y1), y1 * (rho - y3) - y2, y1 * y2 - beta * y3)

@nb.cfunc(lsoda_sig)
def lorenz_odes_nb(t, y_, dy, p_):
    y = nb.carray(y_,(3,))
    p = nb.carray(p_,(3,))
    y1, y2, y3 = y
    sigma, beta, rho = p
    dy[0] = sigma * (y2 - y1)
    dy[1] = y1 * (rho - y3) - y2
    dy[2] = y1 * y2 - beta * y3
funcptr = lorenz_odes_nb.address

print(f"Scipy: {scipy.__version__}")

dt = 0.01
T = 50
t = np.arange(dt, T + dt, dt)

# Lorenz system parameters
beta = 8 / 3
sigma = 10
rho = 28
n = 3

# Function to be integrated - with parameter values
fun = partial(lorenz_odes, sigma=sigma, beta=beta, rho=rho)

# Initial condition
y0 = (-8, 8, 27)

# Simulate using scipy.integrate.odeint method
# Produces same results as Matlab
rtol = 10e-12
atol = 10e-12 * np.ones_like(y0)
t0 = time.time()
y = odeint(fun, y0, t, tfirst=True, rtol=rtol, atol=atol)
print(f"odeint: {(time.time() - t0)*1000:.1f} ms")
assert y.shape == (5000, 3)

# Simulate using scipy.integrate.solve_ivp method
t_span = [t[0], t[-1]]
t0 = time.time()
sol = solve_ivp(fun, t_span, y0, t_eval=t, method='LSODA')
print(f"solve_ivp(LSODA): {(time.time() - t0)*1000:.1f} ms")

# Simulate using NumbaLSODA
data = np.array([sigma, beta, rho],dtype = np.float64)
y0_ = np.array(y0,dtype = np.float64)
t0 = time.time()
usol, success = lsoda(funcptr, y0_, t, data, rtol=rtol, atol=atol[0])
print(f"NumbaLSODA: {(time.time() - t0)*1000:.1f} ms")

astrojuanlu · 2021-10-30T14:36:13Z

Related: https://github.com/hgrecco/numbakit-ode/

astrojuanlu · 2021-10-30T14:38:01Z

Apart from that, even though #8386 by 2018-@/xoviat was closed, I wonder if there's still interest in bringing it back to life?

slayoo · 2023-05-10T12:21:35Z

In an attempt to speed things up when working with solve_ivp, we've used Numba to JIT-compile parts of the ODE system. This works well, however not with all ODE solvers in SciPy. In particular, we've noticed that using the BDF method is impossible with Numba, most surprisingly merely using Numba in another parts of a project already makes the BDF solver behave non-deterministically. We've described the issue in the Numba issue tracker: numba/numba#8931

Mentioning here because perhaps some of you might have hints how to approach the problem. In short, using Numba in unrelated code makes NaN value occur in the the BDF solver essentially at random. Never noticed so far on macOS; one documented case on Windows; easy to catch on Linux. The referenced Numba issue includes a minimal reproducer with a Dockerfile as well as a Github Actions CI set up for the very purpose of depicting the issue: https://github.com/slayoo/numba_issue_8931/actions/

Help needed :)

HannoSpreeuw · 2023-08-07T20:57:26Z

@slayoo that's great to hear!
I am happy with using "Radau" or "LSODA", with a functional Jacobian, so is your speeded up version of solve_ivp available?

Wrzlprmft mentioned this issue Dec 31, 2017

Make new integrators support user-controlled steps (solve_ivp) #8258

Open

rgommers mentioned this issue Jan 19, 2018

DOC: recommend scipy.integrate.solve_ivp #8222

Merged

nmayorov added scipy.integrate enhancement A new feature or improvement labels Jan 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

solve_ivp has a massive overhead #8257

solve_ivp has a massive overhead #8257

Wrzlprmft commented Dec 31, 2017 •

edited

ghost commented Jan 1, 2018

pv commented Jan 2, 2018 via email

Wrzlprmft commented Jan 2, 2018

nmayorov commented Jan 20, 2018

ghost commented Feb 9, 2018

ghost commented Feb 9, 2018

ghost commented Feb 15, 2018

Wrzlprmft commented Feb 15, 2018

ghost commented Feb 15, 2018

ghost commented Feb 15, 2018

Wrzlprmft commented Feb 15, 2018

ghost commented Feb 15, 2018

15940260868 commented Feb 24, 2018

Wrzlprmft commented Feb 24, 2018

billtubbs commented Nov 18, 2019 •

edited

syockit commented Jan 5, 2020

billtubbs commented Jan 5, 2020 •

edited

Nicholaswogan commented Jul 6, 2021

astrojuanlu commented Oct 30, 2021

astrojuanlu commented Oct 30, 2021

slayoo commented May 10, 2023

HannoSpreeuw commented Aug 7, 2023

solve_ivp has a massive overhead #8257

solve_ivp has a massive overhead #8257

Comments

Wrzlprmft commented Dec 31, 2017 • edited

Benchmark

ghost commented Jan 1, 2018

pv commented Jan 2, 2018 via email

Wrzlprmft commented Jan 2, 2018

nmayorov commented Jan 20, 2018

ghost commented Feb 9, 2018

ghost commented Feb 9, 2018

ghost commented Feb 15, 2018

Wrzlprmft commented Feb 15, 2018

ghost commented Feb 15, 2018

ghost commented Feb 15, 2018

Wrzlprmft commented Feb 15, 2018

ghost commented Feb 15, 2018

15940260868 commented Feb 24, 2018

Wrzlprmft commented Feb 24, 2018

billtubbs commented Nov 18, 2019 • edited

syockit commented Jan 5, 2020

billtubbs commented Jan 5, 2020 • edited

Nicholaswogan commented Jul 6, 2021

astrojuanlu commented Oct 30, 2021

astrojuanlu commented Oct 30, 2021

slayoo commented May 10, 2023

HannoSpreeuw commented Aug 7, 2023

Wrzlprmft commented Dec 31, 2017 •

edited

billtubbs commented Nov 18, 2019 •

edited

billtubbs commented Jan 5, 2020 •

edited