Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

solve_ivp has a massive overhead #8257

Open
Wrzlprmft opened this issue Dec 31, 2017 · 22 comments
Open

solve_ivp has a massive overhead #8257

Wrzlprmft opened this issue Dec 31, 2017 · 22 comments
Labels
enhancement A new feature or improvement scipy.integrate

Comments

@Wrzlprmft
Copy link
Contributor

Wrzlprmft commented Dec 31, 2017

While the new integrators or solve_ivp, respectively, can compete with ode for large differential equations, it is up to twenty times slower for small ones, which suggests a massive overhead. This is not so nice, in particular considering that ode already has a considerable overhead when compared to odeint (which mostly comes through the latter internalising the loops, as far as I can tell).

Benchmark

The benchmark can be found in this Gist.

I here use my own module JiTCODE to compile the derivative to make things generally faster and emphasise what exactly is done by the benchmarked modules. Through this, the benchmark is not fair for odeint since the compiled derivative has to be wrapped to invert the arguments.

On my machine this yields:

----------------------------------------
two coupled FitzHugh–Nagumo oscillators
----------------------------------------
ode (dopri5) took 0.32029 s
odeint with suboptimal function (LSODA) took 0.24195 s
solve_ivp (RK45) took 6.30288 s
solve_ivp (RK45) with dense_output took 6.03647 s
RK45 with dense output took 5.78149 s
RK45 with manual resetting took 5.74096 s
----------------------------------------
random network of Kuramoto oscillators
----------------------------------------
ode (dopri5) took 6.65827 s
odeint with suboptimal function (LSODA) took 7.76285 s
solve_ivp (RK45) took 9.19205 s
solve_ivp (RK45) with dense_output took 9.28966 s
RK45 with dense output took 9.04160 s
RK45 with manual resetting took 9.15578 s
@ghost
Copy link

ghost commented Jan 1, 2018

Have you profiled the source to determine what it taking up the most time?

@pv
Copy link
Member

pv commented Jan 2, 2018 via email

@Wrzlprmft
Copy link
Contributor Author

iirc, this is mostly python per-call overhead, as per my last profiling.

@pv @xoviat My profiling confirms this and this would match the symptoms.

@nmayorov
Copy link
Contributor

What can I say, it was written in Python relatively fast to provide features that people enjoy in Matlab. It is very far from ideal in efficiency for small problems, but I think it is sort of acceptable for typical scipy usage I think having code in Python also provides some value, it is easy to read and understand.

Sure let's try to improve things step by step. For example, Runge-Kutta methods are quite straightforward and can be rewritten to compiled code relatively easy.

@nmayorov nmayorov added scipy.integrate enhancement A new feature or improvement labels Jan 20, 2018
@ghost
Copy link

ghost commented Feb 9, 2018

I'm attempting to address this in gh-8386. However, I couldn't get the benchmark to work with either master or the cythonized version.

@ghost
Copy link

ghost commented Feb 9, 2018

With SciPy 1.0 and other dependencies installed, I saw the following error:

Python 3.6.3 |Anaconda, Inc.| (default, Nov  8 2017, 15:10:56) [MSC v.1900 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]:             0.0135*y(2)-0.02*y(3)
   ...:         ]),
   ...:     initial = np.array([1.,2.,3.,4.]),
   ...:     times = 2000+np.arange(0,100000,10),
   ...:     rtol = 1e-5,
   ...:     atol = 1e-8,
   ...: )
   ...:
   ...:
   ...: n, c, q = 100, 3.0, 0.2
   ...: A = choice( [1,0], size=(n,n), p=[q,1-q] )
   ...: omega = uniform(-0.5,0.5,n)
   ...: def kuramotos_f():
   ...:     for i in range(n):
   ...:         coupling_sum = sum(
   ...:                 sin(y(j)-y(i))
   ...:                 for j in range(n)
   ...:                 if A[j,i]
   ...:             )
   ...:         yield omega[i] + c/(n-1)*coupling_sum
   ...:
   ...: test_scenario(
   ...:     name = "random network of Kuramoto oscillators",
   ...:     fun = get_compiled_function(kuramotos_f),
   ...:     initial = uniform(0,2*np.pi,n),
   ...:     times = range(1,10001,10),
   ...:     rtol = 1e-13,
   ...:     atol = 1e-6,
   ...: )
   ...:
jitced.c
   Creating library jitced.cp36-win_amd64.lib and object jitced.cp36-win_amd64.exp
Generating code
Finished generating code
----------------------------------------
two coupled FitzHughNagumo oscillators
----------------------------------------
Call-back argument must be function|instance|instance.__call__|f2py-function but got NoneType.
ode (dopri5) took 0.00000s
---------------------------------------------------------------------------
error                                     Traceback (most recent call last)
<ipython-input-1-169f9c9d0e1a> in <module>()
    112     times = 2000+np.arange(0,100000,10),
    113     rtol = 1e-5,
--> 114     atol = 1e-8,
    115 )
    116

<ipython-input-1-169f9c9d0e1a> in test_scenario(name, fun, initial, times, rtol, atol)
     31         I.set_integrator(solver_ode,rtol=rtol,atol=atol,nsteps=10**8)
     32         I.set_initial_value(initial,0.0)
---> 33         result = np.vstack(I.integrate(time) for time in times)
     34     assert I.successful()
     35

~\Miniconda3\lib\site-packages\numpy\core\shape_base.py in vstack(tup)
    232
    233     """
--> 234     return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
    235
    236 def hstack(tup):

~\Miniconda3\lib\site-packages\numpy\core\shape_base.py in <listcomp>(.0)
    232
    233     """
--> 234     return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
    235
    236 def hstack(tup):

<ipython-input-1-169f9c9d0e1a> in <genexpr>(.0)
     31         I.set_integrator(solver_ode,rtol=rtol,atol=atol,nsteps=10**8)
     32         I.set_initial_value(initial,0.0)
---> 33         result = np.vstack(I.integrate(time) for time in times)
     34     assert I.successful()
     35

~\Miniconda3\lib\site-packages\scipy\integrate\_ode.py in integrate(self, t, step, relax)
    430             self._y, self.t = mth(self.f, self.jac or (lambda: None),
    431                                   self._y, self.t, t,
--> 432                                   self.f_params, self.jac_params)
    433         except SystemError:
    434             # f2py issue with tuple returns, see ticket 1187.

~\Miniconda3\lib\site-packages\scipy\integrate\_ode.py in run(self, f, jac, y0, t0, t1, f_params, jac_params)
   1088     def run(self, f, jac, y0, t0, t1, f_params, jac_params):
   1089         x, y, iwork, istate = self.runner(*((f, t0, y0, t1) +
-> 1090                                           tuple(self.call_args) + (f_params,)))
   1091         self.istate = istate
   1092         if istate < 0:

error: failed in processing argument list for call-back fcn.

In [2]:

@ghost
Copy link

ghost commented Feb 15, 2018

cc @Wrzlprmft

Have you any idea why the benchmark isn't working?

@Wrzlprmft
Copy link
Contributor Author

Call-back argument must be function|instance|instance.call|f2py-function but got NoneType.

@xoviat: This is weird. Could be a problem of scipy.ode. Can you remove lines 29 to 34 in the benchmark (which I moved to a Gist) and try again?

If this fails as well, can you try to run all tests of jitcxde_common and jitcode and raise the issue at the first module which throws errors?

@ghost
Copy link

ghost commented Feb 15, 2018

Result without lines 29-34:

Generating code
Finished generating code
----------------------------------------
two coupled FitzHughNagumo oscillators
----------------------------------------
odeint with suboptimal function (LSODA) took 0.00000s
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-9a360840056b> in <module>()
    106     times = 2000+np.arange(0,100000,10),
    107     rtol = 1e-5,
--> 108     atol = 1e-8,
    109 )
    110

<ipython-input-1-9a360840056b> in test_scenario(name, fun, initial, times, rtol, atol)
     34                 y0=initial, t=[0.0]+list(times),
     35                 rtol=rtol, atol=atol,
---> 36                 mxstep=10**8
     37                 )
     38

~\Miniconda3\lib\site-packages\scipy\integrate\odepack.py in odeint(func, y0, t, args, Dfun, col_deriv, full_output, ml, mu, rtol, atol, tcrit, h0, hmax, hmin, ixpr, mxstep, mxhnil, mxordn, mxords, printmessg)
    213     output = _odepack.odeint(func, y0, t, args, Dfun, col_deriv, ml, mu,
    214                              full_output, rtol, atol, tcrit, h0, hmax, hmin,
--> 215                              ixpr, mxstep, mxhnil, mxordn, mxords)
    216     if output[-1] < 0:
    217         warning_msg = _msgs[output[-1]] + " Run with full_output = 1 to get quantitative information."

<ipython-input-1-9a360840056b> in <lambda>(y, t)
     28
     29
---> 30     inv_fun = lambda y,t: fun(t,y)
     31     with timer("odeint with suboptimal function (LSODA)"):
     32         result = odeint(

TypeError: 'NoneType' object is not callable

I'm running the tests now.

@ghost
Copy link

ghost commented Feb 15, 2018

@Wrzlprmft I'm struggling to create an environment that satisfies all of the dependencies required to run your script. I've attempted to build symengine.py from source to get ahead of the relevant issues, but have not been successful. Perhaps you could give some instructions to set-up the environment?

@Wrzlprmft
Copy link
Contributor Author

@xoviat I never encountered problems with that or had any reports along that line (otherwise I would have left instructions in the documentation). Can you please open a separate issue here detailing your problems, so we do not have to digress too much here?

My first guess is that the compilation fails but doesn’t throw an error for whatever reason. I vaguely recall that something like this happens with Jupyter and similar. Did you run it in a terminal?

@ghost
Copy link

ghost commented Feb 15, 2018

What I might do is wait for a symengine release and then come back to this.

@15940260868
Copy link

@Wrzlprmft hi, did you run the benchmark successfully? I run it with a problem of "TypeError: test_scenario() missing 6 required positional arguments: 'name', 'fun', 'initial', 'times', 'rtol', and 'atol'" Do you know why? Thank you

@Wrzlprmft
Copy link
Contributor Author

@15940260868 This doesn’t happen to me and I really do not see why it should happen. It’s almost certainly not a problem of SciPy or JiTCODE. Did you make any changes to the code or run it in a weird way?

@billtubbs
Copy link

billtubbs commented Nov 18, 2019

Would be great to see solve_ivp speeded up.

For my problem (Lorenz system for 5000 time steps):
odeint took 0.21s
solve_ivp (with method='LSODA') took 1.17s

Maximum difference in solutions: 5.4e-06

@syockit
Copy link

syockit commented Jan 5, 2020

@billtubbs For LSODA, try using scipy-1.4.1 (should be fixed with gh-9901)

I found that for the FitzHugh–Nagumo oscillators, dopri5 and solve_ivp(RK45) do not converge to the same solution. I tried looking at the trajectory (for time period 0-2000, instead of 2000-100000 as in the benchmark) and dopri5 shows a spike at around t=1800 before correcting itself to ivp(RK45)'s trajectory (but being late in phase)
oscillator

@billtubbs
Copy link

billtubbs commented Jan 5, 2020

Thanks. I just upgraded to scipy-1.4.1 and here are the results (again, on my simulated Lorenz system for 5000 timesteps).

Not much different:

Scipy: 1.3.1

  1. odeint: 93.1 ms
  2. solve_ivp(LSODA): 217.1 ms

Scipy: 1.4.1

  1. odeint: 98.1 ms
  2. solve_ivp(LSODA): 214.7 ms

It's not a very accurate test (using time.time()) but I also checked using the iPython %timeit function in a Jupyter notebook I get a similar difference - (89 ms vs 217 ms).

B.t.w. I'm not concerned that they converge to different solutions. The Lorenz system is divergent.

Code is here if you want to take a look.

@Nicholaswogan
Copy link

I wrote a wrapper to LSODA which has no overhead: https://github.com/Nicholaswogan/NumbaLSODA . Solving ODEs with this wrapper should be just as fast as pure C/C++. Below is a comparison with your sample lorenz system:

Scipy: 1.5.2

  1. odeint: 143.4 ms
  2. solve_ivp(LSODA): 310.7 ms
  3. NumbaLSODA: 3.6 ms
"""Comparison of scipy's integration functions.
"""
import time
from functools import partial
import numpy as np
import scipy
from scipy.integrate import odeint, solve_ivp

from NumbaLSODA import lsoda_sig, lsoda
import numba as nb

def lorenz_odes(t, y, sigma, beta, rho):
    """The Lorenz system of ordinary differential equations.
    Returns:
        dydt (tuple): Derivative (w.r.t. time)
    """
    y1, y2, y3 = y
    return (sigma * (y2 - y1), y1 * (rho - y3) - y2, y1 * y2 - beta * y3)

@nb.cfunc(lsoda_sig)
def lorenz_odes_nb(t, y_, dy, p_):
    y = nb.carray(y_,(3,))
    p = nb.carray(p_,(3,))
    y1, y2, y3 = y
    sigma, beta, rho = p
    dy[0] = sigma * (y2 - y1)
    dy[1] = y1 * (rho - y3) - y2
    dy[2] = y1 * y2 - beta * y3
funcptr = lorenz_odes_nb.address

print(f"Scipy: {scipy.__version__}")

dt = 0.01
T = 50
t = np.arange(dt, T + dt, dt)

# Lorenz system parameters
beta = 8 / 3
sigma = 10
rho = 28
n = 3

# Function to be integrated - with parameter values
fun = partial(lorenz_odes, sigma=sigma, beta=beta, rho=rho)

# Initial condition
y0 = (-8, 8, 27)

# Simulate using scipy.integrate.odeint method
# Produces same results as Matlab
rtol = 10e-12
atol = 10e-12 * np.ones_like(y0)
t0 = time.time()
y = odeint(fun, y0, t, tfirst=True, rtol=rtol, atol=atol)
print(f"odeint: {(time.time() - t0)*1000:.1f} ms")
assert y.shape == (5000, 3)

# Simulate using scipy.integrate.solve_ivp method
t_span = [t[0], t[-1]]
t0 = time.time()
sol = solve_ivp(fun, t_span, y0, t_eval=t, method='LSODA')
print(f"solve_ivp(LSODA): {(time.time() - t0)*1000:.1f} ms")

# Simulate using NumbaLSODA
data = np.array([sigma, beta, rho],dtype = np.float64)
y0_ = np.array(y0,dtype = np.float64)
t0 = time.time()
usol, success = lsoda(funcptr, y0_, t, data, rtol=rtol, atol=atol[0])
print(f"NumbaLSODA: {(time.time() - t0)*1000:.1f} ms")

@astrojuanlu
Copy link
Contributor

Related: https://github.com/hgrecco/numbakit-ode/

@astrojuanlu
Copy link
Contributor

Apart from that, even though #8386 by 2018-@/xoviat was closed, I wonder if there's still interest in bringing it back to life?

@slayoo
Copy link
Contributor

slayoo commented May 10, 2023

In an attempt to speed things up when working with solve_ivp, we've used Numba to JIT-compile parts of the ODE system. This works well, however not with all ODE solvers in SciPy. In particular, we've noticed that using the BDF method is impossible with Numba, most surprisingly merely using Numba in another parts of a project already makes the BDF solver behave non-deterministically. We've described the issue in the Numba issue tracker: numba/numba#8931

Mentioning here because perhaps some of you might have hints how to approach the problem. In short, using Numba in unrelated code makes NaN value occur in the the BDF solver essentially at random. Never noticed so far on macOS; one documented case on Windows; easy to catch on Linux. The referenced Numba issue includes a minimal reproducer with a Dockerfile as well as a Github Actions CI set up for the very purpose of depicting the issue: https://github.com/slayoo/numba_issue_8931/actions/

Help needed :)

@HannoSpreeuw
Copy link

@slayoo that's great to hear!
I am happy with using "Radau" or "LSODA", with a functional Jacobian, so is your speeded up version of solve_ivp available?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.integrate
Projects
None yet
Development

No branches or pull requests

10 participants