Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cythonize scalar function root finders #9216

Closed
mikofski opened this issue Sep 3, 2018 · 21 comments
Closed

Cythonize scalar function root finders #9216

mikofski opened this issue Sep 3, 2018 · 21 comments

Comments

@mikofski
Copy link
Contributor

mikofski commented Sep 3, 2018

@person142 responsed to #8354 that his preference was for a cython optimize api.

This came up before: #7242.

As you can see I was against the idea. Handling all reasonable halting conditions for a scalar solver is already a PITA, and I think the problem gets much worse for a vectorized scalar solver. IMO it is better to provide a Cython API to a scalar solver and let users handle looping over it.

#8357 only vectorized Newton methods. #8431 allows users to use a "tight loop" in C with Brent, Ridder, and other root finders.

We decided it was okay to split up Newton

Is it okay to split up Newton, Halley's, and secant into different calls in cython_optimize?

I am personally pro this (wish it had been done that way in optimize), but maybe others will feel more strongly that the APIs should match.

We discussed which callbacks to support and the fact that cython_optimize should be pure C so it can free the GIL in this comment

This commit has links to annotated html that shows that zeros_struct and zeros_array are pure C, no Python so they can release the GIL to call Cython prange

The Cython optimize API in #8431 was discussed in this SciPy-Dev post and this one too when I specifically asked about callback signatures.

@pvanmulbregt
Copy link
Contributor

If I understand correctly then:
gh-8354 (Issue) ENH: "vectorize" newton and other scalar methods
gh-8357 (PR) ENH: vectorize scalar zero-search-functions
At one stage a reviewer expressed preference for a different approach, using a cython optimized API. The PR gh-8357, using vectorization and not a cython optimized API, was accepted & merged.
gh-8341 (New PR) ENH: Cython optimize zeros api

What is the implication of the new PR? Are you now staying that vectorization is not the right approach and it should be replaced with a cython optimized API?

@mikofski
Copy link
Contributor Author

mikofski commented Sep 4, 2018

What is the implication of the new PR?

PR #8431 is not new, it was opened in February, around the same time as #8357 .

Are you now saying that vectorization is not the right approach and it should be replaced with a cython optimized API?

No, I still think that for newton, using NumPy was appropriate, because since it is in pure Python, it was amenable to vectorization. Also being a Python function, _array_newton is accessible to anyone without using Cython or compiling C source.

However, in pvlib-python for example brentq is also used, and currently this is not vectorized. For the remaining root-finding functions (brentq, brenth, ridder, bisect), I agree with the original reviewer that Cython optimize API is the right approach.

Also, the Cython optimize API serves other purposes in addition to just vectorizing the calculations. For example, using Cython, it is trivial to parallelize the calculation with low overhead using prange if the caller is pure C and no Python, such as the zeros_struct and zeros_array solvers. And using the Cython optimize API would allow using scipy.LowLevelCallable which I had hoped to work on next.

Are you against adding the Cython optimize API? Is anyone in favor of this? I worry that I have wasted my time, especially if there is need or desire for these features, other than my own.

@rgommers
Copy link
Member

rgommers commented Sep 4, 2018

Are you against adding the Cython optimize API? Is anyone in favor of this? I worry that I have wasted my time, especially if there is need or desire for these features, other than my own.

I'm not against. The existing cython APIs, especially the linalg one, have been very successful. However, we do need a clean API as well as make sure that this is not going to be a maintenance burden or block new features in the future. Currently the API isn't looking right, and even reviewing is going to be a lot of work. Hence the reluctance.

Would be good to get a sense of what others think of cython_optimize in principle (not the current API) - @person142, @pv, other optimize-interested devs?

@andyfaff
Copy link
Contributor

andyfaff commented Sep 4, 2018 via email

@mikofski
Copy link
Contributor Author

mikofski commented Sep 4, 2018

@andyfaff thanks for your feedback. I copied the integrate.quad signatures based on your suggestion to implement LowLevelCallable, specifically

double func(double x, void *user_data) # 1
double func(int n, double *xx) # 2

My goal as I mentioned in this comment was to use cython_optimize.zeros to dispatch the LowLevelCallable based on it's signature to either zeros_struct (1) or zeros_array (2) but also leave those exposed in the Cython API to use directly in a pure C no Python use case if nogil was desired. I was hoping to do this in two separate PR[s], complete Cython API first, then add dispatcher for LowLevelCallable or Python callable.

Also I tried to keep the solvers signatures identical to their Python counterparts, hence why func, x0, fprime, ... for newton, in which the function and it's derivatives are separated. The signatures for brentq, brenth, ridder, and bisect have the same signature as their Python versions. The only signatures that are slightly different are secant and halley which I've separated from newton as discussed in this comment

@ev-br
Copy link
Member

ev-br commented Sep 4, 2018

Would be good to get a sense of what others think of cython_optimize in principle (not the current API)

IMO it's very valuable.
There's a question of scope, of course: not sure about the whole optimize, but Cython root-finders, emphatically yes.

I would definitely use Cython brentq, the other scalar root-finders might be lower priority.

@pvanmulbregt
Copy link
Contributor

@mikofski Please take my questions as my attempt to understand what the issue is and what you are proposing, I'm not expressing an opinion. Procedurally, any discussion of this Issue and PR should be in the context of today's SciPy, which has two implementations of newton, rather than the SciPy of January which had one. The PR gh-8431 would then add a 3rd implementation.

What I'd like to see are the use cases, how they are handled in today's SciPy, what the issue is with using current implementations, how that might be changed, and then the benefit(s) of such change. That's why I was asking for a new Issue to be filed, to make all that explicit in today's context.

@andyfaff
Copy link
Contributor

andyfaff commented Sep 4, 2018 via email

@mikofski
Copy link
Contributor Author

mikofski commented Sep 5, 2018

PR #8431 does not attempt to modify any of the minimizers, sorry, only the scalar-function root finders. In addition, it does not currently address LowLevelCallable, although I would like to in a follow on PR.

The use cases for Cython optimize API are twofold:

  1. enable brentq and other scalar-function root finders to be used to solve the same callback with over 100,000 different sets of arguments by binding them to C using Cython - although newton and it's siblings were vectorized in ENH: vectorize scalar zero-search-functions #8357, the other root finders, such as brentq were not because they are written in C and not as easily vectorized, an alternative is to use Cython
  2. enable all of the scalar-function root finders to take advantage of benefits of a C-binding using Cython, such as:
    • ability to use LowLevelCallable which requires a C-binding for the purpose of using non-python callbacks
    • ability to use prange for embarrassingly parallel calculations in a loop with the same callback and over 100,000 different sets of arguments
    • ability to use non-Python memoryviews to access array elements so that the compiler can optimize loops and increase the performance of solving the same callback with over 100,000 different sets of extra arguments

For example, pvlib-python uses brentq to solve for the maximum power point in the single-diode model of a solar cell. In a typical simulation of a PV system, at 15 second intervals, there could be 1e6 daytime hours that would need a unique solution. Other efforts such as caching/memoizing and interpolating are possible, but the overhead may be greater than vectorizing, embarrassingly parallel loops, or compiler optimizations.

There may be other use cases, and a few individuals have already stated that it may be useful. Also based on the success of other Cython API[s] such as linalg and special then a Cython optimize API might also be

cc: @andyfaff and @pvanmulbregt

@mikofski
Copy link
Contributor Author

mikofski commented Sep 5, 2018

today's SciPy, which has two implementations of newton, rather than the SciPy of January which had one. The PR #8431 would then add a 3rd implementation.

@pvanmulbregt are you concerned that there are too many implementations of newton?

@pvanmulbregt
Copy link
Contributor

I'm concerned about the number of sub-case specializations of newton. Each new sub-case introduces additional maintenance and reduced usage (hence reduced reliability) in the existing implementations.

The title of the Issue "Cythonize scalar function root finders" is implementation-centric rather than user-centric. It's also a little ambiguous on one key point --- whether the cythonized root finders will be replacing the existing python/C root-finders. (The PR gh-8341 doesn't replace, it adds new ones handling a subset of functions.) The discussion may go in different directions based on this point,
so I'll ask it now.
Is it your intent that the Cython-ized versions replace the existing versions?

@mikofski
Copy link
Contributor Author

mikofski commented Sep 6, 2018

No, I do not intend or expect the Cython versions of any of the zero-finders to replace the existing Python API[s] because usage of the Cython versions requires a non-trivial effort by the user to write and compile Cython code, and additional software to be installed, such as the Cython package and a C-compiler, which are not required for users who only wish to use the SciPy Python API[s].

The usage of the Cython API is fundamentally different from the Python API:

Python API
>>> from math import exp
>>> from scipy.optimize import brentq

>>> brentq(lambda x, c0, c1: c0 - exp(x) / c1, a=0.5, b=1.0, args=(1.0, 2.0),
...        xtol=0.001, rtol=0.001, maxiter=10, full_output=True)
(0.68314716056,
       converged: True
            flag: 'converged'
  function_calls: 6
      iterations: 5
            root: 0.68314716056)

A similar calculation using the Cython API would probably not be worth the effort if there were not significant performance improvements.

Cython API
import cython
from libc.math cimport exp
from scipy.optimize.cython_optimize cimport zeros_struct

ctypedef struct test_full_output:
    int funcalls
    int iterations
    int error_num
    double root

ctypedef struct test_params:
    double C0
    double C1

@cython.cdivision(True)
cdef double f(double x, void *args):
    cdef test_params *myargs = <test_params *> args
    return myargs.C0 - exp(x) / myargs.C1

cdef test_full_output show_full_output(dict args, double xa, double xb,
                                       double xtol, double rtol, int mitr):
    cdef test_params myargs = args
    cdef test_full_output full_output
    full_output.root = zeros_struct.brentq(f, xa, xb, <test_params *> &myargs,
        xtol, rtol, mitr, <zeros_struct.scipy_zeros_parameters *> &full_output)
    return full_output

def test_cython_brentq(args={'C0'=1.0, 'C1': 0.2}, xa=0.5, xb=1.0,
                       xtol=1e-3, rtol=1e-3, mitr=10):
    """test Cython brentq with full output"""
    return show_full_output(args, xa, xb, xtol, rtol, mitr)

After compiling the Cython code, then in Python ...

>>> test_cython_brentq()
{'funcalls': 6,
 'iterations': 5,
 'error_num': 0,
 'root': 0.68314716056}

I'm also concerned about the duplicate implementations of Newton. One idea that I think could possibly combine the various Newton implementations would be to implement a Python extension module, based on the current Python extension module for brentq, ..., in zeros.c to replace the Python implementation and call the C source code for Newton, secant, and Halley, which IMHO would be a different PR.

@rgommers
Copy link
Member

rgommers commented Sep 6, 2018

No, I do not intend or expect the Cython versions of any of the zero-finders to replace the existing Python API[s]

@mikofski the question was not about the API; it's clear that any changes there need to be backwards compatible. It's about the implementations. I.e. will there be duplication of implementation between Python - Cython - C.

One idea that I think could possibly combine the various Newton implementations would be to implement a Python extension module, based on the current Python extension module for brentq, ..., in zeros.c to replace the Python implementation and call the C source code for Newton, secant, and Halley, which IMHO would be a different PR.

Yes that sounds better. If you have a Cython API for newton, then you'd want one of:

  • 1 implementation in C, and exposing that from Cython and Python
  • 1 implementation in Cython, and calling that from Python (preferred if current implementation is Python-only)

@mikofski
Copy link
Contributor Author

mikofski commented Sep 6, 2018

I.e. will there be duplication of implementation between Python - Cython - C.

Yes, unfortunately #8431 currently has duplicate implementations that serve different purposes:

  • a new C version that is only used for the Cython API, but could potentially be used for a Python API by either adding a Python extension module, or just exposing it via a wrapper around the Cython version, right? 2nd option is perhaps easier than the 1st?
  • the original Python version which is only used for the Python API - I believe it could potentially be used for the Cython API, but IMO that would defeat some of the purpose of creating a Cython API, b/c I think it would be slower than a typed version in C, and I think it would take a lot of work, and might make a lot of too big changes 😦

Yes that sounds better. If you have a Cython API for newton, then you'd want one of:

  • 1 implementation in C, and exposing that from Cython and Python

Based on my answer above, I think the first suggestion would be best, but I realize that there are many other opinions from others with much more experience and knowledge than myself, and that this would be a decision that the maintainers would make.

One nice outcome of having a single implementation in C, exposed from both Cython and Python, would be that we could probably remove the _array_newton Python version, but maintain the same API for sequences, although it would require some thought to implement well, I think it could be more efficient than the current implementation. And another nice outcome might be that these functions could take scipy.LowLevelCallable if implemented in C .

Another option to consider would be to NOT cythonize Newton, Halley, and secant, and to only cythonize Brent, Ridder, and bisect, since they are already implemented in C and exposed via Python. Therefore there would be no more implementations of Newton, Halley, or secant and we would delete the C-implementation in the PR.

@pvanmulbregt
Copy link
Contributor

Thanks for sharing your real-world use case of solving solar cell equations. It helped clarify for me some aspects of the computation costs. Some long thoughts follow.

My naive model of the computation is that newton takes approximate time
#iterations * (NewtonStepTime + FuncCallOverhead + TIME(FuncAndDerivative))
(Similarly for brentq, ridder, ...)

Computers are much faster now than they were 40 years ago :-) when brentq was developed, and the #iterations is on the order of 10-100 for well-behaved problems.  

  1. If a single root solving takes a "long time", then it must be consuming cycles in the FuncAndDerivative. The user could write the function in C or Cython or decorate a Python function with numba.jit but modifying newton won't have much overall impact.

When solving multiple equations, my (still naive) model is
#equations * (#iterations * (NewtonStepTime + FuncCallOverhead + TIME(FuncAndDerivative)))

  1. Solving a single function but from many starting points (or starting intervals).
    Your previous PR ENH: vectorize scalar zero-search-functions #8357 vectorized newton so that it could simultaneously find the zeros of a single function from many starting points. Essentially mitigating the impact of #equations by reducing the overhead of NewtonStepTime/FuncCallOverhead and possibly of FuncAndDerivative as well. The only change in the newton API was to accept a vector of x0 values.

  2. The solar-cell example is a case of a single, parameterized, function, and simultaneously solving for many (parameters=[v, il, io, rs, rsh, vt], starting_point=x0) pairs.
    This particular solar-cell function has one additional property --- the function only involves a few additions/multiplications and an exp. Rewriting it in C/Cython/numba.jit will make its cycles much smaller in comparison with the Python NewtonStepTime and the FuncCallOverhead. Replacing the Python newton infrastructure with Cython would reduce the NewtonStepTime and FuncCallOverhead, without any parallelization.
    [An alternative approach might be to vectorize this just as in ENH: vectorize scalar zero-search-functions #8357, both for x0 and the parameters, and let numpy handle the broadcasting.]

  3. The last case is solving many unrelated equations.
    No obvious parallelization gains.

[This does perhaps raise the question as to what will most benefit the user --- a faster implementation to solve one equation (helps cases 1-4 a little), or an implementation that solves multiple equations simultaneously (may help cases 2 & 3 a lot)? But I suspect the stopping criteria for the latter is not as simple as for a single equation.]

Having new said all that, my Ideal scenario would be one where there is a single implementation in SciPy for the algorithm, and the user sees benefit without having to change the way they build the function. I.e. They could write it in Python/Cython/..., the SciPy would be agnostic. And SciPy would have a single implementation to maintain.

@charris
Copy link
Member

charris commented Sep 7, 2018

I would definitely use Cython brentq, the other scalar root-finders might be lower priority.

If I were doing this today, I would only implement brentq and bisect. Back when I wrote these in (2002?), I just threw in everything, just because.

@ev-br
Copy link
Member

ev-br commented Sep 9, 2018

@charris (Having read Brett Cannon's blog post, I feel compelled to clarify, just in case.) My comment was only meant to imply that IMO it became clear over time that brentq turned out to be so robust a magic-black-box-well-oiled-machine of a routine, that it just shadows the other ones. IOW, thanks a lot for writing it!

@mikofski mikofski changed the title Cythonize scalar function root finders Collapse scalar and array newton functions together and eliminate dual codebase Oct 27, 2018
@pvanmulbregt
Copy link
Contributor

@mikofski The change of title suggests a quite different topic than the original. My suggestion would be to restore the old title and then create a new Issue to cover the new topic. If the original topic is no longer an Issue, also close it out. That way all the existing comments, which were presumably written with the original topic in mind and are mostly relevant to that topic, do not clog up the new Issue. Additionally a new Issue is likely to draw new thoughts and comments more than a title change in an existing Issue!

@mikofski mikofski changed the title Collapse scalar and array newton functions together and eliminate dual codebase Cythonize scalar function root finders Nov 7, 2018
@mikofski mikofski closed this as completed Nov 7, 2018
@rgommers
Copy link
Member

rgommers commented Nov 8, 2018

@mikofski did you mean to close this issue?

After reading through the whole thing again, my summary is:

It's still hard to weigh the pros and cons of each approach, given the multiple use cases. We should try and write a summary document together I think, because my head hurts ....

@mikofski
Copy link
Contributor Author

mikofski commented Nov 8, 2018

I closed it because I thought #7242 explained the need better, whereas this PR seems to justify the implementation. I thought closing this might be helpful, but happy to reopen if that's not the case.

@rgommers
Copy link
Member

Okay, let's leave this one closed then. There's still useful content here, I'll cross-link from gh-7242.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants