ENH: vectorize scalar zero-search-functions #8357

mikofski · 2018-02-02T08:32:08Z

add test for newton with arrays
import asarray and np.abs
check any of the derivatives for zero, cast fder to array
check all newton step deltas, use numpy abs to cast (p-p0) to arary

Signed-off-by: Mark Mikofski bwana.marko@yahoo.com

* add test for newton with arrays * import asarray and np.abs * check any of the derivatives for zero, cast fder to array * check all newton step deltas, use numpy abs to cast (p-p0) to arary Signed-off-by: Mark Mikofski <bwana.marko@yahoo.com>

nmayorov · 2018-02-02T09:07:31Z

Hi!

I suggest to change the title to explain what it is --- BUG, ENH, etc, just look at the recent pull request. And move the references to the issues into the description, so everyone could follow the links (they are not useful being in the title).

ilayn · 2018-02-02T12:41:33Z

I took the liberty to do that actually, also to GitHub-link the issue in question.

* make pep8 happy

* instead of just at testing, so that halley and secant are easier - remove asarray(fder == 0).any() when testing if any fder is zero, just use (fder == 0).any() since fder is now an ndarray by default - still need np_abs(p - p0) because there is no .abs() method for ndarrays * convert initial guess, p0 -> ndarray using asarray(1.0 * p0) * convert fder -> ndarray using asarray(fprime(*myargs)) * convert fval -> ndarray using asarray(func(*myargs)) * just compare max newton-step instead of all of them, eg: abs(p-p0).max() < tol

* use the non-variant form of halley's, eg: dx = - 1 / (fder/fval - fder2/fder/2 - fder(N+1)/fderN/N ...) * use boolean indexing for parabolic halley's variant, but comment out

* add test for arrays in newton for secant * remove sign import, but import where from numpy * replace 1e-4 with machine specific delta as finfo(float).eps**0.33 * convert x0 to ndarray using asarray * use where to make array of +dx where x0>=0 else -dx * combine if - else conditions into 1-line * convert q0 and q1 to ndarray using asarray * add comment "check for divide by zero" * use .any() on arrays to check for divide by zero * use np_abs for all newton steps and just check the .max() * convert q1 if re-evaluated to ndarray using asarray

mikofski · 2018-02-03T18:29:04Z

Hi @person142 , @ilayn , @rgommers , @nmayorov , @andyfaff , OK I think this is ready for review. Can you please review these changes or suggest reviewers and consider merging it into master? I look forward to your feedback. Thanks!

andyfaff · 2018-02-03T19:54:25Z

@mikofski thank you for your eagerness to contribute. For new features like this it's advisable to post on the scipy-dev mailing list to begin with, so there can be wider discussion about whether the new feature is desired, and possible design, before the development work is started.
For my part I think the functionality that's being added here is better suited to a short wrapper (e.g. map, multiprocessing.Pool.map) which will do any vectorisation required.

mikofski · 2018-02-03T23:29:42Z

Thanks @andyfaff. I've posted a comment with a link on the scipy-dev mailing list as you suggested. IMO multi-processing will add unnecessary overhead, but I'll do some benchmarking to demonstrate this. Also IMO, a much easier approach is to use numpy arrays. I only had to make few changes to use numpy arrays, and I also improved the Halley method addressing #5922 and removed several if-else statements, which IMO simplifies the codebase - the new code is 5 lines shorter. I've added new tests for arrays using all 3 methods (Newton, Halley, and secant) and all tests pass.

mikofski · 2018-02-04T05:38:30Z

@andyfaff here is the benchmark of mp versus numpy arrays:

TL;DR: comparison of multiprocessing vs numpy arrays

Using NumPy arrays is about 200x faster than using multiprocessing.

method	mp	numpy
newton	1 loop, best of 3: 2.66 s per loop	100 loops, best of 3: 13.9 ms per loop
halley	1 loop, best of 3: 4.25 s per loop	10 loops, best of 3: 19.3 ms per loop
secant	1 loop, best of 3: 2.15 s per loop	100 loops, best of 3: 11.1 ms per loop

Here are the code snippets:

I ran these from an interpreter and timed it using %timeit magic in IPython.

>>> from benchmarking_newton import bench_mp_newton, bench_mp_secant, bench_mp_halley
>>> %timeit bench_mp_newton()
# 1 loop, best of 3: 2.66 s per loop

>>> %timeit bench_mp_halley()
# 1 loop, best of 3: 4.25 s per loop

>>> %timeit bench_mp_secant()
# 1 loop, best of 3: 2.15 s per loop

Then I checked the output to make sure it was consistent. Then repeat with the other snippet.

Of course this is probably because I have a 2011 mac with an intel 2.3GHz i5 and 16GB of 1333MHz DDR3 ram, but I think that just underscores my reason to prefer numpy arrays over mp for this application.

* use assert_warns context to catch runtime warning * don't return anythin

* comparing a sequence with a scalar, is always false, so use p0 instead of x0, since p0 is an array, so can be compared * also can't multiply a sequence by a float, so use asarray(x0) * 1.0 to convert to floats * just p0 instead of x0 in secant * need to find just where the denominator is zero but the numerator is still non-zero, that's the only places that should issue a warning. * fix assert_warns, don't use context, remove test for secant, just check zero derivative *

mikofski · 2018-02-04T11:15:36Z

Hi @person142 , @ilayn , @rgommers , @nmayorov , @andyfaff ,
So sorry, I guess I jumped the gun a bit, but all tests really are passing now, and there were a few edge cases that I fixed as well. I really think this is in good shape now. I hope I made a strong argument in the benchmarks above for why I believe numpy arrays are better than mp for this particular use case. I would appreciate your comments, and I hope that you will consider merging this. I started a thread on the scipy-dev mailing list and although there hasn't been much chatter, my collaborators on pvlib-python, a solar industry - Sandia National Labs collective open source project, are very interested in seeing this implemented. There is also another issue, #7242 , with a similar request, and a bug #5922 re: the parabolic Halley's method, that I think this PR closes. Also this PR removes around 4 lines or about 2%. 😉
Best Regards,
Mark Mikofski

PS @bmeyers , @scopatz , @jakevdp , @katyhuff , @tkelman maybe could you take a look at this and comment? thx!

tkelman

I'm not too familiar with this code, but I guess vectorizing roughly works here if you're wanting to simultaneously solve many independent univariate problems. I'd personally throw a multivariate constrained solver at the problem (e.g. ipopt via pyomo) but that's my solution to everything.

tkelman · 2018-02-07T03:43:12Z

scipy/optimize/zeros.py

-            fder = fprime(*myargs)
-            if fder == 0:
+            fder = asarray(fprime(*myargs))  # convert to ndarray
+            if (fder == 0).any():


wouldn't you need to keep iterating the other elements?

You could, but IMO you really don't need to. I think it requires indexing which IMO makes the code a little messy. Do you feel very strongly that it should keep iterating the other cases? I'm open to changes.

you'd exit early upon hitting a problematic termination condition for any element, even if remaining elements are far from convergence. but I'd defer to a scipy maintainer's opinion here, since non-scalar usage would be new functionality (right? does the released code error on arrays?)

Yes. See the traceback in #8354.

tkelman · 2018-02-07T03:48:56Z

scipy/optimize/zeros.py

+                fder2 = asarray(fprime2(*myargs))  # convert to ndarray
+                # Halley's method
+                # https://en.wikipedia.org/wiki/Halley%27s_method
+                p = p0 - 2 * fval * fder / (2 * fder ** 2 - fval * fder2)


is removing the switch changing the scalar behavior here?

No, this switch is particular to the "parabolic Halley's method" a variant, see link in #5922. I removed it because branches require indexing which IMO makes the code a little messy. Using the traditional Halley's method, see Wikipedia link in code, removed the need for branching, and IMHO makes the code cleaner. Does this answer your question?

Gotcha. Guess I'd avoid coupling behavior changes for the scalar case to the feature addition of vectorization (but again my opinion here matters less than the maintainers'). The two changes could be done separately, one but not the other, or both, though vectorizing the modified version is simpler as you say.

Guess I'd avoid coupling behavior changes for the scalar case to the feature addition of vectorization

I agree 100% with this. Let's stick to one issue at a time. (The modification looks more prone to overflow because of the fder**2, so it's not clear the changes are a win.)

The current version (as of scipy-1.0.0) with the "parabolic" variant of Halley's already has fder**2 on L183, so if it is likely to overflow, then it would have already.

I agree it would be more explicit to future maintainers to understand if we separated the switch from the parabolic variant of Halley's to the more widely accepted form into a separate PR. Although it's a requirement for this PR, because the more common version of Halley's doesn't require any branching which I'm really hoping to avoid here. So should I open another PR to propose switching out the parabolic variant of Halley's?

tkelman · 2018-02-07T03:51:41Z

scipy/optimize/zeros.py

-                if p1 != p0:
-                    msg = "Tolerance of %s reached" % (p1 - p0)
+            divide_by_zero = (q1 == q0)
+            if divide_by_zero.any():


here as well, wouldn't it be better to continue for all other elements?

See my previous answer in the newton branch. My ethic was to make the smallest changes possible to add some new features. If users demand more features, then I think we should discuss the trade offs.

The halting conditions are the biggest thing to think about here. This way is simple, but I'm sure that someone is going to yell at us in the future about evaluating their callback function too many times. Might be worth seeing how messy the indexing would look.

mikofski · 2018-02-07T04:16:33Z

Thanks Tony! Great feedback! I had already thought of using hybrj or hybrd via scipy.optimize.root or fsolve but I didn't think of ipopt. The multivariate methods do work, using a sparse Jacobian and setting the derivatives as the diagonal, but surprisingly still not as fast as using numpy arrays. But I will give ipopt a try.

mikofski · 2018-02-07T04:49:07Z

PS @tkelman and others, this PR is in support of pvlib-python issue 408 in case you want to comment or contribute.

mikofski · 2018-02-07T04:54:57Z

@tkelman, see this comment for a comparison that includes multivariate methods root and fsolve.

mikofski · 2018-02-12T16:41:53Z

Hi @person142 ,

Are you the maintainer for scipy.optimize? I know you are very busy and volunteering your time. But may I please ask if there is any news on this pull request or the related issue #8354? I greatly appreciate your consideration, and I thank you for your time.

Best Regards

person142 · 2018-02-12T17:53:18Z

Are you the maintainer

No, we don't really have dedicated maintainers, just people who work more or less on various modules.

may I please ask if there is any news on this pull request or the related issue

I guess I already laid out my personal preferences, but it seems there's a demand here and I don't want to stand in the way of that. I can take a look at the code, though I would note that there are other people around more qualified to review optimize than me.

mikofski · 2018-02-12T23:18:06Z

Thank you very much for your response @person142. I really appreciate your time, and I look forward to your review. If there's anyone else who you think should review this PR, please let me know.

I am also working on the cython_optimize branch that you suggested, and I'll try to get a PR up asap. In fact @wholmgren already started work on it, and I'm merging his work into my branch. In general, I agree with your concerns, although specifically for newton I think that numpy arrays are the efficient way to go for a couple reasons:

numpy arrays, I believe, optimize the use of cpu cache and simd registers, because they perform one operation on the entire array at a time, versus all of the operations on one item at a time.
newton is fairly simple with almost no branching conditions, other than to decide which method to use: newton, halley, or secant, and those conditions do not depend on individual array elements, so IMO they are not complex to vectorize or confusing to read. So in other words, I think newton is very similar to your np.sin(x) + np.cos(x) example of great code to vectorize.

Thanks again!

mikofski · 2018-06-17T09:49:20Z

@rgommers I've merged the fix for #8904 (check if fval is zero before fder). Do you think this PR can be merged now? Thanks!

rgommers · 2018-06-17T19:19:53Z

Thanks @mikofski

If "vector newton" goes as its own specialized piece of code, that is one thing. If it goes in and then "vector newton" and "scalar newton" have to be kept in sync, ...

@pvanmulbregt IIRC there was some more conversation after your last comment above in this PR discussion. Can't find it so quickly - where are we now on this?

@jaimefrio

@jaimefrio's comments seem to have been addressed long ago

pvanmulbregt · 2018-06-18T01:18:02Z

If "vector newton" goes as its own specialized piece of code, that is one thing. If it goes in and then "vector newton" and "scalar newton" have to be kept in sync,

@rgommers I haven't seen any response to my comments.

mikofski · 2018-06-18T03:56:15Z

If it goes in and then "vector newton" and "scalar newton" have to be kept in sync, that’s a different situation. In particular, if future changes can’t be made to "scalar newton" because they don’t fit within the "vector newton" paradigm, than that’s a problem for me.

I'm OK with letting the scalar and array versions diverge. I'm sorry if I implied otherwise. IMO it's up to the SciPy Dev community to decide the future.

@pvanmulbregt are you OK with merging this PR now, or are there more changes you want to discuss?

mikofski · 2018-06-22T17:35:27Z

Hi @rgommers if there are no further comments, do you think this PR be merged now? Thanks!

rgommers · 2018-06-24T02:22:22Z

Hi @mikofski, one more request: please add a clear example in the docstring. I did some testing, and it seems to me that your intended use-case is quite specific here, namely many starting points that are close together. If they're far apart, my first test sees 5x slowdown over a naive loop:

In [12]: def loop(x0):
    ...:     res = []
    ...:     for x00 in x0:
    ...:         res.append(optimize.newton(f, x00, fprime=lambda x: 3 * x**2))
    ...:     return res
    ...: 

In [16]: x0
Out[16]: [-10, -0.1, 0, 1.5, 2]

In [17]: loop(x0)
/Users/rgommers/Code/scipy/scipy/optimize/zeros.py:218: RuntimeWarning: derivative was zero.
  warnings.warn(msg, RuntimeWarning)
Out[17]: [1.0, 1.0000000000000002, 0.0, 1.0, 1.0000000000000002]

In [18]: %timeit loop(x0)
/Users/rgommers/Code/scipy/scipy/optimize/zeros.py:218: RuntimeWarning: derivative was zero.
  warnings.warn(msg, RuntimeWarning)
37 µs ± 123 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [19]: %timeit optimize.newton(f, x0, fprime=lambda x: 3 * x**2)
/Users/rgommers/Code/scipy/scipy/optimize/zeros.py:342: RuntimeWarning: some derivatives were zero
  warnings.warn(msg, RuntimeWarning)
232 µs ± 4.24 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [21]: x1 = np.random.randn(100)

In [22]: x1
Out[22]: 
array([ 1.41609833,  1.04025197,  1.66960851,  0.46059693,  1.48430028,
        0.94540877, -0.27935118, -0.74926466, -0.13886066,  0.32667485,
       -1.68669016,  1.0034203 , -1.54834386,  0.58685757, -0.60555813,
       -0.54121248, -0.12990681,  0.08451024,  1.19283028,  1.54771281,
        0.98637133, -1.1708113 ,  0.61718709,  0.99237219, -0.67036913,
        1.00300018, -2.20807856, -0.04758557,  0.18629305, -1.270167  ,
        0.37242728, -1.00057862,  1.47378774,  0.85928708, -0.15305182,
        0.69245111,  0.2097883 ,  0.62229393,  0.1167718 , -0.24468124,
       -0.58607686,  1.46611212, -0.61677397,  0.69270566,  0.53881705,
       -0.63744367,  0.60740258, -0.3928028 ,  0.94028144, -0.10888972,
        1.1526264 ,  1.81230203,  0.33019543,  0.1271731 , -0.23379744,
       -1.04741483, -0.20111341,  0.34254768, -1.06763832, -1.19269645,
       -0.84060414,  0.18237724, -1.30099065, -0.83624874,  0.3568043 ,
        0.52238604, -0.75529355,  1.02751746,  0.35498641,  1.28244214,
        0.54933607, -0.35843335, -0.16835571, -1.19527499, -1.52479884,
        0.71891142, -0.09804082, -1.78604302, -1.26407257,  0.94010916,
       -0.48064219,  0.50223763,  0.00371838, -1.18707256, -1.88412904,
       -1.02572829, -0.17449417,  1.99486829, -1.2262213 ,  1.19163304,
       -0.62562627,  0.08796651,  0.54374973, -0.58012312,  0.60174995,
       -0.44156488,  1.87408098,  0.07009939, -1.08970241,  0.94315773])

In [23]: %timeit optimize.newton(f, x1, fprime=lambda x: 3 * x**2)
28.6 µs ± 118 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [24]: %timeit loop(x1)
286 µs ± 1.86 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Did I understand your intention with this code? Either way, a clear example would help.

rgommers · 2018-06-24T02:28:24Z

I did go through the whole conversation here again. Overall there does seem to be a need, although there are also some reservations. It looks like a hesitant +0.5 on balance.

I'm OK with letting the scalar and array versions diverge. I'm sorry if I implied otherwise. IMO it's up to the SciPy Dev community to decide the future.

It looks like given this API, users would expect the same behavior for scalar as for sequence input, also for future extensions. However, it always seems possible to support sequence input in a simple for-loop in case of future extensions that aren't easily vectorizable. That would then just not be as fast as expected, but it would work.

mikofski · 2018-06-24T05:35:03Z

Hi @rgommers, I added a note to indicate the use cases that might benefit most from the vectorized form, and an example of a vectorized function.

RE: the intention of this code, it was prompted by work on solar energy modeling package called PVLib-Python in which I needed to solve a very large (>1e5) set of implicit equations, which have the same form, but different coefficients.

Thanks very much for your input, I am glad you think it will be a positive contribution, and I hope that others do as well. I will strive to keep it up to date with the scalar branch when appropriate and in collaboration with the SciPy Dev community.

rgommers · 2018-06-25T01:22:11Z

Okay, in it goes. Thanks @mikofski for this feature and the persistence, and everyone else for review/discussion.

rgommers · 2018-06-25T01:23:30Z

I'll send a follow-up with some minor tweaks, it was just time to get this merged.

Follow up of scipygh-8357.

Added in scipygh-8357, seems unnecessary.

rgommers · 2018-06-25T06:50:14Z

follow up in gh-8969. most importantly, removes the unnecessary converged keyword.

Follow up of scipygh-8357.

Added in scipygh-8357, seems unnecessary.

jason-s · 2019-01-29T19:46:17Z

Sorry I didn't have a chance to review the PR when it first came out. I support this change + think it's useful... for supporting applications that can benefit from Newton's method and related root finders.

It is not a solution for addressing #7242 and I would ask that you edit the issue description accordingly. I've edited the title of #7242 to clarify that I meant a robust method such as Brent's or Chandrupatla's.

mikofski · 2019-01-29T20:23:02Z

@jason-s thanks. I removed "addresses 7242" from the description to clarify that this only addresses Newton and not bounded root finders.

fixes scipy#8354 and addresses scipy#7242

c7c5cb8

* add test for newton with arrays * import asarray and np.abs * check any of the derivatives for zero, cast fder to array * check all newton step deltas, use numpy abs to cast (p-p0) to arary Signed-off-by: Mark Mikofski <bwana.marko@yahoo.com>

mikofski mentioned this pull request Feb 2, 2018

ENH: "vectorize" newton and other scalar methods #8354

Closed

ilayn changed the title ~~fixes #8354 and addresses #7242 - vectorize scalar zero-search-functions~~ WIP: BUG: ENH: vectorize scalar zero-search-functions Feb 2, 2018

mikofski added 4 commits February 2, 2018 11:58

remove extra whitespace

98672d0

* make pep8 happy

add test for halley's with arrays

f24a559

* use the non-variant form of halley's, eg: dx = - 1 / (fder/fval - fder2/fder/2 - fder(N+1)/fderN/N ...) * use boolean indexing for parabolic halley's variant, but comment out

remove commented code for legacy halley's parabolic variant

70edc74

mikofski mentioned this pull request Feb 3, 2018

Suboptimal convergence of Halley's method? #5922

Closed

mikofski changed the title ~~WIP: BUG: ENH: vectorize scalar zero-search-functions~~ BUG: ENH: vectorize scalar zero-search-functions Feb 3, 2018

mikofski mentioned this pull request Feb 3, 2018

ENH: add methods and tests for a explicit IV curve calculation of single-diode model pvlib/pvlib-python#409

Merged

6 tasks

mikofski added 4 commits February 4, 2018 00:45

use secant test case without runtime warning

44bf2eb

* use assert_warns context to catch runtime warning * don't return anythin

don't test sqrt(16) with x0=4

fcca324

fix sqrt(15) and sqrt(16) values in assertion

5f181cb

tkelman reviewed Feb 7, 2018

View reviewed changes

mikofski mentioned this pull request Feb 8, 2018

ENH: implement at least one good robust scalar root-finding algorithm (Brent's? Chandrupatla's?) in a vectorized approach #7242

Open

mikofski added 3 commits June 17, 2018 01:29

TST: fix check zero derivatives in array newton

0cfcd3d

TST: fix gh8904 test for array neweton

45e7b6a

ENH: initialize nz_der as True, if fval is zero set failures to False

1f2748c

DOC: add example of good usage case for vectorized

c05a476

rgommers merged commit e1d50a7 into scipy:master Jun 25, 2018

rgommers changed the title ~~BUG: ENH: vectorize scalar zero-search-functions~~ ENH: vectorize scalar zero-search-functions Jun 25, 2018

rgommers added a commit to rgommers/scipy that referenced this pull request Jun 25, 2018

MAINT: doc enhancements and style fixes for optimize.newton.

747ee93

Follow up of scipygh-8357.

rgommers added a commit to rgommers/scipy that referenced this pull request Jun 25, 2018

MAINT: remove the converged keyword to newton that was just added.

b968f33

Added in scipygh-8357, seems unnecessary.

rgommers mentioned this pull request Jun 25, 2018

MAINT: doc and code fixes for optimize.newton #8969

Merged

mikofski deleted the vectorize_newton branch June 25, 2018 17:20

rgommers added a commit to rgommers/scipy that referenced this pull request Jun 28, 2018

MAINT: doc enhancements and style fixes for optimize.newton.

998a458

Follow up of scipygh-8357.

rgommers added a commit to rgommers/scipy that referenced this pull request Jun 28, 2018

MAINT: remove the converged keyword to newton that was just added.

7173542

Added in scipygh-8357, seems unnecessary.

pvanmulbregt mentioned this pull request Aug 26, 2018

ENH: Cython optimize zeros api #8431

Merged

4 tasks

mikofski mentioned this pull request Sep 3, 2018

Cythonize scalar function root finders #9216

Closed

mikofski mentioned this pull request Nov 9, 2018

DOC: update 1.2.0 release notes #9461

Merged

rgommers mentioned this pull request Mar 30, 2019

BUG: Avoid inplace modification of input array in newton #9991

Merged

ENH: vectorize scalar zero-search-functions #8357

ENH: vectorize scalar zero-search-functions #8357

Conversation

mikofski commented Feb 2, 2018 • edited

nmayorov commented Feb 2, 2018

ilayn commented Feb 2, 2018

mikofski commented Feb 3, 2018

andyfaff commented Feb 3, 2018

mikofski commented Feb 3, 2018 • edited

mikofski commented Feb 4, 2018 • edited

TL;DR: comparison of multiprocessing vs numpy arrays

mikofski commented Feb 4, 2018

tkelman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikofski commented Feb 7, 2018

mikofski commented Feb 7, 2018

mikofski commented Feb 7, 2018

mikofski commented Feb 12, 2018

person142 commented Feb 12, 2018

mikofski commented Feb 12, 2018

mikofski commented Jun 17, 2018

rgommers commented Jun 17, 2018

pvanmulbregt commented Jun 18, 2018

mikofski commented Jun 18, 2018

mikofski commented Jun 22, 2018

rgommers commented Jun 24, 2018

rgommers commented Jun 24, 2018 • edited

mikofski commented Jun 24, 2018

rgommers commented Jun 25, 2018

rgommers commented Jun 25, 2018

rgommers commented Jun 25, 2018

jason-s commented Jan 29, 2019

mikofski commented Jan 29, 2019

mikofski commented Feb 2, 2018 •

edited

mikofski commented Feb 3, 2018 •

edited

mikofski commented Feb 4, 2018 •

edited

rgommers commented Jun 24, 2018 •

edited