Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: vectorize scalar zero-search-functions #8357

Merged
merged 57 commits into from Jun 25, 2018

Conversation

mikofski
Copy link
Contributor

@mikofski mikofski commented Feb 2, 2018

Fixes #8354

  • add test for newton with arrays
  • import asarray and np.abs
  • check any of the derivatives for zero, cast fder to array
  • check all newton step deltas, use numpy abs to cast (p-p0) to arary

Signed-off-by: Mark Mikofski bwana.marko@yahoo.com

* add test for newton with arrays
* import asarray and np.abs
* check any of the derivatives for zero, cast fder to array
* check all newton step deltas, use numpy abs to cast (p-p0) to arary

Signed-off-by: Mark Mikofski <bwana.marko@yahoo.com>
@nmayorov
Copy link
Contributor

nmayorov commented Feb 2, 2018

Hi!

I suggest to change the title to explain what it is --- BUG, ENH, etc, just look at the recent pull request. And move the references to the issues into the description, so everyone could follow the links (they are not useful being in the title).

@ilayn ilayn changed the title fixes #8354 and addresses #7242 - vectorize scalar zero-search-functions WIP: BUG: ENH: vectorize scalar zero-search-functions Feb 2, 2018
@ilayn
Copy link
Member

ilayn commented Feb 2, 2018

I took the liberty to do that actually, also to GitHub-link the issue in question.

* make pep8 happy
* instead of just at testing, so that halley and secant are easier
    - remove asarray(fder == 0).any() when testing if any fder is zero,
      just use (fder == 0).any() since fder is now an ndarray by default
    - still need np_abs(p - p0) because there is no .abs() method for
      ndarrays
* convert initial guess, p0 -> ndarray using asarray(1.0 * p0)
* convert fder -> ndarray using asarray(fprime(*myargs))
* convert fval -> ndarray using asarray(func(*myargs))
* just compare max newton-step instead of all of them,
  eg: abs(p-p0).max() < tol
* use the non-variant form of halley's,
  eg: dx = - 1 / (fder/fval - fder2/fder/2 - fder(N+1)/fderN/N ...)
* use boolean indexing for parabolic halley's variant, but comment out
* add test for arrays in newton for secant
* remove sign import, but import where from numpy
* replace 1e-4 with machine specific delta as finfo(float).eps**0.33
* convert x0 to ndarray using asarray
* use where to make array of +dx where x0>=0 else -dx
* combine if - else conditions into 1-line
* convert q0 and q1 to ndarray using asarray
* add comment "check for divide by zero"
* use .any() on arrays to check for divide by zero
* use np_abs for all newton steps and just check the .max()
* convert q1 if re-evaluated to ndarray using asarray
@mikofski mikofski changed the title WIP: BUG: ENH: vectorize scalar zero-search-functions BUG: ENH: vectorize scalar zero-search-functions Feb 3, 2018
@mikofski
Copy link
Contributor Author

mikofski commented Feb 3, 2018

Hi @person142 , @ilayn , @rgommers , @nmayorov , @andyfaff , OK I think this is ready for review. Can you please review these changes or suggest reviewers and consider merging it into master? I look forward to your feedback. Thanks!

@andyfaff
Copy link
Contributor

andyfaff commented Feb 3, 2018

@mikofski thank you for your eagerness to contribute. For new features like this it's advisable to post on the scipy-dev mailing list to begin with, so there can be wider discussion about whether the new feature is desired, and possible design, before the development work is started.
For my part I think the functionality that's being added here is better suited to a short wrapper (e.g. map, multiprocessing.Pool.map) which will do any vectorisation required.

@mikofski
Copy link
Contributor Author

mikofski commented Feb 3, 2018

Thanks @andyfaff. I've posted a comment with a link on the scipy-dev mailing list as you suggested. IMO multi-processing will add unnecessary overhead, but I'll do some benchmarking to demonstrate this. Also IMO, a much easier approach is to use numpy arrays. I only had to make few changes to use numpy arrays, and I also improved the Halley method addressing #5922 and removed several if-else statements, which IMO simplifies the codebase - the new code is 5 lines shorter. I've added new tests for arrays using all 3 methods (Newton, Halley, and secant) and all tests pass.

@mikofski
Copy link
Contributor Author

mikofski commented Feb 4, 2018

@andyfaff here is the benchmark of mp versus numpy arrays:

TL;DR: comparison of multiprocessing vs numpy arrays

Using NumPy arrays is about 200x faster than using multiprocessing.

method mp numpy
newton 1 loop, best of 3: 2.66 s per loop 100 loops, best of 3: 13.9 ms per loop
halley 1 loop, best of 3: 4.25 s per loop 10 loops, best of 3: 19.3 ms per loop
secant 1 loop, best of 3: 2.15 s per loop 100 loops, best of 3: 11.1 ms per loop

Here are the code snippets:

I ran these from an interpreter and timed it using %timeit magic in IPython.

>>> from benchmarking_newton import bench_mp_newton, bench_mp_secant, bench_mp_halley
>>> %timeit bench_mp_newton()
# 1 loop, best of 3: 2.66 s per loop

>>> %timeit bench_mp_halley()
# 1 loop, best of 3: 4.25 s per loop

>>> %timeit bench_mp_secant()
# 1 loop, best of 3: 2.15 s per loop

Then I checked the output to make sure it was consistent. Then repeat with the other snippet.

Of course this is probably because I have a 2011 mac with an intel 2.3GHz i5 and 16GB of 1333MHz DDR3 ram, but I think that just underscores my reason to prefer numpy arrays over mp for this application.

* use assert_warns context to catch runtime warning
* don't return anythin
* comparing a sequence with a scalar, is always false, so use p0
  instead of x0, since p0 is an array, so can be compared
* also can't multiply a sequence by a float, so use asarray(x0) * 1.0 to
  convert to floats
* just p0 instead of x0 in secant
* need to find just where the denominator is zero but the numerator is
  still non-zero, that's the only places that should issue a warning.
* fix assert_warns, don't use context, remove test for secant, just
  check zero derivative
*
@mikofski
Copy link
Contributor Author

mikofski commented Feb 4, 2018

Hi @person142 , @ilayn , @rgommers , @nmayorov , @andyfaff ,
So sorry, I guess I jumped the gun a bit, but all tests really are passing now, and there were a few edge cases that I fixed as well. I really think this is in good shape now. I hope I made a strong argument in the benchmarks above for why I believe numpy arrays are better than mp for this particular use case. I would appreciate your comments, and I hope that you will consider merging this. I started a thread on the scipy-dev mailing list and although there hasn't been much chatter, my collaborators on pvlib-python, a solar industry - Sandia National Labs collective open source project, are very interested in seeing this implemented. There is also another issue, #7242 , with a similar request, and a bug #5922 re: the parabolic Halley's method, that I think this PR closes. Also this PR removes around 4 lines or about 2%. 😉
Best Regards,
Mark Mikofski

PS @bmeyers , @scopatz , @jakevdp , @katyhuff , @tkelman maybe could you take a look at this and comment? thx!

Copy link

@tkelman tkelman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too familiar with this code, but I guess vectorizing roughly works here if you're wanting to simultaneously solve many independent univariate problems. I'd personally throw a multivariate constrained solver at the problem (e.g. ipopt via pyomo) but that's my solution to everything.

fder = fprime(*myargs)
if fder == 0:
fder = asarray(fprime(*myargs)) # convert to ndarray
if (fder == 0).any():
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't you need to keep iterating the other elements?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could, but IMO you really don't need to. I think it requires indexing which IMO makes the code a little messy. Do you feel very strongly that it should keep iterating the other cases? I'm open to changes.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you'd exit early upon hitting a problematic termination condition for any element, even if remaining elements are far from convergence. but I'd defer to a scipy maintainer's opinion here, since non-scalar usage would be new functionality (right? does the released code error on arrays?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. See the traceback in #8354.

fder2 = asarray(fprime2(*myargs)) # convert to ndarray
# Halley's method
# https://en.wikipedia.org/wiki/Halley%27s_method
p = p0 - 2 * fval * fder / (2 * fder ** 2 - fval * fder2)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is removing the switch changing the scalar behavior here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this switch is particular to the "parabolic Halley's method" a variant, see link in #5922. I removed it because branches require indexing which IMO makes the code a little messy. Using the traditional Halley's method, see Wikipedia link in code, removed the need for branching, and IMHO makes the code cleaner. Does this answer your question?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Guess I'd avoid coupling behavior changes for the scalar case to the feature addition of vectorization (but again my opinion here matters less than the maintainers'). The two changes could be done separately, one but not the other, or both, though vectorizing the modified version is simpler as you say.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess I'd avoid coupling behavior changes for the scalar case to the feature addition of vectorization

I agree 100% with this. Let's stick to one issue at a time. (The modification looks more prone to overflow because of the fder**2, so it's not clear the changes are a win.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current version (as of scipy-1.0.0) with the "parabolic" variant of Halley's already has fder**2 on L183, so if it is likely to overflow, then it would have already.

I agree it would be more explicit to future maintainers to understand if we separated the switch from the parabolic variant of Halley's to the more widely accepted form into a separate PR. Although it's a requirement for this PR, because the more common version of Halley's doesn't require any branching which I'm really hoping to avoid here. So should I open another PR to propose switching out the parabolic variant of Halley's?

if p1 != p0:
msg = "Tolerance of %s reached" % (p1 - p0)
divide_by_zero = (q1 == q0)
if divide_by_zero.any():
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here as well, wouldn't it be better to continue for all other elements?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my previous answer in the newton branch. My ethic was to make the smallest changes possible to add some new features. If users demand more features, then I think we should discuss the trade offs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The halting conditions are the biggest thing to think about here. This way is simple, but I'm sure that someone is going to yell at us in the future about evaluating their callback function too many times. Might be worth seeing how messy the indexing would look.

@mikofski
Copy link
Contributor Author

mikofski commented Feb 7, 2018

Thanks Tony! Great feedback! I had already thought of using hybrj or hybrd via scipy.optimize.root or fsolve but I didn't think of ipopt. The multivariate methods do work, using a sparse Jacobian and setting the derivatives as the diagonal, but surprisingly still not as fast as using numpy arrays. But I will give ipopt a try.

@mikofski
Copy link
Contributor Author

mikofski commented Feb 7, 2018

PS @tkelman and others, this PR is in support of pvlib-python issue 408 in case you want to comment or contribute.

@mikofski
Copy link
Contributor Author

mikofski commented Feb 7, 2018

@tkelman, see this comment for a comparison that includes multivariate methods root and fsolve.

@mikofski
Copy link
Contributor Author

Hi @person142 ,

Are you the maintainer for scipy.optimize? I know you are very busy and volunteering your time. But may I please ask if there is any news on this pull request or the related issue #8354? I greatly appreciate your consideration, and I thank you for your time.

Best Regards

@person142
Copy link
Member

Are you the maintainer

No, we don't really have dedicated maintainers, just people who work more or less on various modules.

may I please ask if there is any news on this pull request or the related issue

I guess I already laid out my personal preferences, but it seems there's a demand here and I don't want to stand in the way of that. I can take a look at the code, though I would note that there are other people around more qualified to review optimize than me.

@mikofski
Copy link
Contributor Author

Thank you very much for your response @person142. I really appreciate your time, and I look forward to your review. If there's anyone else who you think should review this PR, please let me know.

I am also working on the cython_optimize branch that you suggested, and I'll try to get a PR up asap. In fact @wholmgren already started work on it, and I'm merging his work into my branch. In general, I agree with your concerns, although specifically for newton I think that numpy arrays are the efficient way to go for a couple reasons:

  1. numpy arrays, I believe, optimize the use of cpu cache and simd registers, because they perform one operation on the entire array at a time, versus all of the operations on one item at a time.
  2. newton is fairly simple with almost no branching conditions, other than to decide which method to use: newton, halley, or secant, and those conditions do not depend on individual array elements, so IMO they are not complex to vectorize or confusing to read. So in other words, I think newton is very similar to your np.sin(x) + np.cos(x) example of great code to vectorize.

Thanks again!

@mikofski
Copy link
Contributor Author

@rgommers I've merged the fix for #8904 (check if fval is zero before fder). Do you think this PR can be merged now? Thanks!

@rgommers
Copy link
Member

Thanks @mikofski

If "vector newton" goes as its own specialized piece of code, that is one thing. If it goes in and then "vector newton" and "scalar newton" have to be kept in sync, ...

@pvanmulbregt IIRC there was some more conversation after your last comment above in this PR discussion. Can't find it so quickly - where are we now on this?

@rgommers rgommers dismissed jaimefrio’s stale review June 17, 2018 19:20

@jaimefrio's comments seem to have been addressed long ago

@pvanmulbregt
Copy link
Contributor

If "vector newton" goes as its own specialized piece of code, that is one thing. If it goes in and then "vector newton" and "scalar newton" have to be kept in sync,

@rgommers I haven't seen any response to my comments.

@mikofski
Copy link
Contributor Author

If it goes in and then "vector newton" and "scalar newton" have to be kept in sync, that’s a different situation. In particular, if future changes can’t be made to "scalar newton" because they don’t fit within the "vector newton" paradigm, than that’s a problem for me.

I'm OK with letting the scalar and array versions diverge. I'm sorry if I implied otherwise. IMO it's up to the SciPy Dev community to decide the future.

@pvanmulbregt are you OK with merging this PR now, or are there more changes you want to discuss?

@mikofski
Copy link
Contributor Author

Hi @rgommers if there are no further comments, do you think this PR be merged now? Thanks!

@rgommers
Copy link
Member

Hi @mikofski, one more request: please add a clear example in the docstring. I did some testing, and it seems to me that your intended use-case is quite specific here, namely many starting points that are close together. If they're far apart, my first test sees 5x slowdown over a naive loop:

In [12]: def loop(x0):
    ...:     res = []
    ...:     for x00 in x0:
    ...:         res.append(optimize.newton(f, x00, fprime=lambda x: 3 * x**2))
    ...:     return res
    ...: 

In [16]: x0
Out[16]: [-10, -0.1, 0, 1.5, 2]

In [17]: loop(x0)
/Users/rgommers/Code/scipy/scipy/optimize/zeros.py:218: RuntimeWarning: derivative was zero.
  warnings.warn(msg, RuntimeWarning)
Out[17]: [1.0, 1.0000000000000002, 0.0, 1.0, 1.0000000000000002]

In [18]: %timeit loop(x0)
/Users/rgommers/Code/scipy/scipy/optimize/zeros.py:218: RuntimeWarning: derivative was zero.
  warnings.warn(msg, RuntimeWarning)
37 µs ± 123 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [19]: %timeit optimize.newton(f, x0, fprime=lambda x: 3 * x**2)
/Users/rgommers/Code/scipy/scipy/optimize/zeros.py:342: RuntimeWarning: some derivatives were zero
  warnings.warn(msg, RuntimeWarning)
232 µs ± 4.24 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [21]: x1 = np.random.randn(100)

In [22]: x1
Out[22]: 
array([ 1.41609833,  1.04025197,  1.66960851,  0.46059693,  1.48430028,
        0.94540877, -0.27935118, -0.74926466, -0.13886066,  0.32667485,
       -1.68669016,  1.0034203 , -1.54834386,  0.58685757, -0.60555813,
       -0.54121248, -0.12990681,  0.08451024,  1.19283028,  1.54771281,
        0.98637133, -1.1708113 ,  0.61718709,  0.99237219, -0.67036913,
        1.00300018, -2.20807856, -0.04758557,  0.18629305, -1.270167  ,
        0.37242728, -1.00057862,  1.47378774,  0.85928708, -0.15305182,
        0.69245111,  0.2097883 ,  0.62229393,  0.1167718 , -0.24468124,
       -0.58607686,  1.46611212, -0.61677397,  0.69270566,  0.53881705,
       -0.63744367,  0.60740258, -0.3928028 ,  0.94028144, -0.10888972,
        1.1526264 ,  1.81230203,  0.33019543,  0.1271731 , -0.23379744,
       -1.04741483, -0.20111341,  0.34254768, -1.06763832, -1.19269645,
       -0.84060414,  0.18237724, -1.30099065, -0.83624874,  0.3568043 ,
        0.52238604, -0.75529355,  1.02751746,  0.35498641,  1.28244214,
        0.54933607, -0.35843335, -0.16835571, -1.19527499, -1.52479884,
        0.71891142, -0.09804082, -1.78604302, -1.26407257,  0.94010916,
       -0.48064219,  0.50223763,  0.00371838, -1.18707256, -1.88412904,
       -1.02572829, -0.17449417,  1.99486829, -1.2262213 ,  1.19163304,
       -0.62562627,  0.08796651,  0.54374973, -0.58012312,  0.60174995,
       -0.44156488,  1.87408098,  0.07009939, -1.08970241,  0.94315773])

In [23]: %timeit optimize.newton(f, x1, fprime=lambda x: 3 * x**2)
28.6 µs ± 118 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [24]: %timeit loop(x1)
286 µs ± 1.86 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Did I understand your intention with this code? Either way, a clear example would help.

@rgommers
Copy link
Member

rgommers commented Jun 24, 2018

I did go through the whole conversation here again. Overall there does seem to be a need, although there are also some reservations. It looks like a hesitant +0.5 on balance.

I'm OK with letting the scalar and array versions diverge. I'm sorry if I implied otherwise. IMO it's up to the SciPy Dev community to decide the future.

It looks like given this API, users would expect the same behavior for scalar as for sequence input, also for future extensions. However, it always seems possible to support sequence input in a simple for-loop in case of future extensions that aren't easily vectorizable. That would then just not be as fast as expected, but it would work.

@mikofski
Copy link
Contributor Author

Hi @rgommers, I added a note to indicate the use cases that might benefit most from the vectorized form, and an example of a vectorized function.

RE: the intention of this code, it was prompted by work on solar energy modeling package called PVLib-Python in which I needed to solve a very large (>1e5) set of implicit equations, which have the same form, but different coefficients.

Thanks very much for your input, I am glad you think it will be a positive contribution, and I hope that others do as well. I will strive to keep it up to date with the scalar branch when appropriate and in collaboration with the SciPy Dev community.

@rgommers rgommers merged commit e1d50a7 into scipy:master Jun 25, 2018
@rgommers
Copy link
Member

Okay, in it goes. Thanks @mikofski for this feature and the persistence, and everyone else for review/discussion.

@rgommers
Copy link
Member

I'll send a follow-up with some minor tweaks, it was just time to get this merged.

@rgommers rgommers changed the title BUG: ENH: vectorize scalar zero-search-functions ENH: vectorize scalar zero-search-functions Jun 25, 2018
rgommers added a commit to rgommers/scipy that referenced this pull request Jun 25, 2018
rgommers added a commit to rgommers/scipy that referenced this pull request Jun 25, 2018
@rgommers
Copy link
Member

follow up in gh-8969. most importantly, removes the unnecessary converged keyword.

@mikofski mikofski deleted the vectorize_newton branch June 25, 2018 17:20
rgommers added a commit to rgommers/scipy that referenced this pull request Jun 28, 2018
rgommers added a commit to rgommers/scipy that referenced this pull request Jun 28, 2018
@pvanmulbregt pvanmulbregt mentioned this pull request Aug 26, 2018
4 tasks
@jason-s
Copy link

jason-s commented Jan 29, 2019

Sorry I didn't have a chance to review the PR when it first came out. I support this change + think it's useful... for supporting applications that can benefit from Newton's method and related root finders.

It is not a solution for addressing #7242 and I would ask that you edit the issue description accordingly. I've edited the title of #7242 to clarify that I meant a robust method such as Brent's or Chandrupatla's.

@mikofski
Copy link
Contributor Author

@jason-s thanks. I removed "addresses 7242" from the description to clarify that this only addresses Newton and not bounded root finders.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Items related to regular maintenance tasks scipy.optimize
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet