MAINT: Ediff1d performance #8183

mattharrigan · 2016-10-20T00:06:06Z

Eliminate a copy operation when to_begin or to_end is given. Also use ravel instead of flatiter which is much faster.

Closes #8175

Benchmark:
python -m timeit --setup="import numpy as np;x=np.arange(1e7)" "np.ediff1d(x, 0)"

new version is about 5x faster on my machine

Eliminate a copy operation when to_begin or to_end is given. Also use ravel instead of flatiter which is much faster.

shoyer

I reviewed this over in mattharrigan#1. Looks good to me!

shoyer · 2016-10-21T17:10:19Z

@mattharrigan Can you rename the second commit to mention ediff1d in the title?

mattharrigan · 2016-10-21T17:21:45Z

I haven't ever done that before. Do I commit --amend and then push --force like this?

kernc · 2016-10-21T18:23:52Z

commit --amend and then push --force

That should work.

shoyer · 2016-10-24T15:12:43Z

Thanks!

shoyer · 2016-10-25T18:22:33Z

Yes, that sounds right.

On Fri, Oct 21, 2016 at 10:21 AM, mattharrigan notifications@github.com
wrote:

I haven't ever done that before. Do I commit --amend and then push --force
like this https://help.github.com/articles/changing-a-commit-message/?

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#8183 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABKS1qWxJF02jXun_0DnCM2MuqcTk3MHks5q2PSrgaJpZM4KblnF
.

mhvk · 2016-10-25T21:53:26Z

This causes a regression is astropy, since with the creation of an empty array, subclasses are no longer propagated correctly. See https://travis-ci.org/astropy/astropy/jobs/170586124#L2564.

shoyer · 2016-10-25T21:56:57Z

@mhvk Remind me, do we have a standard way of allocating results with the proper subclass?

mattharrigan · 2016-10-26T00:25:41Z

I'll defer to the experts, but this looks promising

mhvk · 2016-10-26T08:50:59Z

numpy/lib/arraysetops.py

+    result = np.empty(l + len(to_begin) + len(to_end), dtype=ary.dtype)
+    result[:len(to_begin)] = to_begin
+    result[len(to_begin) + l:] = to_end
+    np.subtract(ary[1:], ary[:-1], result[len(to_begin):len(to_begin) + l])


Surely this is wrong? It always writes only one element of the output!

seberg · 2016-10-26T09:06:24Z

@shoyer look at functions like np.delete which use get and then at the end use the array wrap. Not sure if this is the right thing here though. The other thing is to try to stick to ufuncs which know how to wrap.

mhvk · 2016-10-26T09:06:40Z

Looking at the new code, I don't think it is right for input arrays longer than 2 elements... Revert this for now?!

As for Quantity, the old code would only have worked when to_begin and to_end were not set, as in that case simply the difference is returned. If either were present, np.hstack would ignore the subclass and return a normal array. In that sense, the code was somewhat broken already.

To get this right is not totally trivial, as, at least for Quantity, one should consider that to_begin and to_end may have units that are different from the result (say, be in cm rather than in m). The safest (but certainly not fastest way) would be to create the difference array as in the old code, and then call result.insert(...) for the begin/end). This does follow good coding style in that, by calling array methods, it leaves as much as possible up to the subclass to implement.

Alternatively, after calculating the difference, one could still create an empty array, and either call, as @mattharrigan suggested, diff.__array_wrap__(result) or result.__array_finalize__(diff) (I like the latter a bit better) to ensure it gets the right properties. After that, result[:len(to_begin)] = to_begin and result[:len(result)-len(to_end)] = to_end would do proper conversion. To me, though, this feels more fragile than the insert tactic, and it is not obvious to me it would be faster.

Note, though, that at least for Quantity and its subclasses, one must calculate the difference first, since there is no guarantee that the unit remains the same as the input (e.g., a Magnitude, which is a subclass of a Quantity holding logarithmic values, would change its unit upon taking a difference (from say mag(Jy) to mag).

seberg · 2016-10-26T09:07:41Z

One thing about wrapping: I have no idea how far numpy_ufunc is and how it plays into it.

seberg · 2016-10-26T09:21:02Z

@mhvk if you think you will have the fix in quickly, don't know if a revert is vital (if this is wrong), but in that case mark a followup PR as a release blocker (or create a release blocking issue).

mattharrigan · 2016-10-26T09:21:32Z

@mhvk I don't understand the concern about not being right for arrays longer than 2. Are you confusing the lower case L with the number 1? If so I apologize for not using better variable names.

I didn't know much at all about ndarray subclasses or how much complexity might creep into this PR due to it. Most of the inefficiency of the previous implementation was due to the flat iterator instead of just calling ravel and doing the subtraction with an array. Perhaps that change alone should be made. But you point out that Quantities are not handled well in either case

mhvk · 2016-10-26T09:38:03Z

Ah, yes, that was an l, not a 1. Sorry for the noise!

mhvk · 2016-10-26T09:40:54Z

I think it would be good at least to restore the previous behaviour (and thus remove the regression): without to_begin and to_end, surely by far the fastest is to just return the difference, rather than creating an array to fill with it.

mattharrigan · 2016-10-26T09:44:53Z

In either case the output array must be created. a[1:] = a[:-1] just does that behind the scenes. This code does the subtraction with the output array given so the output array is not created twice

mhvk · 2016-10-26T10:00:26Z

It it true that an output array has to be created in either case, but the speed is not the same:

a = np.arange(10000.)
%timeit result = a[1:]-a[:-1]
# 100000 loops, best of 3: 12.6 µs per loop
%timeit result = np.empty(9999); np.subtract(a[1:], a[:-1], result)
# 100000 loops, best of 3: 15.6 µs per loop

It gets worse for smaller inputs:

a = np.arange(100.)
%timeit result = a[1:]-a[:-1]
# 1000000 loops, best of 3: 1.63 µs per loop
%timeit result = np.empty(99); np.subtract(a[1:], a[:-1], result)
# 100000 loops, best of 3: 3.78 µs per loop

mattharrigan · 2016-10-26T10:14:16Z

Good point. There should be something to the effect of:

if to_begin is None and to_end is None:
    return a[1:]-a[:-1]

mattharrigan · 2016-10-26T13:32:04Z

My intuition about performance was wrong (big surprise), but I would like to understand why. Here are three timeit calls. The second is the fastest as I expected. However creating that empty array takes forever relatively speaking, so that step 2 + step 3 takes far longer than 1, even though step one internally has to create an array also. Admittedly these are very small arrays. Is the difference just due to overhead? Is there hidden work somewhere? Thank you

python -m timeit -s "import numpy as np;a = np.arange(10, dtype=float)" "a[1:]-a[:-1]"
python -m timeit -s "import numpy as np;a = np.arange(10, dtype=float);buf=np.empty(9, dtype=float)" "np.subtract(a[1:], a[:-1], buf)"
python -m timeit -s "import numpy as np" "buf=np.empty(9, dtype=float)"

seberg · 2016-10-26T14:01:05Z

There is plenty of hidden work.... At the things you are timing, simply the argument (and keyword arguments) parsing is probably a significant portion of things.

shoyer · 2016-10-26T15:10:41Z

The speed difference here (~2 us) is pretty much just unavoidable Python overhead. Anything less than 5-10 us not really worth optimizing -- anyone who cares about performance at this level should not be using Python in their inner loop.

We can certainly switch to use simple subtraction for preserving subclasses in this case, but it is very unfortunate how all these implementation details have leaked to become public API for handling subclasses. This is not the only case where NumPy has been locked inadvertently into strange/inefficient implementations for the sake of subclasses. In any case I'm glad you are testing the behavior you are relying on! Our internal test coverage for subclasses is pretty sparse, and it's difficult to imagine covering all cases without expanding the test suite in a major way (and also without adding appropriate APIs like __numpy_ufunc__ for handling this stuff in a consistent fashion).

mattharrigan · 2016-10-26T17:04:52Z

I think there are several options, not sure which is preferred

leave previous copy logic, but still change from flat to ravel for the easy performance
fast track to just return the diff when to_begin and to_end are both None, if not special case ndarray subclasses to use the previous copy logic, feels hacky
fast track to just return the diff when to_begin and to_end are both None, if not use current implementation but wrap/finalize/apply black magic before returning, maybe like this
Please let me know what you think

charris · 2016-10-26T17:37:45Z

I'd like to branch 1.12 soonish so it would be good to get this fixed up.

mhvk · 2016-10-26T22:20:55Z

@shoyer - it is not just subclasses outside of numpy -- MaskedArray would not work here either... (and does not work with to_begin or to_end with the old version either). Of course, masked arrays are annoying too, though in fairness the problem encountered here, that one cannot easily concatenate different instances of subclasses -- be it with np.hstack or by creating a large array -- is about the only one for which there is a real problem for subclasses (but sadly not one too easy to solve).

Unfortunately, for a general subclass, for wrapping to work, one does have to calculate the difference, since it may have properties that are required for the wrapping and cannot simply be inferred from the input array (the same would hold for a hypothetical SafeDtype subclass, which would change the dtype on subtraction). Given this, I would vote for the first option, ie., return to the previous implementation but use ravel instead of float. Then, if concatenation is solved eventually, this will just work for all cases.

shoyer · 2016-10-27T17:28:08Z

@mattharrigan I am happy with either (1) or (2). You are also welcome to give (3) a shot, but that would be significantly more involved. We do need to unbreak astropy, but we don't need to make this work in cases that didn't work before.

@mhvk I recognize that these issues are true for MaskedArray as well. Supporting subclasses is wonderful, as long as it can be done without majorly impacting performance or readability. However, I am opposed to using concatenate here merely for the sake hypothetical future subclass compatibility improvements, when it has a very real impact on performance for use with base numpy.ndarray objects.

I do find this feature creep around subclasses quite frustrating. It puts us, as NumPy developers, in an untenable position: we are locked into supporting features we neither intentionally added nor even know we have! If something happens to work properly for subclasses through fortuitous implementation details, then we are stuck supporting it forever.

This is particularly problematic for subclasses outside the NumPy code base, because they aren't tied in to our CI systems. Maybe this a good time to revise the documentation to officially discourage subclassing by third parties.

mattharrigan · 2016-10-27T19:07:08Z

I too am frustrated. The original point of this PR was performance, and option 1 still provides most of the performance benefit, is clean, and shares the same subclass peculiarities as the previous implementation.

Does this PR get reopened or should I start another PR?

mhvk · 2016-10-27T19:10:33Z

@shoyer - I guess the annoyances are real, but it may be good to keep in mind the benefits as well: it would seem to me that numpy in general, and MaskedArray and recarray in particular have improved greatly from having been subclassed (in astropy not just by Quantity but also by MaskedColumn in astropy.table and by Column in astropy.io.fits). And it certainly has helped astropy a lot that the original developers of numpy (including the people who did numeric as part of their work at STScI for HST) had the foresight to write code in such a way that numpy could be used very broadly, including by subclassing it.

On the PR proper, I do think that given that there was a general hope expressed to have ediff1d be deprecated in favour of the faster & better diff, we have spent far too much time arguing about what is a small additional improvement over replacing .flat with .ravel().

mattharrigan · 2016-10-27T19:51:58Z

I could make a strong argument that the proposed implementation plus calling wrap is better specifically for astropy units. Using the previous ediff1d, these two calls produce identical output:

np.ediff1d(np.arange(5)*u.s, to_begin=10*u.meter)
np.ediff1d(np.arange(5)*u.s, to_begin=10*u.s)

The first should error out and the second should keep the unit. Both end up being wrong. hstack, or concatenate in general, doesn't check for compatible units or keep the unit.

However this errors out as it should:

x = np.arange(5) * u.s
x[:2] = np.arange(2) * u.meter

Which is what proposed implementation plus wrap would do. That's also what diff will do, eventually I hope. The proposed would also work for this, while the previous ediff1d chokes:

x[:2] = np.arange(2) * u.ms
np.ediff1d(np.arange(5)*u.s, to_begin=10*u.ms)

mhvk · 2016-10-27T20:06:03Z

@mattharrigan - indeed, if I were to write a version of concatenate that would work for quantity, it would effectively have to do just what you describe! However, your current code will also give a wrong result, since your np.empty command creates a regular array and if one does array[some-slice] = quantity, no unit checking is done (there is nothing to hook into at present that allows us to check that quantity is in fact dimensionless). So, you would need to create an empty array that already is of the right subclass (ie., an np.empty_like with a given shape or np.empty(...).view(type(...)). Indeed, I wondered a bit whether to send a patch to this PR to do that, except that then the problem is that without taking the difference I cannot be entirely sure what the unit would be (one could start from to_begin or to_end, which should have consistent unit, but by this point I felt it was getting too complicated...)

Anyway, just to be clear, I think anything that makes things faster for regular arrays and does not cause a regression is a clear benefit, i.e., either do your 1 or 2 (and perhaps add a test with MaskedArray to be sure this doesn't recur); I'd also add a comment that the later part of the code should ideally become subclass safe.

shoyer · 2016-10-27T20:06:59Z

Does this PR get reopened or should I start another PR?

Please open a new PR.

mattharrigan · 2016-10-27T20:41:58Z

I haven't used masked array or astropy units previously but i have used np.matrix, so that's what I looked at first. I planned to copy examples from np.linalg which return a matrix or array depending on the input using wrap. It works nicely with the matrix subclass, see below. I would do that immediately after creating the empty array.

a = np.arange(5)
m = np.matrix(0)
wrap = getattr(m, "__array_prepare__", m.__array_wrap__)
out = wrap(a)

However it doesn't seem to work with astropy.Quantity. Are the various methods implemented correctly? I am not familiar with that code at all but I suspect it might be an issue.

mhvk · 2016-10-27T21:33:26Z

@mattharrigan - what I meant was that __array_wrap__ should always work for ndarray subclasses (while I have not seen __array_prepare__ used outside of ufuncs). At least, for matrix, MaskedArray, and Quantity that is true:

a = np.arange(5.)
np.matrix(0).__array_wrap__(a)
# matrix([[ 0.,  1.,  2.,  3.,  4.]])
np.ma.MaskedArray(0.).__array_wrap__(a)
# masked_array(data = [ 0.  1.  2.  3.  4.],
#              mask = False,
#        fill_value = 1e+20)
u.Quantity(1., u.m).__array_wrap__(a)
# <Quantity [ 0., 1., 2., 3., 4.] m>

I should add that now that I look at this again, this form of wrapping will at least make everything work for regular Quantity -- so maybe that is the right path for ediff1d as well after all! (it won't work for Magnitude, but so be it... perfect is the enemy of good!)

mhvk · 2016-10-27T21:35:34Z

p.s. Sorry I'm replying as if your quote was to a comment I gave on your np.diff PR -- anyway, bottom line, with the __array_wrap__ here and there, things should work for units for which subtraction does not change the unit (which is all regular units). Hooray!

ENH: performance improvement to ediff1d

16a98c0

Eliminate a copy operation when to_begin or to_end is given. Also use ravel instead of flatiter which is much faster.

charris added 01 - Enhancement component: numpy.lib labels Oct 20, 2016

shoyer approved these changes Oct 21, 2016

View reviewed changes

TST: Added cases for better coverage of ediff1d

4848271

mattharrigan force-pushed the ediff1d-performance branch from 1bb4329 to 4848271 Compare October 21, 2016 20:28

shoyer merged commit 3bd79ab into numpy:master Oct 24, 2016

mhvk reviewed Oct 26, 2016

View reviewed changes

mattharrigan mentioned this pull request Oct 26, 2016

ENH: add padding options to diff #8206

Merged

mattharrigan mentioned this pull request Oct 27, 2016

BUG: ediff1d should return subclasses #8218

Merged

charris changed the title ~~Ediff1d performance~~ MAINT: Ediff1d performance Oct 28, 2016

MAINT: Ediff1d performance #8183

MAINT: Ediff1d performance #8183

Conversation

mattharrigan commented Oct 20, 2016

shoyer left a comment

Choose a reason for hiding this comment

shoyer commented Oct 21, 2016

mattharrigan commented Oct 21, 2016

kernc commented Oct 21, 2016

shoyer commented Oct 24, 2016

shoyer commented Oct 25, 2016

mhvk commented Oct 25, 2016

shoyer commented Oct 25, 2016

mattharrigan commented Oct 26, 2016

mhvk Oct 26, 2016

Choose a reason for hiding this comment

seberg commented Oct 26, 2016

mhvk commented Oct 26, 2016 • edited Loading

seberg commented Oct 26, 2016

seberg commented Oct 26, 2016

mattharrigan commented Oct 26, 2016 • edited Loading

mhvk commented Oct 26, 2016

mhvk commented Oct 26, 2016

mattharrigan commented Oct 26, 2016 • edited Loading

mhvk commented Oct 26, 2016

mattharrigan commented Oct 26, 2016

mattharrigan commented Oct 26, 2016

seberg commented Oct 26, 2016

shoyer commented Oct 26, 2016

mattharrigan commented Oct 26, 2016

charris commented Oct 26, 2016

mhvk commented Oct 26, 2016

shoyer commented Oct 27, 2016

mattharrigan commented Oct 27, 2016 • edited Loading

mhvk commented Oct 27, 2016

mattharrigan commented Oct 27, 2016 • edited Loading

mhvk commented Oct 27, 2016

shoyer commented Oct 27, 2016

mattharrigan commented Oct 27, 2016

mhvk commented Oct 27, 2016 • edited Loading

mhvk commented Oct 27, 2016

mhvk commented Oct 26, 2016 •

edited

Loading

mattharrigan commented Oct 26, 2016 •

edited

Loading

mattharrigan commented Oct 26, 2016 •

edited

Loading

mattharrigan commented Oct 27, 2016 •

edited

Loading

mattharrigan commented Oct 27, 2016 •

edited

Loading

mhvk commented Oct 27, 2016 •

edited

Loading