Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: Ediff1d performance #8183

Merged
merged 2 commits into from
Oct 24, 2016
Merged

Conversation

mattharrigan
Copy link
Contributor

Eliminate a copy operation when to_begin or to_end is given. Also use ravel instead of flatiter which is much faster.

Closes #8175

Benchmark:
python -m timeit --setup="import numpy as np;x=np.arange(1e7)" "np.ediff1d(x, 0)"

new version is about 5x faster on my machine

Eliminate a copy operation when to_begin or to_end is given.  Also
use ravel instead of flatiter which is much faster.
Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed this over in mattharrigan#1. Looks good to me!

@shoyer
Copy link
Member

shoyer commented Oct 21, 2016

@mattharrigan Can you rename the second commit to mention ediff1d in the title?

@mattharrigan
Copy link
Contributor Author

I haven't ever done that before. Do I commit --amend and then push --force like this?

@kernc
Copy link

kernc commented Oct 21, 2016

commit --amend and then push --force

That should work.

@shoyer shoyer merged commit 3bd79ab into numpy:master Oct 24, 2016
@shoyer
Copy link
Member

shoyer commented Oct 24, 2016

Thanks!

@shoyer
Copy link
Member

shoyer commented Oct 25, 2016

Yes, that sounds right.

On Fri, Oct 21, 2016 at 10:21 AM, mattharrigan notifications@github.com
wrote:

I haven't ever done that before. Do I commit --amend and then push --force
like this https://help.github.com/articles/changing-a-commit-message/?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#8183 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABKS1qWxJF02jXun_0DnCM2MuqcTk3MHks5q2PSrgaJpZM4KblnF
.

@mhvk
Copy link
Contributor

mhvk commented Oct 25, 2016

This causes a regression is astropy, since with the creation of an empty array, subclasses are no longer propagated correctly. See https://travis-ci.org/astropy/astropy/jobs/170586124#L2564.

@shoyer
Copy link
Member

shoyer commented Oct 25, 2016

@mhvk Remind me, do we have a standard way of allocating results with the proper subclass?

@mattharrigan
Copy link
Contributor Author

I'll defer to the experts, but this looks promising

result = np.empty(l + len(to_begin) + len(to_end), dtype=ary.dtype)
result[:len(to_begin)] = to_begin
result[len(to_begin) + l:] = to_end
np.subtract(ary[1:], ary[:-1], result[len(to_begin):len(to_begin) + l])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely this is wrong? It always writes only one element of the output!

@seberg
Copy link
Member

seberg commented Oct 26, 2016

@shoyer look at functions like np.delete which use get and then at the end use the array wrap. Not sure if this is the right thing here though. The other thing is to try to stick to ufuncs which know how to wrap.

@mhvk
Copy link
Contributor

mhvk commented Oct 26, 2016

Looking at the new code, I don't think it is right for input arrays longer than 2 elements... Revert this for now?!

As for Quantity, the old code would only have worked when to_begin and to_end were not set, as in that case simply the difference is returned. If either were present, np.hstack would ignore the subclass and return a normal array. In that sense, the code was somewhat broken already.

To get this right is not totally trivial, as, at least for Quantity, one should consider that to_begin and to_end may have units that are different from the result (say, be in cm rather than in m). The safest (but certainly not fastest way) would be to create the difference array as in the old code, and then call result.insert(...) for the begin/end). This does follow good coding style in that, by calling array methods, it leaves as much as possible up to the subclass to implement.

Alternatively, after calculating the difference, one could still create an empty array, and either call, as @mattharrigan suggested, diff.__array_wrap__(result) or result.__array_finalize__(diff) (I like the latter a bit better) to ensure it gets the right properties. After that, result[:len(to_begin)] = to_begin and result[:len(result)-len(to_end)] = to_end would do proper conversion. To me, though, this feels more fragile than the insert tactic, and it is not obvious to me it would be faster.

Note, though, that at least for Quantity and its subclasses, one must calculate the difference first, since there is no guarantee that the unit remains the same as the input (e.g., a Magnitude, which is a subclass of a Quantity holding logarithmic values, would change its unit upon taking a difference (from say mag(Jy) to mag).

@seberg
Copy link
Member

seberg commented Oct 26, 2016

One thing about wrapping: I have no idea how far numpy_ufunc is and how it plays into it.

@seberg
Copy link
Member

seberg commented Oct 26, 2016

@mhvk if you think you will have the fix in quickly, don't know if a revert is vital (if this is wrong), but in that case mark a followup PR as a release blocker (or create a release blocking issue).

@mattharrigan
Copy link
Contributor Author

mattharrigan commented Oct 26, 2016

@mhvk I don't understand the concern about not being right for arrays longer than 2. Are you confusing the lower case L with the number 1? If so I apologize for not using better variable names.

I didn't know much at all about ndarray subclasses or how much complexity might creep into this PR due to it. Most of the inefficiency of the previous implementation was due to the flat iterator instead of just calling ravel and doing the subtraction with an array. Perhaps that change alone should be made. But you point out that Quantities are not handled well in either case

@mhvk
Copy link
Contributor

mhvk commented Oct 26, 2016

Ah, yes, that was an l, not a 1. Sorry for the noise!

@mhvk
Copy link
Contributor

mhvk commented Oct 26, 2016

I think it would be good at least to restore the previous behaviour (and thus remove the regression): without to_begin and to_end, surely by far the fastest is to just return the difference, rather than creating an array to fill with it.

@mattharrigan
Copy link
Contributor Author

mattharrigan commented Oct 26, 2016

In either case the output array must be created. a[1:] = a[:-1] just does that behind the scenes. This code does the subtraction with the output array given so the output array is not created twice

@mhvk
Copy link
Contributor

mhvk commented Oct 26, 2016

It it true that an output array has to be created in either case, but the speed is not the same:

a = np.arange(10000.)
%timeit result = a[1:]-a[:-1]
# 100000 loops, best of 3: 12.6 µs per loop
%timeit result = np.empty(9999); np.subtract(a[1:], a[:-1], result)
# 100000 loops, best of 3: 15.6 µs per loop

It gets worse for smaller inputs:

a = np.arange(100.)
%timeit result = a[1:]-a[:-1]
# 1000000 loops, best of 3: 1.63 µs per loop
%timeit result = np.empty(99); np.subtract(a[1:], a[:-1], result)
# 100000 loops, best of 3: 3.78 µs per loop

@mattharrigan
Copy link
Contributor Author

Good point. There should be something to the effect of:

if to_begin is None and to_end is None:
    return a[1:]-a[:-1]

@mattharrigan
Copy link
Contributor Author

My intuition about performance was wrong (big surprise), but I would like to understand why. Here are three timeit calls. The second is the fastest as I expected. However creating that empty array takes forever relatively speaking, so that step 2 + step 3 takes far longer than 1, even though step one internally has to create an array also. Admittedly these are very small arrays. Is the difference just due to overhead? Is there hidden work somewhere? Thank you

python -m timeit -s "import numpy as np;a = np.arange(10, dtype=float)" "a[1:]-a[:-1]"
python -m timeit -s "import numpy as np;a = np.arange(10, dtype=float);buf=np.empty(9, dtype=float)" "np.subtract(a[1:], a[:-1], buf)"
python -m timeit -s "import numpy as np" "buf=np.empty(9, dtype=float)"

@seberg
Copy link
Member

seberg commented Oct 26, 2016

There is plenty of hidden work.... At the things you are timing, simply the argument (and keyword arguments) parsing is probably a significant portion of things.

@shoyer
Copy link
Member

shoyer commented Oct 26, 2016

The speed difference here (~2 us) is pretty much just unavoidable Python overhead. Anything less than 5-10 us not really worth optimizing -- anyone who cares about performance at this level should not be using Python in their inner loop.

We can certainly switch to use simple subtraction for preserving subclasses in this case, but it is very unfortunate how all these implementation details have leaked to become public API for handling subclasses. This is not the only case where NumPy has been locked inadvertently into strange/inefficient implementations for the sake of subclasses. In any case I'm glad you are testing the behavior you are relying on! Our internal test coverage for subclasses is pretty sparse, and it's difficult to imagine covering all cases without expanding the test suite in a major way (and also without adding appropriate APIs like __numpy_ufunc__ for handling this stuff in a consistent fashion).

@mattharrigan
Copy link
Contributor Author

I think there are several options, not sure which is preferred

  1. leave previous copy logic, but still change from flat to ravel for the easy performance
  2. fast track to just return the diff when to_begin and to_end are both None, if not special case ndarray subclasses to use the previous copy logic, feels hacky
  3. fast track to just return the diff when to_begin and to_end are both None, if not use current implementation but wrap/finalize/apply black magic before returning, maybe like this
    Please let me know what you think

@charris
Copy link
Member

charris commented Oct 26, 2016

I'd like to branch 1.12 soonish so it would be good to get this fixed up.

@mhvk
Copy link
Contributor

mhvk commented Oct 26, 2016

@shoyer - it is not just subclasses outside of numpy -- MaskedArray would not work here either... (and does not work with to_begin or to_end with the old version either). Of course, masked arrays are annoying too, though in fairness the problem encountered here, that one cannot easily concatenate different instances of subclasses -- be it with np.hstack or by creating a large array -- is about the only one for which there is a real problem for subclasses (but sadly not one too easy to solve).

Unfortunately, for a general subclass, for wrapping to work, one does have to calculate the difference, since it may have properties that are required for the wrapping and cannot simply be inferred from the input array (the same would hold for a hypothetical SafeDtype subclass, which would change the dtype on subtraction). Given this, I would vote for the first option, ie., return to the previous implementation but use ravel instead of float. Then, if concatenation is solved eventually, this will just work for all cases.

@shoyer
Copy link
Member

shoyer commented Oct 27, 2016

@mattharrigan I am happy with either (1) or (2). You are also welcome to give (3) a shot, but that would be significantly more involved. We do need to unbreak astropy, but we don't need to make this work in cases that didn't work before.

@mhvk I recognize that these issues are true for MaskedArray as well. Supporting subclasses is wonderful, as long as it can be done without majorly impacting performance or readability. However, I am opposed to using concatenate here merely for the sake hypothetical future subclass compatibility improvements, when it has a very real impact on performance for use with base numpy.ndarray objects.

I do find this feature creep around subclasses quite frustrating. It puts us, as NumPy developers, in an untenable position: we are locked into supporting features we neither intentionally added nor even know we have! If something happens to work properly for subclasses through fortuitous implementation details, then we are stuck supporting it forever.

This is particularly problematic for subclasses outside the NumPy code base, because they aren't tied in to our CI systems. Maybe this a good time to revise the documentation to officially discourage subclassing by third parties.

@mattharrigan
Copy link
Contributor Author

mattharrigan commented Oct 27, 2016

I too am frustrated. The original point of this PR was performance, and option 1 still provides most of the performance benefit, is clean, and shares the same subclass peculiarities as the previous implementation.

Does this PR get reopened or should I start another PR?

@mhvk
Copy link
Contributor

mhvk commented Oct 27, 2016

@shoyer - I guess the annoyances are real, but it may be good to keep in mind the benefits as well: it would seem to me that numpy in general, and MaskedArray and recarray in particular have improved greatly from having been subclassed (in astropy not just by Quantity but also by MaskedColumn in astropy.table and by Column in astropy.io.fits). And it certainly has helped astropy a lot that the original developers of numpy (including the people who did numeric as part of their work at STScI for HST) had the foresight to write code in such a way that numpy could be used very broadly, including by subclassing it.

On the PR proper, I do think that given that there was a general hope expressed to have ediff1d be deprecated in favour of the faster & better diff, we have spent far too much time arguing about what is a small additional improvement over replacing .flat with .ravel().

@mattharrigan
Copy link
Contributor Author

mattharrigan commented Oct 27, 2016

I could make a strong argument that the proposed implementation plus calling wrap is better specifically for astropy units. Using the previous ediff1d, these two calls produce identical output:

np.ediff1d(np.arange(5)*u.s, to_begin=10*u.meter)
np.ediff1d(np.arange(5)*u.s, to_begin=10*u.s)

The first should error out and the second should keep the unit. Both end up being wrong. hstack, or concatenate in general, doesn't check for compatible units or keep the unit.

However this errors out as it should:

x = np.arange(5) * u.s
x[:2] = np.arange(2) * u.meter

Which is what proposed implementation plus wrap would do. That's also what diff will do, eventually I hope. The proposed would also work for this, while the previous ediff1d chokes:

x[:2] = np.arange(2) * u.ms
np.ediff1d(np.arange(5)*u.s, to_begin=10*u.ms)

@mhvk
Copy link
Contributor

mhvk commented Oct 27, 2016

@mattharrigan - indeed, if I were to write a version of concatenate that would work for quantity, it would effectively have to do just what you describe! However, your current code will also give a wrong result, since your np.empty command creates a regular array and if one does array[some-slice] = quantity, no unit checking is done (there is nothing to hook into at present that allows us to check that quantity is in fact dimensionless). So, you would need to create an empty array that already is of the right subclass (ie., an np.empty_like with a given shape or np.empty(...).view(type(...)). Indeed, I wondered a bit whether to send a patch to this PR to do that, except that then the problem is that without taking the difference I cannot be entirely sure what the unit would be (one could start from to_begin or to_end, which should have consistent unit, but by this point I felt it was getting too complicated...)

Anyway, just to be clear, I think anything that makes things faster for regular arrays and does not cause a regression is a clear benefit, i.e., either do your 1 or 2 (and perhaps add a test with MaskedArray to be sure this doesn't recur); I'd also add a comment that the later part of the code should ideally become subclass safe.

@shoyer
Copy link
Member

shoyer commented Oct 27, 2016

Does this PR get reopened or should I start another PR?

Please open a new PR.

@mattharrigan
Copy link
Contributor Author

I haven't used masked array or astropy units previously but i have used np.matrix, so that's what I looked at first. I planned to copy examples from np.linalg which return a matrix or array depending on the input using wrap. It works nicely with the matrix subclass, see below. I would do that immediately after creating the empty array.

a = np.arange(5)
m = np.matrix(0)
wrap = getattr(m, "__array_prepare__", m.__array_wrap__)
out = wrap(a)

However it doesn't seem to work with astropy.Quantity. Are the various methods implemented correctly? I am not familiar with that code at all but I suspect it might be an issue.

@mhvk
Copy link
Contributor

mhvk commented Oct 27, 2016

@mattharrigan - what I meant was that __array_wrap__ should always work for ndarray subclasses (while I have not seen __array_prepare__ used outside of ufuncs). At least, for matrix, MaskedArray, and Quantity that is true:

a = np.arange(5.)
np.matrix(0).__array_wrap__(a)
# matrix([[ 0.,  1.,  2.,  3.,  4.]])
np.ma.MaskedArray(0.).__array_wrap__(a)
# masked_array(data = [ 0.  1.  2.  3.  4.],
#              mask = False,
#        fill_value = 1e+20)
u.Quantity(1., u.m).__array_wrap__(a)
# <Quantity [ 0., 1., 2., 3., 4.] m>

I should add that now that I look at this again, this form of wrapping will at least make everything work for regular Quantity -- so maybe that is the right path for ediff1d as well after all! (it won't work for Magnitude, but so be it... perfect is the enemy of good!)

@mhvk
Copy link
Contributor

mhvk commented Oct 27, 2016

p.s. Sorry I'm replying as if your quote was to a comment I gave on your np.diff PR -- anyway, bottom line, with the __array_wrap__ here and there, things should work for units for which subtraction does not change the unit (which is all regular units). Hooray!

@charris charris changed the title Ediff1d performance MAINT: Ediff1d performance Oct 28, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants