-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: optimisation of array_equal #24663
Conversation
Just my first thoughts: I am not sure I think it needs to be our job to check for identity, even when we know its correct. I am happy to skip NaN checks when we know for sure that there are no NaNs. |
Yeah, I'm unsure, I think it's pretty rare, but at the same time, it can be quite a bit more performant.
Do you mean we should have an explicit list of non-nanable types instead of issubtype ? Regardless both are not a hill I'm going to die on, I'm happy to split the logic in two PRs if you wish. |
I am just not sure that your current |
Actually an alternative that we may or may not like: I always think that it might be useful to add a ufunc for I.e. so that we don't have to do the NaN dance at all, but have a ufunc that matches NaNs (I think this would also help fix a bunch of EDIT: Arg, I keep mixing this up. Because what I want is something that effectively considers NaNs equal. But |
c3b453a
to
359e22c
Compare
The approach I've refactored the test to use pytest parametrize, and added a number of case of interest, and rebased to put the tests commit first (modified test is passing before the PR). |
1b6ffb4
to
48217cc
Compare
Added a test for 0-d arrays in the first commit. I can remove the |
Maybe stupid question, I'ts not my area of expertice but may assuming array have same dtypes, woudln't |
Just tested, that does not work for |
I think it might work for all but object arrays since you can't know the truthiness of a python value without testing it. |
Moving to ready for review. I'm happy to split into a multiple chunks, like only the tests for example. |
numpy/core/numeric.py
Outdated
def _dtype_cannot_hold_nan(dtype): | ||
return dtype in {np.int8, np.int16, np.int32, np.int64} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def _dtype_cannot_hold_nan(dtype): | |
return dtype in {np.int8, np.int16, np.int32, np.int64} | |
_no_nan_dtypes = {np.int8, np.int16, np.int32, np.int64} | |
def _dtype_cannot_hold_nan(dtype): | |
return dtype in _no_nan_dtypes |
This is much faster, a benchmark:
import numpy as np
def _dtype_cannot_hold_nan(dtype):
return dtype in {np.int8, np.int16, np.int32, np.int64}
_no_nan_dtypes = {np.int8, np.int16, np.int32, np.int64}
def _dtype_cannot_hold_nan_2(dtype):
return dtype in _no_nan_dtypes
dtype=np.array([1.]).dtype
%timeit _dtype_cannot_hold_nan(dtype)
%timeit _dtype_cannot_hold_nan_2(dtype)
%timeit dtype in _no_nan_dtypes
results in
507 ns ± 63.1 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
112 ns ± 1.94 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
57.9 ns ± 5.47 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In inline version is the fastest, so that would be an option as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question is do we want to let people add things to _no_nan_dtypes
by monkeypatching.
For both the I'm tempted to want to optimize for large array (even if my use case is small-ish). This is were we'll see the most gain. Timing wise as we to at least µs, a few ns for dtype checking won't make much the differences. For large array checking, the |
This create a circular import. |
Gentle poke, to know the status. I'm happy to spit the test refactor into its own PR. There is still the question about the complex test that @rossbar wrote where I'm not sure what was written was what was intended. |
Updated comments following discussion. |
Going through my list of opened PRs, Just checking if there more I can do to help. I can pull the test refactor in a separate PRs to make this more digestible/less contentious. |
ebef6dd
to
7616f57
Compare
Gentle nudge again, as this was relatively painful to rebase on master after the recent changes. I can again pull the test refactor into its own PR if that makes it easier to get accepted. |
It looks like some of the new tests are failing. I believe a separate PR that just implements the refactoring and expands the tests would have an easier time getting merged. I don't have a strong opinion here on the performance changes. I personally don't think the overhead of checking object identity first is big enough to worry about slowing down non-identity cases but I suspect that's why people are a little leery. A new set of benchmark runs with the latest version of the PR might help as well? If you'd like, stop by the next NumPy triage zoom meeting on Wednesday at 1:00 PM US Eastern time and you'll have more people's attention. |
This is a pure-test refactor (as well as adding a few cases), extracted from numpy#24663. Testing each individually has a small Pytest overhead, but has the advantage of each test being able to fail independently in case of a modification of the tested function. It also make it much easier to add a new test, and I believe is much more readable.
7616f57
to
aebd2d7
Compare
Yeah, I've botched the merge/rebase. Redoing now.
I'll try, but Those times are usually touch for me here, and today I'll can't for sure. I've pulled out the test refactor into its own PR #25010 |
aebd2d7
to
035d92e
Compare
This is a pure-test refactor (as well as adding a few cases), extracted from numpy#24663. Testing each individually has a small Pytest overhead, but has the advantage of each test being able to fail independently in case of a modification of the tested function. It also make it much easier to add a new test, and I believe is much more readable. [skip cirrus] [skip circle] [skip azp]
This is a pure-test refactor (as well as adding a few cases), extracted from numpy#24663. Testing each individually has a small Pytest overhead, but has the advantage of each test being able to fail independently in case of a modification of the tested function. It also make it much easier to add a new test, and I believe is much more readable. [skip cirrus] [skip circle] [skip azp]
This is a pure-test refactor (as well as adding a few cases), extracted from numpy#24663. Testing each individually has a small Pytest overhead, but has the advantage of each test being able to fail independently in case of a modification of the tested function. It also make it much easier to add a new test, and I believe is much more readable. [skip cirrus] [skip circle] [skip azp]
This is a pure-test refactor (as well as adding a few cases), extracted from numpy#24663. Testing each individually has a small Pytest overhead, but has the advantage of each test being able to fail independently in case of a modification of the tested function. It also make it much easier to add a new test, and I believe is much more readable. [skip cirrus] [skip circle] [skip azp]
For various reason I was looking at array_equal and I believe there are a few case where we can get some speedup/less memory allocation. - if `equal_nan == True` then could we add a shortcut, we can sometime check that both array are the same and avoid the computation. - if the dtypes cannot contain `nan`, then we do not need the expensive fancy indexing and removal of `nan`s we can test will the following script ``` import numpy as np base = np.random.randint(1, 100, (10000, 10000), dtype='int64') a = base.copy() a1 = base.copy() b = base.copy().astype('float') b[b == 50] = np.nan b1 = b.copy() data = {'a':a, 'b':b, 'a1':a1, 'b1': b1} for (x, y) in [('a','a'),('a','a1'),('b','b'),('b','b1')]: ax = data[x] bx = data[y] for en in [True, False]: res = np.array_equal(ax, bx, equal_nan=en) print(f"np.array_equal({x}, {y}, equal_nan={en}) == {res}") %timeit -r10 -o np.array_equal(ax, bx, equal_nan=en) ``` Which give us the following results. ``` --- before.txt 2023-09-07 15:19:01 +++ after.txt 2023-09-07 15:24:30 @@ -1,16 +1,16 @@ np.array_equal(a, a, equal_nan=True) == True -419 ms ± 42.4 ms per loop (mean ± std. dev. of 10 runs, 1 loop each) +1.41 µs ± 5.37 ns per loop (mean ± std. dev. of 10 runs, 1,000,000 loops each) np.array_equal(a, a, equal_nan=False) == True -31.1 ms ± 200 µs per loop (mean ± std. dev. of 10 runs, 10 loops each) +31 ms ± 348 µs per loop (mean ± std. dev. of 10 runs, 10 loops each) np.array_equal(a, a1, equal_nan=True) == True -1.07 s ± 326 ms per loop (mean ± std. dev. of 10 runs, 1 loop each) +40.3 ms ± 133 µs per loop (mean ± std. dev. of 10 runs, 10 loops each) np.array_equal(a, a1, equal_nan=False) == True -41.9 ms ± 130 µs per loop (mean ± std. dev. of 10 runs, 10 loops each) +40.3 ms ± 210 µs per loop (mean ± std. dev. of 10 runs, 10 loops each) np.array_equal(b, b, equal_nan=True) == True -569 ms ± 61.7 ms per loop (mean ± std. dev. of 10 runs, 1 loop each) +909 ns ± 1.61 ns per loop (mean ± std. dev. of 10 runs, 1,000,000 loops each) np.array_equal(b, b, equal_nan=False) == False -31.2 ms ± 208 µs per loop (mean ± std. dev. of 10 runs, 10 loops each) +30.2 ms ± 228 µs per loop (mean ± std. dev. of 10 runs, 10 loops each) np.array_equal(b, b1, equal_nan=True) == True -842 ms ± 171 ms per loop (mean ± std. dev. of 10 runs, 1 loop each) +461 ms ± 3.07 ms per loop (mean ± std. dev. of 10 runs, 1 loop each) np.array_equal(b, b1, equal_nan=False) == False -40.4 ms ± 383 µs per loop (mean ± std. dev. of 10 runs, 10 loops each) +39.2 ms ± 234 µs per loop (mean ± std. dev. of 10 runs, 10 loops each) ```
04650f7
to
c9933a5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love these special cases, but I think they are fine.
Just nitpicks, I don't care what you do with them, although I think I would prefer to use the nt.<type>
approach at least, because the global is just awkward.
numpy/_core/numeric.py
Outdated
cannot_have_nan = _dtype_cannot_hold_nan(a1.dtype)\ | ||
and _dtype_cannot_hold_nan(a2.dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cannot_have_nan = _dtype_cannot_hold_nan(a1.dtype)\ | |
and _dtype_cannot_hold_nan(a2.dtype) | |
cannot_have_nan = (_dtype_cannot_hold_nan(a1.dtype) | |
and _dtype_cannot_hold_nan(a2.dtype) |
Just avoid the \
because I hate it :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
numpy/_core/numeric.py
Outdated
def _dtype_cannot_hold_nan(dtype): | ||
global _no_nan_types | ||
if _no_nan_types is None: | ||
_no_nan_types = {np.bool_, np.int8, np.int16, np.int32, np.int64} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could just use import .numerictypes as nt
. The problem is that things are wonky while we are setting things up.
I might slightly prefer (but verbose) to write:
_no_nan_dtypes = {np.dtypes.BoolDType, np.dtypes.Int8DType, ...}
and then compare type(dtype)
rather than dtype.type
. We could put the _no_nan_dtypes
into np.dtypes
also.
I do not want to consider this public, though. Because the right API should be different (maybe a np.dtypes.NoNaNDType.register(MyDType)
,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree that the API should not be public, at least for now. And, numerictypes
seem great !
@@ -2505,7 +2525,7 @@ def array_equiv(a1, a2): | |||
except Exception: | |||
return False | |||
|
|||
return bool(asarray(a1 == a2).all()) | |||
return bool((a1 == a2).all()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! this does seem OK to remove, because we guarantee that a == b
returns a NumPy boolean array/scalar. So we don't run into problems even for object dtype!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can also send that as it's own PR if you wish.
ret = getattr(numeric, attr_name, None) | ||
if ret is None and attr_name != "newaxis": | ||
|
||
sentinel = object() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really this PR, but I don't think I care too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I don't do this, test fails because _no_nan_dtypes
is None
.
Here are some redone bench test with 1000x1000 random array: It's only a guideline as the larger the arrays are the more the effect will be felt. (Source to replicate : https://gist.github.com/Carreau/baecfc0b02773bea6a744935e0dc9d1b) CI seem to have a problem with np.dtypes, but it works locally, which is weird. |
np.dtypes.Int16DType, | ||
np.dtypes.Int32DType, | ||
np.dtypes.Int64DType, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strange that I cannot repro locally, but the try to import modules lazily seems to make things barf. Importing dtypes
lazily is useless (The C side has to already import it to fill it!), but I guess explicitly importing here will work...
EDIT: ahh, it is when reloading, in which case the C code isn't run again and dtypes is never explicitly imported until here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is actually a bit annoying, because np.longlong
or np.cint
could fall through the cracks (right now), but will ignore that.
I'm not quite sure why arm64 is failing, can you restart it ? Or I can merge main/rebase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry forgot about this a bit. The np.dtypes
namespace population hates reloading, so let's use the ugly way to get the same thing. The in
check needs to work on types, so either dtype.type
or type(dtype)
(depending on what is used).
Thanks @Carreau passing now, and my last set of changes were pretty small. |
🥳 🎊 Thanks ! |
For various reason I was looking at array_equal and I believe there are
a few case where we can get some speedup/less memory allocation.
equal_nan == True
then could we add a shortcut, we can sometimecheck that both array are the same and avoid the computation
nan
, then we do not need the expensivefancy indexing and removal of
nan
swe can test will the following script
Which give us the following results.