Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: optimisation of array_equal #24663

Merged
merged 7 commits into from
Nov 7, 2023
Merged

Conversation

Carreau
Copy link
Contributor

@Carreau Carreau commented Sep 7, 2023

For various reason I was looking at array_equal and I believe there are
a few case where we can get some speedup/less memory allocation.

  • if equal_nan == True then could we add a shortcut, we can sometime
    check that both array are the same and avoid the computation
  • if the dtypes cannot contain nan, then we do not need the expensive
    fancy indexing and removal of nans

we can test will the following script

import numpy as np

base = np.random.randint(1, 100, (10000, 10000), dtype='int64')
a = base.copy()
a1 = base.copy()

b = base.copy().astype('float')
b[b == 50] = np.nan

b1 = b.copy()

data = {'a':a, 'b':b, 'a1':a1, 'b1': b1}

for (x, y) in [('a','a'),('a','a1'),('b','b'),('b','b1')]:
    ax = data[x]
    bx = data[y]
    for en in [True, False]:
        res = np.array_equal(ax, bx, equal_nan=en)
        print(f"np.array_equal({x}, {y}, equal_nan={en}) == {res}")
        %timeit -r10 -o np.array_equal(ax, bx, equal_nan=en)

Which give us the following results.

--- before.txt  2023-09-07 15:19:01
+++ after.txt   2023-09-07 15:24:30
@@ -1,16 +1,16 @@
np.array_equal(a, a, equal_nan=True) == True
-419 ms ± 42.4 ms per loop (mean ± std. dev. of 10 runs, 1 loop each)
+1.41 µs ± 5.37 ns per loop (mean ± std. dev. of 10 runs, 1,000,000 loops each)

np.array_equal(a, a, equal_nan=False) == True
-31.1 ms ± 200 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)
+31 ms ± 348 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)

np.array_equal(a, a1, equal_nan=True) == True
-1.07 s ± 326 ms per loop (mean ± std. dev. of 10 runs, 1 loop each)
+40.3 ms ± 133 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)

np.array_equal(a, a1, equal_nan=False) == True
-41.9 ms ± 130 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)
+40.3 ms ± 210 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)

p.array_equal(b, b, equal_nan=True) == True
-569 ms ± 61.7 ms per loop (mean ± std. dev. of 10 runs, 1 loop each)
+909 ns ± 1.61 ns per loop (mean ± std. dev. of 10 runs, 1,000,000 loops each)

np.array_equal(b, b, equal_nan=False) == False
-31.2 ms ± 208 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)
+30.2 ms ± 228 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)

np.array_equal(b, b1, equal_nan=True) == True
-842 ms ± 171 ms per loop (mean ± std. dev. of 10 runs, 1 loop each)
+461 ms ± 3.07 ms per loop (mean ± std. dev. of 10 runs, 1 loop each)

np.array_equal(b, b1, equal_nan=False) == False
-40.4 ms ± 383 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)
+39.2 ms ± 234 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)

@seberg
Copy link
Member

seberg commented Sep 7, 2023

Just my first thoughts: I am not sure I think it needs to be our job to check for identity, even when we know its correct. I am happy to skip NaN checks when we know for sure that there are no NaNs.
One thing to beware of: timedeltas have NaNs, and I am not quite sure you are catching them (which I agree is wrong by NumPy if the case, but we need to take care here either way).

@Carreau
Copy link
Contributor Author

Carreau commented Sep 7, 2023

I am not sure I think it needs to be our job to check for identity, even when we know its correct

Yeah, I'm unsure, I think it's pretty rare, but at the same time, it can be quite a bit more performant.

One thing to beware of: timedeltas have NaNs, and I am not quite sure you are catching them (which I agree is wrong by NumPy if the case, but we need to take care here either way).

Do you mean we should have an explicit list of non-nanable types instead of issubtype ?

Regardless both are not a hill I'm going to die on, I'm happy to split the logic in two PRs if you wish.

@seberg
Copy link
Member

seberg commented Sep 7, 2023

I am just not sure that your current issubdtype (which I don't like, but there isn't a much cleaner alternative probably) doesn't also match for timedeltas: this needs a test.

@seberg
Copy link
Member

seberg commented Sep 7, 2023

Actually an alternative that we may or may not like: I always think that it might be useful to add a ufunc for islessgreater or maybe an inverted version. Maybe with a better name not_different?!

I.e. so that we don't have to do the NaN dance at all, but have a ufunc that matches NaNs (I think this would also help fix a bunch of np.unique annoyances/bugs).

EDIT: Arg, I keep mixing this up. Because what I want is something that effectively considers NaNs equal. But islessgreater isn't that because islessgreater(1., NaN) is also False.

@Carreau Carreau force-pushed the faster_np_array_equal branch 3 times, most recently from c3b453a to 359e22c Compare September 8, 2023 13:12
@Carreau
Copy link
Contributor Author

Carreau commented Sep 8, 2023

The approach issubdtype does indeed fail for timestamp64, I decided to fall back to a dtype in {...} with an explicit list of dtype that cannot have NaNs, in a utility function that would make it easy to update this list.

I've refactored the test to use pytest parametrize, and added a number of case of interest, and rebased to put the tests commit first (modified test is passing before the PR).

numpy/core/numeric.py Outdated Show resolved Hide resolved
@Carreau Carreau force-pushed the faster_np_array_equal branch 2 times, most recently from 1b6ffb4 to 48217cc Compare September 10, 2023 08:23
@Carreau
Copy link
Contributor Author

Carreau commented Sep 10, 2023

Added a test for 0-d arrays in the first commit.

I can remove the asarray(? == ?) in the function but I'm not entirely confident about it.

@Carreau
Copy link
Contributor Author

Carreau commented Sep 11, 2023

.e. so that we don't have to do the NaN dance at all, but have a ufunc that matches NaNs (I think this would also help fix a bunch of np.unique annoyances/bugs

Maybe stupid question, I'ts not my area of expertice but may assuming array have same dtypes, woudln't a.tobytes() == b.tobytes() work for this ? I guess tobytes() make copies, but maybe there is a way to do it w/o copies ?

@Carreau
Copy link
Contributor Author

Carreau commented Sep 11, 2023

Maybe stupid question, I'ts not my area of expertice but may assuming array have same dtypes, woudln't a.tobytes() == b.tobytes() work for this ? I guess tobytes() make copies, but maybe there is a way to do it w/o copies ?

Just tested, that does not work for 1+nan.j == nan + 1.j, but I guess we have at least a.tobytes() == b.tobytes() => a == b. Maybe that's an extra optimisation.

@ngoldbaum
Copy link
Member

woudln't a.tobytes() == b.tobytes() work for this

I think it might work for all but object arrays since you can't know the truthiness of a python value without testing it.

@Carreau
Copy link
Contributor Author

Carreau commented Sep 20, 2023

Moving to ready for review. I'm happy to split into a multiple chunks, like only the tests for example.

@Carreau Carreau marked this pull request as ready for review September 20, 2023 08:55
Comment on lines 2382 to 2383
def _dtype_cannot_hold_nan(dtype):
return dtype in {np.int8, np.int16, np.int32, np.int64}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _dtype_cannot_hold_nan(dtype):
return dtype in {np.int8, np.int16, np.int32, np.int64}
_no_nan_dtypes = {np.int8, np.int16, np.int32, np.int64}
def _dtype_cannot_hold_nan(dtype):
return dtype in _no_nan_dtypes

This is much faster, a benchmark:

import numpy as np

def _dtype_cannot_hold_nan(dtype):
    return dtype in {np.int8, np.int16, np.int32, np.int64}

_no_nan_dtypes = {np.int8, np.int16, np.int32, np.int64}

def _dtype_cannot_hold_nan_2(dtype):
    return dtype in _no_nan_dtypes

dtype=np.array([1.]).dtype

%timeit _dtype_cannot_hold_nan(dtype)
%timeit _dtype_cannot_hold_nan_2(dtype)
%timeit dtype in _no_nan_dtypes

results in

507 ns ± 63.1 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
112 ns ± 1.94 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
57.9 ns ± 5.47 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In inline version is the fastest, so that would be an option as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question is do we want to let people add things to _no_nan_dtypes by monkeypatching.

numpy/core/numeric.py Outdated Show resolved Hide resolved
numpy/core/numeric.py Outdated Show resolved Hide resolved
@Carreau
Copy link
Contributor Author

Carreau commented Sep 21, 2023

For both the is question and the dtype micro-optimisation I think depends on wether you want to optimize for large or small arrays.

I'm tempted to want to optimize for large array (even if my use case is small-ish). This is were we'll see the most gain. Timing wise as we to at least µs, a few ns for dtype checking won't make much the differences. For large array checking, the is can have also a large impact on memory allocation.

@Carreau
Copy link
Contributor Author

Carreau commented Sep 21, 2023

_no_nan_dtypes = {np.int8, np.int16, np.int32, np.int64}

This create a circular import.

numpy/core/numeric.py Outdated Show resolved Hide resolved
@Carreau
Copy link
Contributor Author

Carreau commented Oct 2, 2023

Gentle poke, to know the status. I'm happy to spit the test refactor into its own PR.

There is still the question about the complex test that @rossbar wrote where I'm not sure what was written was what was intended.

@Carreau
Copy link
Contributor Author

Carreau commented Oct 2, 2023

There is still the question about the complex test that @rossbar wrote where I'm not sure what was written was what was intended.

Updated comments following discussion.

@Carreau
Copy link
Contributor Author

Carreau commented Oct 17, 2023

Going through my list of opened PRs, Just checking if there more I can do to help.

I can pull the test refactor in a separate PRs to make this more digestible/less contentious.

@Carreau Carreau force-pushed the faster_np_array_equal branch 2 times, most recently from ebef6dd to 7616f57 Compare October 26, 2023 10:01
@Carreau
Copy link
Contributor Author

Carreau commented Oct 26, 2023

Gentle nudge again, as this was relatively painful to rebase on master after the recent changes.

I can again pull the test refactor into its own PR if that makes it easier to get accepted.

@ngoldbaum
Copy link
Member

It looks like some of the new tests are failing.

I believe a separate PR that just implements the refactoring and expands the tests would have an easier time getting merged.

I don't have a strong opinion here on the performance changes. I personally don't think the overhead of checking object identity first is big enough to worry about slowing down non-identity cases but I suspect that's why people are a little leery. A new set of benchmark runs with the latest version of the PR might help as well?

If you'd like, stop by the next NumPy triage zoom meeting on Wednesday at 1:00 PM US Eastern time and you'll have more people's attention.

Carreau added a commit to Carreau/numpy that referenced this pull request Oct 26, 2023
This is a pure-test refactor (as well as adding a few cases), extracted
from numpy#24663.

Testing each individually has a small Pytest overhead, but has the
advantage of each test being able to fail independently in case of a
modification of the tested function.

It also make it much easier to add a new test, and I believe is much more
readable.
@Carreau
Copy link
Contributor Author

Carreau commented Oct 26, 2023

It looks like some of the new tests are failing.

Yeah, I've botched the merge/rebase. Redoing now.

If you'd like, stop by the next NumPy triage zoom meeting on Wednesday at 1:00 PM US Eastern time and you'll have more people's attention.

I'll try, but Those times are usually touch for me here, and today I'll can't for sure.

I've pulled out the test refactor into its own PR #25010

rgommers pushed a commit to Carreau/numpy that referenced this pull request Oct 27, 2023
This is a pure-test refactor (as well as adding a few cases), extracted
from numpy#24663.

Testing each individually has a small Pytest overhead, but has the
advantage of each test being able to fail independently in case of a
modification of the tested function.

It also make it much easier to add a new test, and I believe is much more
readable.

[skip cirrus] [skip circle] [skip azp]
Carreau added a commit to Carreau/numpy that referenced this pull request Oct 27, 2023
This is a pure-test refactor (as well as adding a few cases), extracted
from numpy#24663.

Testing each individually has a small Pytest overhead, but has the
advantage of each test being able to fail independently in case of a
modification of the tested function.

It also make it much easier to add a new test, and I believe is much more
readable.

[skip cirrus] [skip circle] [skip azp]
Carreau added a commit to Carreau/numpy that referenced this pull request Oct 27, 2023
This is a pure-test refactor (as well as adding a few cases), extracted
from numpy#24663.

Testing each individually has a small Pytest overhead, but has the
advantage of each test being able to fail independently in case of a
modification of the tested function.

It also make it much easier to add a new test, and I believe is much more
readable.

[skip cirrus] [skip circle] [skip azp]
Carreau added a commit to Carreau/numpy that referenced this pull request Oct 27, 2023
This is a pure-test refactor (as well as adding a few cases), extracted
from numpy#24663.

Testing each individually has a small Pytest overhead, but has the
advantage of each test being able to fail independently in case of a
modification of the tested function.

It also make it much easier to add a new test, and I believe is much more
readable.

[skip cirrus] [skip circle] [skip azp]
For various reason I was looking at array_equal and I believe there are
a few case where we can get some speedup/less memory allocation.

 - if `equal_nan == True` then could we add a shortcut, we can sometime
   check that both array are the same and avoid the computation.
 - if the dtypes cannot contain `nan`, then we do not need the expensive
   fancy indexing and removal of `nan`s

we can test will the following script

```
import numpy as np

base = np.random.randint(1, 100, (10000, 10000), dtype='int64')
a = base.copy()
a1 = base.copy()

b = base.copy().astype('float')
b[b == 50] = np.nan

b1 = b.copy()

data = {'a':a, 'b':b, 'a1':a1, 'b1': b1}

for (x, y) in [('a','a'),('a','a1'),('b','b'),('b','b1')]:
    ax = data[x]
    bx = data[y]
    for en in [True, False]:
        res = np.array_equal(ax, bx, equal_nan=en)
        print(f"np.array_equal({x}, {y}, equal_nan={en}) == {res}")
        %timeit -r10 -o np.array_equal(ax, bx, equal_nan=en)

```

Which give us the following results.

```
--- before.txt	2023-09-07 15:19:01
+++ after.txt	2023-09-07 15:24:30
@@ -1,16 +1,16 @@
 np.array_equal(a, a, equal_nan=True) == True
-419 ms ± 42.4 ms per loop (mean ± std. dev. of 10 runs, 1 loop each)
+1.41 µs ± 5.37 ns per loop (mean ± std. dev. of 10 runs, 1,000,000 loops each)
 np.array_equal(a, a, equal_nan=False) == True
-31.1 ms ± 200 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)
+31 ms ± 348 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)
 np.array_equal(a, a1, equal_nan=True) == True
-1.07 s ± 326 ms per loop (mean ± std. dev. of 10 runs, 1 loop each)
+40.3 ms ± 133 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)
 np.array_equal(a, a1, equal_nan=False) == True
-41.9 ms ± 130 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)
+40.3 ms ± 210 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)
 np.array_equal(b, b, equal_nan=True) == True
-569 ms ± 61.7 ms per loop (mean ± std. dev. of 10 runs, 1 loop each)
+909 ns ± 1.61 ns per loop (mean ± std. dev. of 10 runs, 1,000,000 loops each)
 np.array_equal(b, b, equal_nan=False) == False
-31.2 ms ± 208 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)
+30.2 ms ± 228 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)
 np.array_equal(b, b1, equal_nan=True) == True
-842 ms ± 171 ms per loop (mean ± std. dev. of 10 runs, 1 loop each)
+461 ms ± 3.07 ms per loop (mean ± std. dev. of 10 runs, 1 loop each)
 np.array_equal(b, b1, equal_nan=False) == False
-40.4 ms ± 383 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)
+39.2 ms ± 234 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)
```
Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love these special cases, but I think they are fine.

Just nitpicks, I don't care what you do with them, although I think I would prefer to use the nt.<type> approach at least, because the global is just awkward.

Comment on lines 2460 to 2461
cannot_have_nan = _dtype_cannot_hold_nan(a1.dtype)\
and _dtype_cannot_hold_nan(a2.dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cannot_have_nan = _dtype_cannot_hold_nan(a1.dtype)\
and _dtype_cannot_hold_nan(a2.dtype)
cannot_have_nan = (_dtype_cannot_hold_nan(a1.dtype)
and _dtype_cannot_hold_nan(a2.dtype)

Just avoid the \ because I hate it :).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

def _dtype_cannot_hold_nan(dtype):
global _no_nan_types
if _no_nan_types is None:
_no_nan_types = {np.bool_, np.int8, np.int16, np.int32, np.int64}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could just use import .numerictypes as nt. The problem is that things are wonky while we are setting things up.
I might slightly prefer (but verbose) to write:
_no_nan_dtypes = {np.dtypes.BoolDType, np.dtypes.Int8DType, ...}
and then compare type(dtype) rather than dtype.type. We could put the _no_nan_dtypes into np.dtypes also.

I do not want to consider this public, though. Because the right API should be different (maybe a np.dtypes.NoNaNDType.register(MyDType),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree that the API should not be public, at least for now. And, numerictypes seem great !

@@ -2505,7 +2525,7 @@ def array_equiv(a1, a2):
except Exception:
return False

return bool(asarray(a1 == a2).all())
return bool((a1 == a2).all())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! this does seem OK to remove, because we guarantee that a == b returns a NumPy boolean array/scalar. So we don't run into problems even for object dtype!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can also send that as it's own PR if you wish.

ret = getattr(numeric, attr_name, None)
if ret is None and attr_name != "newaxis":

sentinel = object()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really this PR, but I don't think I care too much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I don't do this, test fails because _no_nan_dtypes is None.

@Carreau
Copy link
Contributor Author

Carreau commented Oct 30, 2023

Here are some redone bench test with 1000x1000 random array:
BENCH

It's only a guideline as the larger the arrays are the more the effect will be felt.

(Source to replicate : https://gist.github.com/Carreau/baecfc0b02773bea6a744935e0dc9d1b)

CI seem to have a problem with np.dtypes, but it works locally, which is weird.

np.dtypes.Int16DType,
np.dtypes.Int32DType,
np.dtypes.Int64DType,
}
Copy link
Member

@seberg seberg Oct 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange that I cannot repro locally, but the try to import modules lazily seems to make things barf. Importing dtypes lazily is useless (The C side has to already import it to fill it!), but I guess explicitly importing here will work...

EDIT: ahh, it is when reloading, in which case the C code isn't run again and dtypes is never explicitly imported until here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is actually a bit annoying, because np.longlong or np.cint could fall through the cracks (right now), but will ignore that.

@Carreau
Copy link
Contributor Author

Carreau commented Nov 6, 2023

I'm not quite sure why arm64 is failing, can you restart it ? Or I can merge main/rebase

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry forgot about this a bit. The np.dtypes namespace population hates reloading, so let's use the ugly way to get the same thing. The in check needs to work on types, so either dtype.type or type(dtype) (depending on what is used).

numpy/_core/numeric.py Outdated Show resolved Hide resolved
numpy/_core/numeric.py Outdated Show resolved Hide resolved
numpy/_core/numeric.py Outdated Show resolved Hide resolved
numpy/_core/numeric.py Outdated Show resolved Hide resolved
@seberg seberg merged commit 7ff7ec7 into numpy:main Nov 7, 2023
60 checks passed
@seberg
Copy link
Member

seberg commented Nov 7, 2023

Thanks @Carreau passing now, and my last set of changes were pretty small.

@Carreau
Copy link
Contributor Author

Carreau commented Nov 7, 2023

🥳 🎊 Thanks !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants