ENH: optimisation of array_equal #24663

Carreau · 2023-09-07T13:32:17Z

For various reason I was looking at array_equal and I believe there are
a few case where we can get some speedup/less memory allocation.

if equal_nan == True then could we add a shortcut, we can sometime
check that both array are the same and avoid the computation
if the dtypes cannot contain nan, then we do not need the expensive
fancy indexing and removal of nans

we can test will the following script

import numpy as np

base = np.random.randint(1, 100, (10000, 10000), dtype='int64')
a = base.copy()
a1 = base.copy()

b = base.copy().astype('float')
b[b == 50] = np.nan

b1 = b.copy()

data = {'a':a, 'b':b, 'a1':a1, 'b1': b1}

for (x, y) in [('a','a'),('a','a1'),('b','b'),('b','b1')]:
    ax = data[x]
    bx = data[y]
    for en in [True, False]:
        res = np.array_equal(ax, bx, equal_nan=en)
        print(f"np.array_equal({x}, {y}, equal_nan={en}) == {res}")
        %timeit -r10 -o np.array_equal(ax, bx, equal_nan=en)

Which give us the following results.

--- before.txt  2023-09-07 15:19:01
+++ after.txt   2023-09-07 15:24:30
@@ -1,16 +1,16 @@
np.array_equal(a, a, equal_nan=True) == True
-419 ms ± 42.4 ms per loop (mean ± std. dev. of 10 runs, 1 loop each)
+1.41 µs ± 5.37 ns per loop (mean ± std. dev. of 10 runs, 1,000,000 loops each)

np.array_equal(a, a, equal_nan=False) == True
-31.1 ms ± 200 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)
+31 ms ± 348 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)

np.array_equal(a, a1, equal_nan=True) == True
-1.07 s ± 326 ms per loop (mean ± std. dev. of 10 runs, 1 loop each)
+40.3 ms ± 133 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)

np.array_equal(a, a1, equal_nan=False) == True
-41.9 ms ± 130 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)
+40.3 ms ± 210 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)

p.array_equal(b, b, equal_nan=True) == True
-569 ms ± 61.7 ms per loop (mean ± std. dev. of 10 runs, 1 loop each)
+909 ns ± 1.61 ns per loop (mean ± std. dev. of 10 runs, 1,000,000 loops each)

np.array_equal(b, b, equal_nan=False) == False
-31.2 ms ± 208 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)
+30.2 ms ± 228 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)

np.array_equal(b, b1, equal_nan=True) == True
-842 ms ± 171 ms per loop (mean ± std. dev. of 10 runs, 1 loop each)
+461 ms ± 3.07 ms per loop (mean ± std. dev. of 10 runs, 1 loop each)

np.array_equal(b, b1, equal_nan=False) == False
-40.4 ms ± 383 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)
+39.2 ms ± 234 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)

seberg · 2023-09-07T14:09:40Z

Just my first thoughts: I am not sure I think it needs to be our job to check for identity, even when we know its correct. I am happy to skip NaN checks when we know for sure that there are no NaNs.
One thing to beware of: timedeltas have NaNs, and I am not quite sure you are catching them (which I agree is wrong by NumPy if the case, but we need to take care here either way).

Carreau · 2023-09-07T14:19:38Z

I am not sure I think it needs to be our job to check for identity, even when we know its correct

Yeah, I'm unsure, I think it's pretty rare, but at the same time, it can be quite a bit more performant.

One thing to beware of: timedeltas have NaNs, and I am not quite sure you are catching them (which I agree is wrong by NumPy if the case, but we need to take care here either way).

Do you mean we should have an explicit list of non-nanable types instead of issubtype ?

Regardless both are not a hill I'm going to die on, I'm happy to split the logic in two PRs if you wish.

seberg · 2023-09-07T14:26:06Z

I am just not sure that your current issubdtype (which I don't like, but there isn't a much cleaner alternative probably) doesn't also match for timedeltas: this needs a test.

seberg · 2023-09-07T14:39:25Z

Actually an alternative that we may or may not like: I always think that it might be useful to add a ufunc for islessgreater or maybe an inverted version. Maybe with a better name not_different?!

I.e. so that we don't have to do the NaN dance at all, but have a ufunc that matches NaNs (I think this would also help fix a bunch of np.unique annoyances/bugs).

EDIT: Arg, I keep mixing this up. Because what I want is something that effectively considers NaNs equal. But islessgreater isn't that because islessgreater(1., NaN) is also False.

Carreau · 2023-09-08T13:18:19Z

The approach issubdtype does indeed fail for timestamp64, I decided to fall back to a dtype in {...} with an explicit list of dtype that cannot have NaNs, in a utility function that would make it easy to update this list.

I've refactored the test to use pytest parametrize, and added a number of case of interest, and rebased to put the tests commit first (modified test is passing before the PR).

numpy/core/numeric.py

Carreau · 2023-09-10T08:24:34Z

Added a test for 0-d arrays in the first commit.

I can remove the asarray(? == ?) in the function but I'm not entirely confident about it.

Carreau · 2023-09-11T07:39:48Z

.e. so that we don't have to do the NaN dance at all, but have a ufunc that matches NaNs (I think this would also help fix a bunch of np.unique annoyances/bugs

Maybe stupid question, I'ts not my area of expertice but may assuming array have same dtypes, woudln't a.tobytes() == b.tobytes() work for this ? I guess tobytes() make copies, but maybe there is a way to do it w/o copies ?

numpy/core/tests/test_numeric.py

Carreau · 2023-09-11T07:49:43Z

Maybe stupid question, I'ts not my area of expertice but may assuming array have same dtypes, woudln't a.tobytes() == b.tobytes() work for this ? I guess tobytes() make copies, but maybe there is a way to do it w/o copies ?

Just tested, that does not work for 1+nan.j == nan + 1.j, but I guess we have at least a.tobytes() == b.tobytes() => a == b. Maybe that's an extra optimisation.

ngoldbaum · 2023-09-11T16:37:16Z

woudln't a.tobytes() == b.tobytes() work for this

I think it might work for all but object arrays since you can't know the truthiness of a python value without testing it.

Carreau · 2023-09-20T08:54:57Z

Moving to ready for review. I'm happy to split into a multiple chunks, like only the tests for example.

eendebakpt · 2023-09-20T20:42:20Z

numpy/core/numeric.py

+def _dtype_cannot_hold_nan(dtype):
+    return dtype in {np.int8, np.int16, np.int32, np.int64}


Suggested change

def _dtype_cannot_hold_nan(dtype):

return dtype in {np.int8, np.int16, np.int32, np.int64}

_no_nan_dtypes = {np.int8, np.int16, np.int32, np.int64}

def _dtype_cannot_hold_nan(dtype):

return dtype in _no_nan_dtypes

This is much faster, a benchmark:

import numpy as np def _dtype_cannot_hold_nan(dtype): return dtype in {np.int8, np.int16, np.int32, np.int64} _no_nan_dtypes = {np.int8, np.int16, np.int32, np.int64} def _dtype_cannot_hold_nan_2(dtype): return dtype in _no_nan_dtypes dtype=np.array([1.]).dtype %timeit _dtype_cannot_hold_nan(dtype) %timeit _dtype_cannot_hold_nan_2(dtype) %timeit dtype in _no_nan_dtypes

results in

507 ns ± 63.1 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) 112 ns ± 1.94 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each) 57.9 ns ± 5.47 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In inline version is the fastest, so that would be an option as well.

One question is do we want to let people add things to _no_nan_dtypes by monkeypatching.

numpy/core/numeric.py

Carreau · 2023-09-21T06:11:26Z

For both the is question and the dtype micro-optimisation I think depends on wether you want to optimize for large or small arrays.

I'm tempted to want to optimize for large array (even if my use case is small-ish). This is were we'll see the most gain. Timing wise as we to at least µs, a few ns for dtype checking won't make much the differences. For large array checking, the is can have also a large impact on memory allocation.

Carreau · 2023-09-21T06:14:22Z

_no_nan_dtypes = {np.int8, np.int16, np.int32, np.int64}

This create a circular import.

numpy/core/numeric.py

Carreau · 2023-10-02T08:37:43Z

Gentle poke, to know the status. I'm happy to spit the test refactor into its own PR.

There is still the question about the complex test that @rossbar wrote where I'm not sure what was written was what was intended.

Carreau · 2023-10-02T12:54:11Z

There is still the question about the complex test that @rossbar wrote where I'm not sure what was written was what was intended.

Updated comments following discussion.

Carreau · 2023-10-17T14:23:23Z

Going through my list of opened PRs, Just checking if there more I can do to help.

I can pull the test refactor in a separate PRs to make this more digestible/less contentious.

Carreau · 2023-10-26T10:02:14Z

Gentle nudge again, as this was relatively painful to rebase on master after the recent changes.

I can again pull the test refactor into its own PR if that makes it easier to get accepted.

ngoldbaum · 2023-10-26T15:21:25Z

It looks like some of the new tests are failing.

I believe a separate PR that just implements the refactoring and expands the tests would have an easier time getting merged.

I don't have a strong opinion here on the performance changes. I personally don't think the overhead of checking object identity first is big enough to worry about slowing down non-identity cases but I suspect that's why people are a little leery. A new set of benchmark runs with the latest version of the PR might help as well?

If you'd like, stop by the next NumPy triage zoom meeting on Wednesday at 1:00 PM US Eastern time and you'll have more people's attention.

This is a pure-test refactor (as well as adding a few cases), extracted from numpy#24663. Testing each individually has a small Pytest overhead, but has the advantage of each test being able to fail independently in case of a modification of the tested function. It also make it much easier to add a new test, and I believe is much more readable.

Carreau · 2023-10-26T15:52:32Z

It looks like some of the new tests are failing.

Yeah, I've botched the merge/rebase. Redoing now.

If you'd like, stop by the next NumPy triage zoom meeting on Wednesday at 1:00 PM US Eastern time and you'll have more people's attention.

I'll try, but Those times are usually touch for me here, and today I'll can't for sure.

I've pulled out the test refactor into its own PR #25010

This is a pure-test refactor (as well as adding a few cases), extracted from numpy#24663. Testing each individually has a small Pytest overhead, but has the advantage of each test being able to fail independently in case of a modification of the tested function. It also make it much easier to add a new test, and I believe is much more readable. [skip cirrus] [skip circle] [skip azp]

For various reason I was looking at array_equal and I believe there are a few case where we can get some speedup/less memory allocation. - if `equal_nan == True` then could we add a shortcut, we can sometime check that both array are the same and avoid the computation. - if the dtypes cannot contain `nan`, then we do not need the expensive fancy indexing and removal of `nan`s we can test will the following script ``` import numpy as np base = np.random.randint(1, 100, (10000, 10000), dtype='int64') a = base.copy() a1 = base.copy() b = base.copy().astype('float') b[b == 50] = np.nan b1 = b.copy() data = {'a':a, 'b':b, 'a1':a1, 'b1': b1} for (x, y) in [('a','a'),('a','a1'),('b','b'),('b','b1')]: ax = data[x] bx = data[y] for en in [True, False]: res = np.array_equal(ax, bx, equal_nan=en) print(f"np.array_equal({x}, {y}, equal_nan={en}) == {res}") %timeit -r10 -o np.array_equal(ax, bx, equal_nan=en) ``` Which give us the following results. ``` --- before.txt 2023-09-07 15:19:01 +++ after.txt 2023-09-07 15:24:30 @@ -1,16 +1,16 @@ np.array_equal(a, a, equal_nan=True) == True -419 ms ± 42.4 ms per loop (mean ± std. dev. of 10 runs, 1 loop each) +1.41 µs ± 5.37 ns per loop (mean ± std. dev. of 10 runs, 1,000,000 loops each) np.array_equal(a, a, equal_nan=False) == True -31.1 ms ± 200 µs per loop (mean ± std. dev. of 10 runs, 10 loops each) +31 ms ± 348 µs per loop (mean ± std. dev. of 10 runs, 10 loops each) np.array_equal(a, a1, equal_nan=True) == True -1.07 s ± 326 ms per loop (mean ± std. dev. of 10 runs, 1 loop each) +40.3 ms ± 133 µs per loop (mean ± std. dev. of 10 runs, 10 loops each) np.array_equal(a, a1, equal_nan=False) == True -41.9 ms ± 130 µs per loop (mean ± std. dev. of 10 runs, 10 loops each) +40.3 ms ± 210 µs per loop (mean ± std. dev. of 10 runs, 10 loops each) np.array_equal(b, b, equal_nan=True) == True -569 ms ± 61.7 ms per loop (mean ± std. dev. of 10 runs, 1 loop each) +909 ns ± 1.61 ns per loop (mean ± std. dev. of 10 runs, 1,000,000 loops each) np.array_equal(b, b, equal_nan=False) == False -31.2 ms ± 208 µs per loop (mean ± std. dev. of 10 runs, 10 loops each) +30.2 ms ± 228 µs per loop (mean ± std. dev. of 10 runs, 10 loops each) np.array_equal(b, b1, equal_nan=True) == True -842 ms ± 171 ms per loop (mean ± std. dev. of 10 runs, 1 loop each) +461 ms ± 3.07 ms per loop (mean ± std. dev. of 10 runs, 1 loop each) np.array_equal(b, b1, equal_nan=False) == False -40.4 ms ± 383 µs per loop (mean ± std. dev. of 10 runs, 10 loops each) +39.2 ms ± 234 µs per loop (mean ± std. dev. of 10 runs, 10 loops each) ```

seberg

I don't love these special cases, but I think they are fine.

Just nitpicks, I don't care what you do with them, although I think I would prefer to use the nt.<type> approach at least, because the global is just awkward.

seberg · 2023-10-27T14:34:04Z

numpy/_core/numeric.py

+    cannot_have_nan = _dtype_cannot_hold_nan(a1.dtype)\
+            and _dtype_cannot_hold_nan(a2.dtype)


Suggested change

cannot_have_nan = _dtype_cannot_hold_nan(a1.dtype)\

and _dtype_cannot_hold_nan(a2.dtype)

cannot_have_nan = (_dtype_cannot_hold_nan(a1.dtype)

and _dtype_cannot_hold_nan(a2.dtype)

Just avoid the \ because I hate it :).

seberg · 2023-10-27T14:40:12Z

numpy/_core/numeric.py

+def _dtype_cannot_hold_nan(dtype):
+    global _no_nan_types
+    if _no_nan_types is None:
+        _no_nan_types = {np.bool_, np.int8, np.int16, np.int32, np.int64}


Could just use import .numerictypes as nt. The problem is that things are wonky while we are setting things up.
I might slightly prefer (but verbose) to write:
_no_nan_dtypes = {np.dtypes.BoolDType, np.dtypes.Int8DType, ...}
and then compare type(dtype) rather than dtype.type. We could put the _no_nan_dtypes into np.dtypes also.

I do not want to consider this public, though. Because the right API should be different (maybe a np.dtypes.NoNaNDType.register(MyDType),

Yeah, I agree that the API should not be public, at least for now. And, numerictypes seem great !

seberg · 2023-10-27T14:43:09Z

numpy/_core/numeric.py

@@ -2505,7 +2525,7 @@ def array_equiv(a1, a2):
    except Exception:
        return False

-    return bool(asarray(a1 == a2).all())
+    return bool((a1 == a2).all())


Nice! this does seem OK to remove, because we guarantee that a == b returns a NumPy boolean array/scalar. So we don't run into problems even for object dtype!

I can also send that as it's own PR if you wish.

seberg · 2023-10-27T14:43:25Z

numpy/core/numeric.py

-    ret = getattr(numeric, attr_name, None)
-    if ret is None and attr_name != "newaxis":
+
+    sentinel = object()


Not really this PR, but I don't think I care too much.

If I don't do this, test fails because _no_nan_dtypes is None.

Carreau · 2023-10-30T11:07:53Z

Here are some redone bench test with 1000x1000 random array:

It's only a guideline as the larger the arrays are the more the effect will be felt.

(Source to replicate : https://gist.github.com/Carreau/baecfc0b02773bea6a744935e0dc9d1b)

CI seem to have a problem with np.dtypes, but it works locally, which is weird.

seberg · 2023-10-30T11:28:15Z

numpy/_core/numeric.py

+    np.dtypes.Int16DType,
+    np.dtypes.Int32DType,
+    np.dtypes.Int64DType,
+}


Strange that I cannot repro locally, but the try to import modules lazily seems to make things barf. Importing dtypes lazily is useless (The C side has to already import it to fill it!), but I guess explicitly importing here will work...

EDIT: ahh, it is when reloading, in which case the C code isn't run again and dtypes is never explicitly imported until here.

It is actually a bit annoying, because np.longlong or np.cint could fall through the cracks (right now), but will ignore that.

Carreau · 2023-11-06T10:51:42Z

I'm not quite sure why arm64 is failing, can you restart it ? Or I can merge main/rebase

seberg

Sorry forgot about this a bit. The np.dtypes namespace population hates reloading, so let's use the ugly way to get the same thing. The in check needs to work on types, so either dtype.type or type(dtype) (depending on what is used).

numpy/_core/numeric.py

seberg · 2023-11-07T17:30:29Z

Thanks @Carreau passing now, and my last set of changes were pretty small.

Carreau · 2023-11-07T17:32:32Z

🥳 🎊 Thanks !

github-actions bot added the 01 - Enhancement label Sep 7, 2023

Carreau force-pushed the faster_np_array_equal branch 3 times, most recently from c3b453a to 359e22c Compare September 8, 2023 13:12

eendebakpt reviewed Sep 9, 2023

View reviewed changes

numpy/core/numeric.py Outdated Show resolved Hide resolved

Carreau force-pushed the faster_np_array_equal branch 2 times, most recently from 1b6ffb4 to 48217cc Compare September 10, 2023 08:23

Carreau commented Sep 11, 2023

View reviewed changes

numpy/core/tests/test_numeric.py Outdated Show resolved Hide resolved

Carreau marked this pull request as ready for review September 20, 2023 08:55

eendebakpt reviewed Sep 20, 2023

View reviewed changes

eendebakpt reviewed Sep 21, 2023

View reviewed changes

numpy/core/numeric.py Outdated Show resolved Hide resolved

Carreau force-pushed the faster_np_array_equal branch 2 times, most recently from ebef6dd to 7616f57 Compare October 26, 2023 10:01

Carreau mentioned this pull request Oct 26, 2023

MAINT: test-refactor of numpy/_core/numeric.py #25010

Merged

Carreau force-pushed the faster_np_array_equal branch from 7616f57 to aebd2d7 Compare October 26, 2023 15:52

Carreau force-pushed the faster_np_array_equal branch from aebd2d7 to 035d92e Compare October 26, 2023 16:00

Carreau added 2 commits October 27, 2023 13:52

fix shadowing of None objects

c9933a5

Carreau force-pushed the faster_np_array_equal branch from 04650f7 to c9933a5 Compare October 27, 2023 11:55

seberg approved these changes Oct 27, 2023

View reviewed changes

review suggestions

32ed43d

try numerictype types

05e360e

seberg reviewed Oct 30, 2023

View reviewed changes

try reimport

5fdfe0c

seberg reviewed Nov 7, 2023

View reviewed changes

numpy/_core/numeric.py Outdated Show resolved Hide resolved

numpy/_core/numeric.py Outdated Show resolved Hide resolved

numpy/_core/numeric.py Outdated Show resolved Hide resolved

Apply suggestions from code review

7eddde2

seberg reviewed Nov 7, 2023

View reviewed changes

numpy/_core/numeric.py Outdated Show resolved Hide resolved

Update numpy/_core/numeric.py

f728bb1

seberg merged commit 7ff7ec7 into numpy:main Nov 7, 2023
60 checks passed

bouweandela mentioned this pull request Jan 9, 2024

Add dask.array.array_equal dask/dask#10740

Open

3 tasks

		def _dtype_cannot_hold_nan(dtype):
		return dtype in {np.int8, np.int16, np.int32, np.int64}

		cannot_have_nan = _dtype_cannot_hold_nan(a1.dtype)\
		and _dtype_cannot_hold_nan(a2.dtype)

ENH: optimisation of array_equal #24663

ENH: optimisation of array_equal #24663

Conversation

Carreau commented Sep 7, 2023 • edited Loading

seberg commented Sep 7, 2023

Carreau commented Sep 7, 2023

seberg commented Sep 7, 2023

seberg commented Sep 7, 2023 • edited Loading

Carreau commented Sep 8, 2023

Carreau commented Sep 10, 2023

Carreau commented Sep 11, 2023

Carreau commented Sep 11, 2023 • edited Loading

ngoldbaum commented Sep 11, 2023

Carreau commented Sep 20, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Carreau commented Sep 21, 2023

Carreau commented Sep 21, 2023

Carreau commented Oct 2, 2023

Carreau commented Oct 2, 2023

Carreau commented Oct 17, 2023

Carreau commented Oct 26, 2023

ngoldbaum commented Oct 26, 2023

Carreau commented Oct 26, 2023

seberg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Carreau commented Oct 30, 2023

seberg Oct 30, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Carreau commented Nov 6, 2023

seberg left a comment

Choose a reason for hiding this comment

seberg commented Nov 7, 2023

Carreau commented Nov 7, 2023

Carreau commented Sep 7, 2023 •

edited

Loading

seberg commented Sep 7, 2023 •

edited

Loading

Carreau commented Sep 11, 2023 •

edited

Loading

seberg Oct 30, 2023 •

edited

Loading