Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH/BUG: optimisations and fixes for np.unique with nan values #23288

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

jiwaszki
Copy link

BUG: fix for np.unique with nan values (See #23286)

Fix np.unique which returns multiple NaNs despite setting equal_nan=True when axis!=None.

According to on-going discussion in #23286 - there should be a way to fix this behaviour using new function/ufunc. Let's treat this PR as a hot-fix.

Brief of changes:

  • Scan over incoming input array with (np.isnan(ar)).any() to determine if there are any NaN values. If not, redirect function to "fallback" in if-else body.
  • Keep previous optimization that was relying on axis=None and sorting properties. Leave comments for future devs.
  • Allow dtypes of kind V (np.void) to be processed in _unique1d().
  • Added tests covering problematic multiple-NaN cases.
  • Refactored the benchmark and parametrised it to cover 1D/2D cases with the axis set to None/0.
    Usage: python runtests.py --bench-compare main bench_lib.Unique
  • Perf number are really close to each other and tests are passing once again. Only significant differences are present in cases with axis=0+ (which is currently a price for working "NaN-detection").

Main branch:

[ 75.00%] · For numpy commit e141d907 <main> (round 2/2):
[ 87.50%] ··· bench_lib.Unique.time_unique_1D                                                                                                                                                                    ok
[ 87.50%] ··· ============ ============= ============ ============ ============ ============= ============ ============= ============ ============= ============
              --                                                                    percent_nans / axis                                                         
              ------------ -------------------------------------------------------------------------------------------------------------------------------------
               array_size     0 / None      0 / 0      0.1 / None    0.1 / 0      2.0 / None    2.0 / 0     50.0 / None    50.0 / 0    90.0 / None    90.0 / 0  
              ============ ============= ============ ============ ============ ============= ============ ============= ============ ============= ============
                   64       3.31±0.08μs   26.0±0.8μs   3.57±0.7μs   26.2±0.6μs   3.25±0.03μs   25.7±0.6μs    5.21±0.1μs   23.7±0.4μs   5.16±0.09μs   19.8±0.2μs 
                  300       4.96±0.05μs    81.4±3μs    5.06±0.1μs    83.6±2μs     7.13±0.2μs    88.8±2μs     7.15±0.1μs    79.3±2μs     6.44±0.2μs   60.3±0.9μs 
                  512        7.32±0.2μs    161±1μs     9.28±0.5μs    151±2μs     8.85±0.08μs    146±5μs     8.11±0.02μs    127±2μs     7.42±0.09μs    92.9±1μs  
              ============ ============= ============ ============ ============ ============= ============ ============= ============ ============= ============

This PR:

[ 50.00%] · For numpy commit 7ef3abf3 <jiwaszki/np_unique_nan_bug> (round 2/2):
[ 62.50%] ··· bench_lib.Unique.time_unique_1D                                                                                                                                                                    ok
[ 62.50%] ··· ============ ============= ============ ============ ============ ============ ============ ============= ========== ============= ===========
              --                                                                  percent_nans / axis                                                       
              ------------ ---------------------------------------------------------------------------------------------------------------------------------
               array_size     0 / None      0 / 0      0.1 / None    0.1 / 0     2.0 / None    2.0 / 0     50.0 / None   50.0 / 0   90.0 / None    90.0 / 0 
              ============ ============= ============ ============ ============ ============ ============ ============= ========== ============= ===========
                   64        3.46±0.5μs    27.5±2μs    3.45±0.1μs   27.8±0.8μs   3.42±0.1μs   27.3±0.5μs    5.01±0.1μs   36.4±7μs   5.19±0.03μs    25.0±1μs 
                  300       5.15±0.06μs   83.0±0.9μs    5.57±3μs     93.4±3μs    8.02±0.3μs    104±30μs     7.26±0.8μs   86.6±3μs    6.17±0.2μs   84.2±20μs 
                  512        7.75±0.2μs    154±2μs     8.94±0.6μs    152±8μs     8.98±0.5μs    158±6μs      8.60±0.1μs   134±7μs     7.28±0.5μs    99.2±3μs 
              ============ ============= ============ ============ ============ ============ ============ ============= ========== ============= ===========

TODO:
There is another one "early return" case which I am considering. If the input array is already 1D, axis value is not affecting the output shape as it can be only set to either None or 0, axis is swaping with itself:

>>> a = np.array([0., np.nan, 2., np.nan, 2., 1., 0., 1., 2., 0.])
>>> np.unique(a, equal_nan=True, axis=None)
array([ 0.,  1.,  2., nan])
>>> a = np.array([0., np.nan, 2., np.nan, 2., 1., 0., 1., 2., 0.])
>>> np.unique(a, equal_nan=True, axis=0)
array([ 0.,  1.,  2., nan])

@MatteoRaso
Copy link
Contributor

Can you add a test to see if this also works for masked arrays? If possible, it'd be nice if this PR can close #23281 as well.

@jiwaszki
Copy link
Author

Can you add a test to see if this also works for masked arrays? If possible, it'd be nice if this PR can close #23281 as well.

Sure thing! But first, regarding the problems with axis argument. Currently, it seems that the function is broken when combining axis with anything. Example:

>>> import numpy as np
>>> import numpy.ma as ma
>>> a = ma.array([1,2,3,4,2,3,1,4], mask=[0,0,1,1,0,0,0,1])
>>> 
>>> out = np.unique(a, return_index=True)
>>> print(type(out), out)
<class 'tuple'> (masked_array(data=[1, 2, 3, --],
             mask=[False, False, False,  True],
       fill_value=999999), array([0, 1, 5, 2]))
>>> 
>>> out = np.unique(a, axis=0, return_index=True)
>>> print(type(out), out)
<class 'tuple'> (array([1, 2, 3, 4]), array([0, 1, 2, 3]))

It will take me some extra time to figure out how to fix it and keep performance "as-is".

Thanks for the heads up!

@seberg
Copy link
Member

seberg commented Mar 6, 2023

I have my doubts that this works. The NaN can be only in one place (i.e. the first), in which case it isn't even sorted to the end (np.unique([[np.nan, 0], [1, 2]], axis=0), the value of axis doesn't really matter).

It is indeed interesting whether this helps the masked array code paths, but likely the masked array fix lies not here but must be in np.ma itself (it is fine if np.unique does the wrong thing when given a masked array).

@jiwaszki
Copy link
Author

@seberg you are right! I am back with the another proposition. Some highlights:

  • Highly efficient solution for: 2-D not structured dtypes and the additional finding for 1-D case.
  • Fix all test cases and added new ones.

I was scratching my head a lot, this "nan issue" seemed to be unsolvable in any (efficient) way. I have found this interesting solution by @rjeb #20896 , it was also reviewed by you. It was a game-changer for me.

Fixing the bug:

Generally, I applied the similar idea. For structured types it goes down recursively to check every element correctly. The basic building block "checking of nans" is done like this:

a != b and not(np.isnan(a) and np.isnan(b))

@rjeb solution was not perfect in "nested" scenario. Now it works flawlessly in all existing test cases. For structured types it might not be the most performant implementation... but hey! It works, looks clean and solves "nan issue":)

Optimising solution:

While working on this case I have found two really great optimisations to apply. I put a lot of comments for future devs. I think this is a quite advanced implementation and it will be better to look at the code at the same time.

Based on some properties (like dtype, array shape and axis) there are cases which allow to simplify required computations. "Consolidating" of the array is always making it 2-D structured-like ndarray. This is not good for the performance, thus if given function input:

  • array is 1-D and axis is 0. "Consolidation" was making it 2-D with single entry on each row, then flattening and apply all computations... why bother with it? If array is 1-D and axis=0, this is the same as passing it with axis=None. It will be moving axes 0-to-0 and flattened eventually. This saves a lot of time! It basically matches the axis=None case.
  • array is 2-D and is not structured ("simple dtype"). If this is fulfilled, proceed with optimised version of _unique1D called _unique2D. New function have same "preprocessing step" to have axes switched and correctly sorted. The whole trick is to reconstruct the array to access optimised numpy call on slices. Now the cool part, with this optimisation the bottle-neck of unique call is moved to flattening/sorting of the array and previous check/looping can be rewritten to simple:
_nans = np.isnan(aux)
mask[1:] = np.any((aux[1:] != aux[:-1]) & ~(_nans[1:] & _nans[:-1]), axis=1)

Let numbers speak for themselves:

       before           after         ratio
     [e141d907]       [7e7cd9f5]
     <main>           <jiwaszki/np_unique_nan_bug>
# These three are not significant at all, right? 
+     2.89±0.01μs      3.11±0.03μs     1.07  bench_lib.Unique.time_unique_1D(64, 0.1, None)
+     2.88±0.02μs      3.08±0.01μs     1.07  bench_lib.Unique.time_unique_1D(64, 0, None)
+     2.93±0.01μs      3.08±0.02μs     1.05  bench_lib.Unique.time_unique_1D(64, 2.0, None)
# 2-D optimisation effects:
-         157±3μs          143±4μs     0.91  bench_lib.Unique.time_unique_2D(64, 0, None)
-         111±6ms         70.2±9ms     0.63  bench_lib.Unique.time_unique_2D(2048, 90.0, 0)
-         108±7ms         64.6±8ms     0.60  bench_lib.Unique.time_unique_2D(2048, 2.0, 0)
-     1.35±0.01ms          811±2μs     0.60  bench_lib.Unique.time_unique_2D(300, 90.0, 0)
-      21.2±0.2ms       12.5±0.3ms     0.59  bench_lib.Unique.time_unique_2D(1024, 90.0, 0)
-     4.98±0.04ms      2.90±0.02ms     0.58  bench_lib.Unique.time_unique_2D(512, 90.0, 0)
-         121±6ms         69.3±5ms     0.57  bench_lib.Unique.time_unique_2D(2048, 0.1, 0)
-         183±1μs          103±1μs     0.56  bench_lib.Unique.time_unique_2D(64, 90.0, 0)
-      19.8±0.1ms      11.2±0.09ms     0.56  bench_lib.Unique.time_unique_2D(1024, 2.0, 0)
-      20.0±0.3ms      11.2±0.09ms     0.56  bench_lib.Unique.time_unique_2D(1024, 0, 0)
-      20.2±0.2ms      11.2±0.03ms     0.56  bench_lib.Unique.time_unique_2D(1024, 50.0, 0)
-     19.9±0.07ms       11.0±0.2ms     0.55  bench_lib.Unique.time_unique_2D(1024, 0.1, 0)
-     4.35±0.03ms       2.34±0.2ms     0.54  bench_lib.Unique.time_unique_2D(512, 2.0, 0)
-     4.46±0.05ms      2.38±0.02ms     0.53  bench_lib.Unique.time_unique_2D(512, 50.0, 0)
-     4.31±0.01ms      2.30±0.05ms     0.53  bench_lib.Unique.time_unique_2D(512, 0.1, 0)
-     4.35±0.04ms      2.29±0.01ms     0.53  bench_lib.Unique.time_unique_2D(512, 0, 0)
-         125±9ms         64.1±5ms     0.51  bench_lib.Unique.time_unique_2D(2048, 50.0, 0)
-     1.10±0.01ms        562±0.9μs     0.51  bench_lib.Unique.time_unique_2D(300, 50.0, 0)
-        1.07±0ms          537±5μs     0.50  bench_lib.Unique.time_unique_2D(300, 2.0, 0)
-         119±2ms         59.1±3ms     0.49  bench_lib.Unique.time_unique_2D(2048, 0, 0)
-        1.07±0ms          528±6μs     0.49  bench_lib.Unique.time_unique_2D(300, 0.1, 0)
-        1.07±0ms          525±1μs     0.49  bench_lib.Unique.time_unique_2D(300, 0, 0)
-         151±2μs         71.3±1μs     0.47  bench_lib.Unique.time_unique_2D(64, 50.0, 0)
-         145±2μs       65.9±0.4μs     0.46  bench_lib.Unique.time_unique_2D(64, 2.0, 0)
-       144±0.5μs       65.3±0.4μs     0.45  bench_lib.Unique.time_unique_2D(64, 0.1, 0)
-       145±0.4μs       65.6±0.6μs     0.45  bench_lib.Unique.time_unique_2D(64, 0, 0)
# 1-D optimisation effects:
-      17.5±0.5μs      4.44±0.02μs     0.25  bench_lib.Unique.time_unique_1D(64, 90.0, 0)
-      20.6±0.3μs      4.52±0.02μs     0.22  bench_lib.Unique.time_unique_1D(64, 50.0, 0)
-      22.5±0.4μs      3.11±0.03μs     0.14  bench_lib.Unique.time_unique_1D(64, 2.0, 0)
-      22.4±0.5μs      3.09±0.01μs     0.14  bench_lib.Unique.time_unique_1D(64, 0, 0)
-      22.5±0.3μs      3.10±0.02μs     0.14  bench_lib.Unique.time_unique_1D(64, 0.1, 0)
-        52.9±2μs      5.68±0.01μs     0.11  bench_lib.Unique.time_unique_1D(300, 90.0, 0)
-        69.9±2μs      6.22±0.03μs     0.09  bench_lib.Unique.time_unique_1D(300, 50.0, 0)
-      77.4±0.2μs      6.46±0.02μs     0.08  bench_lib.Unique.time_unique_1D(300, 2.0, 0)
-      81.4±0.9μs      6.70±0.03μs     0.08  bench_lib.Unique.time_unique_1D(512, 90.0, 0)
-       112±0.3μs      7.46±0.05μs     0.07  bench_lib.Unique.time_unique_1D(512, 50.0, 0)
-        72.5±3μs      4.68±0.02μs     0.06  bench_lib.Unique.time_unique_1D(300, 0, 0)
-      73.0±0.5μs      4.69±0.02μs     0.06  bench_lib.Unique.time_unique_1D(300, 0.1, 0)
-       131±0.8μs      8.17±0.03μs     0.06  bench_lib.Unique.time_unique_1D(512, 0.1, 0)
-       131±0.7μs       8.15±0.2μs     0.06  bench_lib.Unique.time_unique_1D(512, 2.0, 0)
-         593±3μs       33.5±0.8μs     0.06  bench_lib.Unique.time_unique_1D(2048, 2.0, 0)
-       171±0.4μs      9.62±0.04μs     0.06  bench_lib.Unique.time_unique_1D(1024, 90.0, 0)
-         139±4μs      6.71±0.01μs     0.05  bench_lib.Unique.time_unique_1D(512, 0, 0)
-         234±1μs      11.0±0.08μs     0.05  bench_lib.Unique.time_unique_1D(1024, 50.0, 0)
-         273±5μs      12.1±0.01μs     0.04  bench_lib.Unique.time_unique_1D(1024, 0.1, 0)
-         273±5μs       12.0±0.1μs     0.04  bench_lib.Unique.time_unique_1D(1024, 2.0, 0)
-         377±6μs      16.2±0.06μs     0.04  bench_lib.Unique.time_unique_1D(2048, 90.0, 0)
-         500±2μs         21.3±3μs     0.04  bench_lib.Unique.time_unique_1D(2048, 50.0, 0)
-         579±3μs       23.4±0.7μs     0.04  bench_lib.Unique.time_unique_1D(2048, 0.1, 0)
-         275±3μs       10.7±0.2μs     0.04  bench_lib.Unique.time_unique_1D(1024, 0, 0)
-         602±3μs       22.9±0.8μs     0.04  bench_lib.Unique.time_unique_1D(2048, 0, 0)

Thanks in advance for another review!

@jiwaszki jiwaszki changed the title BUG: fix for np.unique with nan values ENH/BUG: optimisations and fixes for np.unique with nan values Mar 15, 2023
@jiwaszki
Copy link
Author

@seberg ping

@jiwaszki
Copy link
Author

@seberg could you review it again? Thanks!

@jiwaszki jiwaszki force-pushed the jiwaszki/np_unique_nan_bug branch from d9c2a7b to 67f12f3 Compare May 15, 2023 19:26
Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite complicated, to make it easier for me to get started: do I get it right that the approach is to use the structured dtype trick for sorting only?
If we undo that view immediately after sorting, we then apply the same NaN sanitizing logic (just with the right .any()/.all() calls to account for the extra axis sprinkled in?

In that case, I suspect you can simplify the branching a bit, just view the old dtype (if it was already structured, no harm done).

(I almost wonder if np.lexsort might not be as well for that sorting trick.)

# Then this optimization might be used to speed up unique computation along
# axis equal to either 0 or 1.
if (orig_dim == 2 and len(orig_dtype) < 2 and
all(i > 0 for i in orig_shape)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why orig_ndim matters. Isn't the array always flattened along all but one dim?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Awaiting a code review
Development

Successfully merging this pull request may close these issues.

None yet

3 participants