Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add np.in1d, np.isin, np.setxor1d, np.setdiff1d, extend np.intersect1d. #9338

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

jaredjeya
Copy link

@jaredjeya jaredjeya commented Dec 7, 2023

  • Adds support for the functions np.in1d, np.isin, np.setxor1d, and np.setdiff1d, all including optional argument assume_unique as well as invert for np.in1d and np.isin.

  • Extends np.intersect1d to support the first optional argument (assume_unique).

Resolves #4677. Improves on #4074.

I have written unit tests for all the new functions and functionality, and applied flake8 to my code.

The only question I have is how to handle the optional argument kind in np.in1d: I've implemented the kind="sort" algorithm, but this is not the default behaviour of numpy (which picks an algorithm based on performance heuristics), so I've made it explicit that only kind="sort" is supported.

@jaredjeya
Copy link
Author

I’ve just now seen #4815, possibly some of the code could be ported across from there too.

@jaredjeya jaredjeya mentioned this pull request Dec 8, 2023
4 tasks
@jaredjeya jaredjeya marked this pull request as draft December 8, 2023 12:10
…gorithm for np.in1d with short second inputs.
@jaredjeya jaredjeya marked this pull request as ready for review December 8, 2023 13:39
@jaredjeya
Copy link
Author

jaredjeya commented Dec 8, 2023

I'm not sure why those tests are failing all of a sudden - it seems to be at the flake8 step? But the errors flake8 is highlighting are unrelated to code I've modified and it didn't fail in the earlier test.

Edit: nvm, spotted #9342.

@stuartarchibald
Copy link
Contributor

@jaredjeya feel free to ignore the flake8 failures until #9342 is merged later today (a new version of flake8 was released and it picked up a couple of new issues). Thanks.

@jaredjeya jaredjeya changed the title Add np.in1d, np.setxor1d, np.setdiff1d, extend np.intersect1d. Add np.in1d, np.isin, np.setxor1d, np.setdiff1d, extend np.intersect1d. Dec 8, 2023
@jaredjeya
Copy link
Author

@jaredjeya feel free to ignore the flake8 failures until #9342 is merged later today (a new version of flake8 was released and it picked up a couple of new issues). Thanks.

Now it’s been merged, how do I get the failed tests to run again? Is there anything I need to do to keep the new code up-to-date with the main branch? (Sorry, this is my first ever PR to a project in active development!)

@gmarkall
Copy link
Member

/azp run

Copy link
Contributor

Azure Pipelines successfully started running 1 pipeline(s).

@gmarkall
Copy link
Member

@jaredjeya Many thanks for the PR! I'm tentatively moving it to the 0.60 milestone - at the moment we're working on the 0.59 release, and the 0.60 release will be focused around supporting NumPy 2.0, so it may be some time before it's possible to review this PR.

Copy link
Collaborator

@guilhermeleobas guilhermeleobas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @jaredjeya. Code seems fine as it is almost a 1:1 port of what NumPy does. Just have some minor comments that should be easy to address.

numba/np/arraymath.py Outdated Show resolved Hide resolved
numba/np/arraymath.py Outdated Show resolved Hide resolved
numba/np/arraymath.py Outdated Show resolved Hide resolved
numba/np/arraymath.py Outdated Show resolved Hide resolved
numba/np/arraymath.py Outdated Show resolved Hide resolved
@guilhermeleobas guilhermeleobas added 4 - Waiting on author Waiting for author to respond to review Effort - medium Medium size effort needed and removed 3 - Ready for Review labels Jan 12, 2024
@guilhermeleobas
Copy link
Collaborator

@jaredjeya, could you include the tests from NumPy for the set of functions added in this PR?
https://github.com/numpy/numpy/blob/b0371ef240560e78b651a5d7c9407ae3212a3d56/numpy/lib/tests/test_arraysetops.py#L15

…owed from numpy's own testing.

Also fixed some bugs in the tests (including that lists were never tested).
@jaredjeya
Copy link
Author

So what I've done here is add the numpy test cases to the tests I'd already written, rather than writing new tests entirely. Also fixed that I'd been casting all the inputs to arrays before testing them, so I wasn't testing the functionality with lists.

Otherwise I also added a check to catch at compile time if someone calls these functions with incorrect literal values for kind, but I don't enforce that it's a compile time constant. I also changed some of the error messages for readability and consistency.

@jaredjeya
Copy link
Author

I forgot to add the attribution for test_setops_manyways, which should read:

# https://github.com/numpy/numpy/blob/b0371ef240560e78b651a5d7c9407ae3212a3d56/numpy/lib/tests/test_arraysetops.py#L588 # noqa: E501

Since it's just a comment is there a way to add that in without re-running all the tests?

numba/np/arraymath.py Outdated Show resolved Hide resolved
@@ -4982,20 +4981,23 @@ def np_in1d_impl(ar1, ar2, assume_unique=False, invert=False, kind="sort"):
def jit_np_isin(element, test_elements, assume_unique=False, invert=False,
kind="sort"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value for kind should be None, not "sort".

The following should work on your branch:

import numpy as np
from numba import njit

@njit
def foo():
    return np.isin(3, 4, kind=None)

print(foo())

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I've done it like I did is because I didn't implement the numpy kind="table" method. (The output is equivalent, but I suppose the performance is higher in certain cases with the table method (I haven't tested)). When kind=None, the numpy code will use the table method if certain conditions are met, otherwise the sorting method, so we don't support that either -- strictly, we only support kind="sort". So I thought np.isin(3, 4, kind=None) should fail.

If you think it should work I'm happy to change it, or otherwise we just say we don't support the "kind" argument and leave it at that. Or, I could even implement the "table" method, it isn't that much more complicated than the sort method, and then we fully support the "kind" argument.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I think it is best to match as close as possible the NumPy signature but keep a check internally for kind == "sort".

numba/np/arraymath.py Outdated Show resolved Hide resolved
@jaredjeya
Copy link
Author

jaredjeya commented Jan 22, 2024

I decided to add some tests to make sure the correct errors result when nonsense inputs are supplied, including e.g. literals like kind="foo". However I can't for the life of me figure out how to get numba to actually compile these as literals.

This is the full type-checking code within the @overload function:

if kind is not None:
    if (isinstance(kind, types.StringLiteral)
            and kind.literal_value != "sort"):
        raise NumbaValueError('isin: Only kind="sort" is supported')
    elif not isinstance(kind, (types.UnicodeType, str, types.NoneType)):
        raise TypingError('isin: Argument "kind" must be a string or None')

this is the code in the actual implementation,

if kind is not None and kind != "sort":
    raise ValueError('isin: Only kind="sort" is supported')

and this is the testing code:

@njit()
def table_isin(a, b):
    return np.isin(a, b, kind="table")
with self.assertRaises(NumbaValueError):
    table_isin(a, b)

I also tried kind=literal("table") and kind=literally("table"). But the code always raises a ValueError at runtime rather than a NumbaValueError at compilation time, and if I insert debug print statements, I see repr(kind) == unicode_type and type(kind) == <class 'numba.core.types.misc.UnicodeType'>.

What am I doing wrong? Other functions e.g. np.searchsorted use identical patterns in their tests.

@jaredjeya
Copy link
Author

Nevermind, I sorted it

Comment on lines 5004 to 5005
if kind is not None and kind != "sort":
raise ValueError('isin: Only kind="sort" is supported')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If kind=None, NumPy will use table by default, which is not supported in this PR. Change the clause to something like this:

Suggested change
if kind is not None and kind != "sort":
raise ValueError('isin: Only kind="sort" is supported')
if kind != "sort":
raise ValueError('isin: Only kind="sort" is supported')

This will force a user to spell the kind="sort" keyword argument and not assume Numba supports the table option.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If kind=None, NumPy will use table by default, which is not supported in this PR.

Yes, that is why I didn't implement kind=None in the first place. I'll change it back.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function signature needs to be with kind=None as it is what NumPy specifies. But let's keep a check to only run the function if the user specifies kind="sort".

Let me know if you have any questions regarding this.

Comment on lines 4992 to 4998
if not (kind is None or isinstance(kind, types.NoneType)):
kindval = getattr(kind, "literal_value", kind)
if not isinstance(kindval,
(types.UnicodeType, str, types.StringLiteral)):
raise TypingError('in1d: Argument "kind" must be a string or None')
elif kindval != "sort":
raise NumbaValueError('in1d: Only kind="sort" is supported')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine, but we can keep only the ValueError to simplify the codebase.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, I should get rid of this whole section?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I've reduced it to just,

if not isinstance(kind, (types.UnicodeType, str, types.StringLiteral)):
    raise TypingError('in1d: Only kind="sort" is supported')

which makes sure kind=None doesn't make it past compilation. It seemed to be causing some type inference issues if I didn't have that.

numba/np/arraymath.py Outdated Show resolved Hide resolved
numba/np/arraymath.py Outdated Show resolved Hide resolved
@jaredjeya
Copy link
Author

jaredjeya commented Feb 14, 2024

@guilhermeleobas Tests seem to be failing now because numpy < 1.24 doesn't support the kind keyword, and because we now force this to be included explicitly, the numpy and numba behaviours diverge. What do you suggest?

According to the release notes, pre-1.24, the kind="sort" behaviour was the default. Newer versions automatically pick the optimal algorithm.

It might not be too hard to implement kind="table", but it would take me a while as I'm quite busy atm. Otherwise we can drop the keyword entirely and say we're only replicating pre-1.24 behaviour? The documentation suggests using a version gate but I don't know what that means in this context.

@guilhermeleobas
Copy link
Collaborator

According to the release notes, pre-1.24, the kind="sort" behaviour was the default. Newer versions automatically pick the optimal algorithm.

I think we can have an if-else statement to support the kind argument if NumPy version if greater or equal 1.24:

if numpy_version < (1, 24):
    @overload(np.isin)
    def jit_np_isin(element, test_elements, assume_unique=False, invert=False):
         ....

else:
    @overload(np.isin)
    def jit_np_isin(element, test_elements, assume_unique=False, invert=False,
                    kind=None):
        ...

You may have to update the tests as well to only pass kind="sort" if NumPy >= 1.24

Otherwise we can drop the keyword entirely and say we're only replicating pre-1.24 behaviour?

That's also an option. Choose the one that is easiest for you to implement.

@gmarkall gmarkall modified the milestones: 0.60.0-rc1, 0.61.0-rc1 Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 - Waiting on author Waiting for author to respond to review Effort - medium Medium size effort needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants