Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Integer promotion fixes needed for NumPy 2 for comparison operators #16282

Closed
seberg opened this issue Jul 15, 2024 · 2 comments · Fixed by #16532
Closed

[BUG] Integer promotion fixes needed for NumPy 2 for comparison operators #16282

seberg opened this issue Jul 15, 2024 · 2 comments · Fixed by #16532

Comments

@seberg
Copy link
Contributor

seberg commented Jul 15, 2024

While NumPy 2 and pandas promotions should now work mostly correctly on the main branch (tests pass with cupy==12.3.1 (not actually released) and numpy==2.0.0, the comparison operators are still a problem.

As of now the following new test would just fail:

@pytest.mark.parametrize(
    "dtype,val",
    [("int8", 200), ("int32", 2**32), ("uint8", -128), ("uint64", -1)],
)
def test_series_compare_integer(dtype, val):
    # Tests that these actually work, even though they are out of bound.
    force_cast = np.array(val).astype(dtype)
    sr = Series(
        [np.iinfo(dtype).min, np.iinfo(dtype).max, force_cast], dtype=dtype
    )
    assert (sr != val).all()
    assert not (sr == val).any()

    if val < 0:
        assert (sr > val).all()
    else:
        assert (sr < val).all()

Because when comparing an integer series and a 0-D object, we would assume it is OK to use the series' dtype even when the integer does not fit.

The solution to this should not be very complicated. For integer comparison operations change normalize_binop_value to:

  • Check whether series.dtype.type(other_integer) fits (if yes OK)
  • If it does not fit, we need some fallback path, either:
    • convert to a NumPy array (which will end up upcasting the series), or
    • add an immediate return path since in that case the result is known to be all True or all False. (i.e. avoid an upcast and also guarantee correct behavior for any scalar comparison).

I may look into it now that the other things are merged. But if anyone has a preference for the approach, happy to hear it.

N.B. for series to series comparisons, pandas/numpy now compare uint64 and int64 correctly by value (for scalars the above fixes already do that when they work). One could fix that, but it is a separate issue and seems lower priority.

@jakirkham
Copy link
Member

If this will be fixed in CuPy 13.3.0, is there anything more for us to do here?

@seberg
Copy link
Contributor Author

seberg commented Aug 1, 2024

This needs to be fixed in CuDF itself to avoid users running into annoying limitations (such as uint32_series > -1 failing).

It isn't a difficult thing as such: We need a custom promotion logic for comparisons, so getting a good-enough fix should be quick. But opened an issue, because I thought someone deeper in cudf may have a better thought on the best approach.

rapids-bot bot pushed a commit that referenced this issue Aug 13, 2024
)

When Python integers are compared to a series of integers, the result can always be correctly defined no matter the values of the Python integer.

This was always a very mild issue.  But with NumPy 2 behavior not upcasting the computation result type based on the value anymore, even things like:
```
cudf.Series([1, 2, 3], dtype="int8") < 1000
```
would fail.
(Similar paths could be taken for other integer scalars, but there would be mostly nice for performance.)

N.B. NumPy/pandas also support exact comparisons when mixing e.g. uint64 and int64.  This is another rare exception that cudf currently does not support.

Closes gh-16282

Authors:
  - Sebastian Berg (https://github.com/seberg)

Approvers:
  - Matthew Roeschke (https://github.com/mroeschke)

URL: #16532
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants