[BUG] Pymod operation is missing `null` values #5938

galipremsagar · 2020-08-11T22:54:46Z

Describe the bug
When there is 0 in denominator of pymod operation, we seem to be returning garbage value instead of null value.

Steps/Code to reproduce bug

Type "help", "copyright", "credits" or "license" for more information.
>>> 1 % 0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ZeroDivisionError: integer division or modulo by zero
>>> import pandas as pd
>>> x = pd.Series([1, 2, 3, 4, 0])
>>> x  % 2
0    1
1    0
2    1
3    0
4    0
dtype: int64
>>> 2 % x
0    0.0
1    0.0
2    2.0
3    2.0
4    NaN
dtype: float64

>>> import cudf
>>> x = cudf.from_pandas(x)
>>> x % 2
0    1
1    0
2    1
3    0
4    0
dtype: int64
>>> 2 % x
0             0
1             0
2             2
3             2
4    4294967295   #<- This could either be `nan` or `null`
dtype: int64

Expected behavior
Return either nan or null inplace of garbage values

Environment overview (please complete the following information)

Environment location: Docker
Method of cuDF install: from source (0.15)

Additional context
This kind of looks similar to #5722 but not sure if they share the same code-flows, so Just putting it out there.

The text was updated successfully, but these errors were encountered:

kkraus14 · 2020-08-12T00:21:57Z

I guess it makes sense for a modulo zero operation to return null since it's undefined behavior but this is somewhere we would differ from Pandas.

github-actions · 2021-02-16T21:18:33Z

This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

kkraus14 · 2021-02-24T21:44:17Z

cc @brandon-b-miller for more binaryop null things

vyasr · 2022-08-04T18:47:49Z

Binary operations in libcudf are designed so that nulls always propagate. As a result, although we have created a specialized mod operator PYMOD in libcudf for compatibility with Python's rules for handling negatives, we can't trivially exploit that for null handling since the operator doesn't return a nullable value.

However, in this instance there is some precedent for solving this at the libcudf level. Specifically, all the Null* operators have special handling to bypass null propagation in the wrapper for ops. While I would normally say that we should solve this issue by post-processing the result in Python (similar to how #11441 solves #7389), the existing infrastructure for solving this problem makes me less wary of introducing this logic into C++.

@jrhemstad I'd be curious what you think about this. The tl;dr is that in pandas x % 0 returns nulls, and we're trying to figure out how best to do that. Options are (1) using the same wrapping logic that we currently use in libcudf for the Null* operators, which I find slightly icky (but only slightly since we're already doing this, it's just a little less obvious why you'd do this for an operator that isn't strictly about specialized null handling) or (2) just post-processing the result in Python based on whether the input values contained zeros.

CC @brandon-b-miller

jrhemstad · 2022-08-04T19:08:43Z

although we have created a specialized mod operator PYMOD in libcudf for compatibility with Python's rules for handling negatives

"PYMOD" is a misnomer. It wasn't intended to be 100% compliant with Python's %. It was intended to just be the mathematical modulo operation vs C++'s % is the remainder operation.

See: https://stackoverflow.com/questions/13683563/whats-the-difference-between-mod-and-remainder

I wanted to rename MOD to REMAINDER and PYMOD would be MODULO.

While I would normally say that we should solve this issue by post-processing the result in Python

This is the correct answer. It's no different than #7389 (comment) imo.

jrhemstad · 2022-08-04T19:19:00Z

Found the original conversation: #1985 (review)

What's worse is calling an operation PyMod. It doesn't tell a user of the C++ interface what it actually means. In general, we should use descriptions that stand on their own and do not resort to knowledge of another language or library.

I still agree with my past self :)

vyasr · 2022-08-04T19:26:12Z

I'm fine with that explanation, although to @harrism's point we'd definitely need to document how we define remainder vs modulo since it's at best it's a nonstandard but common distinction. So perhaps we should do that rename while also addressing this issue here :)

I think resolving this issue then requires two things:

Rename PYMOD to REMAINDER (along with suitable documentation of the meaning)
Add the same kind of logic as Return zero when floor dividing an integer data by zero #11441 except for modulo.

@brandon-b-miller would be up for handling this? I'm hoping it would be a near carbon copy of #11441 except with a different fill value. I can take on 1 myself if you don't want to.

brandon-b-miller · 2022-08-04T19:27:53Z

happy to handle making the change 👍

Closes rapidsai#5938.

The type normalisation applied before heading into libcudf previously had slightly unexpected consequences for large int64 values. If not providing a `cudf.Scalar`, a bare `int64` scalar would be cast to `uint64` and then normal numpy type promotion would unify to `float64`. This is lossy, since int64 to float64 is neither surjective nor injective. To avoid this, try very hard to maintain the dtype of the object coming in, and match pandas behaviour by applying numpy type promotion rules via `numpy.result_type`. - Closes #5938 - Closes #7389 - Closes #12072 - Closes #12092 Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #12074

galipremsagar added bug Something isn't working Needs Triage Need team to review and classify labels Aug 11, 2020

kkraus14 added Python Affects Python cuDF API. and removed Needs Triage Need team to review and classify labels Aug 12, 2020

rgsl888prabhu self-assigned this Aug 14, 2020

github-actions bot added the rotten label Feb 16, 2021

kkraus14 unassigned rgsl888prabhu Feb 23, 2021

github-actions bot removed the inactive-90d label Feb 24, 2021

vyasr mentioned this issue Aug 2, 2022

[BUG] Floor division behavior of integer by series with zeroes differs from Pandas #7389

Closed

vyasr assigned brandon-b-miller Aug 4, 2022

wence- added a commit to wence-/cudf that referenced this issue Nov 8, 2022

Fix check for zero with columns in __floordiv__ and __mod__

77d94d1

Closes rapidsai#5938.

wence- mentioned this issue Nov 8, 2022

Fix type promotion edge cases in numerical binops #12074

Merged

3 tasks

wence- added a commit to wence-/cudf that referenced this issue Nov 11, 2022

Remove xfails related to rapidsai#5938

f06a6b9

rapids-bot bot closed this as completed in #12074 Nov 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Pymod operation is missing `null` values #5938

[BUG] Pymod operation is missing `null` values #5938

galipremsagar commented Aug 11, 2020

kkraus14 commented Aug 12, 2020

github-actions bot commented Feb 16, 2021

kkraus14 commented Feb 24, 2021

vyasr commented Aug 4, 2022

jrhemstad commented Aug 4, 2022 •

edited

Loading

jrhemstad commented Aug 4, 2022 •

edited

Loading

vyasr commented Aug 4, 2022

brandon-b-miller commented Aug 4, 2022

[BUG] Pymod operation is missing null values #5938

[BUG] Pymod operation is missing null values #5938

Comments

galipremsagar commented Aug 11, 2020

kkraus14 commented Aug 12, 2020

github-actions bot commented Feb 16, 2021

kkraus14 commented Feb 24, 2021

vyasr commented Aug 4, 2022

jrhemstad commented Aug 4, 2022 • edited Loading

jrhemstad commented Aug 4, 2022 • edited Loading

vyasr commented Aug 4, 2022

brandon-b-miller commented Aug 4, 2022

[BUG] Pymod operation is missing `null` values #5938

[BUG] Pymod operation is missing `null` values #5938

jrhemstad commented Aug 4, 2022 •

edited

Loading

jrhemstad commented Aug 4, 2022 •

edited

Loading