[BUG] Floor division behavior of integer by series with zeroes differs from Pandas #7389

brandon-b-miller · 2021-02-16T17:22:41Z

Describe the bug
Dividing an int by a series or dataframe with zeroes in it yields a different result from Pandas nullable dtypes.

Steps/Code to reproduce bug

import pandas as pd
1 // pd.Series([0], dtype='Int64') # Nullable
0    0
dtype: Int64

vs

>>> 1 // cudf.Series([0])
0    9223372036854775807

Expected behavior
I'm not sure which result is preferred here. We could cast to float but that'd require a scan.

Environment overview (please complete the following information)

Environment location: Bare Metal
Method of cuDF install: Source

The text was updated successfully, but these errors were encountered:

jrhemstad · 2021-02-16T17:28:36Z

So Pandas expects division by zero to yield zero?

brandon-b-miller · 2021-02-16T19:20:00Z

Not sure if it's a bug, something that just hasn't been implemented yet, or otherwise. Raised pandas-dev/pandas#39847 to ask. In our case maybe we want to raise.

github-actions · 2021-03-18T22:18:05Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

vyasr · 2022-07-22T22:49:24Z

Based on discussion in #7492, we initially decided that the appropriate solution would be to match pandas by setting division by zero entries to zero in the output, but in subsequent discussions we decided to take no action. @brandon-b-miller at this point do you think it's worth revisiting this and doing the post facto 0 filling, or should we just close this issue?

brandon-b-miller · 2022-07-25T18:10:36Z

As long as we're doing things like INT_POW could there be precedent to add this at the libcudf level?

jrhemstad · 2022-07-25T20:14:01Z

As long as we're doing things like INT_POW could there be precedent to add this at the libcudf level?

INT_POW is different. It's a distinct algorithm for doing the "pow" operation.

"INT_DIVISION_BY_0_YIELDS_0" is far more niche.

If this behavior is desired, it should be a pre/post-processing step.

vyasr · 2022-07-26T00:00:56Z

I agree, I'd still want to do this as a post-processing step. I think we'd probably be OK to eat the perf hit though.

brandon-b-miller · 2022-08-01T19:02:31Z

So we're OK with doing a filter and I guess a scatter every time we do floor division by a cuDF series?

vyasr · 2022-08-01T22:45:07Z

Yeah I think so. This feels like a very typical case where cuDF Python should pay a performance cost for matching pandas behavior exactly. No need to overspecialize libcudf.

vyasr · 2022-08-02T00:16:01Z

This issue is pretty similar to #5938 in spirit, although it'll require a pretty different type of fix.

mroeschke · 2022-08-23T20:59:41Z

This appears to be a moving target on the pandas side

>>> import pandas as pd
>>> pd.__version__
'1.4.3'
>>> 1 // pd.Series([0], dtype='Int64')
0    0
dtype: Int64

vs

In [1]: pd.__version__
Out[1]: '1.6.0.dev0'  # essentially 1.5

In [2]: 1 // pd.Series([0], dtype='Int64')
Out[2]:
0    inf
dtype: Float64

FWIW I do find the 0 result from pandas a bit odd. And given discussion on the pandas side related to this (pandas-dev/pandas#30188 and pandas-dev/pandas#32265) maybe in this case it may make sense for cuDF to return a more sensible result (e.g. not 0)?

wence- · 2022-11-08T18:59:32Z

At least as of #12074, cudf will match pandas 1.5 AFAICT in these cases.

The type normalisation applied before heading into libcudf previously had slightly unexpected consequences for large int64 values. If not providing a `cudf.Scalar`, a bare `int64` scalar would be cast to `uint64` and then normal numpy type promotion would unify to `float64`. This is lossy, since int64 to float64 is neither surjective nor injective. To avoid this, try very hard to maintain the dtype of the object coming in, and match pandas behaviour by applying numpy type promotion rules via `numpy.result_type`. - Closes #5938 - Closes #7389 - Closes #12072 - Closes #12092 Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #12074

brandon-b-miller added bug Something isn't working Needs Triage Need team to review and classify labels Feb 16, 2021

kkraus14 added Python Affects Python cuDF API. and removed Needs Triage Need team to review and classify labels Feb 16, 2021

brandon-b-miller mentioned this issue Mar 2, 2021

Return zero for integer floor division by zero #7492

Closed

github-actions bot added the inactive-30d label Mar 18, 2021

vyasr mentioned this issue Jul 22, 2022

[FEA] Match Pandas behaviour for boolean-int ops #2172

Closed

brandon-b-miller self-assigned this Aug 2, 2022

brandon-b-miller mentioned this issue Aug 2, 2022

Return zero when floor dividing an integer data by zero #11441

Closed

vyasr mentioned this issue Aug 4, 2022

[BUG] Pymod operation is missing null values #5938

Closed

wence- mentioned this issue Nov 9, 2022

Fix type promotion edge cases in numerical binops #12074

Merged

3 tasks

rapids-bot bot closed this as completed in #12074 Nov 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Floor division behavior of integer by series with zeroes differs from Pandas #7389

[BUG] Floor division behavior of integer by series with zeroes differs from Pandas #7389

brandon-b-miller commented Feb 16, 2021

jrhemstad commented Feb 16, 2021

brandon-b-miller commented Feb 16, 2021

github-actions bot commented Mar 18, 2021

vyasr commented Jul 22, 2022

brandon-b-miller commented Jul 25, 2022 •

edited

Loading

jrhemstad commented Jul 25, 2022

vyasr commented Jul 26, 2022

brandon-b-miller commented Aug 1, 2022

vyasr commented Aug 1, 2022

vyasr commented Aug 2, 2022

mroeschke commented Aug 23, 2022 •

edited

Loading

wence- commented Nov 8, 2022

[BUG] Floor division behavior of integer by series with zeroes differs from Pandas #7389

[BUG] Floor division behavior of integer by series with zeroes differs from Pandas #7389

Comments

brandon-b-miller commented Feb 16, 2021

jrhemstad commented Feb 16, 2021

brandon-b-miller commented Feb 16, 2021

github-actions bot commented Mar 18, 2021

vyasr commented Jul 22, 2022

brandon-b-miller commented Jul 25, 2022 • edited Loading

jrhemstad commented Jul 25, 2022

vyasr commented Jul 26, 2022

brandon-b-miller commented Aug 1, 2022

vyasr commented Aug 1, 2022

vyasr commented Aug 2, 2022

mroeschke commented Aug 23, 2022 • edited Loading

wence- commented Nov 8, 2022

brandon-b-miller commented Jul 25, 2022 •

edited

Loading

mroeschke commented Aug 23, 2022 •

edited

Loading