Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return zero when floor dividing an integer data by zero #11441

Conversation

brandon-b-miller
Copy link
Contributor

Closes #7389

@brandon-b-miller brandon-b-miller requested a review from a team as a code owner August 2, 2022 19:30
@github-actions github-actions bot added the Python Affects Python cuDF API. label Aug 2, 2022
@brandon-b-miller brandon-b-miller added bug Something isn't working non-breaking Non-breaking change 2 - In Progress Currently a work in progress labels Aug 2, 2022
@vyasr
Copy link
Contributor

vyasr commented Aug 4, 2022

@brandon-b-miller it looks like there's one code path somewhere that is (potentially erroneously) relying on the old behavior.

@brandon-b-miller brandon-b-miller added breaking Breaking change and removed non-breaking Non-breaking change labels Aug 5, 2022
@codecov
Copy link

codecov bot commented Aug 8, 2022

Codecov Report

❗ No coverage uploaded for pull request base (branch-22.10@e431440). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head e500fe5 differs from pull request most recent head b2b22c8. Consider uploading reports for the commit b2b22c8 to get more accurate results

@@               Coverage Diff               @@
##             branch-22.10   #11441   +/-   ##
===============================================
  Coverage                ?   86.41%           
===============================================
  Files                   ?      145           
  Lines                   ?    22975           
  Branches                ?        0           
===============================================
  Hits                    ?    19853           
  Misses                  ?     3122           
  Partials                ?        0           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@brandon-b-miller brandon-b-miller added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Aug 8, 2022
@mroeschke
Copy link
Contributor

Appears on the pandas side, there's some agreement that pandas nullable-int dtype should match the pandas non-nullable int dtype for this operation, although it differs from numpy: pandas-dev/pandas#48223

In [1]: pd.__version__
Out[1]: '1.6.0.dev0+3.g1f41ff07d1'

In [2]: 1 // pd.Series([0], dtype='Int64')  # changed in 1.5
Out[2]:
0    inf
dtype: Float64

In [3]: 1 // pd.Series([0], dtype='int64')
Out[3]:
0    inf
dtype: float64

In [4]: 1 // np.array([0], dtype=np.int64)
<ipython-input-4-9d5b9660db55>:1: RuntimeWarning: divide by zero encountered in floor_divide
  1 // np.array([0], dtype=np.int64)
Out[4]: array([0])

But as mentioned in #7389 (comment), if cuDF would like to implement a more realistic (from a math perspective) result from this operation, IMO it wouldn't be a bad thing.

@brandon-b-miller
Copy link
Contributor Author

I'm completely happy with casting to float and returning inf, especially if that's where pandas is headed as well. The only question I have is can we avoid the data scan - @mroeschke in pandas >1.5, does it specifically check for zero and then cast, or would we get Float64 for this operation too? I am guessing it ends up int matching the non nullable behavior meaning we would still need to post-process.

1 // pd.Series([2], dtype='Int64')

@mroeschke
Copy link
Contributor

In the 1 / 0 case, pandas has special logic to take the 1 / 0 = 0 result from numpy and replace it with inf

https://github.com/pandas-dev/pandas/blob/e0cf2645095a5164ea7a7b143097bf0051f11481/pandas/core/ops/missing.py#L130

For non divide by zero results, the nullable types (should) match the non-nullable types (and numpy) and return int

In [1]: 1 // pd.Series([2], dtype='Int64')
Out[1]:
0    0
dtype: Int64

In [2]: 1 // pd.Series([2], dtype=np.int64)
Out[2]:
0    0
dtype: int64

In [4]: 1 // np.array([2])
Out[4]: array([0])

@github-actions
Copy link

This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.

@wence-
Copy link
Contributor

wence- commented Nov 10, 2022

#12074 fixes this to match modern panda (and would close this instead)

@quasiben
Copy link
Member

PR is superseded by #12074

@quasiben quasiben closed this Nov 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team breaking Breaking change bug Something isn't working Python Affects Python cuDF API.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

[BUG] Floor division behavior of integer by series with zeroes differs from Pandas
5 participants