-
Notifications
You must be signed in to change notification settings - Fork 901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Floor division behavior of integer by series with zeroes differs from Pandas #7389
Comments
So Pandas expects division by zero to yield zero? |
Not sure if it's a bug, something that just hasn't been implemented yet, or otherwise. Raised pandas-dev/pandas#39847 to ask. In our case maybe we want to raise. |
This issue has been labeled |
Based on discussion in #7492, we initially decided that the appropriate solution would be to match pandas by setting division by zero entries to zero in the output, but in subsequent discussions we decided to take no action. @brandon-b-miller at this point do you think it's worth revisiting this and doing the post facto 0 filling, or should we just close this issue? |
As long as we're doing things like |
INT_POW is different. It's a distinct algorithm for doing the "pow" operation. "INT_DIVISION_BY_0_YIELDS_0" is far more niche. If this behavior is desired, it should be a pre/post-processing step. |
I agree, I'd still want to do this as a post-processing step. I think we'd probably be OK to eat the perf hit though. |
So we're OK with doing a filter and I guess a scatter every time we do floor division by a cuDF series? |
Yeah I think so. This feels like a very typical case where cuDF Python should pay a performance cost for matching pandas behavior exactly. No need to overspecialize libcudf. |
This issue is pretty similar to #5938 in spirit, although it'll require a pretty different type of fix. |
This appears to be a moving target on the pandas side
vs
FWIW I do find the 0 result from pandas a bit odd. And given discussion on the pandas side related to this (pandas-dev/pandas#30188 and pandas-dev/pandas#32265) maybe in this case it may make sense for cuDF to return a more sensible result (e.g. not 0)? |
At least as of #12074, cudf will match pandas 1.5 AFAICT in these cases. |
The type normalisation applied before heading into libcudf previously had slightly unexpected consequences for large int64 values. If not providing a `cudf.Scalar`, a bare `int64` scalar would be cast to `uint64` and then normal numpy type promotion would unify to `float64`. This is lossy, since int64 to float64 is neither surjective nor injective. To avoid this, try very hard to maintain the dtype of the object coming in, and match pandas behaviour by applying numpy type promotion rules via `numpy.result_type`. - Closes #5938 - Closes #7389 - Closes #12072 - Closes #12092 Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #12074
Describe the bug
Dividing an int by a series or dataframe with zeroes in it yields a different result from Pandas nullable dtypes.
Steps/Code to reproduce bug
vs
Expected behavior
I'm not sure which result is preferred here. We could cast to float but that'd require a scan.
Environment overview (please complete the following information)
The text was updated successfully, but these errors were encountered: