Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix type promotion edge cases in numerical binops #12074

Merged
merged 15 commits into from
Nov 16, 2022

Conversation

wence-
Copy link
Contributor

@wence- wence- commented Nov 4, 2022

Description

The type normalisation applied before heading into libcudf previously had slightly unexpected consequences for large int64 values. If not providing a cudf.Scalar, a bare int64 scalar would be cast to uint64 and then normal numpy type promotion would unify to float64. This is lossy, since int64 to float64 is neither surjective nor injective.

To avoid this, try very hard to maintain the dtype of the object coming in, and match pandas behaviour by applying numpy type promotion rules via numpy.result_type.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@wence- wence- requested a review from a team as a code owner November 4, 2022 18:03
@github-actions github-actions bot added the Python Affects Python cuDF API. label Nov 4, 2022
@wence- wence- added 3 - Ready for Review Ready for review by team breaking Breaking change labels Nov 4, 2022
@wence-
Copy link
Contributor Author

wence- commented Nov 4, 2022

This is a breaking change because it changes the behaviour of user-facing API, though I hope very much no-one was relying on it. FWIW, pandas gets this case right.

Try and do everything following numpy using types rather than values
by first attempting to use the dtype of the passed in operand and
subsequently (if it does not have one) using result_type. This way
we avoid problems with min_scalar_type wanting to pick unsigned int
types for bare Python integers.
@wence- wence- added the bug Something isn't working label Nov 8, 2022
No idea how to handle the pandas weirdness here.
@wence-
Copy link
Contributor Author

wence- commented Nov 8, 2022

Requesting some careful eyes from reviewers here (assuming at least the test suite all passes). This kind of type promotion/casting and all the edge cases is really hard to think through for me.

@wence- wence- changed the title Fix equality edge case in numerical binops Fix type promotion edge cases in numerical binops Nov 8, 2022
Copy link
Contributor

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought I would have a look and added a few comments/questions in the hope they are useful

python/cudf/cudf/core/column/column.py Show resolved Hide resolved
python/cudf/cudf/core/column/numerical.py Show resolved Hide resolved
python/cudf/cudf/core/column/numerical.py Show resolved Hide resolved
python/cudf/cudf/core/column/numerical.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/column/numerical.py Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Nov 9, 2022

Codecov Report

Base: 88.07% // Head: 88.11% // Increases project coverage by +0.04% 🎉

Coverage data is based on head (3fdc7a8) compared to base (b2e5069).
Patch coverage: 100.00% of modified lines in pull request are covered.

❗ Current head 3fdc7a8 differs from pull request most recent head 56cd889. Consider uploading reports for the commit 56cd889 to get more accurate results

Additional details and impacted files
@@               Coverage Diff                @@
##           branch-22.12   #12074      +/-   ##
================================================
+ Coverage         88.07%   88.11%   +0.04%     
================================================
  Files               135      135              
  Lines             22133    22124       -9     
================================================
+ Hits              19494    19495       +1     
+ Misses             2639     2629      -10     
Impacted Files Coverage Δ
python/cudf/cudf/core/index.py 92.88% <ø> (ø)
python/cudf/cudf/core/series.py 95.71% <ø> (ø)
python/cudf/cudf/core/column/column.py 87.96% <100.00%> (ø)
python/cudf/cudf/core/column/numerical.py 96.51% <100.00%> (+1.04%) ⬆️
python/cudf/cudf/core/column/timedelta.py 90.17% <100.00%> (ø)
python/cudf/cudf/core/dataframe.py 93.64% <0.00%> (+0.04%) ⬆️
python/cudf/cudf/core/column/string.py 88.65% <0.00%> (+0.12%) ⬆️
python/cudf/cudf/core/groupby/groupby.py 91.51% <0.00%> (+0.20%) ⬆️
python/cudf/cudf/core/tools/datetimes.py 84.49% <0.00%> (+0.30%) ⬆️
... and 1 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Now that binop with cudf.Scalar matches pandas behaviour with
numpy-dtype-enabled scalars, we need to manually promote here.
@wence-
Copy link
Contributor Author

wence- commented Nov 15, 2022

I think this is now ready for another look (though happy to retarget to 23.0x)

@wence-
Copy link
Contributor Author

wence- commented Nov 16, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit a8c0f4b into rapidsai:branch-22.12 Nov 16, 2022
@wence- wence- added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Nov 16, 2022
@wence- wence- deleted the wence/fix/issue-12072 branch November 16, 2022 11:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge breaking Breaking change bug Something isn't working Python Affects Python cuDF API.
Projects
Archived in project
4 participants