-
Notifications
You must be signed in to change notification settings - Fork 903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support equal NaNs in NullEquals binary operation #12275
Conversation
if (!lhs_valid || !rhs_valid) return rhs_valid == lhs_valid; | ||
if constexpr (cudf::is_floating_point<TypeLhs>() && cudf::is_floating_point<TypeRhs>()) { | ||
if (isnan(x) && isnan(y)) return true; | ||
} | ||
return x == y; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think doing this unconditionally will break us. We usually want NaN
s to compare not equal:
In [11]: s = cudf.Series([1.0, 2.0, np.nan], nan_as_null=False)
In [12]: s == s
Out[12]:
0 True
1 True
2 False
dtype: bool
Except when using .equals()
, when we want NaNs to compare equal:
# s.equals(s.copy()) should return [True, True, True]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I wasn't sure if coding this only in NullsEquals
would you help here.
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## branch-23.02 #12275 +/- ##
===============================================
Coverage ? 88.20%
===============================================
Files ? 137
Lines ? 22690
Branches ? 0
===============================================
Hits ? 20013
Misses ? 2677
Partials ? 0 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
I think this is fine from the spark side. Everywhere I looked that we use NULL_EQUALS it is followed by special case handling to make NaNs equal. I would add that we have to do some special case processing for NaNs on all of the comparison operators, not just NULL_EQUALS. But I think those are more likely to cause issues if we don't do this carefully. |
Closing this based on comment from Ashwin that this would break cudf. |
Description
Test change.
Reference #12266
Checklist