Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support equal NaNs in NullEquals binary operation #12275

Closed

Conversation

davidwendt
Copy link
Contributor

Description

Test change.
Reference #12266

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@davidwendt davidwendt added bug Something isn't working 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change labels Nov 30, 2022
@davidwendt davidwendt self-assigned this Nov 30, 2022
Comment on lines +393 to +397
if (!lhs_valid || !rhs_valid) return rhs_valid == lhs_valid;
if constexpr (cudf::is_floating_point<TypeLhs>() && cudf::is_floating_point<TypeRhs>()) {
if (isnan(x) && isnan(y)) return true;
}
return x == y;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think doing this unconditionally will break us. We usually want NaNs to compare not equal:

In [11]: s = cudf.Series([1.0, 2.0, np.nan], nan_as_null=False)

In [12]: s == s
Out[12]: 
0     True
1     True
2    False
dtype: bool

Except when using .equals(), when we want NaNs to compare equal:

# s.equals(s.copy()) should return [True, True, True]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I wasn't sure if coding this only in NullsEquals would you help here.

@codecov
Copy link

codecov bot commented Dec 1, 2022

Codecov Report

❗ No coverage uploaded for pull request base (branch-23.02@5f83a84). Click here to learn what that means.
Patch has no changes to coverable lines.

❗ Current head 952359f differs from pull request most recent head 06fd63a. Consider uploading reports for the commit 06fd63a to get more accurate results

Additional details and impacted files
@@               Coverage Diff               @@
##             branch-23.02   #12275   +/-   ##
===============================================
  Coverage                ?   88.20%           
===============================================
  Files                   ?      137           
  Lines                   ?    22690           
  Branches                ?        0           
===============================================
  Hits                    ?    20013           
  Misses                  ?     2677           
  Partials                ?        0           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@revans2
Copy link
Contributor

revans2 commented Dec 5, 2022

I think this is fine from the spark side. Everywhere I looked that we use NULL_EQUALS it is followed by special case handling to make NaNs equal.

I would add that we have to do some special case processing for NaNs on all of the comparison operators, not just NULL_EQUALS. But I think those are more likely to cause issues if we don't do this carefully.

@davidwendt
Copy link
Contributor Author

Closing this based on comment from Ashwin that this would break cudf.

@davidwendt davidwendt closed this Dec 8, 2022
@davidwendt davidwendt deleted the null-equals-nans branch December 9, 2022 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - In Progress Currently a work in progress bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants