-
Notifications
You must be signed in to change notification settings - Fork 871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move cudf::strings::findall_record to cudf::strings::findall #11575
Move cudf::strings::findall_record to cudf::strings::findall #11575
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## branch-22.10 #11575 +/- ##
===============================================
Coverage ? 86.41%
===============================================
Files ? 145
Lines ? 22992
Branches ? 0
===============================================
Hits ? 19869
Misses ? 3123
Partials ? 0 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
@@ -19,142 +19,124 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing actually has been modified in this file. The original findall.cu
was deleted and this file was renamed to findall.cu
. Github did not detect this and so is just highlighting the differences between the two files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CMake changes LGTM
2 <NA> | ||
0 [on] | ||
1 [] | ||
2 [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. I would have hoped the doctests would fail here, and in other places where the dtype of the output is missing. Can we verify if the doctests are running this?
2 [] | |
2 [] | |
dtype: list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked that, and doc-tests don't seem to be covering StringMethods
somehow. Is it because it isn't listed in __all__
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this is being missed in our doctests. I suspect that none of the .str
class StringMethods
or similar accessors for lists/structs are being doctested. I'll file an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue filed. I'd like to see these docstrings tested/verified by hand, and the issue will guide us for future work. #11606
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the docstring and verified it manually by executing:
>>> s = cudf.Series(['Lion', 'Monkey', 'Rabbit'])
>>> s.str.findall('Monkey')
0 []
1 [Monkey]
2 []
dtype: list
>>> s.str.findall('on')
0 [on]
1 [on]
2 []
dtype: list
>>> s.str.findall('on$')
0 [on]
1 []
2 []
dtype: list
>>> s.str.findall('b')
0 []
1 []
2 [b, b]
dtype: list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @davidwendt!
@gpucibot merge |
Description
Replaces
cudf::strings::findall
with the implementation fromcudf::strings::findall_record
.As referenced in #11510, the column-based
findall
implementation is not used and unnecessary overfindall_record
which returns a lists result. For documentation and discoverabilityfindall_record
is renamed tofindall
and the currentfindall
implementation is removed.Closes #11510
Checklist