Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move cudf::strings::findall_record to cudf::strings::findall #11575

Merged
merged 12 commits into from
Aug 29, 2022

Conversation

davidwendt
Copy link
Contributor

@davidwendt davidwendt commented Aug 22, 2022

Description

Replaces cudf::strings::findall with the implementation from cudf::strings::findall_record.
As referenced in #11510, the column-based findall implementation is not used and unnecessary over findall_record which returns a lists result. For documentation and discoverability findall_record is renamed to findall and the current findall implementation is removed.

Closes #11510

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@davidwendt davidwendt added 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Aug 22, 2022
@davidwendt davidwendt self-assigned this Aug 22, 2022
@davidwendt davidwendt added this to PR-WIP in v22.10 Release via automation Aug 22, 2022
@github-actions github-actions bot added CMake CMake build issue Java Affects Java cuDF API. Python Affects Python cuDF API. labels Aug 22, 2022
@codecov
Copy link

codecov bot commented Aug 22, 2022

Codecov Report

❗ No coverage uploaded for pull request base (branch-22.10@ccd72f2). Click here to learn what that means.
Patch has no changes to coverable lines.

Additional details and impacted files
@@               Coverage Diff               @@
##             branch-22.10   #11575   +/-   ##
===============================================
  Coverage                ?   86.41%           
===============================================
  Files                   ?      145           
  Lines                   ?    22992           
  Branches                ?        0           
===============================================
  Hits                    ?    19869           
  Misses                  ?     3123           
  Partials                ?        0           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@davidwendt davidwendt added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Aug 23, 2022
@davidwendt davidwendt moved this from PR-WIP to PR-Needs review in v22.10 Release Aug 23, 2022
@davidwendt davidwendt marked this pull request as ready for review August 24, 2022 00:08
@davidwendt davidwendt requested review from a team as code owners August 24, 2022 00:08
@@ -19,142 +19,124 @@

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing actually has been modified in this file. The original findall.cu was deleted and this file was renamed to findall.cu. Github did not detect this and so is just highlighting the differences between the two files.

Copy link
Contributor

@robertmaynard robertmaynard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CMake changes LGTM

2 <NA>
0 [on]
1 []
2 []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I would have hoped the doctests would fail here, and in other places where the dtype of the output is missing. Can we verify if the doctests are running this?

Suggested change
2 []
2 []
dtype: list

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked that, and doc-tests don't seem to be covering StringMethods somehow. Is it because it isn't listed in __all__?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is being missed in our doctests. I suspect that none of the .str class StringMethods or similar accessors for lists/structs are being doctested. I'll file an issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue filed. I'd like to see these docstrings tested/verified by hand, and the issue will guide us for future work. #11606

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the docstring and verified it manually by executing:

>>> s = cudf.Series(['Lion', 'Monkey', 'Rabbit'])
>>> s.str.findall('Monkey')
0          []
1    [Monkey]
2          []
dtype: list
>>> s.str.findall('on')
0    [on]
1    [on]
2      []
dtype: list
>>> s.str.findall('on$')
0    [on]
1      []
2      []
dtype: list
>>> s.str.findall('b')
0        []
1        []
2    [b, b]
dtype: list

@davidwendt davidwendt added breaking Breaking change and removed non-breaking Non-breaking change labels Aug 26, 2022
@davidwendt davidwendt requested a review from bdice August 29, 2022 12:07
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @davidwendt!

@bdice bdice added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Aug 29, 2022
v22.10 Release automation moved this from PR-Needs review to PR-Reviewer approved Aug 29, 2022
@galipremsagar
Copy link
Contributor

@gpucibot merge

@rapids-bot rapids-bot bot merged commit ecf4662 into rapidsai:branch-22.10 Aug 29, 2022
v22.10 Release automation moved this from PR-Reviewer approved to Done Aug 29, 2022
@davidwendt davidwendt deleted the remove-str-findall branch August 29, 2022 21:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge breaking Breaking change CMake CMake build issue improvement Improvement / enhancement to an existing function Java Affects Java cuDF API. libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API.
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

Remove cudf::strings::findall.
5 participants