Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Add filter_characters_of_type strings API #5666

Merged
merged 16 commits into from
Jul 15, 2020

Conversation

davidwendt
Copy link
Contributor

Closes #5520

Create a new strings API that can filter characters from strings by specifying the character types.
Also, allows for an optional replacement character or string in place of each removed character.

@davidwendt davidwendt self-assigned this Jul 9, 2020
@davidwendt davidwendt added 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. strings strings issues (C++ and Python) labels Jul 9, 2020
@codecov
Copy link

codecov bot commented Jul 9, 2020

Codecov Report

Merging #5666 into branch-0.15 will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff              @@
##           branch-0.15    #5666   +/-   ##
============================================
  Coverage        86.25%   86.25%           
============================================
  Files               72       72           
  Lines            12672    12676    +4     
============================================
+ Hits             10930    10934    +4     
  Misses            1742     1742           
Impacted Files Coverage Δ
python/cudf/cudf/core/column/string.py 86.57% <100.00%> (+0.07%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dad22eb...09978c8. Read the comment docs.

@davidwendt davidwendt marked this pull request as ready for review July 13, 2020 16:48
@davidwendt davidwendt requested review from a team as code owners July 13, 2020 16:48
@davidwendt davidwendt changed the title [WIP] Add filter_characters_of_type strings API [REVIEW] Add filter_characters_of_type strings API Jul 13, 2020
@davidwendt davidwendt added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Jul 13, 2020
python/cudf/cudf/core/column/string.py Show resolved Hide resolved
python/cudf/cudf/tests/test_string.py Show resolved Hide resolved
cpp/include/cudf/strings/char_types/char_types.hpp Outdated Show resolved Hide resolved
cpp/include/cudf/strings/char_types/char_types.hpp Outdated Show resolved Hide resolved
cpp/src/strings/char_types/char_types.cu Outdated Show resolved Hide resolved
cpp/src/strings/char_types/char_types.cu Show resolved Hide resolved
@cwharris
Copy link
Contributor

Do we want to include benchmarks for strings features?

@davidwendt
Copy link
Contributor Author

Do we want to include benchmarks for strings features?

Argh. I keep forgetting about the benchmarks. I need to create an issue to add these.

@davidwendt davidwendt requested a review from cwharris July 14, 2020 22:26
@davidwendt davidwendt merged commit c6f24a8 into rapidsai:branch-0.15 Jul 15, 2020
@davidwendt davidwendt deleted the fea-filter-non-alphanum branch July 15, 2020 12:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. strings strings issues (C++ and Python)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Filtering non-alphanumeric characters
4 participants