Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support string indexing in strings_udf #12127

Draft
wants to merge 5 commits into
base: branch-23.02
Choose a base branch
from

Conversation

brandon-b-miller
Copy link
Contributor

This PR adds support for the following operator strings_udf:

  • st[idx]

Part of #9639

@brandon-b-miller brandon-b-miller added feature request New feature or request 2 - In Progress Currently a work in progress numba Numba issue Python Affects Python cuDF API. non-breaking Non-breaking change labels Nov 10, 2022
@brandon-b-miller brandon-b-miller self-assigned this Nov 10, 2022
@brandon-b-miller
Copy link
Contributor Author

Something I didn't fully consider when first mocking that especially with string indexing, the error cases are really important. In python for instance we get an out of range index error for indices beyond the length of the string. A UDF in pandas encountering this would just throw during the loop over the data, but in our case though since we're executing the equivalent code at the thread level and can't raise in the same way.

I think this could be a little dangerous as it could hide logic errors in the UDF that pandas highlights by failing and the user could have their UDF compile and run and just get garbage values for strings that would have thrown in python.

We could maybe fix this using a similar approach to what we started discussing in #8774. Until then we're going to be missing some important functionality here.

@brandon-b-miller brandon-b-miller changed the base branch from branch-22.12 to branch-23.02 November 18, 2022 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - In Progress Currently a work in progress feature request New feature or request non-breaking Non-breaking change numba Numba issue Python Affects Python cuDF API.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

1 participant