Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add index/rindex ufuncs for unicode and bytes dtypes #25775

Merged
merged 8 commits into from
Feb 15, 2024

Conversation

lysnikolaou
Copy link
Contributor

No description provided.

Copy link
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good to me! Once #25347 is merged I will push commits to this PR adding loops for stringdtype and then this can be merged.

@ngoldbaum
Copy link
Member

Looking at this again, I think it would be better to refactor the loops for find and rfind so we can reuse that code, since this is almost the same. Take a look at my last push that does this for the stringdtype find and rfind loops. Also I rebased on main to get stringdtype in this branch, so you'll need to pull from github on your local branch. I probably should have merged but I wasn't thinking, sorry!

@ngoldbaum
Copy link
Member

Also maybe some unicode characters in the tests too? The stringdtype tests have some so not a biggie.

Copy link
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @mhvk would you like to take a look at this as well?

numpy/_core/code_generators/generate_umath.py Show resolved Hide resolved
numpy/_core/code_generators/ufunc_docstrings.py Outdated Show resolved Hide resolved
Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment in-line.

Two general comments:

  1. About the implementation, I think using a template implies these pieces of code get compiled to two different units, i.e., essentially doubling the compiled code, while if we passed in raise_error through aux_data it would not (for a minimal decrease in speed, since string_find itself is much slower than checking a bool).
  2. Do we really need this? It was there for np.char, possibly just because str has it. But do we care enough about failing fast? Or can we just use a python wrapper that checks whether any of the outputs is -1? Given that it is a simple addition, perhaps no worry, but I do wonder what the use case is for an array of strings...

numpy/_core/src/umath/string_ufuncs.cpp Outdated Show resolved Hide resolved
@lysnikolaou
Copy link
Contributor Author

About the implementation, I think using a template implies these pieces of code get compiled to two different units, i.e., essentially doubling the compiled code, while if we passed in raise_error through aux_data it would not (for a minimal decrease in speed, since string_find itself is much slower than checking a bool).

That's true. I'll try this out as soon as your PR that does this for stringdtype gets merged.

Do we really need this? It was there for np.char, possibly just because str has it. But do we care enough about failing fast? Or can we just use a python wrapper that checks whether any of the outputs is -1? Given that it is a simple addition, perhaps no worry, but I do wonder what the use case is for an array of strings...

That's a good point, although this matches the CPython API with a ufunc for very little effort, so I'm wondering if it'd hurt either way.

@ngoldbaum
Copy link
Member

Marten do you mind if we merge this now so we get fast ufuncs for these operations in the NumPy 2.0 RC? We can do the refactor to use auxdata in a followup.

@mhvk
Copy link
Contributor

mhvk commented Feb 13, 2024

I'd prefer to try to get this right in one go, which would mean merging #25796 first and then rebasing this one and use that framework. This of course partially is laziness, as doing this one first means quite a bit more work for me and I really should spend a bit more time on my astronomy job... (though admittedly the main work was finding out how to pass along method function pointers).

On the other hand, I see the need for getting things in. How about this, we ask @seberg if he's happy with the changes to the dtype API in #25796, so that that one can indeed be merged? And if he'd rather take more time, we put this one in first.

@lysnikolaou
Copy link
Contributor Author

I adjusted this to the new static_data approach, so it's ready for another review.

@lysnikolaou
Copy link
Contributor Author

I'm not sure what to make of the benchmark CI failure. I skimmed through the logs and didn't see something that might be related to this PR, but I might be wrong.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Neither comment needs to be addressed (the first I think is actually better not to address , but I left it in for you to have a look), so I'll approve now

numpy/_core/src/umath/string_ufuncs.cpp Outdated Show resolved Hide resolved
numpy/_core/tests/test_strings.py Outdated Show resolved Hide resolved
@ngoldbaum
Copy link
Member

I'll merge this in a little while assuming the tests pass after the last fixup commit for the docstrings I just pushed.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the returning -2 - that seems much cleaner! I think that captures everything, so approving. Thanks!

@ngoldbaum ngoldbaum merged commit 74893ab into numpy:main Feb 15, 2024
62 of 63 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants