Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add rest of unary ufuncs for unicode/bytes dtypes #25791

Merged
merged 11 commits into from
Feb 17, 2024

Conversation

lysnikolaou
Copy link
Contributor

No description provided.

@lysnikolaou lysnikolaou changed the title ENH: Add islower/isupper/istitle ufuncs for unicode/bytes dtypes ENH: Add rest of unary ufuncs for unicode/bytes dtypes Feb 13, 2024
@ngoldbaum
Copy link
Member

Refactoring all the unary loops to call a single helper makes sense to me, looks like a nice cleanup which I will do in stringdtype after this.

@lysnikolaou
Copy link
Contributor Author

lysnikolaou commented Feb 14, 2024

Refactoring all the unary loops to call a single helper makes sense to me, looks like a nice cleanup which I will do in stringdtype after this.

Well, that was your design in stringdtype, which I kinda replicated. So you've done this already here!

@lysnikolaou
Copy link
Contributor Author

@ngoldbaum I did a minor refactoring of the test classes/methods, but I'm not sure if that's better for you or not. Could you please have a look?

Copy link
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests look good now, thanks for doing that refactor! I suspect a large fraction of the xfails in numpy can be safely removed and probably should have been specified with strict=True so I'm trying to make sure that happens going forward when it's possible (e.g. a failure that only happens on some architectures can't be strict).

@ngoldbaum
Copy link
Member

OK, I'll go ahead and add implementations for stringdtype in this PR, I think that will be pretty straightforward.

@ngoldbaum
Copy link
Member

Nice, did it in 20 minutes :)

@seberg or @mhvk since you've been reviewing string stuff, care to take a look at this?

@mhvk
Copy link
Contributor

mhvk commented Feb 14, 2024

I had a quick look earlier and it seemed a case where #25796 will make the PR a lot smaller. So, I'd prefer to review after that...

Independently, it really speaks for the whole approach that it is now so relatively easy to add loops for all the string types. Kudos!

@lysnikolaou
Copy link
Contributor Author

FYI I'll reimplement the unary ufuncs with the new approach after #25775 has been merged.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lysnikolaou - I think there still is a large gain to be made by not generating separate inner loops for each of the boolean query functions but rather passing in the method that should be called - see inline comment.

numpy/_core/src/umath/string_ufuncs.cpp Outdated Show resolved Hide resolved
@mhvk
Copy link
Contributor

mhvk commented Feb 15, 2024

Hmm, looks like I was reviewing the wrong PR, misreading that you had already changed it, sorry!!

@lysnikolaou
Copy link
Contributor Author

This is ready for review now.

@ngoldbaum
Copy link
Member

It looks like the windows pypy failures are real, let me know if you want a hand debugging that.

@lysnikolaou
Copy link
Contributor Author

It looks like the windows pypy failures are real, let me know if you want a hand debugging that.

Oh, I didn't see that. If you have access to a Windows machine and can spare some cycles on this, it'd really help me out, cause I don't have access to Windows right now (we'd have to wait for a bit until I do).

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Sadly, no insight in the pypy failures, so clearly I am missing something but I'll approve regardless for the pieces that I do understand...

numpy/_core/src/umath/string_buffer.h Outdated Show resolved Hide resolved
numpy/_core/src/umath/string_ufuncs.cpp Show resolved Hide resolved
};

for (int i = 0; i < 9; i++) {
if (i < 7) { // isdecimal & isnumeric do not support ASCII
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really mind either way, but is there a reason that isdecimal and isnumeric are not supported?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the python3 APIs we're implementing are unicode-aware:

https://docs.python.org/3/library/stdtypes.html#str.isdecimal
https://docs.python.org/3/library/stdtypes.html#str.isnumeric

Thankfully they're not locale-aware, that would be way worse.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess in principle one could support it right? I'm totally fine not doing so, though!

@ngoldbaum
Copy link
Member

It turns out to be a pypy bug - probably OK to just xfail the tests on pypy (unless @mattip would prefer something else?).

@ngoldbaum
Copy link
Member

pypy/pypy#4901

@lysnikolaou
Copy link
Contributor Author

Thanks for looking into this @ngoldbaum! I've xfail'd the tests.

@lysnikolaou
Copy link
Contributor Author

The two test failures seem unrelated now.

@ngoldbaum
Copy link
Member

Darn, not sure what's up with that sanitizers heisenbug. I hoped that was fixed by #25834 but apparently not. Restarted the failed test jobs.

@ngoldbaum
Copy link
Member

Passing now. @mhvk looked this over and approved, so let’s bring this one in.

@ngoldbaum ngoldbaum merged commit 9a73ab2 into numpy:main Feb 17, 2024
64 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants