Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Change add/isalpha ufuncs to use buffer class & general clean-up #25062

Merged
merged 2 commits into from
Nov 7, 2023

Conversation

lysnikolaou
Copy link
Contributor

No description provided.

Copy link
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments, overall this looks nice.

numpy/_core/src/umath/string_buffer.h Show resolved Hide resolved
return memcmp(buf, other.buf, len);
}

inline void
buffer_memcpy(void *out, npy_int64 n_chars)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought a little bit about if this API makes any sense for UTF-8 and I think it might be OK.

pypy has a "fastutf8" implementation that they use to support O(1) indexing into UTF-8 strings. It's based around an index struct: https://github.com/mozillazg/pypy/blob/426d4e15472884bdf109f6745160c3f77a0ea79e/rpython/rlib/fastutf8/src/utf8.c#L59-L63.

I think the UTF-8 version of Buffer could adapt some of this pypy fast string indexing code so that when a Buffer is created from a UTF-8 string it also stores an index along with the byte stream. This is a single O(N) operation but then allows fast indexing thereafter and should require a smaller allocation than copying to UCS-4, which also requires an O(N) pass over the UTF-8 bytes.

numpy/_core/src/umath/string_ufuncs.cpp Outdated Show resolved Hide resolved
@lysnikolaou
Copy link
Contributor Author

@ngoldbaum Could you review this again please?

Copy link
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spotted two more minor issues I missed on the first pass. I'm going to go ahead and merge this as-is though to unblock you since these are preexisting issues. Thanks for following up on this.

numpy/_core/src/umath/string_buffer.h Show resolved Hide resolved
numpy/_core/src/umath/string_buffer.h Show resolved Hide resolved
@ngoldbaum ngoldbaum merged commit c91f775 into numpy:main Nov 7, 2023
59 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants