-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Change add/isalpha ufuncs to use buffer class & general clean-up #25062
Conversation
c9a1f5a
to
f794aa1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments, overall this looks nice.
return memcmp(buf, other.buf, len); | ||
} | ||
|
||
inline void | ||
buffer_memcpy(void *out, npy_int64 n_chars) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought a little bit about if this API makes any sense for UTF-8 and I think it might be OK.
pypy has a "fastutf8" implementation that they use to support O(1) indexing into UTF-8 strings. It's based around an index
struct: https://github.com/mozillazg/pypy/blob/426d4e15472884bdf109f6745160c3f77a0ea79e/rpython/rlib/fastutf8/src/utf8.c#L59-L63.
I think the UTF-8 version of Buffer
could adapt some of this pypy fast string indexing code so that when a Buffer
is created from a UTF-8 string it also stores an index along with the byte stream. This is a single O(N) operation but then allows fast indexing thereafter and should require a smaller allocation than copying to UCS-4, which also requires an O(N) pass over the UTF-8 bytes.
@ngoldbaum Could you review this again please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spotted two more minor issues I missed on the first pass. I'm going to go ahead and merge this as-is though to unblock you since these are preexisting issues. Thanks for following up on this.
No description provided.