-
-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add ufunc for np.char.isalpha #24835
Conversation
One problem with the current state of this PR that I can't really find a solution to is this: The $ export PYTHONPATH="/Users/lysnikolaou/repos/python/numpy/build-install/usr/lib/python3.11/site-packages"
🐍 Launching Python with PYTHONPATH="/Users/lysnikolaou/repos/python/numpy/build-install/usr/lib/python3.11/site-packages"
$ /usr/bin/env python -P
Python 3.11.3 (tags/v3.11.3:f3909b8bc8, May 29 2023, 12:51:44) [Clang 14.0.3 (clang-1403.0.22.14.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> a = np.array([['abc', 'def'], ['world', ' he']])
>>> np.core.multi
np.core.multiarray np.core.multiply(
>>> np.core.multiarray.isalpha(a)
array([[ True, True],
[ True, False]])
>>> np.char.isalpha(a)
array([[ True, True],
[ True, False]]) However, I'm getting a value error when passing a $ export PYTHONPATH="/Users/lysnikolaou/repos/python/numpy/build-install/usr/lib/python3.11/site-packages"
🐍 Launching Python with PYTHONPATH="/Users/lysnikolaou/repos/python/numpy/build-install/usr/lib/python3.11/site-packages"
$ /usr/bin/env python -P
Python 3.11.3 (tags/v3.11.3:f3909b8bc8, May 29 2023, 12:51:44) [Clang 14.0.3 (clang-1403.0.22.14.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> a = np.char.array([['abc', 'def'], ['world', ' he']])
>>> np.core.multiarray.isalpha(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/lysnikolaou/repos/python/numpy/build-install/usr/lib/python3.11/site-packages/numpy/core/defchararray.py", line 2099, in __array_finalize__
raise ValueError("Can only create a chararray from string data.")
ValueError: Can only create a chararray from string data. It seems that the |
Yes, that should be happening. Unlike a custom function, UFuncs will correctly preserve subclass information. Could probably add something like (untested):
to make it work, I doubt that has any negative side-effects. Maybe |
Is there a way to tell the ufunc to force a specific return type? Because |
You can use EDIT: Or does |
The same is still happening after applying this: numpy on string-ufuncs-isalpha [$!?] is 📦 v2.0.0.dev0 via 🐍 pyenv mambaforge (venv)
❯ git diff
diff --git a/numpy/core/defchararray.py b/numpy/core/defchararray.py
index c3968bc43..2c2c7c955 100644
--- a/numpy/core/defchararray.py
+++ b/numpy/core/defchararray.py
@@ -2093,6 +2093,11 @@ def __new__(subtype, shape, itemsize=1, unicode=False, buffer=None,
_globalvar = 0
return self
+ def __array_prepare__(self, arr, context=None):
+ if arr.dtype.char in "US":
+ return arr.view(type(self))
+ return arr
+
def __array_finalize__(self, obj):
# The b is a special case because it is used for reconstructing.
if not _globalvar and self.dtype.char not in 'SUbc': |
Nice. I think the |
Similar results:
|
@ngoldbaum Is |
I don't see this when I build stringdtype using numpy's main branch or this PR. Can you open an issue over on the numpy-user-dtypes repo with more detail about what's happening on your setup? Maybe it's just a documentation thing but if it isn't I'd like to know what the problem is. FWIW, this does break |
* however it may be that this should be moved into `auxdata` eventually, | ||
* which may also be slightly faster/cleaner (but more involved). | ||
*/ | ||
int len = steps[0] / sizeof(character); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had ignored it, assuming this was a first iteration for testing. Yes, I know it adds some boilerplate, but we should probably just create the ufunc without any loops and then add them the same way as the string equality is done.
Then change the signature of this loop and the above comment actually can make sense.
Right now, you should get crashes for some sliced arrays arr = arr[::10]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seberg I think I've got something better now. Can you check again and let me know whether I've missed anything else?
3c6d570
to
e767771
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM now, I would like the rstrip
template parameter to be removed from isalpha
mainly.
I will let Matti take a final quick look when you feel my comments are addressed well enough.
@@ -1137,6 +1144,13 @@ def english_upper(s): | |||
TD(O), | |||
signature='(n?,k),(k,m?)->(n?,m?)', | |||
), | |||
'isalpha' : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure: this only shows up in the internal ufunc
namespace? (actually I guess tests would fail if not)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct, yes.
docstrings.get('numpy.core.umath.isalpha'), | ||
None, | ||
[TypeDescription(U, EmptyFunctionTypeDescr, U, '?'), | ||
TypeDescription(S, EmptyFunctionTypeDescr, S, '?')], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could just leave this empty, instead of the EmptyFunctionTypeDescr
? If not, fine, but maybe add a comment to EmptyFunctionTypeDescr
to explain what it is used for.
Yes, this ensures it shows up in .types
and it may actually be a reasonable way to ensure that in practice. But TBH, it is half-private anyway, so maybe we can just ignore it for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wasn't possible before cause of a syntax error on empty arrays under MSVC, but I wrote some code so that it's okay.
I'm glad it was relatively straightforward to add support for string ufuncs besides the logic operators, looking forward to improved string operator performance! 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments, and some documentation nits.
return -1; | ||
}} | ||
f = PyUFunc_FromFuncAndDataAndSignatureAndIdentity( | ||
NULL, NULL, NULL, {nloops}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only this line is different between the two stanzas. It should be possible to refactor this
- to select on an attribute of
uf
rather than have a separateempty
list, passing it around seems awkward. Something like what I suggested above? - extra points for tweaking
fmt
for an emptyuf
rather than rewriting the whole thing
Thanks @lysnikolaou |
@charris This was the first one of the PRs adding ufuncs for |
While it seems nice that these are backportable. I don't really think they should be backported for a 1.26.x bug-fix release. These are quite large changes, and chances are some behavior is changed. |
Agree with @seberg here, we don't usually add new features in patch releases. |
Got it, thanks both for your input! |
No description provided.