New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG/REG: Possible bug/regression in np.vectorize
with strings
#26269
Comments
Maybe because of #26136? Although looking at that now I don't think it was backported. If that's the reason, could add an exception there if |
Aha, that seems likely, the time frame of 2-3 week had come up! I am not sure that the CI run in question isn't using the nightlies (@keewis you probably know?). But in that case, it is also good to know that we don't have to worry about it for 2.0. |
we're indeed using the nightlies from scientific-python-nightly-wheels (not the release candidate), so yes, that might be it. Edit: do you know what the upload frequency of that is? Once per week? |
Twice a week: on Wednesday and Sunday (although, now that 2.0 is out, I guess we could reduce it again to once a week), and occasionally we trigger it manually to unblock something or push a change out quickly. |
Copying from pydata/xarray#8932 (comment) just to at least check. I am not sure there is too much to be done here if the string length changes, to be honest (and we have better strings now).
It would seem that
np.vectorize
is probably just fundamentally tricky around it (i.e. why NumPy uses vectorize with object output and then casts the result, at least unless the result length is obvious).On the other hand, I would feel a bit safer knowing why it changed, and I don' tknow right now:
Coping from @keewis, there:
The pure numpy reproducer is (which was kinda what @kmuehlbauer described, but now you can try yourself):
str.casefold
returns a string that's four characters longer, and before numpy<2 the dtype would be extended to fit the string (<30U) while with numpy>=2 the old dtype would be used, truncating the string.str.capitalize
does not change the length of the string, which is why that doesn't fail.FWIW, the new string dtype doesn't have that issue, and numpy.strings has a lot of the vectorized functions we've been creating using np.vectorize (i.e. those will most likely be much faster).
The text was updated successfully, but these errors were encountered: