BUG/REG: Possible bug/regression in `np.vectorize` with strings #26269

seberg · 2024-04-12T14:16:00Z

Copying from pydata/xarray#8932 (comment) just to at least check. I am not sure there is too much to be done here if the string length changes, to be honest (and we have better strings now).
It would seem that np.vectorize is probably just fundamentally tricky around it (i.e. why NumPy uses vectorize with object output and then casts the result, at least unless the result length is obvious).

On the other hand, I would feel a bit safer knowing why it changed, and I don' tknow right now:

Coping from @keewis, there:

The pure numpy reproducer is (which was kinda what @kmuehlbauer described, but now you can try yourself):

import numpy as np

arr = np.array(["SOme wOrd Ǆ ß ᾛ ΣΣ ﬃ⁵Å Ç Ⅰ"]).astype(np.str_)
f = str.casefold
print(np.vectorize(f, otypes=[arr.dtype])(arr))

str.casefold returns a string that's four characters longer, and before numpy<2 the dtype would be extended to fit the string (<30U) while with numpy>=2 the old dtype would be used, truncating the string.

str.capitalize does not change the length of the string, which is why that doesn't fail.

FWIW, the new string dtype doesn't have that issue, and numpy.strings has a lot of the vectorized functions we've been creating using np.vectorize (i.e. those will most likely be much faster).

The text was updated successfully, but these errors were encountered:

ngoldbaum · 2024-04-12T14:19:26Z

Maybe because of #26136? Although looking at that now I don't think it was backported. If that's the reason, could add an exception there if dtype.kind is S or U.

seberg · 2024-04-12T14:29:41Z

Aha, that seems likely, the time frame of 2-3 week had come up! I am not sure that the CI run in question isn't using the nightlies (@keewis you probably know?).

But in that case, it is also good to know that we don't have to worry about it for 2.0.

keewis · 2024-04-12T16:12:33Z

we're indeed using the nightlies from scientific-python-nightly-wheels (not the release candidate), so yes, that might be it.

Edit: do you know what the upload frequency of that is? Once per week?
Edit2: in any case, thanks a lot for the very quick fix, @seberg, @ngoldbaum

seberg · 2024-04-12T16:52:14Z

Edit: do you know what the upload frequency of that is? Once per week?

Twice a week: on Wednesday and Sunday (although, now that 2.0 is out, I guess we could reduce it again to once a week), and occasionally we trigger it manually to unblock something or push a change out quickly.

ngoldbaum mentioned this issue Apr 12, 2024

BUG: ensure np.vectorize doesn't truncate fixed-width strings #26270

Merged

charris closed this as completed in #26270 Apr 12, 2024

charris mentioned this issue Apr 12, 2024

BUG: Fixes for np.vectorize. #26272

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG/REG: Possible bug/regression in `np.vectorize` with strings #26269

BUG/REG: Possible bug/regression in `np.vectorize` with strings #26269

seberg commented Apr 12, 2024 •

edited by ngoldbaum

ngoldbaum commented Apr 12, 2024

seberg commented Apr 12, 2024

keewis commented Apr 12, 2024 •

edited

seberg commented Apr 12, 2024

BUG/REG: Possible bug/regression in np.vectorize with strings #26269

BUG/REG: Possible bug/regression in np.vectorize with strings #26269

Comments

seberg commented Apr 12, 2024 • edited by ngoldbaum

ngoldbaum commented Apr 12, 2024

seberg commented Apr 12, 2024

keewis commented Apr 12, 2024 • edited

seberg commented Apr 12, 2024

BUG/REG: Possible bug/regression in `np.vectorize` with strings #26269

BUG/REG: Possible bug/regression in `np.vectorize` with strings #26269

seberg commented Apr 12, 2024 •

edited by ngoldbaum

keewis commented Apr 12, 2024 •

edited