FEAT: Adding array casting support to fixed length strings #225

SwayamInSync · 2025-11-13T11:22:47Z

This PR introduces the support for np.str_ and str (will work on bytes after this gets merged)

juntyr · 2025-11-13T12:15:52Z

Slightly off topic - what's the current timeline for the next release?

SwayamInSync · 2025-11-13T12:19:52Z

Slightly off topic - what's the current timeline for the next release?

Got a bit busy with new stuff and job, NumPy 2.4 is about to be out so probably at same time.
Hopefully all these major integrations will be completed by then (with the upcoming bytes support PR, quaddtype will be good enough of passing NumPy's test_longdouble.py)

ngoldbaum

Just a few comments. I didn't carefully review the reference counting.

ngoldbaum · 2025-11-13T19:22:30Z

quaddtype/numpy_quaddtype/src/casts.cpp

+                PyErr_Format(PyExc_ValueError,
+                             "Cannot cast non-ASCII character '%c' to QuadPrecision", c);
+                return -1;
+            }


won't the string-conversion functions called by cstring_to_quad catch this?

By the way, you might take inspiration from the string_to_long_double function in numpy:

https://github.com/numpy/numpy/blob/fabf1844667c81344ec0d0ab4958cde9cb9a845c/numpy/_core/src/multiarray/arraytypes.c.src#L533-L595

You may also want to simply copy/paste and vendor the code in NumPyOS_ascii_strtold into this project to get a parser that act exactly like numpy's.

Hmm reading the code, there is not much difference, we directly working with UCS but NumPy converts it first to UTF-8 ( we can do this) and special case handling, white spaces, etc are already being handled by the Sleef_strotoq and strtold except the handling of locale which we might not be able to do as

SLEEF's conversion function does no support that

strtold_l (used by NumPyOS_ascii_strtold) is not C standard (a posix extension)

Also just realised NumPy's longdouble and float64 string to dtype conversion's are inconsistent

a = np.array(['1.0000000000000002', " 1 "], dtype=np.str_) print(repr(a)) print(a.astype(QuadPrecDType())) print(a.astype(np.float64)) print(a.astype(np.longdouble)) # output array(['1.0000000000000002', ' 1 '], dtype='<U18') [1. 1.] [1. 1.] Traceback (most recent call last): File "/home/OSS/numpy-user-dtypes/main.py", line 64, in <module> print(a.astype(np.longdouble)) ^^^^^^^^^^^^^^^^^^^^^^^ ValueError: invalid literal for long double: 1

longdouble does not handle the end trailing whitespaces (In quaddtype I am handling it, by setting require_full_parse to false)

https://github.com/numpy/numpy/blob/fabf1844667c81344ec0d0ab4958cde9cb9a845c/numpy/_core/src/multiarray/arraytypes.c.src#L579

this line is the issue

We can fix it, loadtxt strips whitespace anyway (unless you quote), np.fromtxt probably chokes on it, although I wouldn't recommend it over loadtxt anyway.

So yeah, probably reasonable to fix, but also seems very low priority to me.

quaddtype/numpy_quaddtype/src/casts.cpp

SwayamInSync · 2025-11-15T18:38:47Z

The overall diff became quite big but most of the code is inspired from NumPy with slight change logic + Additional tests. So probably casts.cpp will be the only file for focussed review.
From my side I also checked the reference counting, doesn't seem any memory leak

Maybe put the casting helper functions into utility file as well (to cut the diff) but as of now any other files does not seem to include them

ngoldbaum

I skimmed this - unfortunately I don't have time to give this a low-level code review.

Overall, this looks awesome. I think clearly indicating when stuff is copied from NumPy is fine.

Maybe @seberg wants to look this over too?

SwayamInSync · 2025-11-18T16:43:45Z

Cool also independent of this work, @ngoldbaum @seberg should we also patch the longdouble inconsistency? shown #225 (comment) within this thread

seberg

I didn't have a very close look, but overall this looks reasonable to me. There is a small issue with byte-swapped unicode string inputs, but that is rather niche (and will fail obvious anyway).

seberg · 2025-11-19T09:35:13Z

quaddtype/numpy_quaddtype/src/casts.cpp

+                                    npy_intp *view_offset)
+{
+    Py_INCREF(given_descrs[0]);
+    loop_descrs[0] = given_descrs[0];


This really needs a canonical/ISNBO call, since you presumably do not support byte-swapping.

But, I have to ask: I guess NumPy forces you to implement this? Maybe we should change this in some form, even passing the slot as NULL explicitly to mean: Just use the default already.

Because this smells like the default would work fine so that this is all just a bunch of boilerplate...

Setting the slot NULL, gives segfault at runtime, also in future (post version-1 release) I was thinking to support the byteswapping

Yeah, you can't set it explicitly to NULL right now. But omitting it doesn't work, I guess? I.e. NumPy rejects it because your dtype is parametric or so?!

If you want to support NBO, then yes, the default won't work for you anyway. I was thinking of filling in the default if the slot is NULL, but not erroring in case omitting it is rejected by NumPy. (I.e. explicitly tell NumPy: I know the default promoter is fine, even if you think it may not be.)

I don't think byte-swapping support is really worthwhile to bother here, but sure, if you want to add it you'll need it anyway, I guess.

seberg · 2025-11-19T09:38:00Z

quaddtype/numpy_quaddtype/src/casts.cpp

+}
+
+static int
+unicode_to_quad_strided_loop_unaligned(PyArrayMethod_Context *context, char *const data[],


Btw you don't have to implement an unaligned version if you don't want to. NumPy will just make another cast/copy for you.

Oh maybe that's why in my earlier experiments I was almost never able to hit the unaligned loop execution. I thought maybe my system is sophisticated and they might be required for older systems.

Also @seberg is this true always, I mean in that case I can remove all the unaligned loops added but still keep the flag of NPY_METH_SUPPORTS_UNALIGNED

You must support unaligned for the quaddtype <-> quaddtype conversions, but not for any others (because for any other, NumPy can chain them to ensure alignment).

For most systems these days unaligned access just works (although it may be a bit slower, IIRC). But, UBSAN or so likes to shout at you anyway and also it is not true for all systems.
If you know that it is irrelevant for a system, you could probably only write the unaligned version and the compiler will be smart enough to optimize the extra memcpy's away (the other way works, but UBSAN...).

Either way, there is no big need to set NPY_METH_SUPPORTS_UNALIGNED, only for the "copy" one I enforced it in NumPy to ensure that unaligned arrays are guaranteed to work.

SwayamInSync · 2025-11-19T14:56:28Z

Cool so I think this PR is fine as some pointed out work can be useful in future if we support the byte-swapping (othewise we can refactor it there itself)

Proceeding to merge it in!

seberg · 2025-11-19T15:19:27Z

Cool so I think this PR is fine as some pointed out work can be useful in future if we support the byte-swapping (othewise we can refactor it there itself)

Not quite. As is, the tests should fail if you pass in a byteswapped unicode string (legacy dtype)? The resolve_dtypes function needs to ensure a native byte order for unicode.
The unaligned seems unnecessary, but sure it is completely fine and good.

SwayamInSync · 2025-11-19T15:40:33Z

Oh sorry I mistakenly pushed that commit into a different branch (for byte casting support)
Let me copy paste the details here and push it.

Keeping the unaligned as you mentioned UBSAN complains about it so

SwayamInSync · 2025-11-19T15:57:57Z

quaddtype/numpy_quaddtype/src/casts.cpp

+                                    npy_intp *view_offset)
+{
+    if (!PyArray_ISNBO(given_descrs[0]->byteorder)) {
+        loop_descrs[0] = PyArray_DescrNewByteorder(given_descrs[0], NPY_NATIVE);


Creating new one might be better than just throwing incompatibility error

seberg · 2025-11-19T16:00:05Z

quaddtype/numpy_quaddtype/src/casts.cpp

+    else {
+        Py_INCREF(given_descrs[0]);
+        loop_descrs[0] = given_descrs[0];
+    }


Hmmmm yeah this works, so fine to move on.

I am a bit surprised that (but not much) that I never made something like NPY_DT_CALL_ensure_canonical public? That seems like a pretty big oversight as it is a rather common need here.
(For simple builtins, canonical is the same as ensuring native byteorder)

Also is there a case that PyArray_GetDefaultDescr gives the descriptor with native byte order?
I use this for float->quad casting and with '>f4' this seems to work

That resolve_descriptor is basically a general descriptor for every numeric->quad cast,
IDK why I did that (1 year ago) sound illogical now

PyArray_GetDefaultDescr will always be native and is defined for these non-parametric types.
So I think that would also be fine to use for sure. (There is one silly difference that doesn't even matter here, this path preserves metadata I suspect. But while metadata seems kinda useful these days, I don't think propagating it is useful.)

SwayamInSync · 2025-11-20T04:51:58Z

okay merging this in now!

SwayamInSync added 5 commits November 13, 2025 09:51

small refactor

0f223d3

added string_to_quad and tests

77def74

adapt to size

84c4dfd

roundtrip

a38a651

T and S10 TBA

4cdef99

SwayamInSync added the numpy_quaddtype label Nov 13, 2025

file reformat

d1605c7

SwayamInSync requested a review from ngoldbaum November 13, 2025 15:40

ngoldbaum reviewed Nov 13, 2025

View reviewed changes

SwayamInSync added 4 commits November 15, 2025 13:23

refactoring duplicated logic

7e9df19

better string handling

10b7964

import math

0e97ae7

casting as per size

fa70b10

SwayamInSync requested a review from ngoldbaum November 15, 2025 16:21

simplify trailing whitespace logic

d76ef3f

ngoldbaum approved these changes Nov 18, 2025

View reviewed changes

seberg reviewed Nov 19, 2025

View reviewed changes

change here

0e0614c

SwayamInSync commented Nov 19, 2025

View reviewed changes

seberg reviewed Nov 19, 2025

View reviewed changes

SwayamInSync merged commit 13d62b1 into numpy:main Nov 20, 2025
11 checks passed

SwayamInSync mentioned this pull request Nov 20, 2025

FEAT: Adding Quad to and from bytes array casting support #228

Open

Uh oh!

FEAT: Adding array casting support to fixed length strings #225

FEAT: Adding array casting support to fixed length strings #225

Uh oh!

Conversation

SwayamInSync commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

juntyr commented Nov 13, 2025

Uh oh!

SwayamInSync commented Nov 13, 2025

Uh oh!

ngoldbaum left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SwayamInSync Nov 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

SwayamInSync commented Nov 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngoldbaum left a comment

Choose a reason for hiding this comment

Uh oh!

SwayamInSync commented Nov 18, 2025

Uh oh!

seberg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SwayamInSync commented Nov 19, 2025

Uh oh!

seberg commented Nov 19, 2025

Uh oh!

SwayamInSync commented Nov 19, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SwayamInSync commented Nov 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

SwayamInSync commented Nov 13, 2025 •

edited

Loading

SwayamInSync Nov 15, 2025 •

edited

Loading

SwayamInSync commented Nov 15, 2025 •

edited

Loading