Skip to content

Conversation

@SwayamInSync
Copy link
Member

This PR is part of the work making quaddtype compatible with NumPy testing for longdouble

  • Adds the support for making quaddtype values from bytes
  • Some refactors along defining locks (as they maybe later use in multiple files)
  • Adding tests

Copy link
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spotted a bug and have a suggestion to refactor to avoid the buggy pattern you're replicating that occurs elsewhere in the library.

Otherwise looks good although I didn't look carefully at the tests.

}
char *endptr = NULL;
if (backend == BACKEND_SLEEF) {
self->value.sleef_value = Sleef_strtoq(s, &endptr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here and in the other path you should check if the returned value is zero. If it is, the conversion failed and you should exit early.

I'm not sure but I think it's likely that in the current code in that case, endptr would still be NULL in that case and you'd segfault when you deference endptr in the next if block.

I'd also add some explicit tests for strings that contain values that are outside the range of representable values. You might also need to deal with errno, I'm not sure if sleef sets that like strtold is supposed to do.

It looks like we have similar bugs in our other uses of strtold and Sleef_strtoq:

goldbaum at Nathans-MBP in ~/Documents/numpy-user-dtypes on 216!
± rg -A 5 strtold
quaddtype/numpy_quaddtype/src/scalar.c
167:            self->value.longdouble_value = strtold(s, &endptr);
168-        }
169-        if (*endptr != '\0' || endptr == s) {
170-            PyErr_SetString(PyExc_ValueError, "Unable to parse string to QuadPrecision");
171-            Py_DECREF(self);
172-            return NULL;

quaddtype/numpy_quaddtype/src/dtype.c
364:        long double val = strtold(buffer, &endptr);
365-        if (endptr == buffer) {
366-            return 0;  /* Return 0 on parse error (no items read) */
367-        }
368-        *(long double *)dptr = val;
369-    }
--
387:        long double val = strtold(s, endptr);
388-        if (*endptr == s) {
389-            return -1;
390-        }
391-        *(long double *)dptr = val;
392-    }

Maybe it makes sense to refactor this operation into a new function called string_to_quad?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually also had this doubt, so I read about it and as per C standards page-343's point 4 and 7 say

  • If no conversion is performed, strtold returns 0.0L and sets endptr to s (the same string as input).
  • If a conversion partially succeeds, endptr will point to the first character after the converted part.

so in both possible cases, it can't be NULL

And from SLEEF doc's they say

This is a QP version of the strtod function.

So this should also follow the same rules.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might also need to deal with errno, I'm not sure if sleef sets that like strtold is supposed to do.

I am afraid as SLEEF does not set errno (as in implementation of nextafter, I was setting those myself in the draft PR)

@pytest.mark.parametrize("invalid_bytes", [
b"", # Empty bytes
b"not_a_number", # Invalid format
b"1.23abc", # Trailing garbage
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some edge cases to make the sleef_strtoq fail

@ngoldbaum
Copy link
Member

ngoldbaum commented Nov 10, 2025

Sorry for missing all that!

Still, what do you think about refactoring the code I commented on into a new cstring_to_quad helper function? I think there are three other spots in the library that could use it.

@SwayamInSync
Copy link
Member Author

Yup, I'll refactor that part into a utility header (will keep doing these capsule size refactors in future PRs as well)

Copy link
Member

@jorenham jorenham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could help to add bytes to the allowed input types in the stubs:

_IntoQuad: TypeAlias = (
QuadPrecision
| float
| str
| np.floating[Any]
| np.integer[Any]
| np.bool_
) # fmt: skip

@SwayamInSync
Copy link
Member Author

It could help to add bytes to the allowed input types in the stubs:

_IntoQuad: TypeAlias = (
QuadPrecision
| float
| str
| np.floating[Any]
| np.integer[Any]
| np.bool_
) # fmt: skip

Ah yes, sorry I missed that

@SwayamInSync
Copy link
Member Author

Also python int too right? (or it gets handle as np.integer)

@jorenham
Copy link
Member

jorenham commented Nov 10, 2025

int

float already includes int (it contradicts the runtime behaviour, which I think is horrible, but it's the way it is). So basically type-checkers pretend that float is the same as float | int

@SwayamInSync
Copy link
Member Author

@ngoldbaum this is ready

Copy link
Member

@jorenham jorenham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the static typing changes look good

else
{
out_value->longdouble_value = strtold(str, endptr);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this function return int, do the error-handling, and return -1 on error and 0 on success? Unless it was intentional for some reason to leave the error handling different at all the call sites, I don't think it was.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually for NPY_DT_PyArray_ArrFuncs_fromstr NumPy passes its own endptr and we use exactly that in parsing, this way NumPy checks if *endptr moved forward (if not, it throws the "unmatched data" error). Here is the declaration link in NumPy link

That's why I thought to keep it like this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there are 2 cases, where we check the exceptions by defninig our own endptr and some where NumPy keeps track

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you handle that with a boolean flag or something? partial_conversion_check?

if (endptr == s) {
    // didn't parse anything
    return -1;
}
if (partial_conversion_check && endptr != "\0") {
    // characters remain to be converted
    return -1;
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be possible as

int cstring_to_quad(const char *str, QuadBackendType backend, quad_value *out_value, 
char **endptr, bool require_full_parse)
{
  if(backend == BACKEND_SLEEF) {
    out_value->sleef_value = Sleef_strtoq(str, endptr);
  } else {
    out_value->longdouble_value = strtold(str, endptr);
  }
  if(*endptr == str) 
    return -1; // parse error - nothing was parsed
  
  // If full parse is required
  if(require_full_parse && **endptr != '\0')
    return -1; // parse error - characters remain to be converted
  
  return 0; // success
}

Both endptr and require_full_parse need to be coming from the calling context can decide passing true or false for example in NPY_DT_PyArray_ArrFuncs_fromstr should pass false and others true

Let me know @ngoldbaum if you want something like this then I can proceed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that's better than scattering the error checking throughout the code. Either way you need to know which kind of error checking you need to do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, much cleaner now!

Copy link
Contributor

@juntyr juntyr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

"2.71828182845904523536028747135266249775",
])
def test_bytes_encoding_compatibility(self, test_str):
"""Test that bytes created from different encodings work correctly."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which encodings do we support, just ASCII and UTF8? Is this the same as numpy and is it documented somewhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK
Strings: UTF-8 (Python 3 strings are Unicode)
Bytes: Any encoding, but numeric literals are typically ASCII

@SwayamInSync
Copy link
Member Author

Merging this in!

@SwayamInSync SwayamInSync merged commit 7094b3c into numpy:main Nov 12, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants