-
-
Notifications
You must be signed in to change notification settings - Fork 13
FEAT: Adding Byte support to QuadPrecision scalar constructor #223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ngoldbaum
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spotted a bug and have a suggestion to refactor to avoid the buggy pattern you're replicating that occurs elsewhere in the library.
Otherwise looks good although I didn't look carefully at the tests.
| } | ||
| char *endptr = NULL; | ||
| if (backend == BACKEND_SLEEF) { | ||
| self->value.sleef_value = Sleef_strtoq(s, &endptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here and in the other path you should check if the returned value is zero. If it is, the conversion failed and you should exit early.
I'm not sure but I think it's likely that in the current code in that case, endptr would still be NULL in that case and you'd segfault when you deference endptr in the next if block.
I'd also add some explicit tests for strings that contain values that are outside the range of representable values. You might also need to deal with errno, I'm not sure if sleef sets that like strtold is supposed to do.
It looks like we have similar bugs in our other uses of strtold and Sleef_strtoq:
goldbaum at Nathans-MBP in ~/Documents/numpy-user-dtypes on 216!
± rg -A 5 strtold
quaddtype/numpy_quaddtype/src/scalar.c
167: self->value.longdouble_value = strtold(s, &endptr);
168- }
169- if (*endptr != '\0' || endptr == s) {
170- PyErr_SetString(PyExc_ValueError, "Unable to parse string to QuadPrecision");
171- Py_DECREF(self);
172- return NULL;
quaddtype/numpy_quaddtype/src/dtype.c
364: long double val = strtold(buffer, &endptr);
365- if (endptr == buffer) {
366- return 0; /* Return 0 on parse error (no items read) */
367- }
368- *(long double *)dptr = val;
369- }
--
387: long double val = strtold(s, endptr);
388- if (*endptr == s) {
389- return -1;
390- }
391- *(long double *)dptr = val;
392- }
Maybe it makes sense to refactor this operation into a new function called string_to_quad?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually also had this doubt, so I read about it and as per C standards page-343's point 4 and 7 say
- If no conversion is performed,
strtoldreturns0.0Land setsendptrtos(the same string as input). - If a conversion partially succeeds,
endptrwill point to the first character after the converted part.
so in both possible cases, it can't be NULL
And from SLEEF doc's they say
This is a QP version of the
strtodfunction.
So this should also follow the same rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might also need to deal with errno, I'm not sure if sleef sets that like strtold is supposed to do.
I am afraid as SLEEF does not set errno (as in implementation of nextafter, I was setting those myself in the draft PR)
| @pytest.mark.parametrize("invalid_bytes", [ | ||
| b"", # Empty bytes | ||
| b"not_a_number", # Invalid format | ||
| b"1.23abc", # Trailing garbage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some edge cases to make the sleef_strtoq fail
|
Sorry for missing all that! Still, what do you think about refactoring the code I commented on into a new |
|
Yup, I'll refactor that part into a utility header (will keep doing these capsule size refactors in future PRs as well) |
jorenham
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could help to add bytes to the allowed input types in the stubs:
numpy-user-dtypes/quaddtype/numpy_quaddtype/_quaddtype_main.pyi
Lines 8 to 15 in 2cbc0c6
| _IntoQuad: TypeAlias = ( | |
| QuadPrecision | |
| | float | |
| | str | |
| | np.floating[Any] | |
| | np.integer[Any] | |
| | np.bool_ | |
| ) # fmt: skip |
Ah yes, sorry I missed that |
|
Also python |
|
|
@ngoldbaum this is ready |
jorenham
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the static typing changes look good
| else | ||
| { | ||
| out_value->longdouble_value = strtold(str, endptr); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make this function return int, do the error-handling, and return -1 on error and 0 on success? Unless it was intentional for some reason to leave the error handling different at all the call sites, I don't think it was.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually for NPY_DT_PyArray_ArrFuncs_fromstr NumPy passes its own endptr and we use exactly that in parsing, this way NumPy checks if *endptr moved forward (if not, it throws the "unmatched data" error). Here is the declaration link in NumPy link
That's why I thought to keep it like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So there are 2 cases, where we check the exceptions by defninig our own endptr and some where NumPy keeps track
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you handle that with a boolean flag or something? partial_conversion_check?
if (endptr == s) {
// didn't parse anything
return -1;
}
if (partial_conversion_check && endptr != "\0") {
// characters remain to be converted
return -1;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be possible as
int cstring_to_quad(const char *str, QuadBackendType backend, quad_value *out_value,
char **endptr, bool require_full_parse)
{
if(backend == BACKEND_SLEEF) {
out_value->sleef_value = Sleef_strtoq(str, endptr);
} else {
out_value->longdouble_value = strtold(str, endptr);
}
if(*endptr == str)
return -1; // parse error - nothing was parsed
// If full parse is required
if(require_full_parse && **endptr != '\0')
return -1; // parse error - characters remain to be converted
return 0; // success
}Both endptr and require_full_parse need to be coming from the calling context can decide passing true or false for example in NPY_DT_PyArray_ArrFuncs_fromstr should pass false and others true
Let me know @ngoldbaum if you want something like this then I can proceed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that's better than scattering the error checking throughout the code. Either way you need to know which kind of error checking you need to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
ngoldbaum
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, much cleaner now!
juntyr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| "2.71828182845904523536028747135266249775", | ||
| ]) | ||
| def test_bytes_encoding_compatibility(self, test_str): | ||
| """Test that bytes created from different encodings work correctly.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which encodings do we support, just ASCII and UTF8? Is this the same as numpy and is it documented somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK
Strings: UTF-8 (Python 3 strings are Unicode)
Bytes: Any encoding, but numeric literals are typically ASCII
|
Merging this in! |
This PR is part of the work making quaddtype compatible with NumPy testing for longdouble