New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: introduce PyArray_SafeCast to fix issues around stringdtype views #26147
Conversation
if ((type1 != type2) && (type1->kind == 'T')) { | ||
return 0; | ||
} | ||
return 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to dig slightly deeper and use this pattern:
numpy/numpy/_core/src/umath/ufunc_object.c
Lines 790 to 812 in 51fd714
if (dtypes[i] != PyArray_DESCR(op[i])) { | |
npy_intp view_offset; | |
NPY_CASTING safety = PyArray_GetCastInfo( | |
PyArray_DESCR(op[i]), dtypes[i], NULL, &view_offset); | |
if (safety < 0 && PyErr_Occurred()) { | |
/* A proper error during a cast check, should be rare */ | |
return -1; | |
} | |
if (view_offset != 0) { | |
/* NOTE: Could possibly implement non-zero view offsets */ | |
must_copy = 1; | |
} | |
if (force_cast_input && i < nin) { | |
/* | |
* ArrayMethod flagged to ignore casting (logical funcs | |
* can force cast to bool) | |
*/ | |
} | |
else if (PyArray_MinCastSafety(safety, casting) != casting) { | |
return 0; /* the cast is not safe enough */ | |
} | |
} |
Making this not be string dtype specific!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah nice and then I can control how view_offset
gets set in string_to_string_resolve_descriptors
, that seems much more idiomatic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this idea doesn't quite work as cleanly as I'd hoped. If I set view_offset != 0
in string_to_string_resolve_descriptors
, there's an assert that triggers if casting == NPY_NO_CASTING
:
numpy/numpy/_core/src/multiarray/convert_datatype.c
Lines 465 to 482 in e1bf1d6
/* | |
* Check for less harmful non-standard returns. The following two returns | |
* should never happen: | |
* 1. No-casting must imply a view offset of 0. | |
* 2. Equivalent-casting + 0 view offset is (usually) the definition | |
* of a "no" cast. However, changing the order of fields can also | |
* create descriptors that are not equivalent but views. | |
* Note that unsafe casts can have a view offset. For example, in | |
* principle, casting `<i8` to `<i4` is a cast with 0 offset. | |
*/ | |
if (*view_offset != 0) { | |
assert(casting != NPY_NO_CASTING); | |
} | |
else { | |
assert(casting != NPY_EQUIV_CASTING | |
|| (PyDataType_HASFIELDS(from) && PyDataType_HASFIELDS(to))); | |
} | |
return casting; |
In the latest version I'm about to push I guard that assert
with an if statement that checks for stringdtype, but that seems unsatisfying. Maybe the correct thing is for stringdtype to set e.g. NPY_EQUIV_CASTING
for casts with distinct allocators, but when I tried doing that it broke reductions and I couldn't see a clear path forward on fixing that, because the check for valid reductions relies on PyArray_EquivTypes
, which enfoces NPY_NO_CASTING
as the minimum casting level:
numpy/numpy/_core/src/umath/ufunc_object.c
Lines 2394 to 2411 in e1bf1d6
/* | |
* The first operand and output should be the same array, so they should | |
* be identical. The second argument can be different for reductions, | |
* but is checked to be identical for accumulate and reduceat. | |
* Ideally, the type-resolver ensures that all are identical, but we do | |
* not enforce this here strictly. Otherwise correct handling of | |
* byte-order changes (or metadata) requires a lot of care; see gh-20699. | |
*/ | |
if (!PyArray_EquivTypes(out_descrs[0], out_descrs[2]) || ( | |
enforce_uniform_args && !PyArray_EquivTypes( | |
out_descrs[0], out_descrs[1]))) { | |
PyErr_Format(PyExc_TypeError, | |
"the resolved dtypes are not compatible with %s.%s. " | |
"Resolved (%R, %R, %R)", | |
ufunc_get_name_cstr(ufunc), method, | |
out_descrs[0], out_descrs[1], out_descrs[2]); | |
goto fail; | |
} |
Maybe we need a new casting level between NO_CASTING
and EQUIV_CASTING
for this case where data can live outside the array buffer on the descriptor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can probably just remove the assert, just need to make sure to not document anywhere that no-casting implies view (I doubt we use it anywhere, but not sure). It used to, for these strings it can't since they use a whole new mechanism for storage... that seems fine.
(you could keep the assert together with checking that the dtype implements the hook for fixing the dtype in the array creation call, given that reason.)
* We ignore that possibility for simplicity; it really is not our bug. | ||
*/ | ||
return 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still necessary here and in EquivTypes
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove it, as it I am very certain it was fixed in boost python a while ago.
return 0; | ||
} | ||
/* If casting is "no casting" this dtypes are considered equivalent. */ | ||
return PyArray_MinCastSafety(safety, NPY_NO_CASTING) == NPY_NO_CASTING; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that most of this implementation is copied from PyArray_EquivTypes
, to avoid calling GetCastInfo
twice. If I just called EquivTypes
, there would be an unnecessary second call to GetCastInfo
to get the view_offset
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair, could add that as a code comment if you like.
What is the status of this? |
I think it just needs another round of review. It’s not critical for the RC but it would be good to have it in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good, but the view offset needs fixing. I cannot review if other places using eqivtypes shojld be using this meaning, right now.
Its maybe subtle but only really affects strings, so with that fix happy to see it in. (Other comments could wait).
* We ignore that possibility for simplicity; it really is not our bug. | ||
*/ | ||
return 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove it, as it I am very certain it was fixed in boost python a while ago.
return 0; | ||
} | ||
/* If casting is "no casting" this dtypes are considered equivalent. */ | ||
return PyArray_MinCastSafety(safety, NPY_NO_CASTING) == NPY_NO_CASTING; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair, could add that as a code comment if you like.
@@ -79,7 +79,8 @@ string_to_string_resolve_descriptors(PyObject *NPY_UNUSED(self), | |||
return NPY_UNSAFE_CASTING; | |||
} | |||
|
|||
*view_offset = 0; | |||
// views are only legal between descriptors that share allocators (e.g. the same object) | |||
*view_offset = descr0->allocator != descr1->allocator; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wrong. You must set it to 0 or leave it unchanged! (Intp min, I think)
* is true as well. | ||
*/ | ||
NPY_NO_EXPORT unsigned char | ||
PyArray_ViewableTypes(PyArray_Descr *type1, PyArray_Descr *type2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can encode the equiv in the name? Or pass in the cast safety? Passing it in would make it work in the ufunc code I think.
But if not its OK.
@@ -1609,7 +1652,11 @@ _array_fromobject_generic( | |||
|
|||
/* One more chance for faster exit if user specified the dtype. */ | |||
oldtype = PyArray_DESCR(oparr); | |||
if (PyArray_EquivTypes(oldtype, dtype)) { | |||
unsigned char viewable = PyArray_ViewableTypes(oldtype, dtype); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The equiv cast may not matter, but then a single field struct to non-struct could return views... not today, I guess
Thanks for the prompt to refactor so I could use it in Now instead of |
Even though sebastian approved this, probably worth one more round of review before merging after the last commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Mostly just queries/requests for clarification
numpy/_core/src/multiarray/ctors.c
Outdated
@@ -1922,7 +1926,7 @@ PyArray_FromArray(PyArrayObject *arr, PyArray_Descr *newtype, int flags) | |||
/* If a writeable array was requested, and arr is not */ | |||
((flags & NPY_ARRAY_WRITEABLE) && | |||
(!(arrflags & NPY_ARRAY_WRITEABLE))) || | |||
!PyArray_EquivTypes(oldtype, newtype); | |||
!view_safe; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the actual work has now been done before defining copy
, I think it should be the first item for copy
, i.e.,
copy = !view_safe ||
...
Alternatively, maybe better, since this now makes every case where a copy was already required slower, one can remove the last bit, and do the calculation like,
if (!copy) { /* Check a view is in fact possible */
npy_intp view_offset
npy_intp is_safe = ...
copy = copy && is_safe && (view_offset == 0);
}
* *minimum_safety* casting level. Sets the view_offset if that is set | ||
* for the cast. If ignore_error is 1, errors in cast setup are ignored. | ||
*/ | ||
NPY_NO_EXPORT npy_intp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment should perhaps add
... are ignored;
otherwise the error is kept and -1 is returned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, remnant of an older version that unconditionally ignored errors.
numpy/_core/src/multiarray/ctors.c
Outdated
@@ -1908,6 +1908,10 @@ PyArray_FromArray(PyArrayObject *arr, PyArray_Descr *newtype, int flags) | |||
} | |||
|
|||
arrflags = PyArray_FLAGS(arr); | |||
npy_intp view_offset; | |||
npy_intp is_safe = PyArray_SafeCast(oldtype, newtype, &view_offset, NPY_NO_CASTING, 1); | |||
npy_intp view_safe = (is_safe && (view_offset == 0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably fine as is, but in the code proper, impossible views are marked with view_offset = NPY_MIN_INTP
if (PyArray_EquivTypes(oldtype, dtype)) { | ||
npy_intp view_offset; | ||
npy_intp is_safe = PyArray_SafeCast(oldtype, dtype, &view_offset, NPY_NO_CASTING, 1); | ||
npy_intp view_safe = (is_safe && (view_offset == 0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, should one check view_offset != NPY_MIN_INTP
instead?
@@ -21,4 +21,9 @@ NPY_VISIBILITY_HIDDEN extern PyObject * npy_ma_str_convert_if_no_array; | |||
NPY_VISIBILITY_HIDDEN extern PyObject * npy_ma_str_cpu; | |||
NPY_VISIBILITY_HIDDEN extern PyObject * npy_ma_str_array_err_msg_substr; | |||
|
|||
NPY_NO_EXPORT npy_intp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the right module to define it in? I know PyArray_EquivTypes
is here, but maybe casts.c
makes more sense?
All review comments should be resolved now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! A suggestion for a tiny further improvement in-line, though probably barely worth the CI since the compiler will likely do it already.
numpy/_core/src/multiarray/ctors.c
Outdated
if (!copy) { | ||
npy_intp view_offset; | ||
npy_intp is_safe = PyArray_SafeCast(oldtype, newtype, &view_offset, NPY_NO_CASTING, 1); | ||
copy = copy || !(is_safe && (view_offset != NPY_MIN_INTP)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion was imperfect: this can be copy = !(is_safe...
since we know copy
is false.
Pulling this one in, thanks for the review! |
Fixes #26140.
This is a minimal implementation of a new function that generalizes
PyArray_EquivTypes
.