-
-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API,MAINT: Rewrite promotion using common DType and common instance #17137
Conversation
61cdd89
to
afb3757
Compare
afb3757
to
24cd2d9
Compare
static PyArray_Descr * | ||
string_unicode_common_instance(PyArray_Descr *descr1, PyArray_Descr *descr2) | ||
{ | ||
if (descr1->elsize >= descr2->elsize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this to prefer the first dtype. Doesn't change anything except which identity gets returned, which also means the "metadata" that is returned is different.
This is a tiny change here: Since the other dtype gets cast first and then we call "common instance", the metadata and instance which is returned can differ from the one that was returned before.
I have a test for metadata preservation, which maps out the current (strange) space. The changes I see are:
I am not convinced that it is worth bothering, since there are no consistent rules for metadata preservation. We could make such rules at some point, and I don't think this is an issue in that regard. |
To be precise, the test which I added to @pytest.mark.parametrize(["dtype1", "dtype2"],
itertools.product(
list(np.typecodes["All"]) +
["i,i", "S3", "S100", "U3", "U100", rational],
repeat=2))
def test_promote_types_metadata(self, dtype1, dtype2):
"""Metadata handling in promotion does not appear formalized
right now in NumPy. This test should thus be considered to
document behaviour, rather than test the correct definition of it.
This test is very ugly, it was useful for rewriting part of the
promotion, but probably should eventually be replaced/deleted
(i.e. when metadata handling in promotion is better defined).
"""
metadata1 = {1: 1}
metadata2 = {2: 2}
dtype1 = np.dtype(dtype1, metadata=metadata1)
dtype2 = np.dtype(dtype2, metadata=metadata2)
# Identical dtypes always preserve metadata (if not byteswapped):
res = np.promote_types(dtype1, dtype1)
assert res.metadata == dtype1.metadata
assert res.isnative # result must be native
try:
res = np.promote_types(dtype1, dtype2)
except TypeError:
# Promotion failed, this test only checks metadata
return
assert res.isnative
# The rules for when metadata is preserved and which dtypes metadta
# will be used are very confusing and depend on multiple paths.
# This long if statement attempts to reproduce this:
if dtype1.type is rational or dtype2.type is rational:
# User dtype promotion preserves byte-order here:
if np.can_cast(res, dtype1):
assert res.metadata == dtype1.metadata
else:
assert res.metadata == dtype2.metadata
elif res.char in "?bhilqpBHILQPefdgFDGOmM":
# All simple types lose metadata (due to using promotion table):
assert res.metadata is None
elif res.kind in "SU" and dtype1 == dtype2:
# Strings give precedence to the second dtype:
assert res is dtype2
elif res == dtype1:
# If one result is the result, it is usually returned unchanged:
assert res is dtype1
elif res == dtype2:
# If one result is the result, it is usually returned unchanged:
assert res is dtype2
elif dtype1.kind == "S" and dtype2.kind == "U":
# Promotion creates a new unicode dtype from scratch
assert res.metadata is None
elif dtype1.kind == "U" and dtype2.kind == "S":
# Promotion creates a new unicode dtype from scratch
assert res.metadata is None
elif res.kind in "SU" and dtype2.kind != res.kind:
# We build on top of dtype1:
assert res.metadata == dtype1.metadata
elif res.kind in "SU" and res.kind == dtype1.kind:
assert res.metadata == dtype1.metadata
elif res.kind in "SU" and res.kind == dtype2.kind:
assert res.metadata == dtype2.metadata
else:
assert res.metadata is None
# Try again for byteswapped version
dtype1 = dtype1.newbyteorder()
assert dtype1.metadata == metadata1
res_bs = np.promote_types(dtype1, dtype2)
if res_bs.names is not None:
# Structured promotion doesn't remove byteswap:
assert res_bs.newbyteorder() == res
else:
assert res_bs == res
assert res_bs.metadata == res.metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great step forward.
That version is on master
Not clear what you mean here: the test in the detail passes on master? In any case, if handling metadata can be split out and merged first that sounds good.
@@ -1894,6 +1898,8 @@ typedef void (PyDataMem_EventHookFunc)(void *inp, void *outp, size_t size, | |||
discover_descr_from_pyobject_function *discover_descr_from_pyobject; | |||
is_known_scalar_type_function *is_known_scalar_type; | |||
default_descr_function *default_descr; | |||
common_dtype_function *common_dtype; | |||
common_instance_function *common_instance; | |||
}; | |||
|
|||
#endif /* NPY_INTERNAL_BUILD */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a follow-on PR it would be nice to move this out of public headers, it is guarded by NPY_INTERNAL_BUILD
anyway.
I meant the test as I posted is works on master, and fails on this branch, I opened a PR here for the test. gh-17168 (it requires the void special case fix to run successfully). If we merge this, it will at least show what changed. About the header: IIRC the problem was how to spell |
We can leave it for another PR. |
This function is better housed in convert_datatype.c and was only in array_coercion, because we did not use it anywhere else before. This also somewhat modifies the logic and cleans up use-cases of it in array_coercion.c
There were two versions of this, since the merger of umath and multiarraymodule, this is unnecessary.
This defines `common_dtype` and `common_instance` (only for parametric DTypes), and uses them to implement the `PyArray_CommonDType` operation. `PyArray_CommonDType()` together with the `common_instance` method then define the existing PromoteTypes. This does not (yet) affect "value based promotion" as defined by `PyArray_ResultType()`. We also require the step of casting to the common DType to define this type of example: ``` np.promote_types("S1", "i8") == np.dtype('S21') ``` This steps requires finding the string length corresponding to the integer (21 characters). This is here handled by the `PyArray_CastDescrToDType` function. However, that function still relies on `PyArray_AdaptFlexibleDType` and thus does not generalize to arbitrary DTypes. See NEP 42 (currently "Common DType Operations" section): https://numpy.org/neps/nep-0042-new-dtypes.html#common-dtype-operations
24cd2d9
to
b40f6bb
Compare
Rebased. There is a new PR at the end which does the test modifications. I can squash it later, but leaving the old commits untouched is better for review. |
if np.promote_types(dtype1, dtype2.kind) == dtype2: | ||
res.metadata is None | ||
else: | ||
res.metadata == metadata2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is much more straight-forward now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Thanks @seberg |
This defines
common_dtype
andcommon_instance
(only for parametricDTypes), and uses them to implement the
PyArray_CommonDType
operation.PyArray_CommonDType()
together with thecommon_instance
method then define the existing PromoteTypes.This does not (yet) affect "value based promotion" as defined by
PyArray_ResultType()
. We also require the step of castingto the common DType to define this type of example:
This steps requires finding the string length corresponding to
the integer (21 characters). This is here handled by the
PyArray_CastDescrToDType
function. However, that functionstill relies on
PyArray_AdaptFlexibleDType
and thus does notgeneralize to arbitrary DTypes.
See NEP 42 (currently "Common DType Operations" section):
https://numpy.org/neps/nep-0042-new-dtypes.html#common-dtype-operations
The first three commits separate out well (I can separate out the first three to a different PR if it helps). A side-by-side view of the diffs is probably much better (at least for the last commit), since the diff is too large.
I moved the one
common_dtype
function for user-dtypes to theusertypes.c
file, but much of that will need some revision in the future to make the DTypeMeta creation more structured. That block of code is