API,MAINT: Rewrite promotion using common DType and common instance #17137

seberg · 2020-08-21T22:44:44Z

This defines common_dtype and common_instance (only for parametric
DTypes), and uses them to implement the PyArray_CommonDType operation.

PyArray_CommonDType() together with the common_instance method then define the existing PromoteTypes.

This does not (yet) affect "value based promotion" as defined by
PyArray_ResultType(). We also require the step of casting
to the common DType to define this type of example:

np.promote_types("S1", "i8") == np.dtype('S21')

This steps requires finding the string length corresponding to
the integer (21 characters). This is here handled by the
PyArray_CastDescrToDType function. However, that function
still relies on PyArray_AdaptFlexibleDType and thus does not
generalize to arbitrary DTypes.

See NEP 42 (currently "Common DType Operations" section):
https://numpy.org/neps/nep-0042-new-dtypes.html#common-dtype-operations

The first three commits separate out well (I can separate out the first three to a different PR if it helps). A side-by-side view of the diffs is probably much better (at least for the last commit), since the diff is too large.

I moved the one common_dtype function for user-dtypes to the usertypes.c file, but much of that will need some revision in the future to make the DTypeMeta creation more structured. That block of code is

seberg · 2020-08-26T00:19:15Z

numpy/core/src/multiarray/dtypemeta.c

+static PyArray_Descr *
+string_unicode_common_instance(PyArray_Descr *descr1, PyArray_Descr *descr2)
+{
+    if (descr1->elsize >= descr2->elsize) {


I changed this to prefer the first dtype. Doesn't change anything except which identity gets returned, which also means the "metadata" that is returned is different.

This is a tiny change here: Since the other dtype gets cast first and then we call "common instance", the metadata and instance which is returned can differ from the one that was returned before.

seberg · 2020-08-26T00:23:32Z

I have a test for metadata preservation, which maps out the current (strange) space. The changes I see are:

In some cases when promoting non-string and strings, we would never preserve metadata. This code does.
User dtype did preserve metadata, but builtin types often drop it. I currently preserve the dropping behaviour of builtin types, which means that user dtypes start dropping metadata.

I am not convinced that it is worth bothering, since there are no consistent rules for metadata preservation. We could make such rules at some point, and I don't think this is an issue in that regard.

seberg · 2020-08-26T00:30:18Z

To be precise, the test which I added to test_numeric.py is below. Note that large if/elif/... block to try to map out everything. That version is on master, I am happy to put that into master, and then adjust it in this PR to make things work?

    @pytest.mark.parametrize(["dtype1", "dtype2"],
            itertools.product(
                list(np.typecodes["All"]) +
                ["i,i", "S3", "S100", "U3", "U100", rational],
                repeat=2))
    def test_promote_types_metadata(self, dtype1, dtype2):
        """Metadata handling in promotion does not appear formalized
        right now in NumPy. This test should thus be considered to
        document behaviour, rather than test the correct definition of it.

        This test is very ugly, it was useful for rewriting part of the
        promotion, but probably should eventually be replaced/deleted
        (i.e. when metadata handling in promotion is better defined).
        """
        metadata1 = {1: 1}
        metadata2 = {2: 2}
        dtype1 = np.dtype(dtype1, metadata=metadata1)
        dtype2 = np.dtype(dtype2, metadata=metadata2)

        # Identical dtypes always preserve metadata (if not byteswapped):
        res = np.promote_types(dtype1, dtype1)
        assert res.metadata == dtype1.metadata
        assert res.isnative  # result must be native

        try:
            res = np.promote_types(dtype1, dtype2)
        except TypeError:
            # Promotion failed, this test only checks metadata
            return
        assert res.isnative

        # The rules for when metadata is preserved and which dtypes metadta
        # will be used are very confusing and depend on multiple paths.
        # This long if statement attempts to reproduce this:
        if dtype1.type is rational or dtype2.type is rational:
            # User dtype promotion preserves byte-order here:
            if np.can_cast(res, dtype1):
                assert res.metadata == dtype1.metadata
            else:
                assert res.metadata == dtype2.metadata

        elif res.char in "?bhilqpBHILQPefdgFDGOmM":
            # All simple types lose metadata (due to using promotion table):
            assert res.metadata is None
        elif res.kind in "SU" and dtype1 == dtype2:
            # Strings give precedence to the second dtype:
            assert res is dtype2
        elif res == dtype1:
            # If one result is the result, it is usually returned unchanged:
            assert res is dtype1
        elif res == dtype2:
            # If one result is the result, it is usually returned unchanged:
            assert res is dtype2
        elif dtype1.kind == "S" and dtype2.kind == "U":
            # Promotion creates a new unicode dtype from scratch
            assert res.metadata is None
        elif dtype1.kind == "U" and dtype2.kind == "S":
            # Promotion creates a new unicode dtype from scratch
            assert res.metadata is None
        elif res.kind in "SU" and dtype2.kind != res.kind:
            # We build on top of dtype1:
            assert res.metadata == dtype1.metadata
        elif res.kind in "SU" and res.kind == dtype1.kind:
            assert res.metadata == dtype1.metadata
        elif res.kind in "SU" and res.kind == dtype2.kind:
            assert res.metadata == dtype2.metadata
        else:
            assert res.metadata is None

        # Try again for byteswapped version
        dtype1 = dtype1.newbyteorder()
        assert dtype1.metadata == metadata1
        res_bs = np.promote_types(dtype1, dtype2)
        if res_bs.names is not None:
            # Structured promotion doesn't remove byteswap:
            assert res_bs.newbyteorder() == res
        else:
            assert res_bs == res
        assert res_bs.metadata == res.metadata

mattip

This is a great step forward.

That version is on master

Not clear what you mean here: the test in the detail passes on master? In any case, if handling metadata can be split out and merged first that sounds good.

mattip · 2020-08-26T07:11:17Z

numpy/core/include/numpy/ndarraytypes.h

@@ -1894,6 +1898,8 @@ typedef void (PyDataMem_EventHookFunc)(void *inp, void *outp, size_t size,
        discover_descr_from_pyobject_function *discover_descr_from_pyobject;
        is_known_scalar_type_function *is_known_scalar_type;
        default_descr_function *default_descr;
+        common_dtype_function *common_dtype;
+        common_instance_function *common_instance;
    };

 #endif  /* NPY_INTERNAL_BUILD */


In a follow-on PR it would be nice to move this out of public headers, it is guarded by NPY_INTERNAL_BUILD anyway.

seberg · 2020-08-26T14:59:34Z

I meant the test as I posted is works on master, and fails on this branch, I opened a PR here for the test. gh-17168 (it requires the void special case fix to run successfully). If we merge this, it will at least show what changed.

About the header: IIRC the problem was how to spell PyArrayDescr_Type internally and externally, which I could not think of a nice solution yet (I tried at least twice). There probably is a nice solution, I just didn't stumble on it yet.
I will try another swing...

mattip · 2020-08-26T15:19:23Z

how to spell PyArrayDescr_Type internally and externally, which I could not think of a nice solution yet (I tried at least twice)

We can leave it for another PR.

This function is better housed in convert_datatype.c and was only in array_coercion, because we did not use it anywhere else before. This also somewhat modifies the logic and cleans up use-cases of it in array_coercion.c

There were two versions of this, since the merger of umath and multiarraymodule, this is unnecessary.

This defines `common_dtype` and `common_instance` (only for parametric DTypes), and uses them to implement the `PyArray_CommonDType` operation. `PyArray_CommonDType()` together with the `common_instance` method then define the existing PromoteTypes. This does not (yet) affect "value based promotion" as defined by `PyArray_ResultType()`. We also require the step of casting to the common DType to define this type of example: ``` np.promote_types("S1", "i8") == np.dtype('S21') ``` This steps requires finding the string length corresponding to the integer (21 characters). This is here handled by the `PyArray_CastDescrToDType` function. However, that function still relies on `PyArray_AdaptFlexibleDType` and thus does not generalize to arbitrary DTypes. See NEP 42 (currently "Common DType Operations" section): https://numpy.org/neps/nep-0042-new-dtypes.html#common-dtype-operations

seberg · 2020-09-02T18:44:23Z

Rebased. There is a new PR at the end which does the test modifications. I can squash it later, but leaving the old commits untouched is better for review.

mattip · 2020-09-03T06:14:06Z

numpy/core/tests/test_numeric.py

+            if np.promote_types(dtype1, dtype2.kind) == dtype2:
+                res.metadata is None
+            else:
+                res.metadata == metadata2


This is much more straight-forward now

mattip

LGTM.

mattip · 2020-09-22T12:54:28Z

Thanks @seberg

seberg force-pushed the restructure-dtype-promotion branch from 61cdd89 to afb3757 Compare August 21, 2020 22:47

seberg added 01 - Enhancement 30 - API component: numpy._core component: numpy.dtype labels Aug 21, 2020

seberg force-pushed the restructure-dtype-promotion branch from afb3757 to 24cd2d9 Compare August 26, 2020 00:16

seberg commented Aug 26, 2020

View reviewed changes

mattip reviewed Aug 26, 2020

View reviewed changes

mattip mentioned this pull request Sep 2, 2020

TST: Add tests mapping out the rules for metadata in promotion #17168

Merged

seberg added 6 commits September 2, 2020 12:53

MAINT: Move dtype instance to DType class cast

b0dd380

This function is better housed in convert_datatype.c and was only in array_coercion, because we did not use it anywhere else before. This also somewhat modifies the logic and cleans up use-cases of it in array_coercion.c

MAINT: Use existing ensure_dtype_nbo in ufunc resolution

6b1c643

There were two versions of this, since the merger of umath and multiarraymodule, this is unnecessary.

MAINT: Always define default_descr() and simplify code

d9075b7

TST: Test void promotion uses equivalent casting

07c2e66

TST: Adapt metadata-promotion tests to new implementation

b40f6bb

seberg force-pushed the restructure-dtype-promotion branch from 24cd2d9 to b40f6bb Compare September 2, 2020 18:43

mattip reviewed Sep 3, 2020

View reviewed changes

mattip approved these changes Sep 3, 2020

View reviewed changes

mattip merged commit 0d25366 into numpy:master Sep 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API,MAINT: Rewrite promotion using common DType and common instance #17137

API,MAINT: Rewrite promotion using common DType and common instance #17137

seberg commented Aug 21, 2020

seberg Aug 26, 2020

seberg commented Aug 26, 2020

seberg commented Aug 26, 2020

mattip left a comment

mattip Aug 26, 2020

seberg commented Aug 26, 2020

mattip commented Aug 26, 2020

seberg commented Sep 2, 2020

mattip Sep 3, 2020

mattip left a comment

mattip commented Sep 22, 2020

API,MAINT: Rewrite promotion using common DType and common instance #17137

API,MAINT: Rewrite promotion using common DType and common instance #17137

Conversation

seberg commented Aug 21, 2020

seberg Aug 26, 2020

Choose a reason for hiding this comment

seberg commented Aug 26, 2020

seberg commented Aug 26, 2020

mattip left a comment

Choose a reason for hiding this comment

mattip Aug 26, 2020

Choose a reason for hiding this comment

seberg commented Aug 26, 2020

mattip commented Aug 26, 2020

seberg commented Sep 2, 2020

mattip Sep 3, 2020

Choose a reason for hiding this comment

mattip left a comment

Choose a reason for hiding this comment

mattip commented Sep 22, 2020