Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API,MAINT: Rewrite promotion using common DType and common instance #17137

Merged
merged 6 commits into from
Sep 22, 2020

Conversation

seberg
Copy link
Member

@seberg seberg commented Aug 21, 2020

This defines common_dtype and common_instance (only for parametric
DTypes), and uses them to implement the PyArray_CommonDType operation.

PyArray_CommonDType() together with the common_instance method then define the existing PromoteTypes.

This does not (yet) affect "value based promotion" as defined by
PyArray_ResultType(). We also require the step of casting
to the common DType to define this type of example:

np.promote_types("S1", "i8") == np.dtype('S21')

This steps requires finding the string length corresponding to
the integer (21 characters). This is here handled by the
PyArray_CastDescrToDType function. However, that function
still relies on PyArray_AdaptFlexibleDType and thus does not
generalize to arbitrary DTypes.

See NEP 42 (currently "Common DType Operations" section):
https://numpy.org/neps/nep-0042-new-dtypes.html#common-dtype-operations


The first three commits separate out well (I can separate out the first three to a different PR if it helps). A side-by-side view of the diffs is probably much better (at least for the last commit), since the diff is too large.

I moved the one common_dtype function for user-dtypes to the usertypes.c file, but much of that will need some revision in the future to make the DTypeMeta creation more structured. That block of code is

static PyArray_Descr *
string_unicode_common_instance(PyArray_Descr *descr1, PyArray_Descr *descr2)
{
if (descr1->elsize >= descr2->elsize) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this to prefer the first dtype. Doesn't change anything except which identity gets returned, which also means the "metadata" that is returned is different.

This is a tiny change here: Since the other dtype gets cast first and then we call "common instance", the metadata and instance which is returned can differ from the one that was returned before.

@seberg
Copy link
Member Author

seberg commented Aug 26, 2020

I have a test for metadata preservation, which maps out the current (strange) space. The changes I see are:

  1. In some cases when promoting non-string and strings, we would never preserve metadata. This code does.
  2. User dtype did preserve metadata, but builtin types often drop it. I currently preserve the dropping behaviour of builtin types, which means that user dtypes start dropping metadata.

I am not convinced that it is worth bothering, since there are no consistent rules for metadata preservation. We could make such rules at some point, and I don't think this is an issue in that regard.

@seberg
Copy link
Member Author

seberg commented Aug 26, 2020

To be precise, the test which I added to test_numeric.py is below. Note that large if/elif/... block to try to map out everything. That version is on master, I am happy to put that into master, and then adjust it in this PR to make things work?

    @pytest.mark.parametrize(["dtype1", "dtype2"],
            itertools.product(
                list(np.typecodes["All"]) +
                ["i,i", "S3", "S100", "U3", "U100", rational],
                repeat=2))
    def test_promote_types_metadata(self, dtype1, dtype2):
        """Metadata handling in promotion does not appear formalized
        right now in NumPy. This test should thus be considered to
        document behaviour, rather than test the correct definition of it.

        This test is very ugly, it was useful for rewriting part of the
        promotion, but probably should eventually be replaced/deleted
        (i.e. when metadata handling in promotion is better defined).
        """
        metadata1 = {1: 1}
        metadata2 = {2: 2}
        dtype1 = np.dtype(dtype1, metadata=metadata1)
        dtype2 = np.dtype(dtype2, metadata=metadata2)

        # Identical dtypes always preserve metadata (if not byteswapped):
        res = np.promote_types(dtype1, dtype1)
        assert res.metadata == dtype1.metadata
        assert res.isnative  # result must be native

        try:
            res = np.promote_types(dtype1, dtype2)
        except TypeError:
            # Promotion failed, this test only checks metadata
            return
        assert res.isnative

        # The rules for when metadata is preserved and which dtypes metadta
        # will be used are very confusing and depend on multiple paths.
        # This long if statement attempts to reproduce this:
        if dtype1.type is rational or dtype2.type is rational:
            # User dtype promotion preserves byte-order here:
            if np.can_cast(res, dtype1):
                assert res.metadata == dtype1.metadata
            else:
                assert res.metadata == dtype2.metadata

        elif res.char in "?bhilqpBHILQPefdgFDGOmM":
            # All simple types lose metadata (due to using promotion table):
            assert res.metadata is None
        elif res.kind in "SU" and dtype1 == dtype2:
            # Strings give precedence to the second dtype:
            assert res is dtype2
        elif res == dtype1:
            # If one result is the result, it is usually returned unchanged:
            assert res is dtype1
        elif res == dtype2:
            # If one result is the result, it is usually returned unchanged:
            assert res is dtype2
        elif dtype1.kind == "S" and dtype2.kind == "U":
            # Promotion creates a new unicode dtype from scratch
            assert res.metadata is None
        elif dtype1.kind == "U" and dtype2.kind == "S":
            # Promotion creates a new unicode dtype from scratch
            assert res.metadata is None
        elif res.kind in "SU" and dtype2.kind != res.kind:
            # We build on top of dtype1:
            assert res.metadata == dtype1.metadata
        elif res.kind in "SU" and res.kind == dtype1.kind:
            assert res.metadata == dtype1.metadata
        elif res.kind in "SU" and res.kind == dtype2.kind:
            assert res.metadata == dtype2.metadata
        else:
            assert res.metadata is None

        # Try again for byteswapped version
        dtype1 = dtype1.newbyteorder()
        assert dtype1.metadata == metadata1
        res_bs = np.promote_types(dtype1, dtype2)
        if res_bs.names is not None:
            # Structured promotion doesn't remove byteswap:
            assert res_bs.newbyteorder() == res
        else:
            assert res_bs == res
        assert res_bs.metadata == res.metadata

Copy link
Member

@mattip mattip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great step forward.

That version is on master

Not clear what you mean here: the test in the detail passes on master? In any case, if handling metadata can be split out and merged first that sounds good.

@@ -1894,6 +1898,8 @@ typedef void (PyDataMem_EventHookFunc)(void *inp, void *outp, size_t size,
discover_descr_from_pyobject_function *discover_descr_from_pyobject;
is_known_scalar_type_function *is_known_scalar_type;
default_descr_function *default_descr;
common_dtype_function *common_dtype;
common_instance_function *common_instance;
};

#endif /* NPY_INTERNAL_BUILD */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a follow-on PR it would be nice to move this out of public headers, it is guarded by NPY_INTERNAL_BUILD anyway.

@seberg
Copy link
Member Author

seberg commented Aug 26, 2020

I meant the test as I posted is works on master, and fails on this branch, I opened a PR here for the test. gh-17168 (it requires the void special case fix to run successfully). If we merge this, it will at least show what changed.

About the header: IIRC the problem was how to spell PyArrayDescr_Type internally and externally, which I could not think of a nice solution yet (I tried at least twice). There probably is a nice solution, I just didn't stumble on it yet.
I will try another swing...

@mattip
Copy link
Member

mattip commented Aug 26, 2020

how to spell PyArrayDescr_Type internally and externally, which I could not think of a nice solution yet (I tried at least twice)

We can leave it for another PR.

This function is better housed in convert_datatype.c and was only
in array_coercion, because we did not use it anywhere else before.

This also somewhat modifies the logic and cleans up use-cases of
it in array_coercion.c
There were two versions of this, since the merger of umath and
multiarraymodule, this is unnecessary.
This defines `common_dtype` and `common_instance` (only for parametric
DTypes), and uses them to implement the `PyArray_CommonDType` operation.

`PyArray_CommonDType()` together with the `common_instance` method
then define the existing PromoteTypes.

This does not (yet) affect "value based promotion" as defined by
`PyArray_ResultType()`.  We also require the step of casting
to the common DType to define this type of example:
```
np.promote_types("S1", "i8") == np.dtype('S21')
```
This steps requires finding the string length corresponding to
the integer (21 characters).  This is here handled by the
`PyArray_CastDescrToDType` function.  However, that function
still relies on `PyArray_AdaptFlexibleDType` and thus does not
generalize to arbitrary DTypes.

See NEP 42 (currently "Common DType Operations" section):
https://numpy.org/neps/nep-0042-new-dtypes.html#common-dtype-operations
@seberg
Copy link
Member Author

seberg commented Sep 2, 2020

Rebased. There is a new PR at the end which does the test modifications. I can squash it later, but leaving the old commits untouched is better for review.

if np.promote_types(dtype1, dtype2.kind) == dtype2:
res.metadata is None
else:
res.metadata == metadata2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is much more straight-forward now

Copy link
Member

@mattip mattip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@mattip mattip merged commit 0d25366 into numpy:master Sep 22, 2020
@mattip
Copy link
Member

mattip commented Sep 22, 2020

Thanks @seberg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants