BUG: add checks for some invalid structured dtypes. Fixes #2865. #8235

J-Sand · 2016-11-04T20:03:06Z

This also fixes a similar bug I found while investigating the issue: with structured dtypes of the form ('i', {'name': ('i', offset, 'optional title')}), if the inner tuple has the wrong number or types of items, a bad python API call is made, resulting in a SystemError.

I'm completely new to the numpy source, so I may have made some mistakes. In particular, I wasn't certain how to deal with a dtype like ('O', [('name', 'O')]). As far as I can tell, it's the only structured dtype of this form containing an object dtype that currently works, so I've allowed it as a special case. On the other hand, it seems pretty useless.

charris · 2016-11-05T16:31:32Z

@ahaldane Heads up.

ahaldane

Overall, nice catch and solution!

ahaldane · 2016-11-07T03:18:34Z

numpy/core/src/multiarray/descriptor.c

@@ -287,7 +287,7 @@ _convert_from_tuple(PyObject *obj)
            type->elsize = itemsize;
        }
    }
-    else if (PyDict_Check(val) || PyDictProxy_Check(val)) {
+    else if (type->metadata && (PyDict_Check(val) || PyDictProxy_Check(val))) {


This is fine, but I just want to note for posterity that this whole if-block is weird undocumented behavior. (as are a number of things related to metadata).

(Actually, conceivably the "correct" behavior here if type->metadata == NULL might be to do type->metadata = val, eg compare to the end of convert_from_dict. But that's not clear.).

In the future we might consider removing the block anyway, so I'm fine with this as-is.

ahaldane · 2016-11-07T03:24:01Z

numpy/core/src/multiarray/descriptor.c

+                goto fail;
+            }
+        }
+        else {


Can this whole else block be replaced by

if (PyDataType_REFCHK(conv)) { goto fail; }

?

That would also catch the case there are objects in nested structures.

ahaldane · 2016-11-07T03:31:51Z

numpy/core/tests/test_regression.py

+        assert_raises(ValueError, np.dtype,
+                      ('i', {'name': ('i', 'wrongtype', 'title')}))
+        # this one is allowed as a special case though
+        a = np.ones(1, dtype=('O', [('name', 'O')]))


Is it important that this example works? If not, I would be tempted to altogether disallow the (base_dtype, new_dtype) form of specification if either base_dtype or new_dtype have objects (tested with PyDataType_REFCHK).

Note that similar operations are already disallowed:

>>> np.zeros(3, dtype='O').view(dtype([('name', 'O')])) TypeError: Cannot change data-type for object array.

That wouldn't allow your special case though.

Thanks for the feedback. I noticed that test_dtype.py already has a test that requires that this case works:

def test_base_dtype_with_object_type(self): # Issue gh-2798, should not error. np.array(['a'], dtype="O").astype(("O", [("name", "O")]))

There is a discussion at #2798, but I don't really understand the rationale. Hmm, I also notice that you can currently have a dtype with object fields in the first element of the tuple, like this:

>>> np.zeros(1, dtype=('O,O', 'O,i8')) array([(0, 0)], dtype=[('f0', 'O'), ('f1', '<i8')])

or even multiple fields in the first element and just 'O' in the second one, like this:

>>> np.zeros(1, dtype=('i4,i4', 'O')) array([(0, 0)], dtype=[('f0', '<i4'), ('f1', '<i4')])

but this seems especially useless, since one of the two dtypes in the tuple is just ignored (as long as it has the correct size), even if it doesn't make sense to map it to the other one.

So should we allow all of the dtypes that currently work just in case somebody is using them for some reason, or just ones that kind of make sense (e.g. ('O,O', 'O,O')), or just the one that is already tested for, or none of them?

You're right, #2798 suggests that some people (eg h5py) have wanted to be able to create dtype(("O", [("name", "O")])), in order to add a field name. I would have liked to simply disallow that, but maybe we can't.

Its probably safe to assume no-one is doing anything like dtype(('O,O','O,i8')), so my suggestion is to disallow all cases involving objects (by RECHCK on both elements of the tuple) except the special case you already have for a single field.

I am a little tempted to add a deprecation warning for a release cycle, but it's probably such a rare problem that I would rather just go ahead with the change. But since it can break code, we should clearly document it. So please add something to doc/release/1.13.0-notes.rst decsribing what will break.

By the way, the example causing problems in h5py is in h5py/h5py#217. I tried the example, but it looks like it no longer involves the special case dtype. Also, incidentally, after #6053 gets in they will be able to write np.zeros(3, dtype='O').astype([('name', 'O')]) instead (this is currently disallowed). So there will be a better syntax for adding a field.

Also, by the way, I once wrote a lot of code in #5548 (in _check_field_overlap) to carefully check whether views like 'O,O' to 'O,i8' would clobber pointers or not, by searching out the positions of all the pointers. But it was a huge complicated affair we got rid of later because it was too slow. That's why I favor the stupid solution of simply disallowing views involving objects.

charris · 2016-11-23T18:36:15Z

As future reference, it is preferable to put the Fixes ... comment in the commit message body. Maybe need to update some documentation somewhere...

J-Sand · 2016-11-24T00:37:41Z

OK, I think I've covered everything, though I was slightly unsure where to put the release note.

ahaldane

If you address the two comments and squash I think it's good to go.

ahaldane · 2016-11-24T17:45:20Z

numpy/core/src/multiarray/descriptor.c

+ * people have been using to add a field to an object array without fields
+ */
+static int
+validate_structured_object_dtype(PyArray_Descr *new, PyArray_Descr *conv)


I might like a less general function name. Maybe invalid_union_object_dtype?

ahaldane · 2016-11-24T17:45:41Z

numpy/core/src/multiarray/descriptor.c

+    if (!PyDataType_REFCHK(new) && !PyDataType_REFCHK(conv)) {
+        return 0;
+    }
+    if (PyDataType_HASFIELDS(new)) {


I think by organizing the tests this way you are trying to allow cases like np.dtype(('i4', 'O')). But I think we should rule that out too: Consider the strange fact that np.dtype(('i8', 'O')).hasobject is True.

Maybe reorganize the if-statements here a little, eg start with if (new->kind != 'O'), then if (!PyDataType_HASFIELDS(conv)) , then if (PyTuple_GET_SIZE(names) != 1) an so on, an fail if any is true.

Fixes numpy#2865.

ahaldane · 2016-11-25T15:31:20Z

Looks good. I'll merge in a few minutes.

Thanks @J-Sand !

charris added 00 - Bug component: numpy._core labels Nov 4, 2016

ahaldane reviewed Nov 7, 2016

View reviewed changes

J-Sand force-pushed the invalid-structured-dtypes-fix branch from a91be86 to cb65c8a Compare November 24, 2016 00:17

ahaldane reviewed Nov 24, 2016

View reviewed changes

BUG: add checks for some invalid structured dtypes.

9fe73dd

Fixes numpy#2865.

J-Sand force-pushed the invalid-structured-dtypes-fix branch from 452061b to 9fe73dd Compare November 24, 2016 21:02

ahaldane merged commit e80b948 into numpy:master Nov 25, 2016

ahaldane mentioned this pull request Nov 25, 2016

Segfault from using probably invalid dtype #2865

Closed

homu mentioned this pull request Nov 25, 2016

BUG: void .item() doesn't hold reference to original array #8157

Merged

J-Sand deleted the invalid-structured-dtypes-fix branch November 29, 2016 02:19

dmbelov mentioned this pull request Apr 13, 2020

ENH: better handle dtype creation with metadata #15962

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: add checks for some invalid structured dtypes. Fixes #2865. #8235

BUG: add checks for some invalid structured dtypes. Fixes #2865. #8235

J-Sand commented Nov 4, 2016

charris commented Nov 5, 2016

ahaldane left a comment

ahaldane Nov 7, 2016

ahaldane Nov 7, 2016

ahaldane Nov 7, 2016

J-Sand Nov 23, 2016

ahaldane Nov 23, 2016 •

edited

charris commented Nov 23, 2016

J-Sand commented Nov 24, 2016

ahaldane left a comment

ahaldane Nov 24, 2016

ahaldane Nov 24, 2016

ahaldane commented Nov 25, 2016

BUG: add checks for some invalid structured dtypes. Fixes #2865. #8235

BUG: add checks for some invalid structured dtypes. Fixes #2865. #8235

Conversation

J-Sand commented Nov 4, 2016

charris commented Nov 5, 2016

ahaldane left a comment

Choose a reason for hiding this comment

ahaldane Nov 7, 2016

Choose a reason for hiding this comment

ahaldane Nov 7, 2016

Choose a reason for hiding this comment

ahaldane Nov 7, 2016

Choose a reason for hiding this comment

J-Sand Nov 23, 2016

Choose a reason for hiding this comment

ahaldane Nov 23, 2016 • edited

Choose a reason for hiding this comment

charris commented Nov 23, 2016

J-Sand commented Nov 24, 2016

ahaldane left a comment

Choose a reason for hiding this comment

ahaldane Nov 24, 2016

Choose a reason for hiding this comment

ahaldane Nov 24, 2016

Choose a reason for hiding this comment

ahaldane commented Nov 25, 2016

ahaldane Nov 23, 2016 •

edited