New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected compound datatype construction #15638
Comments
The listing of how the >>> my_type_specification = (('one', int), ('two', float))
>>> np.type(my_type_specification
TypeError Traceback (most recent call last)
<ipython-input-6-dc4df5e4645d> in <module>
----> 1 dt = np.dtype(my_type_specification)
TypeError: data type 'one' not understood This warrants a closer look - thanks for reporting! |
There are similar examples of my second form in the docs when talking about union types, it is noted though:
|
Good point, that's another Note that # Specifying a structured dtype with a list
>>> type_specifier = [('time', float), ('temp', float), ('alt', int)]
>>> my_dtype = np.dtype(type_specifier)
>>> my_dtype
dtype([('time', '<f8'), ('temp', '<f8'), ('alt', '<i8')])
>>> my_dtype = np.dtype(tuple(type_specifier))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-61-380fa230aff7> in <module>
----> 1 my_dtype = np.dtype(tuple(my_type))
TypeError: Tuple must have size 2, but has size 3 This is the expected behavior. The problem arises when tuples with bad_type_specifier = (('int',int), ('float',float))
# This *should* raise a TypeError as the type specifier doesn't
# match one of the accepted input specifications for tuples
my_dtype = np.dtype(bad_type_specifier) I believe the bug originates in the input checking here: numpy/numpy/core/src/multiarray/descriptor.c Line 242 in 55cce7d
|
Maybe something like this could work to make things more strict: diff --git a/numpy/core/src/multiarray/descriptor.c b/numpy/core/src/multiarray/descriptor.c
index eb4f68959..faa416231 100644
--- a/numpy/core/src/multiarray/descriptor.c
+++ b/numpy/core/src/multiarray/descriptor.c
@@ -857,16 +857,27 @@ _try_convert_from_inherit_tuple(PyArray_Descr *type, PyObject *newobj)
if (new == NULL) {
goto fail;
}
+ if (_validate_union_object_dtype(new, conv) < 0) {
+ Py_DECREF(new);
+ goto fail;
+ }
+ else if (!PyDataType_HASFIELDS(conv)) {
+ PyErr_SetString(PyExc_TypeError,
+ "Only structured dtypes can be used as a union base.");
+ goto fail;
+ }
+ else if (PyDataType_HASFIELDS(new) || PyDataType_ISUSERDEF(new)) {
+ PyErr_SetString(PyExc_TypeError,
+ "Can only create union dtype for basic NumPy dtypes and not "
+ "structured ones.");
+ goto fail;
+ }
if (PyDataType_ISUNSIZED(new)) {
new->elsize = conv->elsize;
}
else if (new->elsize != conv->elsize) {
PyErr_SetString(PyExc_ValueError,
- "mismatch in size of old and new data-descriptor");
- Py_DECREF(new);
- goto fail;
- }
- else if (_validate_union_object_dtype(new, conv) < 0) {
+ "mismatch in size of old and new data-descriptor");
Py_DECREF(new);
goto fail;
} |
Thanks for the time spent chasing this down @seberg. An approach along the lines you proposed + additional tests seems like a good approach (+ some extra time for fully grokking the change to the logic :) ). I'd also advocate adding an additional example to the doc string to highlight the "special" behavior of len-2 tuples for specifying dtypes. |
I think we should do that. Not limiting user datatypes would be possible, but I am currently not sure what the point of an |
A deprecation might have been nice, but overall, I think that we should remove any union/compound dtype construction (except
Since this is weird, and removing it should allow simplifications and just make code paths less brittle, we should disallow this for NumPy 2.0. (TBH, I would be surprised if this isn't buggy in places.) |
@rossbar do you remember this a bit more? I dug a bit into this, and it sounds like h5py at least at some point was using these. Maybe they stopped, maybe not but... It makes me think though that it would maybe still be good to force more sanity here. I could imagine mainly:
A |
For context, I was using h5py a lot three years ago which is probably how I came across this; though I really don't recall! |
When constructing compound types giving a tuple of tuples, instead of a list of tuples as is common in docs, leads to unexpected results.
Reproducing code example:
On my system this is producing:
The text was updated successfully, but these errors were encountered: