-
-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Structured array drops field-titles when being 'sliced' by field-names #9625
Comments
I vaguely recall that field titles are a deprecated feature, so it's not too surprising that they are not well supported |
Are field-titles officially or unofficially deprecated? :-) Just in case it's the latter, let me try to advocate for field-titles, by explaining my use-case: I acquire large data-series with dozens of fields and find it very useful to have concise field-names (improves code readability), while at the same time also having field documentation in human readable form. It is handy that both, names and titles, are already defined where the data originated in the first place and just gets passed along through the pipeline: acquisition-->processing-->storage...
And any consumer/receiver of that data has the benefit of a fully documented dtype, where the titles document important aspects of a field, like the physical unit (e.g.: mm or inches?) BTW: Recreating a numpy array that has been serialized as shown above is done in just a single line of code: |
We still support titles, but my impression is they are sometimes forgotten so they can be a bit buggy. Multi-field indexing using titles behaves strangely in other ways too: >>> a = np.zeros(4, dtype=[(('title', 'b'), 'i'), ('c', 'i')])
>>> a[['title', 'c']]
array([(0, 0), (0, 0), (0, 0), (0, 0)],
dtype=[('title', '<i4'), ('c', '<i4')])
>>> a[['b', 'c']]
array([(0, 0), (0, 0), (0, 0), (0, 0)],
dtype=[('b', '<i4'), ('c', '<i4')])
>>> a[['title', 'b']]
array([(0, 0), (0, 0), (0, 0), (0, 0)],
dtype=[('title', '<i4'), ('b', '<i4')]) I've been working on multi-field indexing in another PR (#6053), I'll see if I can get titles to work more sensibly. |
Also, I have proposed docs for structured arrays which doesn't deprecate titles, but I do say they are "obsolete" and that "their use is discouraged". See #9056. Is that too strong? |
@axeloide, since you are an actual user of titles with multi-field indexing, can you comment on how you think the code in my last example should behave? Here are two possibilities: Behavior 1 (easier to implement): >>> a = np.zeros(4, dtype=[(('title', 'b'), 'i'), ('c', 'i')])
>>> a[['title', 'c']]
KeyError: No field named 'title'
>>> a[['b', 'c']]
array([(0, 0), (0, 0), (0, 0), (0, 0)],
dtype=[(('title', 'b'), '<i4'), ('c', '<i4')])
>>> a[['title', 'b']]
KeyError: No field named 'title' Behavior 2: >>> a = np.zeros(4, dtype=[(('title', 'b'), 'i'), ('c', 'i')])
>>> a[['title', 'c']]
array([(0, 0), (0, 0), (0, 0), (0, 0)],
dtype=[(('title', 'b'), '<i4'), ('c', '<i4')])
>>> a[['b', 'c']]
array([(0, 0), (0, 0), (0, 0), (0, 0)],
dtype=[(('title', 'b'), '<i4'), ('c', '<i4')])
>>> a[['title', 'b']]
KeyError: duplicate field name 'b' |
Also, one last note: We are planning to merge #6053 which will probably affect you, since it changes the way multi-field indexing works. If you are using numpy 1.13 you should be getting lots of |
@ahaldane , thanks a lot for addressing this in your commit! 👍 As you probably have guessed, I indeed would expect behaviour 1. Making titles also work as indices seems like one feature too much and violates in my opinion the principle of "separation of concerns". In my naive understanding names are unique keys and titles are optional meta-data that go piggy-back on them, but only serve documentation purposes. Also thanks for the "FutureWarnings". They already made me check my usages and I'm OK with views instead of copies, it is actually what I had expected anyway. As for your changes on the doc's stating that fields are an obsolete feature: yes that is way too strong! |
Even if this metadata is useful, I'd argue that it's in the wrong place - I think it should be attached to the the field type, not to the field name; so belongs on the dtypes, not the containing x_dt = np.dtype(np.float32, metadata='elevation / m')
T_dt = np.dtype(np.float32, metadata='temperature / K')
new_dt = np.dtype([('x', x_dt), ('T', t_dt)])
some_data = np.array(..., new_dt)
some_x = some_data['x']
# some_x still has the metadata attached - using titles forces it to be discarded |
Note also that using >>> dt = np.dtype((int, 3))
>>> dt
dtype(('<i4', (3,)))
>>> np.dtype(dt.descr)
dtype([('f0', 'V12')]) While it seems I'm wrong about |
@eric-wieser I agree with both your remarks. Having metadata/titles on the field-type instead would also be great. |
Is the following behaviour of numpy (v1.13.1) a bug or by design?
I would have expected the last expression to have returned this instead:
[(('title 2', 'y'), '>f4'), (('title 1', 'x'), '|i1')]
Why are the field-titles missing on a view obtained by indexing?
It is kind of surprising, that some aspects of the structured-array dtype just get dropped.
The text was updated successfully, but these errors were encountered: