Structured array drops field-titles when being 'sliced' by field-names #9625

axeloide · 2017-08-29T23:09:27Z

Is the following behaviour of numpy (v1.13.1) a bug or by design?

>>>import numpy as np
>>>a = np.zeros((1,), dtype=[(('title 1', 'x'), '|i1'), (('title 2', 'y'), '>f4')])

>>>a.dtype.descr
[(('title 1', 'x'), '|i1'), (('title 2', 'y'), '>f4')]

>>>a[['y','x']].dtype.descr
[('y', '>f4'), ('x', '|i1')]

I would have expected the last expression to have returned this instead:
[(('title 2', 'y'), '>f4'), (('title 1', 'x'), '|i1')]

Why are the field-titles missing on a view obtained by indexing?
It is kind of surprising, that some aspects of the structured-array dtype just get dropped.

The text was updated successfully, but these errors were encountered:

eric-wieser · 2017-08-30T22:52:02Z

I vaguely recall that field titles are a deprecated feature, so it's not too surprising that they are not well supported

axeloide · 2017-08-31T11:42:20Z

Are field-titles officially or unofficially deprecated? :-)

Just in case it's the latter, let me try to advocate for field-titles, by explaining my use-case:

I acquire large data-series with dozens of fields and find it very useful to have concise field-names (improves code readability), while at the same time also having field documentation in human readable form. It is handy that both, names and titles, are already defined where the data originated in the first place and just gets passed along through the pipeline: acquisition-->processing-->storage...
Losing titles along that pipeline is especially inconvenient, because they tend to be the more useful the further down the pipeline, e.g. numpy arrays that have been serialized for storage or exchange. As an example: It just takes these two lines of code to get any ndarray persisted to a database or sent over the wire:

serialized_array.dtype = repr(numpy_array.dtype.descr)
serialized_array.buffer = numpy_array.tostring()

And any consumer/receiver of that data has the benefit of a fully documented dtype, where the titles document important aspects of a field, like the physical unit (e.g.: mm or inches?)

BTW: Recreating a numpy array that has been serialized as shown above is done in just a single line of code:
numpy_array = np.frombuffer(serialized_array.buffer, dtype=eval(serialized_array.dtype))

ahaldane · 2017-09-03T23:38:05Z

We still support titles, but my impression is they are sometimes forgotten so they can be a bit buggy.

Multi-field indexing using titles behaves strangely in other ways too:

>>> a = np.zeros(4, dtype=[(('title', 'b'), 'i'), ('c', 'i')])
>>> a[['title', 'c']]
array([(0, 0), (0, 0), (0, 0), (0, 0)],
      dtype=[('title', '<i4'), ('c', '<i4')])
>>> a[['b', 'c']]
array([(0, 0), (0, 0), (0, 0), (0, 0)],
      dtype=[('b', '<i4'), ('c', '<i4')])
>>> a[['title', 'b']]
array([(0, 0), (0, 0), (0, 0), (0, 0)],
      dtype=[('title', '<i4'), ('b', '<i4')])

I've been working on multi-field indexing in another PR (#6053), I'll see if I can get titles to work more sensibly.

ahaldane · 2017-09-03T23:39:58Z

Also, I have proposed docs for structured arrays which doesn't deprecate titles, but I do say they are "obsolete" and that "their use is discouraged". See #9056. Is that too strong?

ahaldane · 2017-09-03T23:49:35Z

@axeloide, since you are an actual user of titles with multi-field indexing, can you comment on how you think the code in my last example should behave?

Here are two possibilities:

Behavior 1 (easier to implement):

>>> a = np.zeros(4, dtype=[(('title', 'b'), 'i'), ('c', 'i')])
>>> a[['title', 'c']]
KeyError: No field named 'title'
>>> a[['b', 'c']]
array([(0, 0), (0, 0), (0, 0), (0, 0)],
      dtype=[(('title', 'b'), '<i4'), ('c', '<i4')])
>>> a[['title', 'b']]
KeyError: No field named 'title'

Behavior 2:

>>> a = np.zeros(4, dtype=[(('title', 'b'), 'i'), ('c', 'i')])
>>> a[['title', 'c']]
array([(0, 0), (0, 0), (0, 0), (0, 0)],
      dtype=[(('title', 'b'), '<i4'), ('c', '<i4')])
>>> a[['b', 'c']]
array([(0, 0), (0, 0), (0, 0), (0, 0)],
      dtype=[(('title', 'b'), '<i4'), ('c', '<i4')])
>>> a[['title', 'b']]
KeyError: duplicate field name 'b'

ahaldane · 2017-09-03T23:53:50Z

Also, one last note: We are planning to merge #6053 which will probably affect you, since it changes the way multi-field indexing works. If you are using numpy 1.13 you should be getting lots of FutureWarnings about it.

axeloide · 2017-09-04T07:05:04Z

@ahaldane , thanks a lot for addressing this in your commit! 👍

As you probably have guessed, I indeed would expect behaviour 1. Making titles also work as indices seems like one feature too much and violates in my opinion the principle of "separation of concerns". In my naive understanding names are unique keys and titles are optional meta-data that go piggy-back on them, but only serve documentation purposes.

Also thanks for the "FutureWarnings". They already made me check my usages and I'm OK with views instead of copies, it is actually what I had expected anyway.

As for your changes on the doc's stating that fields are an obsolete feature: yes that is way too strong!
I really consider titles a very useful feature, so much so, that I invested more effort in explaining why here: https://stackoverflow.com/q/45939506/2239469
I'd be willing to contribute to the documentation of titles: I find the pyplot example illustrative (title as axis labels).

eric-wieser · 2017-09-04T08:05:55Z

I really consider titles a very useful feature, so much so, that I invested more effort in explaining why here:

Even if this metadata is useful, I'd argue that it's in the wrong place - I think it should be attached to the the field type, not to the field name; so belongs on the dtypes, not the containing np.void. So for instance, I'd favor something like:

x_dt = np.dtype(np.float32, metadata='elevation / m')
T_dt = np.dtype(np.float32, metadata='temperature / K')
new_dt = np.dtype([('x', x_dt), ('T', t_dt)])


some_data = np.array(..., new_dt)
some_x = some_data['x']

# some_x still has the metadata attached - using titles forces it to be discarded

eric-wieser · 2017-09-04T08:13:46Z

Note also that using .descr is pretty fragile, and fails to roundtrip on more exotic dtypes:

>>> dt = np.dtype((int, 3))
>>> dt
dtype(('<i4', (3,)))
>>> np.dtype(dt.descr)
dtype([('f0', 'V12')])

While it seems I'm wrong about titles being obsolete (and was just remembering @ahaldane's PR), I would definitely recommend avoiding dtype.descr, and given that all internal uses are slowly being removed to fix bugs, would consider it obsolete.

axeloide · 2017-09-04T10:02:55Z

@eric-wieser I agree with both your remarks. Having metadata/titles on the field-type instead would also be great.
I was being too lazy/confused when using repr(numpy_array.dtype.descr), I guess it would be much more reliable to use repr(numpy_array.dtype), instead.

Fixes numpy#9625

ahaldane mentioned this issue Sep 4, 2017

MAINT: struct assignment "by field position", multi-field indices return views #6053

Merged

charris closed this as completed in 9f27418 Sep 9, 2017

theodoregoetz pushed a commit to theodoregoetz/numpy that referenced this issue Oct 23, 2017

BUG: account for field titles in multi-field indexes

9d4b61d

Fixes numpy#9625

ahaldane mentioned this issue Dec 5, 2017

Struct dtype compat for NumPy 1.14 dask/dask#2964

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Structured array drops field-titles when being 'sliced' by field-names #9625

Structured array drops field-titles when being 'sliced' by field-names #9625

axeloide commented Aug 29, 2017 •

edited

Loading

eric-wieser commented Aug 30, 2017

axeloide commented Aug 31, 2017 •

edited

Loading

ahaldane commented Sep 3, 2017

ahaldane commented Sep 3, 2017

ahaldane commented Sep 3, 2017

ahaldane commented Sep 3, 2017

axeloide commented Sep 4, 2017

eric-wieser commented Sep 4, 2017 •

edited

Loading

eric-wieser commented Sep 4, 2017 •

edited

Loading

axeloide commented Sep 4, 2017

Structured array drops field-titles when being 'sliced' by field-names #9625

Structured array drops field-titles when being 'sliced' by field-names #9625

Comments

axeloide commented Aug 29, 2017 • edited Loading

eric-wieser commented Aug 30, 2017

axeloide commented Aug 31, 2017 • edited Loading

ahaldane commented Sep 3, 2017

ahaldane commented Sep 3, 2017

ahaldane commented Sep 3, 2017

ahaldane commented Sep 3, 2017

axeloide commented Sep 4, 2017

eric-wieser commented Sep 4, 2017 • edited Loading

eric-wieser commented Sep 4, 2017 • edited Loading

axeloide commented Sep 4, 2017

axeloide commented Aug 29, 2017 •

edited

Loading

axeloide commented Aug 31, 2017 •

edited

Loading

eric-wieser commented Sep 4, 2017 •

edited

Loading

eric-wieser commented Sep 4, 2017 •

edited

Loading