ENH(?): Improved structured array creation support #10910

attoblarg · 2018-04-15T22:49:38Z

Creating a structured array from already existing data (to enable named-field access) seems to be more difficult than I expect.

What I expected was I could pass my existing 2D ndarray to numpy.array with a new dtype:

a = np.arange(4).reshape(2,2)
b = np.array(a, dtype=[('a', float), ('b', int)])

This does give the expected dtype ([('a', '<f8'), ('b', '<i4')]), but the resulting array has repeated entries and more dimensions:

[[(0., 0) (1., 1)]
 [(2., 2) (3., 3)]]

and the field access b['a'] gives:

[[0. 1.]
 [2. 3.]]

Using np.array, np.asarray, or ndarray.astype all do this.

Using direct assignment of ndarray.dtype or using ndarray.view (with dtype=[('a', float), ('b', float)]) gives something a little different (still the wrong number of dimensions):

[[(0, 1)]
 [(2, 3)]]

and the field access b['a'] gives:

[[0]
 [2]]

What does give the expected result:

If numpy.array is passed a list of tuples ([(0,1), (2,3)]), then it behaves as expected (not list of lists, list of 1D ndarrays, 2D ndarray, etc.). But copying large arrays into tuples first not great.
Using numpy.rec.fromarrays, passing the transpose of a. A record array isn't necessary, but this is an easy solution.

So there is a currently available solution, but it still seems that some of the seemingly straight-forward ways of doing this don't give the expected result.

The text was updated successfully, but these errors were encountered:

eric-wieser · 2018-04-16T01:04:43Z

arr.view([('a', float), ('b', float)]).squeeze(axis=-1) should do what you want.

If you want the b column to be an int, then you can do .astype([('a', float), ('b', int)]) on the result.

You might consider this a little verbose, but it ensures that only one copy is made.

Dan-Patterson · 2018-04-16T06:24:26Z

A discussion on stack overflow provides some extra examples and insight . I prefer working with numpy rather than pandas for mixed data type arrays for a variety of reasons. The use of a list comprehension to change a list of lists to a list of tuples seems to be the easiest to remember before applying the desired dtype.

eric-wieser · 2018-04-16T06:42:48Z

@Dan-Patterson: Converting to tuples and back is O(arr.size), using .view(dt).squeeze(axis=-1) is O(1)

attoblarg · 2018-04-16T12:47:50Z

@eric-wieser: Thanks, that will also do. However, I would like to know if the current result from .view without squeeze is actually the intended result, or if it might get "fixed" eventually, causing old code containing this work-around to fail.

Dan-Patterson · 2018-04-16T15:42:05Z

@eric-wieser your edited expression using .astype is much clearer now.

ahaldane · 2018-04-16T18:58:16Z

We actually discussed introducing a function structured_to_unstructured and vice versa on the mailing list back in January:

http://numpy-discussion.10968.n7.nabble.com/Setting-custom-dtypes-and-1-14-tp45156p45207.html

You can see the docstrings and implementations here:

ahaldane@f779c49

Further suggestions welcome.

I think we are on hold actually merging it because we were waiting to decide how to implement repack_fields first, see ongoing discussion in #10411

ahaldane · 2018-04-16T19:51:53Z

@attoblarg the current result from .view should always work in the future, but be aware that views are memory-layout-dependent.

In other words, if you are sure your starting array is contiguous in memory with no padding bytes or strides, and has exactly 64 bit entries, then doing arr.view('f8,f8') works now and forever. Just be aware your code will fail if arr is not contiguous (eg, rand(10,2).view('f8,f8') works but rand(10,4)[:, ::2].view('f8,f8') does not, and rand(10,2).view('f4,f4') will not)

mattip · 2019-01-16T08:58:03Z

Closing. BTW, structured_to_unstructured was added to NumPy in 1.16

mattip added 15 - Discussion 57 - Close? Issues which may be closable unless discussion continued labels Apr 18, 2018

mattip added the component: numpy.dtype label Aug 10, 2018

mattip closed this as completed Jan 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH(?): Improved structured array creation support #10910

ENH(?): Improved structured array creation support #10910

attoblarg commented Apr 15, 2018

eric-wieser commented Apr 16, 2018 •

edited

Dan-Patterson commented Apr 16, 2018

eric-wieser commented Apr 16, 2018 •

edited

attoblarg commented Apr 16, 2018

Dan-Patterson commented Apr 16, 2018

ahaldane commented Apr 16, 2018

ahaldane commented Apr 16, 2018

mattip commented Jan 16, 2019

ENH(?): Improved structured array creation support #10910

ENH(?): Improved structured array creation support #10910

Comments

attoblarg commented Apr 15, 2018

eric-wieser commented Apr 16, 2018 • edited

Dan-Patterson commented Apr 16, 2018

eric-wieser commented Apr 16, 2018 • edited

attoblarg commented Apr 16, 2018

Dan-Patterson commented Apr 16, 2018

ahaldane commented Apr 16, 2018

ahaldane commented Apr 16, 2018

mattip commented Jan 16, 2019

eric-wieser commented Apr 16, 2018 •

edited

eric-wieser commented Apr 16, 2018 •

edited