Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH(?): Improved structured array creation support #10910

Closed
attoblarg opened this issue Apr 15, 2018 · 8 comments
Closed

ENH(?): Improved structured array creation support #10910

attoblarg opened this issue Apr 15, 2018 · 8 comments
Labels
15 - Discussion 57 - Close? Issues which may be closable unless discussion continued component: numpy.dtype

Comments

@attoblarg
Copy link

Creating a structured array from already existing data (to enable named-field access) seems to be more difficult than I expect.

What I expected was I could pass my existing 2D ndarray to numpy.array with a new dtype:

a = np.arange(4).reshape(2,2)
b = np.array(a, dtype=[('a', float), ('b', int)])

This does give the expected dtype ([('a', '<f8'), ('b', '<i4')]), but the resulting array has repeated entries and more dimensions:

[[(0., 0) (1., 1)]
 [(2., 2) (3., 3)]]

and the field access b['a'] gives:

[[0. 1.]
 [2. 3.]]

Using np.array, np.asarray, or ndarray.astype all do this.

Using direct assignment of ndarray.dtype or using ndarray.view (with dtype=[('a', float), ('b', float)]) gives something a little different (still the wrong number of dimensions):

[[(0, 1)]
 [(2, 3)]]

and the field access b['a'] gives:

[[0]
 [2]]

What does give the expected result:

  • If numpy.array is passed a list of tuples ([(0,1), (2,3)]), then it behaves as expected (not list of lists, list of 1D ndarrays, 2D ndarray, etc.). But copying large arrays into tuples first not great.
  • Using numpy.rec.fromarrays, passing the transpose of a. A record array isn't necessary, but this is an easy solution.

So there is a currently available solution, but it still seems that some of the seemingly straight-forward ways of doing this don't give the expected result.

@eric-wieser
Copy link
Member

eric-wieser commented Apr 16, 2018

arr.view([('a', float), ('b', float)]).squeeze(axis=-1) should do what you want.

If you want the b column to be an int, then you can do .astype([('a', float), ('b', int)]) on the result.

You might consider this a little verbose, but it ensures that only one copy is made.

@Dan-Patterson
Copy link

A discussion on stack overflow provides some extra examples and insight . I prefer working with numpy rather than pandas for mixed data type arrays for a variety of reasons. The use of a list comprehension to change a list of lists to a list of tuples seems to be the easiest to remember before applying the desired dtype.

@eric-wieser
Copy link
Member

eric-wieser commented Apr 16, 2018

@Dan-Patterson: Converting to tuples and back is O(arr.size), using .view(dt).squeeze(axis=-1) is O(1)

@attoblarg
Copy link
Author

@eric-wieser: Thanks, that will also do. However, I would like to know if the current result from .view without squeeze is actually the intended result, or if it might get "fixed" eventually, causing old code containing this work-around to fail.

@Dan-Patterson
Copy link

@eric-wieser your edited expression using .astype is much clearer now.

@ahaldane
Copy link
Member

We actually discussed introducing a function structured_to_unstructured and vice versa on the mailing list back in January:

http://numpy-discussion.10968.n7.nabble.com/Setting-custom-dtypes-and-1-14-tp45156p45207.html

You can see the docstrings and implementations here:

ahaldane@f779c49

Further suggestions welcome.

I think we are on hold actually merging it because we were waiting to decide how to implement repack_fields first, see ongoing discussion in #10411

@ahaldane
Copy link
Member

@attoblarg the current result from .view should always work in the future, but be aware that views are memory-layout-dependent.

In other words, if you are sure your starting array is contiguous in memory with no padding bytes or strides, and has exactly 64 bit entries, then doing arr.view('f8,f8') works now and forever. Just be aware your code will fail if arr is not contiguous (eg, rand(10,2).view('f8,f8') works but rand(10,4)[:, ::2].view('f8,f8') does not, and rand(10,2).view('f4,f4') will not)

@mattip mattip added 15 - Discussion 57 - Close? Issues which may be closable unless discussion continued labels Apr 18, 2018
@mattip
Copy link
Member

mattip commented Jan 16, 2019

Closing. BTW, structured_to_unstructured was added to NumPy in 1.16

@mattip mattip closed this as completed Jan 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
15 - Discussion 57 - Close? Issues which may be closable unless discussion continued component: numpy.dtype
Projects
None yet
Development

No branches or pull requests

5 participants