Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

record array is made up of numpy.void, not numpy.record #3581

Closed
potyt opened this issue Aug 6, 2013 · 20 comments
Closed

record array is made up of numpy.void, not numpy.record #3581

potyt opened this issue Aug 6, 2013 · 20 comments

Comments

@potyt
Copy link

potyt commented Aug 6, 2013

I raised this previously on the PyTables issue list, but it they have asked me to transfer to numpy. (PyTables/PyTables#271)

>>> import numpy as np
>>> a = (1, 2, 3)
>>> ra = np.rec.fromrecords([a])
>>> na = np.array([a], dtype='i8,i8,i8').view(type=np.recarray)
>>> na
rec.array([(1, 2, 3)], 
      dtype=[('f0', '<i8'), ('f1', '<i8'), ('f2', '<i8')])
>>> ra
rec.array([(1, 2, 3)], 
      dtype=[('f0', '<i8'), ('f1', '<i8'), ('f2', '<i8')])
>>> type(ra)
<class 'numpy.core.records.recarray'>
>>> type(na)
<class 'numpy.core.records.recarray'>
>>> ra[0]
(1, 2, 3)
>>> na[0]
(1, 2, 3)
>>> type(ra[0])
<class 'numpy.core.records.record'>
>>> type(na[0])
<class 'numpy.void'>

Any reason why .view can't return record arrays instead of sort of fake record arrays, or voidarrays? Is there a cast missing in the code somewhere?

@potyt
Copy link
Author

potyt commented Sep 16, 2013

Just wondering if anyone is in a position to confirm/deny this as an issue?

@potyt
Copy link
Author

potyt commented May 1, 2014

Bueller?
Bueller?

@nevion
Copy link

nevion commented Oct 9, 2014

bump for this little annoyer - I can confirm this bug and it regularly affects me... accessing structure arrays with ['fieldname'] as the longtime workaround just looks so wrong to boot...

ahaldane added a commit to ahaldane/numpy that referenced this issue Jan 24, 2015
This is a modification to the dtype str and repr functions what helps
solve numpy#3581.

I discussed it on the mailing list in a message "Re: structured arrays,
recarrays, and record arrays" on Jan 19 2015. I didn't get any replies,
but hopefully that merely means "no opinion" rather than "bad idea".

What it does: For structured arrays, if the dtype is not np.void then
print the dtype as `(base_dtype, dtype)`.

New Behavior:

 >>> a = np.array([(1,'ABC'), (2, "DEF")], dtype=[('foo', int), ('bar', 'S4')])
 >>> np.rec.array(a)
 rec.array([(1, 'ABC'), (2, 'DEF')],
       dtype=(numpy.record, [('foo', '<i8'), ('bar', 'S4')]))
 >>> a.view(np.recarray)
 rec.array([(1, 'ABC'), (2, 'DEF')],
       dtype=[('foo', '<i8'), ('bar', 'S4')])
@dalexander
Copy link

Anything that could be done to fix this issue would be appreciated... recarrays are powerful but this issue makes them tricky to use.

@jaimefrio
Copy link
Member

Does #5921 fix this issue, or is it not quite the same?

@embray
Copy link
Contributor

embray commented Jun 2, 2015

I just checked, and this doesn't appear to be fixed. I'll take a look though.

@embray
Copy link
Contributor

embray commented Jun 2, 2015

Although it's not clear to me why .view(type=numpy.recarray) returns an ndarray object and not a numpy.recarray object. Is that intentional? I'm not even sure.

@ahaldane
Copy link
Member

ahaldane commented Jun 4, 2015

@embray .view(type=numpy.recarray) does return a recarray, but it is hard to represent it in string form. If you look carefully in the repr there is an extra 'view' tacked on:

>>> np.array([a], dtype='i8,i8,i8').view(type=np.recarray)
array([(1, 2, 3)], 
  dtype=[('f0', '<i8'), ('f1', '<i8'), ('f2', '<i8')]).view(numpy.recarray)

I discussed it in #5523.

Also, I added new docs for record arrays which explain what I've done in #5482, which aren't visible in the online docs yet. Improvements welcome.

@ahaldane
Copy link
Member

ahaldane commented Jun 4, 2015

Also, I consider this issue fixed, as explained in #5523.

In the example code, ra is a recarray with dtype of np.record, while na is a recarray with dtype of np.void.

@nevion
Copy link

nevion commented Jun 4, 2015

You might consider this issue fixed, but highly related #4443 is still broken. Just tested against the issue in that example and the others I considered dupes.

@ahaldane
Copy link
Member

ahaldane commented Jun 4, 2015

@nevion ha, I didn't mean to sound so final.

But #4443 is working as I expect. If you read the new docs I wrote in #5482 (here) np.rec.array(a) and a.view(np.recarray) (for some ndarray a) give you two differently behaving objects.

The current issue #3581 arose because their reprs were identical, which was confusing. So in #5523 I made sure their reprs are different, to make it clear to the user that they are different. I couldn't think of a better fix, but if there is one that would be great. The problem is that there are two separate aspects of record arrays: 1. They have type np.recarray and 2. their dtype is np.record. I don't see a way of guaranteeing that both properties always come together, since the user can always do .view(np.recarray) which misses the np.record dtype.

I think that a.view(np.recarray) should be discouraged, since as I explain in the docs it is a kind of "halfway" record array. Users should always use np.rec.array, or one of the other np.rec creation functions.

As I note in the docs you can get a "true" record array using a view by doing

a.view(dtype=(np.record, a.dtype), type=np.recarray)

@embray
Copy link
Contributor

embray commented Jun 4, 2015

Okay, so I guess the current behavior could be considered consistent, if confusing. In any case it's maybe not worth worrying about since, last I heard, recarrays are all but deprecated?

@charris
Copy link
Member

charris commented Jun 4, 2015

Recarrays seem pretty widely used, but have never been 'official', which may be an oversight. I'm not sure what the long term plan for them should be. Feedback welcome, and maybe a post on the mailing list for discussion.

@mhvk
Copy link
Contributor

mhvk commented Jun 4, 2015

Sorry to be late in this, but I'm slightly confused why recarray wouldn't just turn the dtype into a proper np.record if a view is taken? This could be handled by adding an appropriate __array_finalize__ to np.recarray.

@ahaldane
Copy link
Member

ahaldane commented Jun 4, 2015

@mhvk when I wrote those PRs I didn't know about __array_finalize__, but now I do and I was just thinking about that this morning. It looks like adding

def __array_finalize__(self, obj):
    self.dtype = dtype(record, self.dtype)

to np.recarray works. Is that the right way to change the dtype?

Edit: I spoke too soon that doesn't work

@mhvk
Copy link
Contributor

mhvk commented Jun 4, 2015

@ahaldane - yes, in principle, though be careful that you can get into __array_finalize__ via different routes [1]. You may want to do something like

def __array_finalize__(self, obj):
    if obj is not None and type(obj) is not type(self):
        self.dtype = dtype(record, obj.dtype)

This way nothing gets changed when initializing or slicing a recarray.

Though in practice, since for initialization or slicing the dtype is already correct, you could also just check whether self.dtype is correct and only change it if it is not.

[1] http://docs.scipy.org/doc/numpy/user/basics.subclassing.html#the-role-of-array-finalize

@ahaldane
Copy link
Member

ahaldane commented Jun 4, 2015

@mhvk I see. I think I got it to work (I was missing ()) with this:

def __array_finalize__(self, obj):
    if obj is not None and type(obj) is not type(self): 
        self.dtype = sb.dtype((record, self.dtype))

then I see it works:

>>> a = np.zeros(2 dtype='i4,f4')
>>> a.view(np.recarray)
rec.array([(0, 0.0), (0, 0.0)], 
      dtype=[('f0', '<i4'), ('f1', '<f4')])
>>> a.view(np.recarray).dtype
dtype((numpy.record, [('f0', '<i4'), ('f1', '<f4')]))

I'll submit a PR. Good thing this came up for discussion again!

@mhvk
Copy link
Contributor

mhvk commented Jun 4, 2015

@ahaldane - great! hope this solves @embray's issues too...

@embray
Copy link
Contributor

embray commented Jun 4, 2015

@mhvk I was thinking this as well. I guess I figured if you take a view with type=np.recarray but not with dtype=(np.record, arr.dtype) then maybe it would be consistent. But I agree it's actually more consistent that recarray objects always return elements with np.record type, and your suggestion is exactly how that is done.

@charris
Copy link
Member

charris commented Jun 5, 2015

@abalkin would be good to have a PR for this. Does defining __array_finalize_array solve any of the problems for which there are other PRs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants