record array is made up of numpy.void, not numpy.record #3581

potyt · 2013-08-06T08:33:42Z

I raised this previously on the PyTables issue list, but it they have asked me to transfer to numpy. (PyTables/PyTables#271)

>>> import numpy as np
>>> a = (1, 2, 3)
>>> ra = np.rec.fromrecords([a])
>>> na = np.array([a], dtype='i8,i8,i8').view(type=np.recarray)
>>> na
rec.array([(1, 2, 3)], 
      dtype=[('f0', '<i8'), ('f1', '<i8'), ('f2', '<i8')])
>>> ra
rec.array([(1, 2, 3)], 
      dtype=[('f0', '<i8'), ('f1', '<i8'), ('f2', '<i8')])
>>> type(ra)
<class 'numpy.core.records.recarray'>
>>> type(na)
<class 'numpy.core.records.recarray'>
>>> ra[0]
(1, 2, 3)
>>> na[0]
(1, 2, 3)
>>> type(ra[0])
<class 'numpy.core.records.record'>
>>> type(na[0])
<class 'numpy.void'>

Any reason why .view can't return record arrays instead of sort of fake record arrays, or voidarrays? Is there a cast missing in the code somewhere?

The text was updated successfully, but these errors were encountered:

potyt · 2013-09-16T13:29:33Z

Just wondering if anyone is in a position to confirm/deny this as an issue?

potyt · 2014-05-01T19:16:18Z

Bueller?
Bueller?

nevion · 2014-10-09T10:15:08Z

bump for this little annoyer - I can confirm this bug and it regularly affects me... accessing structure arrays with ['fieldname'] as the longtime workaround just looks so wrong to boot...

This is a modification to the dtype str and repr functions what helps solve numpy#3581. I discussed it on the mailing list in a message "Re: structured arrays, recarrays, and record arrays" on Jan 19 2015. I didn't get any replies, but hopefully that merely means "no opinion" rather than "bad idea". What it does: For structured arrays, if the dtype is not np.void then print the dtype as `(base_dtype, dtype)`. New Behavior: >>> a = np.array([(1,'ABC'), (2, "DEF")], dtype=[('foo', int), ('bar', 'S4')]) >>> np.rec.array(a) rec.array([(1, 'ABC'), (2, 'DEF')], dtype=(numpy.record, [('foo', '<i8'), ('bar', 'S4')])) >>> a.view(np.recarray) rec.array([(1, 'ABC'), (2, 'DEF')], dtype=[('foo', '<i8'), ('bar', 'S4')])

dalexander · 2015-06-02T00:48:57Z

Anything that could be done to fix this issue would be appreciated... recarrays are powerful but this issue makes them tricky to use.

jaimefrio · 2015-06-02T01:22:47Z

Does #5921 fix this issue, or is it not quite the same?

embray · 2015-06-02T21:47:48Z

I just checked, and this doesn't appear to be fixed. I'll take a look though.

embray · 2015-06-02T21:50:19Z

Although it's not clear to me why .view(type=numpy.recarray) returns an ndarray object and not a numpy.recarray object. Is that intentional? I'm not even sure.

ahaldane · 2015-06-04T04:18:49Z

@embray .view(type=numpy.recarray) does return a recarray, but it is hard to represent it in string form. If you look carefully in the repr there is an extra 'view' tacked on:

>>> np.array([a], dtype='i8,i8,i8').view(type=np.recarray)
array([(1, 2, 3)], 
  dtype=[('f0', '<i8'), ('f1', '<i8'), ('f2', '<i8')]).view(numpy.recarray)

I discussed it in #5523.

Also, I added new docs for record arrays which explain what I've done in #5482, which aren't visible in the online docs yet. Improvements welcome.

ahaldane · 2015-06-04T04:24:43Z

Also, I consider this issue fixed, as explained in #5523.

In the example code, ra is a recarray with dtype of np.record, while na is a recarray with dtype of np.void.

nevion · 2015-06-04T05:02:59Z

You might consider this issue fixed, but highly related #4443 is still broken. Just tested against the issue in that example and the others I considered dupes.

ahaldane · 2015-06-04T05:20:04Z

@nevion ha, I didn't mean to sound so final.

But #4443 is working as I expect. If you read the new docs I wrote in #5482 (here) np.rec.array(a) and a.view(np.recarray) (for some ndarray a) give you two differently behaving objects.

The current issue #3581 arose because their reprs were identical, which was confusing. So in #5523 I made sure their reprs are different, to make it clear to the user that they are different. I couldn't think of a better fix, but if there is one that would be great. The problem is that there are two separate aspects of record arrays: 1. They have type np.recarray and 2. their dtype is np.record. I don't see a way of guaranteeing that both properties always come together, since the user can always do .view(np.recarray) which misses the np.record dtype.

I think that a.view(np.recarray) should be discouraged, since as I explain in the docs it is a kind of "halfway" record array. Users should always use np.rec.array, or one of the other np.rec creation functions.

As I note in the docs you can get a "true" record array using a view by doing

a.view(dtype=(np.record, a.dtype), type=np.recarray)

embray · 2015-06-04T15:13:29Z

Okay, so I guess the current behavior could be considered consistent, if confusing. In any case it's maybe not worth worrying about since, last I heard, recarrays are all but deprecated?

charris · 2015-06-04T15:40:45Z

Recarrays seem pretty widely used, but have never been 'official', which may be an oversight. I'm not sure what the long term plan for them should be. Feedback welcome, and maybe a post on the mailing list for discussion.

mhvk · 2015-06-04T16:04:01Z

Sorry to be late in this, but I'm slightly confused why recarray wouldn't just turn the dtype into a proper np.record if a view is taken? This could be handled by adding an appropriate __array_finalize__ to np.recarray.

ahaldane · 2015-06-04T16:09:17Z

@mhvk when I wrote those PRs I didn't know about __array_finalize__, but now I do and I was just thinking about that this morning. It looks like adding

def __array_finalize__(self, obj):
    self.dtype = dtype(record, self.dtype)

to np.recarray works. Is that the right way to change the dtype?

Edit: I spoke too soon that doesn't work

mhvk · 2015-06-04T16:20:14Z

@ahaldane - yes, in principle, though be careful that you can get into __array_finalize__ via different routes [1]. You may want to do something like

def __array_finalize__(self, obj):
    if obj is not None and type(obj) is not type(self):
        self.dtype = dtype(record, obj.dtype)

This way nothing gets changed when initializing or slicing a recarray.

Though in practice, since for initialization or slicing the dtype is already correct, you could also just check whether self.dtype is correct and only change it if it is not.

[1] http://docs.scipy.org/doc/numpy/user/basics.subclassing.html#the-role-of-array-finalize

ahaldane · 2015-06-04T16:27:02Z

@mhvk I see. I think I got it to work (I was missing ()) with this:

def __array_finalize__(self, obj):
    if obj is not None and type(obj) is not type(self): 
        self.dtype = sb.dtype((record, self.dtype))

then I see it works:

>>> a = np.zeros(2 dtype='i4,f4')
>>> a.view(np.recarray)
rec.array([(0, 0.0), (0, 0.0)], 
      dtype=[('f0', '<i4'), ('f1', '<f4')])
>>> a.view(np.recarray).dtype
dtype((numpy.record, [('f0', '<i4'), ('f1', '<f4')]))

I'll submit a PR. Good thing this came up for discussion again!

mhvk · 2015-06-04T16:43:42Z

@ahaldane - great! hope this solves @embray's issues too...

embray · 2015-06-04T20:59:42Z

@mhvk I was thinking this as well. I guess I figured if you take a view with type=np.recarray but not with dtype=(np.record, arr.dtype) then maybe it would be consistent. But I agree it's actually more consistent that recarray objects always return elements with np.record type, and your suggestion is exactly how that is done.

charris · 2015-06-05T16:27:18Z

@abalkin would be good to have a PR for this. Does defining __array_finalize_array solve any of the problems for which there are other PRs?

charris added Defect labels Feb 23, 2014

This was referenced Oct 9, 2014

Inconsistent behaviour of recarrays #4443

Closed

Recarray indexing returns numpy.void for nested dtypes #4444

Closed

ahaldane mentioned this issue Jan 18, 2015

make recarray.attr return ndarray (not chararray) #5454

Closed

ahaldane mentioned this issue Jan 22, 2015

ENH: Show subclass type in dtype repr and str of structured arrays #5483

Merged

ahaldane mentioned this issue Jan 29, 2015

BUG: recarray __repr__ gives inaccurate representation #5523

Merged

ahaldane mentioned this issue Jun 5, 2015

BUG: automatically convert recarray dtype to np.record #5943

Merged

charris closed this as completed in a93b862 Jun 19, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

record array is made up of numpy.void, not numpy.record #3581

record array is made up of numpy.void, not numpy.record #3581

potyt commented Aug 6, 2013

potyt commented Sep 16, 2013

potyt commented May 1, 2014

nevion commented Oct 9, 2014

dalexander commented Jun 2, 2015

jaimefrio commented Jun 2, 2015

embray commented Jun 2, 2015

embray commented Jun 2, 2015

ahaldane commented Jun 4, 2015

ahaldane commented Jun 4, 2015

nevion commented Jun 4, 2015

ahaldane commented Jun 4, 2015

embray commented Jun 4, 2015

charris commented Jun 4, 2015

mhvk commented Jun 4, 2015

ahaldane commented Jun 4, 2015

mhvk commented Jun 4, 2015

ahaldane commented Jun 4, 2015

mhvk commented Jun 4, 2015

embray commented Jun 4, 2015

charris commented Jun 5, 2015

record array is made up of numpy.void, not numpy.record #3581

record array is made up of numpy.void, not numpy.record #3581

Comments

potyt commented Aug 6, 2013

potyt commented Sep 16, 2013

potyt commented May 1, 2014

nevion commented Oct 9, 2014

dalexander commented Jun 2, 2015

jaimefrio commented Jun 2, 2015

embray commented Jun 2, 2015

embray commented Jun 2, 2015

ahaldane commented Jun 4, 2015

ahaldane commented Jun 4, 2015

nevion commented Jun 4, 2015

ahaldane commented Jun 4, 2015

embray commented Jun 4, 2015

charris commented Jun 4, 2015

mhvk commented Jun 4, 2015

ahaldane commented Jun 4, 2015

mhvk commented Jun 4, 2015

ahaldane commented Jun 4, 2015

mhvk commented Jun 4, 2015

embray commented Jun 4, 2015

charris commented Jun 5, 2015