Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame constructor raises exception with masked recarray #3478

Closed
jnothman opened this issue Apr 29, 2013 · 6 comments · Fixed by #4836
Closed

DataFrame constructor raises exception with masked recarray #3478

jnothman opened this issue Apr 29, 2013 · 6 comments · Fixed by #4836
Labels
Dtype Conversions Unexpected or buggy dtype conversions Enhancement
Milestone

Comments

@jnothman
Copy link
Contributor

The DataFrame constructor supports numpy's ndarray of primitive type, recarray (or structured array) or a masked array of simple type. It breaks when given a masked recarray:

>>> a = np.ma.mrecords.fromrecords([(1,2)], names=['a', 'b'], mask=False)
>>> pandas.DataFrame(a)
...
TypeError: cannot perform reduce with flexible type

Tested at latest public release and HEAD.

@jreback
Copy link
Contributor

jreback commented Apr 29, 2013

this is not a good way of doing this, MaskedReocrds are a sub-class of MaskedArray so it could be caught in that part of the if (and then dealt with). This also makes you always check if something is masked (when that is not very common). Would welcome fixes to this as well as some tests.

@jnothman
Copy link
Contributor Author

@jreback, I can't tell whether you're critiquing my particular implementation (#3479), or the existing implementation. I will comment as if you're critiquing my PR:

Admittedly, for efficiency, my solution is slightly slower for non-masked record arrays, because it calls _unmask and isinstance for each field. I don't expect DataFrame (or numpy record arrays) is especially useful for dealing with thousands of fields, so I don't think that's an efficiency loss worth worrying about. In terms of code cleanliness, it reduces the duplicate code between the isinstance(data, ma.MaskedArray) and isinstance(data, np.ndarray) cases, so I think it is an improvement.

If you're talking about the existing code (though this also applies to my PR), it's unnecessary to explicitly use isinstance to check if something is masked, because ma.getmaskarray(arr) will work anyway. But some kind of check for a mask is necessary first, even if it's a rare case (again, do you have a use-case that requires thousands of DataFrames to be constructed per second that this becomes an issue?). And the code uses isinstanceto avoid a .copy() where otherwise getmaskedarray would suffice.

@jreback
Copy link
Contributor

jreback commented Apr 30, 2013

I was referring to your PR.

you would be suprised about performance. We spend an inordinate amount of time trying to avoid bottlenecks. This is probably not a big deal, but it is easily avoided, by putting this under the MaskedArray check.

I am not exactly sure what ma.getmaskarray(arr) actually does but I can't imagine this is a good general purpose solution. The constructor is already complicated and have tried to limit the special cases.

Why would you not put all of the Masked stuff in a single if?

@jnothman
Copy link
Contributor Author

So let's discuss this over at #3479.

@jreback
Copy link
Contributor

jreback commented Sep 13, 2013

@jnothman getting ready to put this in (I am using your tests, but implementing slightly different).

is the intent of the fill value in a rec array to provide an immediate fillna on the Nan?

e.g. say you construct a DataFrame then ends of having NaN (say because you pass in an explicity index/columns)

DataFrames don't normally carry around a fill value, so would you a) drop it, b) immediately fill with it, c) have an option for this?

we do a) with masked arrays IIRC now

@jreback
Copy link
Contributor

jreback commented Sep 13, 2013

@jnothman sorry...i realize it is option b) on masked arrays.....will do the same then

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants