BUG: df.from_records() is all kinds of broken #2179

ghost · 2012-11-05T12:58:16Z

the columns argument gets clobbered:
In [13]: pd.DataFrame.from_records({1:["foo"],2:["bar"]},columns=['a','b']).columns
Out[13]: Index([1, 2], dtype=int64)
if index is specified, and result_index computed, it will get clobbered
later on.
if index is specified as a list of labels, with the first few matching columns names
and the others not, then sdict will get mutilated, because the removal of columns
occurs within the try clause rather then after success.
EDIT - ignore, I misread the code.
There's duplication against the main Dataframe ctor which also accepts dicts and arrays,
and The exclusion and float coercsion which are unique to from_records() would be useful
to have generally available in the main ctor anyway.
It's unclear if the columns argument should be specified relative to the original
data, or relative to the data modulo excluded columns
Doesn't support duplicate column names (Although that's checked with
a warning)
the docstring specifies the datatypes for data , which do not include a dict.
But the code specifically checks and handles dict input, This is a minor thing, but
using columns along with dict is not well defined because of key (non-)ordering,
so the original docstring spec seems more sane.

The text was updated successfully, but these errors were encountered:

wesm · 2012-11-05T14:42:47Z

Indeed this code has been fairly neglected. I'll see if I can easily fix some of the issues you raised

wesm · 2012-11-05T17:19:26Z

A major issue here is that the index argument is overloaded to take either index field names or an array to be used as the index. This was a pretty bone-headed decision made a long time ago

ghost · 2012-11-05T17:26:10Z

agreed, that was a bit too flexible. But if the loop for removing the columns
gets put in an else caluse to the try/except that's taken care of.

I couldn't figure out a way to get the main ctor to plug data in as columns rather
then rows after sorting out the rest.

wesm · 2012-11-05T17:30:16Z

I'm adding a docs clarification about what columns means (i.e: renaming columns is not possible with this function at the moment) and leaving this issue open for more cleanup at a later time (and fixing the bug from the first bullet point).

ghost · 2012-11-05T17:32:56Z

I'm pretty sure if you move the if/else setting result_index above
the check on index , you can close the 2rd as well.

wesm · 2012-11-05T17:40:50Z

Test case?

ghost · 2012-11-05T17:55:41Z

no need, I see you've removed the if entirely.

…-dev#2179

wesm · 2012-11-24T00:33:54Z

What still needs to get fixed here?

ghost · 2012-11-24T00:37:02Z

support for duplicate columns is still missing. other then that I think it's good.

wesm added a commit that referenced this issue Nov 5, 2012

BUG: from_records bug when passing columns. docs clarification #2179

c1206b3

joaonatali pushed a commit to joaonatali/pandas that referenced this issue Nov 10, 2012

BUG: from_records bug when passing columns. docs clarification pandas…

b02aec7

…-dev#2179

ghost assigned wesm Nov 24, 2012

wesm closed this as completed in f64ccf0 Nov 24, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: df.from_records() is all kinds of broken #2179

BUG: df.from_records() is all kinds of broken #2179

ghost commented Nov 5, 2012

wesm commented Nov 5, 2012

wesm commented Nov 5, 2012

ghost commented Nov 5, 2012

wesm commented Nov 5, 2012

ghost commented Nov 5, 2012

wesm commented Nov 5, 2012

ghost commented Nov 5, 2012

wesm commented Nov 24, 2012

ghost commented Nov 24, 2012

BUG: df.from_records() is all kinds of broken #2179

BUG: df.from_records() is all kinds of broken #2179

Comments

ghost commented Nov 5, 2012

wesm commented Nov 5, 2012

wesm commented Nov 5, 2012

ghost commented Nov 5, 2012

wesm commented Nov 5, 2012

ghost commented Nov 5, 2012

wesm commented Nov 5, 2012

ghost commented Nov 5, 2012

wesm commented Nov 24, 2012

ghost commented Nov 24, 2012