Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: df.from_records() is all kinds of broken #2179

Closed
ghost opened this issue Nov 5, 2012 · 9 comments
Closed

BUG: df.from_records() is all kinds of broken #2179

ghost opened this issue Nov 5, 2012 · 9 comments
Assignees
Milestone

Comments

@ghost
Copy link

ghost commented Nov 5, 2012

  • the columns argument gets clobbered:
    In [13]: pd.DataFrame.from_records({1:["foo"],2:["bar"]},columns=['a','b']).columns
    Out[13]: Index([1, 2], dtype=int64)
  • if index is specified, and result_index computed, it will get clobbered
    later on.
  • if index is specified as a list of labels, with the first few matching columns names
    and the others not, then sdict will get mutilated, because the removal of columns
    occurs within the try clause rather then after success.
    EDIT - ignore, I misread the code.
  • There's duplication against the main Dataframe ctor which also accepts dicts and arrays,
    and The exclusion and float coercsion which are unique to from_records() would be useful
    to have generally available in the main ctor anyway.
  • It's unclear if the columns argument should be specified relative to the original
    data, or relative to the data modulo excluded columns
  • Doesn't support duplicate column names (Although that's checked with
    a warning)
  • the docstring specifies the datatypes for data , which do not include a dict.
    But the code specifically checks and handles dict input, This is a minor thing, but
    using columns along with dict is not well defined because of key (non-)ordering,
    so the original docstring spec seems more sane.
@wesm
Copy link
Member

wesm commented Nov 5, 2012

Indeed this code has been fairly neglected. I'll see if I can easily fix some of the issues you raised

@wesm
Copy link
Member

wesm commented Nov 5, 2012

A major issue here is that the index argument is overloaded to take either index field names or an array to be used as the index. This was a pretty bone-headed decision made a long time ago

@ghost
Copy link
Author

ghost commented Nov 5, 2012

agreed, that was a bit too flexible. But if the loop for removing the columns
gets put in an else caluse to the try/except that's taken care of.

I couldn't figure out a way to get the main ctor to plug data in as columns rather
then rows after sorting out the rest.

@wesm
Copy link
Member

wesm commented Nov 5, 2012

I'm adding a docs clarification about what columns means (i.e: renaming columns is not possible with this function at the moment) and leaving this issue open for more cleanup at a later time (and fixing the bug from the first bullet point).

@ghost
Copy link
Author

ghost commented Nov 5, 2012

I'm pretty sure if you move the if/else setting result_index above
the check on index , you can close the 2rd as well.

@wesm
Copy link
Member

wesm commented Nov 5, 2012

Test case?

@ghost
Copy link
Author

ghost commented Nov 5, 2012

no need, I see you've removed the if entirely.

@wesm
Copy link
Member

wesm commented Nov 24, 2012

What still needs to get fixed here?

@ghost
Copy link
Author

ghost commented Nov 24, 2012

support for duplicate columns is still missing. other then that I think it's good.

@ghost ghost assigned wesm Nov 24, 2012
@wesm wesm closed this as completed in f64ccf0 Nov 24, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant