-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] inconsistent behaviour of cudf.DataFrame and pandas.DataFrame from list of tuples #1705
Comments
There's a few ways to go about this: Longest Dev Time + Better Performance Shorter Dev Time + "No Pandas" Shortest Dev Time + 100% Compatibility I'd go with the shortest dev time, assuming we just want matching functionality and don't care about "code purity" or performance. That brings up an interesting question about the current implementation of @harrism does libcudf happen to have rows => table conversion already? |
I don't think we have that functionality. |
@cwharris what exactly do you mean by "row to column" format? I assume you don't mean a transpose. There is no concept of a "row" in the traditional database sense in cuDF -- everything is stored in columns. Tables are always made up of columns. In any case I think the shortest dev time approach is the right first step, since the request here isn't about performance, it's about compatibility. (Also, this is bug originates internally, not from an end user.) |
Personally I think that no promise of performance can/should be made for the case of tuple inputs, especially given that the data is assumed to be rows in this case. I think going through Pandas in this case might be the best approach. |
While we can update |
Describe the bug
cuDF DataFrame treats first element of each tuple as column name, second element of each tuple as column, while pandas treats each tuple as row, when initializing from list of tuples.
cudf DataFrame behaviour is similar to
pandas.DataFrame.from_items
API which is deprecated.Steps/Code to reproduce bug
Expected behavior
cudf DataFrame initialization should match with pandas dataframe initialization behavior.
Environment details :
Additional context
Also fix the documentation in dataframe.iloc and related tests.
The text was updated successfully, but these errors were encountered: