Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Fix columns & index handling in dataframe constructor #6838

Merged
merged 28 commits into from
Dec 8, 2020

Conversation

galipremsagar
Copy link
Contributor

@galipremsagar galipremsagar commented Nov 24, 2020

Fixes: #6821

This PR fixes issue where columns and index are currently not being handled correctly in specific scenarios.

@galipremsagar galipremsagar added 2 - In Progress Currently a work in progress Python Affects Python cuDF API. labels Nov 24, 2020
@galipremsagar galipremsagar self-assigned this Nov 24, 2020
@galipremsagar galipremsagar added this to PR-WIP in v0.17 Release via automation Nov 24, 2020
@GPUtester
Copy link
Collaborator

Please update the changelog in order to start CI tests.

View the gpuCI docs here.

@codecov
Copy link

codecov bot commented Nov 24, 2020

Codecov Report

Merging #6838 (f877d51) into branch-0.18 (598a14d) will increase coverage by 0.46%.
The diff coverage is 98.33%.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.18    #6838      +/-   ##
===============================================
+ Coverage        81.53%   82.00%   +0.46%     
===============================================
  Files               96       96              
  Lines            15876    16245     +369     
===============================================
+ Hits             12945    13321     +376     
+ Misses            2931     2924       -7     
Impacted Files Coverage Δ
python/cudf/cudf/core/dataframe.py 91.08% <96.29%> (+<0.01%) ⬆️
python/cudf/cudf/core/frame.py 90.34% <100.00%> (+0.34%) ⬆️
python/dask_cudf/dask_cudf/sorting.py 93.38% <100.00%> (+0.25%) ⬆️
python/cudf/cudf/io/feather.py 100.00% <0.00%> (ø)
python/cudf/cudf/comm/serialize.py 0.00% <0.00%> (ø)
python/cudf/cudf/_fuzz_testing/io.py 0.00% <0.00%> (ø)
python/dask_cudf/dask_cudf/_version.py 0.00% <0.00%> (ø)
python/dask_cudf/dask_cudf/io/tests/test_csv.py 100.00% <0.00%> (ø)
python/dask_cudf/dask_cudf/io/tests/test_orc.py 100.00% <0.00%> (ø)
python/dask_cudf/dask_cudf/io/tests/test_json.py 100.00% <0.00%> (ø)
... and 38 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 598a14d...f877d51. Read the comment docs.

@galipremsagar galipremsagar changed the title [WIP] Fix columns & index handling in dataframe constructor [REVIEW] Fix columns & index handling in dataframe constructor Nov 24, 2020
@galipremsagar galipremsagar marked this pull request as ready for review November 24, 2020 22:07
@galipremsagar galipremsagar requested review from a team as code owners November 24, 2020 22:07
@shwina
Copy link
Contributor

shwina commented Nov 24, 2020

At a high level it looks like this shares some logic with reindex.

Can we find a way to reuse code between the two?

@galipremsagar galipremsagar changed the title [REVIEW] Fix columns & index handling in dataframe constructor [WIP] Fix columns & index handling in dataframe constructor Nov 25, 2020
@galipremsagar galipremsagar changed the title [WIP] Fix columns & index handling in dataframe constructor [REVIEW] Fix columns & index handling in dataframe constructor Dec 2, 2020
@galipremsagar galipremsagar requested review from harrism and codereport and removed request for a team, harrism and codereport December 4, 2020 00:23
@galipremsagar galipremsagar changed the title [WIP] Fix columns & index handling in dataframe constructor [REVIEW] Fix columns & index handling in dataframe constructor Dec 4, 2020
@galipremsagar galipremsagar added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Dec 4, 2020
@galipremsagar galipremsagar moved this from PR-WIP to PR-Needs review in v0.18 Release Dec 4, 2020
Copy link
Member

@rjzamora rjzamora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dask_cudf changes seem fine to me. If the map_partitions call could happen on a large graph, I would suggest that we find a way to push the set_index operation into an earlier task. However, this is a single-partition dask_cudf.DataFrame by design, so graph size will never be an issue.

Copy link
Contributor

@shwina shwina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

v0.18 Release automation moved this from PR-Needs review to PR-Reviewer approved Dec 8, 2020
@kkraus14 kkraus14 added 5 - Ready to Merge Testing and reviews complete, ready to merge 6 - Okay to Auto-Merge and removed 3 - Ready for Review Ready for review by team labels Dec 8, 2020
@rapids-bot rapids-bot bot merged commit f6b16ab into rapidsai:branch-0.18 Dec 8, 2020
v0.18 Release automation moved this from PR-Reviewer approved to Done Dec 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge bug Something isn't working non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
No open projects
v0.18 Release
  
Done
Development

Successfully merging this pull request may close these issues.

[BUG] columns parameter is being ignored in specific cases in DataFrame constructor
5 participants