Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add checks for HLG layers in dask-cudf groupby tests #10853

Merged

Conversation

charlesbluca
Copy link
Member

This PR adds helper function check_groupby_result to dask-cudf's groupby tests, and is used in the basic tests to ensure that we are using dask-cudf's groupby_agg function to compute the result as expected.

I also expanded test_groupby_agg to test all supported aggregations, and removed tests that were made superfluous by this change.

@charlesbluca charlesbluca added 3 - Ready for Review Ready for review by team tests Unit testing for project dask-cudf non-breaking Non-breaking change labels May 13, 2022
@charlesbluca charlesbluca requested a review from a team as a code owner May 13, 2022 20:07
@charlesbluca charlesbluca added the improvement Improvement / enhancement to an existing function label May 13, 2022
@github-actions github-actions bot added the Python Affects Python cuDF API. label May 13, 2022
@codecov
Copy link

codecov bot commented May 13, 2022

Codecov Report

Base: 88.02% // Head: 88.05% // Increases project coverage by +0.02% 🎉

Coverage data is based on head (c1045bd) compared to base (11b875b).
Patch has no changes to coverable lines.

❗ Current head c1045bd differs from pull request most recent head 78be977. Consider uploading reports for the commit 78be977 to get more accurate results

Additional details and impacted files
@@               Coverage Diff                @@
##           branch-22.12   #10853      +/-   ##
================================================
+ Coverage         88.02%   88.05%   +0.02%     
================================================
  Files               135      135              
  Lines             22025    22025              
================================================
+ Hits              19388    19394       +6     
+ Misses             2637     2631       -6     
Impacted Files Coverage Δ
python/cudf/cudf/core/dataframe.py 93.67% <0.00%> (+0.04%) ⬆️
python/cudf/cudf/core/column/string.py 88.65% <0.00%> (+0.12%) ⬆️
python/cudf/cudf/core/groupby/groupby.py 91.51% <0.00%> (+0.20%) ⬆️
python/cudf/cudf/core/tools/datetimes.py 84.49% <0.00%> (+0.30%) ⬆️
python/cudf/cudf/core/column/lists.py 93.75% <0.00%> (+0.96%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@caryr35 caryr35 added this to PR-WIP in v22.08 Release via automation May 20, 2022
@caryr35 caryr35 moved this from PR-WIP to PR-Needs review in v22.08 Release May 20, 2022
@charlesbluca charlesbluca changed the base branch from branch-22.06 to branch-22.08 May 24, 2022 13:42
Copy link
Member

@madsbk madsbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, nice clean up @charlesbluca !

@github-actions
Copy link

github-actions bot commented Jul 2, 2022

This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.

@bdice bdice changed the base branch from branch-22.08 to branch-22.10 August 2, 2022 21:51
@bdice bdice removed this from PR-Needs review in v22.08 Release Aug 2, 2022
@bdice bdice added this to PR-WIP in v22.10 Release via automation Aug 2, 2022
@charlesbluca charlesbluca changed the base branch from branch-22.10 to branch-22.12 September 27, 2022 13:41
@caryr35 caryr35 removed this from PR-WIP in v22.10 Release Oct 18, 2022
@caryr35 caryr35 added this to PR-WIP in v22.12 Release via automation Oct 18, 2022
@vyasr
Copy link
Contributor

vyasr commented Nov 1, 2022

@charlesbluca it looks like this PR is just test improvements and was already approved. Any reason not to merge in the latest 22.12, make sure tests pass, then merge?

@charlesbluca
Copy link
Member Author

charlesbluca commented Nov 1, 2022

It was approved by @madsbk, who IIUC isn't a part of cudf-dask-codeowners; @rjzamora, if you get a chance, would you mind reviewing this (related to discussion we had around improving dask-cudf's groupby testing)?

Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor quibbles. I have a slight concern that the second deleted test reduces coverage of corner-cases, can you explain why not?

python/dask_cudf/dask_cudf/tests/test_groupby.py Outdated Show resolved Hide resolved
python/dask_cudf/dask_cudf/tests/test_groupby.py Outdated Show resolved Hide resolved
python/dask_cudf/dask_cudf/tests/test_groupby.py Outdated Show resolved Hide resolved
v22.12 Release automation moved this from PR-WIP to PR-Needs review Nov 2, 2022
@wence-
Copy link
Contributor

wence- commented Nov 3, 2022

@charlesbluca let me know when you think this is ready for another look (e.g. the discussion above is resolved one way or another).

@charlesbluca
Copy link
Member Author

charlesbluca commented Nov 3, 2022

@wence- this should be good for another round of review. The conclusion of the above discussion was to:

  • leave the cumulative testing as is, but add an xfail for the failing null cases (opened [BUG] Cumulative aggregation functions fail on dataframe/series groupbys with nulls #12055 to track this)
  • remove cumulative aggs from SUPPORTED_AGGS so that we don't use dask-cudf's codepath for them since it doesn't give the correct result; that being said, upstream Dask doesn't support passing cumulative aggs into .agg(), so will want to open an issue there to track adding support for this

Not sure if you think it would be worth it to add in tests for cumulative aggregations using .agg() now, knowing they would be failing, and linking to a related Dask issue?

Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two very minor things, but otherwise thanks, LGTM!

v22.12 Release automation moved this from PR-Needs review to PR-Reviewer approved Nov 3, 2022
@charlesbluca
Copy link
Member Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 17b6b2e into rapidsai:branch-22.12 Nov 7, 2022
v22.12 Release automation moved this from PR-Reviewer approved to Done Nov 7, 2022
@vyasr vyasr added dask Dask issue and removed dask-cudf labels Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team dask Dask issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API. tests Unit testing for project
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

6 participants