Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Enable covariance/correlation in dask_cudf #3393

Merged
merged 6 commits into from
Nov 19, 2019

Conversation

rjzamora
Copy link
Member

@rjzamora rjzamora commented Nov 15, 2019

Closes #3363

In order to use the upstream dask covariance/correlation implementation, a few changes are needed in cudf/dask_cudf:

  1. concat_cudf must support axis=1 (already supported in cudf, so this is a trivial change)
  2. cudf must define a dataframe-level cov method

This PR addresses both of these changes. However, (2) is only addressed with a cupy-based implementation (which does not handle dataframes with null values).

TODO Before Merging:

@rjzamora rjzamora requested review from a team as code owners November 15, 2019 19:40
@codecov
Copy link

codecov bot commented Nov 15, 2019

Codecov Report

Merging #3393 into branch-0.11 will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@              Coverage Diff               @@
##           branch-0.11   #3393      +/-   ##
==============================================
+ Coverage         87.3%   87.3%   +<.01%     
==============================================
  Files               49      49              
  Lines             9214    9218       +4     
==============================================
+ Hits              8044    8048       +4     
  Misses            1170    1170
Impacted Files Coverage Δ
python/cudf/cudf/core/dataframe.py 93.48% <100%> (+0.02%) ⬆️
python/dask_cudf/dask_cudf/backends.py 94.54% <100%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3722385...6548477. Read the comment docs.

@rjzamora rjzamora self-assigned this Nov 18, 2019
@rjzamora rjzamora changed the title [WIP] Enable covariance/correlation in dask_cudf [REVIEW] Enable covariance/correlation in dask_cudf Nov 18, 2019
Copy link
Member

@quasiben quasiben left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With dask/dask#5597 merged in, I think this is good

@rjzamora rjzamora merged commit 3ef5863 into rapidsai:branch-0.11 Nov 19, 2019
@rjzamora rjzamora deleted the correlation branch November 21, 2019 18:33
@vyasr vyasr added 4 - Needs Review Waiting for reviewer to review or respond and removed 4 - Needs cuDF (Python) Reviewer labels Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team 4 - Needs Review Waiting for reviewer to review or respond
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] dask-cudf doesn't support "corr"/correlation function like Pandas and cuDF
4 participants