Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MG weighted k-means #3959

Merged
merged 25 commits into from
Jun 29, 2021

Conversation

lowener
Copy link
Contributor

@lowener lowener commented Jun 8, 2021

This PR adds support for MG weighted k-means and is a continuation of @akkamesh and @cjnolet work on PR #2126.

@lowener lowener requested review from a team as code owners June 8, 2021 07:41
@github-actions github-actions bot added CUDA/C++ Cython / Python Cython or Python issue labels Jun 8, 2021
@lowener lowener added Dask / cuml.dask Issue/PR related to Python level dask or cuml.dask features. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jun 8, 2021
@dantegd dantegd added this to PR-WIP in v21.08 Release via automation Jun 8, 2021
@dantegd dantegd added the 3 - Ready for Review Ready for review by team label Jun 8, 2021
Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for picking this one up. Overall it looks great but we do still have an issue to fix (see comment in the review).

@@ -620,6 +657,9 @@ void fit(const raft::handle_t &handle, const KMeansParams &params,
MLCommon::device_buffer<char> workspace(handle.get_device_allocator(),
stream);

// check if weights sum up to n_samples
checkWeights(handle, workspace, weight, stream);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great you pulled this over from cumlprims! IIRC, the one remaining issue should be that the single-GPU k-means normalizes the weights in predict, however that will cause the multi-gpu version to normalize each partition individually since it's embarrassingly parallel.

The weights are being normalized globally in the Dask-based predict but the single-GPU predict is going to re-normalize them locally. The more straightforward path to fixing this might be to have the C++ predict() function accept a normalize_weights argument which defaults to true but we can have the multi-GPU predict function flip it off. The goal here is to eliminate the need for predict() to use the comms, because then it would no longer be able to execute embarassingly parallel.

v21.08 Release automation moved this from PR-WIP to PR-Needs review Jun 9, 2021
@dantegd dantegd added 4 - Waiting on Author Waiting for author to respond to review and removed 3 - Ready for Review Ready for review by team labels Jun 10, 2021
@lowener
Copy link
Contributor Author

lowener commented Jun 22, 2021

rerun tests

@lowener lowener requested a review from cjnolet June 22, 2021 20:34
Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending CI

v21.08 Release automation moved this from PR-Needs review to PR-Reviewer approved Jun 23, 2021
@cjnolet
Copy link
Member

cjnolet commented Jun 23, 2021

rerun tests

1 similar comment
@dantegd
Copy link
Member

dantegd commented Jun 29, 2021

rerun tests

@dantegd
Copy link
Member

dantegd commented Jun 29, 2021

Docstring fix identified in CI:

Generating docs for compound /workspace/cpp/include/cuml/cluster/kmeans_mg.hpp:49: error: The following parameter of ML::kmeans::opg::fit(const raft::handle_t &handle, const KMeansParams &params, const float *X, int n_samples, int n_features, const float *sample_weight, float *centroids, float &inertia, int &n_iter) is not documented:

@dantegd
Copy link
Member

dantegd commented Jun 29, 2021

@gpucibot merge

@codecov-commenter
Copy link

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.08@3887e32). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.08    #3959   +/-   ##
===============================================
  Coverage                ?   85.46%           
===============================================
  Files                   ?      230           
  Lines                   ?    18116           
  Branches                ?        0           
===============================================
  Hits                    ?    15482           
  Misses                  ?     2634           
  Partials                ?        0           
Flag Coverage Δ
dask 48.11% <0.00%> (?)
non-dask 77.73% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3887e32...7e09369. Read the comment docs.

@rapids-bot rapids-bot bot merged commit 166667b into rapidsai:branch-21.08 Jun 29, 2021
v21.08 Release automation moved this from PR-Reviewer approved to Done Jun 29, 2021
@lowener lowener deleted the enh-ext-mg-weighted-kmeans branch June 29, 2021 22:10
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this pull request Oct 9, 2023
This PR adds support for MG weighted k-means and is a continuation of @akkamesh and @cjnolet work on PR rapidsai#2126.

Authors:
  - Micka (https://github.com/lowener)
  - https://github.com/akkamesh
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#3959
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 - Waiting on Author Waiting for author to respond to review CUDA/C++ Cython / Python Cython or Python issue Dask / cuml.dask Issue/PR related to Python level dask or cuml.dask features. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

5 participants