Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hamming, jensen-shannon, kl-divergence, correlation and russellrao distance metrics #4155

Merged
merged 23 commits into from
Sep 14, 2021

Conversation

mdoijade
Copy link
Contributor

-- This PR depends on RAFT PR - rapidsai/raft#306
-- Adds cpp & python interfaces for these distance metrics with pytest support for each of them.
-- also remove redundant commented code in canberra distance metric

@mdoijade mdoijade requested review from a team as code owners August 11, 2021 15:08
@github-actions github-actions bot added CMake CUDA/C++ Cython / Python Cython or Python issue labels Aug 11, 2021
@mdoijade
Copy link
Contributor Author

@dantegd the cuml build looks to be broken due to this error below, is there a PR which is working on fixing this?
This blocks my PR.
@teju85 @cjnolet for help with reviewing this cuML side PR.

https://gpuci.gpuopenanalytics.com/job/rapidsai/job/gpuci/job/cuml/job/prb/job/cuml-cpu-cuda-build/CUDA=11.2/1452/console
3:31:26 ../bench/prims/gram_matrix.cu(61): error: identifier "CUBLAS_CHECK_NO_THROW" is undefined
23:31:26 detected during:
23:31:26 instantiation of "MLCommon::Bench::Matrix::GramMatrix::GramMatrix(const std::string &, const MLCommon::Bench::Matrix::GramTestParams &) [with T=float]"
23:31:26 ../bench/common/ml_benchmark.hpp(197): here
23:31:26 instantiation of "MLCommon::Bench::internal::Registrar<Params, Class>::Registrar(const std::vector<Params, std::allocator> &, const std::string &, const std::string &) [with Params=MLCommon::Bench::Matrix::GramTestParams, Class=MLCommon::Bench::Matrix::GramMatrix]"
23:31:26 (137): here
23:31:26
23:31:26 1 error detected in the compilation of "../bench/prims/gram_matrix.cu".

@mdoijade
Copy link
Contributor Author

seems that this PR introduced this build failure - e977f3e
@harrism

@harrism
Copy link
Member

harrism commented Aug 12, 2021

@mdoijade CUBLAS_CHECK_NO_THROW was enabled by rapidsai/raft#311, on which the PR linked depended. That PR had to be merged before e977f3e would pass CI.

@mdoijade
Copy link
Contributor Author

@mdoijade CUBLAS_CHECK_NO_THROW was enabled by rapidsai/raft#311, on which the PR linked depended. That PR had to be merged before e977f3e would pass CI.

I see. thanks! merged it in my RAFT PR now.

@mdoijade
Copy link
Contributor Author

@dantegd all the test runs are failing with some CI error - https://gpuci.gpuopenanalytics.com/job/rapidsai/job/gpuci/job/cuml/job/prb/job/cuml-cpu-python-build/CUDA=11.2,PYTHON=3.7/1239/console

Also can we merge the raft PR now, as I believe it shouldn't break cuml builds as here all builds looks fine ? - rapidsai/raft#306

@GPUtester
Copy link
Contributor

Can one of the admins verify this patch?

@mdoijade
Copy link
Contributor Author

@dantegd all the test runs are failing with some CI error, is it due to using custom raft fork ?

https://gpuci.gpuopenanalytics.com/job/rapidsai/job/gpuci/job/cuml/job/prb/job/cuml-gpu-test/CUDA=11.0,GPU_LABEL=gpu-a100,LINUX_VER=centos7,PYTHON=3.7/1254/console

00:29:19 Cloning into '_external_repositories/raft'...
00:29:20 error: pathspec 'branch-21.10' did not match any file(s) known to git
00:29:20 -- CUDA_HOME detected with value: /usr/local/cuda
00:29:20 -- CUML_BUILD_PATH environment variable not set.
00:29:20 -- RAFT_PATH environment variable not set.
00:29:20 -- Third party repositories have not been found so they will be cloned. To avoid this set the environment variable CUML_BUILD_PATH, containing the absolute path to the build folder where libcuml++ was built.
00:29:20 Cloning repository raft into _external_repositories/raft
00:29:20 Traceback (most recent call last):
00:29:20 File "setup.py", line 108, in
00:29:20 raft_include_dir = use_raft_package(raft_path, libcuml_path)
00:29:20 File "/workspace/python/setuputils.py", line 100, in use_raft_package
00:29:20 git_info_file=git_info_file)
00:29:20 File "/workspace/python/setuputils.py", line 132, in clone_repo_if_needed
00:29:20 git_info_file=git_info_file)
00:29:20 File "/workspace/python/setuputils.py", line 193, in get_submodule_dependency
00:29:20 clone_repo(repo, repo_info[repo][0], repo_info[repo][1])
00:29:20 File "/workspace/python/setuputils.py", line 230, in clone_repo
00:29:20 GIT_TAG])
00:29:20 File "/opt/conda/envs/rapids/lib/python3.7/subprocess.py", line 363, in check_call
00:29:20 raise CalledProcessError(retcode, cmd)
00:29:20 subprocess.CalledProcessError: Command '['git', 'checkout', 'branch-21.10']' returned non-zero exit status 1.
00:29:20 Build step 'Execute shell' marked build as failure
00:29:21 Recording test results
00:29:24 [Checks API] No suitable checks publisher found.

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a question and a reminder to change the raft branch back, otherwise this looks good.

GIT_REPOSITORY https://github.com/${PKG_FORK}/raft.git
GIT_TAG ${PKG_PINNED_TAG}
GIT_REPOSITORY https://github.com/mdoijade/raft.git
GIT_TAG additionalDistPrims
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder to update this now that the raft side has been merged


// Call the distance function
switch (metric) {
case raft::distance::DistanceType::CorrelationExpanded:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the switch state propagated into this in case there is also an unexpanded version in the future? (Asking for the other distances as well)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No there is no unexpanded version planned for this or other distance metrics. should I replace switch() with an if condition ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These functions are named after the associated distance measure so I would call a function named pairwise_distance_correlation expecting it to compute the correlation distance. If it's not possible that this will ever compute a distance other than correlation, why are we passing the DistanceType into these functions at all? Rather than using an if condition, I would propose removing the metric argument altogether and just having the function compute the distance for which its name implies. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I think the metric arg got propagated to all distances due to euclidean dist having multiple sub-implementations.
I've removed this redundant check & arg now for all non-euclidean metrics which doesn't have multiple implementations.

@mdoijade
Copy link
Contributor Author

@teju85 can I get a rerun of tests here?
PR is hitting some CI errors below,

json.decoder.JSONDecodeError: Unterminated string starting at: line 2407838 column 18 (char 72712781)

Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/conda/core/subdir_data.py", line 705, in fetch_repodata_remote_request
resp.raise_for_status()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/requests/models.py", line 953, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 504 Server Error: Gateway Time-out for url: https://conda.anaconda.org/rapidsai/linux-64/repodata.json

@teju85 teju85 added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Aug 27, 2021
@teju85
Copy link
Member

teju85 commented Aug 27, 2021

rerun tests

…contains single distance metric implementation
Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Majesh. The changes look great! Just pending CI now.

@mdoijade
Copy link
Contributor Author

mdoijade commented Sep 1, 2021

@dantegd before merging branch-21.10 recent changes in my PR all pytests related to new distance metrics were passing,
but since inclusion of I guess this change 903d576
It has started failing in CI.
locally I am unable to reproduce these failures using recent raft branch-21.10 and changes from this cuML PR.
I see that in CI results all the values from raft kernel is getting reported as 0's so I guess the gpu kernel is not getting called, but I am not sure why.
@viclafargue to help check the changes done to pairwise_distance_correlation.cu in this PR if it is as expected.

@teju85 for visibility.

@viclafargue
Copy link
Contributor

viclafargue commented Sep 6, 2021

@dantegd before merging branch-21.10 recent changes in my PR all pytests related to new distance metrics were passing,
but since inclusion of I guess this change 903d576
It has started failing in CI.
locally I am unable to reproduce these failures using recent raft branch-21.10 and changes from this cuML PR.
I see that in CI results all the values from raft kernel is getting reported as 0's so I guess the gpu kernel is not getting called, but I am not sure why.
@viclafargue to help check the changes done to pairwise_distance_correlation.cu in this PR if it is as expected.

@teju85 for visibility.

Sorry for the late reply, I was on vacation. Thank you for working on this. I took a look at this PR and the RAFT changes. It's hard to tell at the moment which change might cause the issue. However, I could reproduce the issue locally. For this, you may need to either update your environment or to remove RAFT package from your conda env (get_rmm.cmake file won't be used for cpp if RAFT is present in env). I also noticed that the final commit in rapidsai/raft#306 containing the merge doesn't appear to be fully CI approved. Has it been tested some other way? Sorry, that's the RAFT CI issue.

@viclafargue
Copy link
Contributor

Seems like the issue comes from the libcumlprims package that is shipped with cmake files that refer to an older version of RAFT. The changes in RAFT that go hand-in-hand with this PR are absent. cc @mdoijade @dantegd

@caryr35 caryr35 added this to PR-WIP in v21.10 Release via automation Sep 8, 2021
@dantegd
Copy link
Member

dantegd commented Sep 8, 2021

rerun tests

@caryr35 caryr35 moved this from PR-WIP to PR-Reviewer approved in v21.10 Release Sep 8, 2021
@codecov-commenter
Copy link

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.10@0e770fa). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.10    #4155   +/-   ##
===============================================
  Coverage                ?   85.98%           
===============================================
  Files                   ?      231           
  Lines                   ?    18515           
  Branches                ?        0           
===============================================
  Hits                    ?    15920           
  Misses                  ?     2595           
  Partials                ?        0           
Flag Coverage Δ
dask 47.31% <0.00%> (?)
non-dask 78.59% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0e770fa...6427584. Read the comment docs.

@mdoijade
Copy link
Contributor Author

can this PR get merged now?

@dantegd dantegd added feature request New feature or request and removed improvement Improvement / enhancement to an existing function labels Sep 14, 2021
@dantegd
Copy link
Member

dantegd commented Sep 14, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 55ddc8b into rapidsai:branch-21.10 Sep 14, 2021
v21.10 Release automation moved this from PR-Reviewer approved to Done Sep 14, 2021
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this pull request Oct 9, 2023
…o distance metrics (rapidsai#4155)

-- This PR depends on RAFT PR - rapidsai/raft#306
-- Adds cpp & python interfaces for these distance metrics with pytest support for each of them.
-- also remove redundant commented code in canberra distance metric

Authors:
  - Mahesh Doijade (https://github.com/mdoijade)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#4155
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake CUDA/C++ Cython / Python Cython or Python issue feature request New feature or request non-breaking Non-breaking change
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

8 participants