Accelerate adjacency matrix to CSR conversion for DBSCAN #4803

ahendriksen · 2022-07-06T17:12:28Z

Fixes issue #2387.

For large data sizes, the batch size of the DBSCAN algorithm is small in order to fit the distance matrix in memory.

This results in a matrix that has dimensions num_points x batch_size, both for the distance and adjacency matrix.

The conversion of the boolean adjacency matrix to CSR format is performed in the 'adjgraph' step. This step was slow when the batch size was small, as described in issue #2387.

In this commit, the adjgraph step is sped up. This is done in two ways:

The adjacency matrix is now stored in row-major batch_size x num_points format --- it was transposed before. This required changes in the vertexdeg step.
The csr_row_op kernel has been replaced by the adj_to_csr kernel. This kernel can divide the work over multiple blocks even when the number of rows (batch size) is small. It makes optimal use of memory bandwidth because rows of the matrix are laid out contiguously in memory.

ahendriksen · 2022-07-07T10:07:11Z

This PR is a follow up to PR #4776, which also involves DBSCAN.

tfeher

Thanks @ahendriksen for this work! I have marked a few issue that shall be addressed. Overall it looks good, and I am excited to see this improvement.

cpp/src/dbscan/adjgraph/algo.cuh

For large data sizes, the batch size of the DBSCAN algorithm is small in order to fit the distance matrix in memory. This results in a matrix that has dimensions num_points x batch_size, both for the distance and adjacency matrix. The conversion of the boolean adjacency matrix to CSR format is performed in the 'adjgraph' step. This step was slow when the batch size was small, as described in issue rapidsai#2387. In this commit, the adjgraph step is sped up. This is done in two ways: 1. The adjacency matrix is now stored in row-major batch_size x num_points format --- it was transposed before. This required changes in the vertexdeg step. 2. The csr_row_op kernel has been replaced by the adj_to_csr kernel. This kernel can divide the work over multiple blocks even when the number of rows (batch size) is small. It makes optimal use of memory bandwidth because rows of the matrix are laid out contiguously in memory.

Temporary commit to check if CI passes with new raft warp-aggregated atomics.

cjnolet

@ahendriksen overall this good to me, minus moving the csr_to_adj back over to RAFT. I also left a little reminder to update the raft pin now that the PR for the atomic utilty function has been moved.

cpp/cmake/thirdparty/get_raft.cmake

cpp/src/dbscan/adjgraph/algo.cuh

cjnolet · 2022-07-22T19:38:07Z

rerun tests

tfeher

The PR looks good to me (apart from removing adj_to_csr).

cjnolet · 2022-07-22T22:53:52Z

@gpucibot merge

codecov-commenter · 2022-07-22T23:20:44Z

Codecov Report

Merging #4803 (b38347b) into branch-22.08 (b5a48db) will increase coverage by 0.00%.
The diff coverage is n/a.

@@              Coverage Diff              @@
##           branch-22.08    #4803   +/-   ##
=============================================
  Coverage         77.62%   77.62%           
=============================================
  Files               180      180           
  Lines             11382    11384    +2     
=============================================
+ Hits               8835     8837    +2     
  Misses             2547     2547

Flag	Coverage Δ
dask	`45.52% <ø> (+<0.01%)`	⬆️
non-dask	`67.26% <ø> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
python/cuml/metrics/__init__.py	`100.00% <0.00%> (ø)`
python/cuml/metrics/cluster/__init__.py	`100.00% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b5a48db...b38347b. Read the comment docs.

cjnolet · 2022-07-22T23:21:11Z

@gpucibot merge

Fixes issue rapidsai#2387. For large data sizes, the batch size of the DBSCAN algorithm is small in order to fit the distance matrix in memory. This results in a matrix that has dimensions num_points x batch_size, both for the distance and adjacency matrix. The conversion of the boolean adjacency matrix to CSR format is performed in the 'adjgraph' step. This step was slow when the batch size was small, as described in issue rapidsai#2387. In this commit, the adjgraph step is sped up. This is done in two ways: 1. The adjacency matrix is now stored in row-major batch_size x num_points format --- it was transposed before. This required changes in the vertexdeg step. 2. The csr_row_op kernel has been replaced by the adj_to_csr kernel. This kernel can divide the work over multiple blocks even when the number of rows (batch size) is small. It makes optimal use of memory bandwidth because rows of the matrix are laid out contiguously in memory. Authors: - Allard Hendriksen (https://github.com/ahendriksen) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) - Tamas Bela Feher (https://github.com/tfeher) URL: rapidsai#4803

ahendriksen requested review from a team as code owners July 6, 2022 17:12

github-actions bot added CUDA/C++ Cython / Python Cython or Python issue labels Jul 6, 2022

tfeher self-assigned this Jul 6, 2022

ahendriksen added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jul 7, 2022

ahendriksen mentioned this pull request Jul 7, 2022

Fix #3414: remove naive versions dbscan algorithms #4804

Merged

tfeher requested changes Jul 7, 2022

View reviewed changes

ahendriksen force-pushed the fea-2387-improve-perf-dbscan-adj-csr branch from 167dfd6 to f68e6fa Compare July 8, 2022 14:32

github-actions bot removed the Cython / Python Cython or Python issue label Jul 8, 2022

ahendriksen mentioned this pull request Jul 8, 2022

Add warp-aggregated atomic increment rapidsai/raft#735

Merged

ahendriksen added the 0 - Blocked Cannot progress due to external reasons label Jul 8, 2022

ahendriksen added 2 commits July 12, 2022 19:00

DBSCAN: bandaid for raft fork

f6caeab

Temporary commit to check if CI passes with new raft warp-aggregated atomics.

ahendriksen force-pushed the fea-2387-improve-perf-dbscan-adj-csr branch from f68e6fa to f6caeab Compare July 12, 2022 17:02

ahendriksen requested a review from a team as a code owner July 12, 2022 17:02

github-actions bot added the CMake label Jul 12, 2022

cjnolet requested changes Jul 14, 2022

View reviewed changes

cpp/cmake/thirdparty/get_raft.cmake Outdated Show resolved Hide resolved

cpp/src/dbscan/adjgraph/algo.cuh Show resolved Hide resolved

ahendriksen mentioned this pull request Jul 18, 2022

Replace csr_adj_graph functions with faster equivalent rapidsai/raft#746

Merged

caryr35 added this to PR-WIP in v22.08 Release via automation Jul 19, 2022

Updating raft pin

b38347b

github-actions bot removed the CMake label Jul 22, 2022

cjnolet approved these changes Jul 22, 2022

View reviewed changes

v22.08 Release automation moved this from PR-WIP to PR-Reviewer approved Jul 22, 2022

tfeher approved these changes Jul 22, 2022

View reviewed changes

tfeher changed the title ~~[Feature] improve perf dbscan adj csr~~ Accelerate adjacency matrix to CSR conversion for DBSCAN Jul 22, 2022

rapids-bot bot merged commit 2fad4c0 into rapidsai:branch-22.08 Jul 22, 2022

v22.08 Release automation moved this from PR-Reviewer approved to Done Jul 22, 2022

ahendriksen mentioned this pull request Jul 25, 2022

Remove duplicate adj_to_csr implementation #4829

Merged

ahendriksen deleted the fea-2387-improve-perf-dbscan-adj-csr branch July 25, 2022 10:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accelerate adjacency matrix to CSR conversion for DBSCAN #4803

Accelerate adjacency matrix to CSR conversion for DBSCAN #4803

ahendriksen commented Jul 6, 2022 •

edited

Loading

ahendriksen commented Jul 7, 2022

tfeher left a comment

cjnolet left a comment

cjnolet commented Jul 22, 2022

tfeher left a comment

cjnolet commented Jul 22, 2022

codecov-commenter commented Jul 22, 2022

cjnolet commented Jul 22, 2022

Accelerate adjacency matrix to CSR conversion for DBSCAN #4803

Accelerate adjacency matrix to CSR conversion for DBSCAN #4803

Conversation

ahendriksen commented Jul 6, 2022 • edited Loading

ahendriksen commented Jul 7, 2022

tfeher left a comment

Choose a reason for hiding this comment

cjnolet left a comment

Choose a reason for hiding this comment

cjnolet commented Jul 22, 2022

tfeher left a comment

Choose a reason for hiding this comment

cjnolet commented Jul 22, 2022

codecov-commenter commented Jul 22, 2022

Codecov Report

cjnolet commented Jul 22, 2022

ahendriksen commented Jul 6, 2022 •

edited

Loading