Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accelerate adjacency matrix to CSR conversion for DBSCAN #4803

Conversation

ahendriksen
Copy link
Contributor

@ahendriksen ahendriksen commented Jul 6, 2022

Fixes issue #2387.

For large data sizes, the batch size of the DBSCAN algorithm is small in order to fit the distance matrix in memory.

This results in a matrix that has dimensions num_points x batch_size, both for the distance and adjacency matrix.

The conversion of the boolean adjacency matrix to CSR format is performed in the 'adjgraph' step. This step was slow when the batch size was small, as described in issue #2387.

In this commit, the adjgraph step is sped up. This is done in two ways:

  1. The adjacency matrix is now stored in row-major batch_size x num_points format --- it was transposed before. This required changes in the vertexdeg step.

  2. The csr_row_op kernel has been replaced by the adj_to_csr kernel. This kernel can divide the work over multiple blocks even when the number of rows (batch size) is small. It makes optimal use of memory bandwidth because rows of the matrix are laid out contiguously in memory.

@ahendriksen ahendriksen requested review from a team as code owners July 6, 2022 17:12
@github-actions github-actions bot added CUDA/C++ Cython / Python Cython or Python issue labels Jul 6, 2022
@tfeher tfeher self-assigned this Jul 6, 2022
@ahendriksen
Copy link
Contributor Author

This PR is a follow up to PR #4776, which also involves DBSCAN.

@ahendriksen ahendriksen added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jul 7, 2022
Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ahendriksen for this work! I have marked a few issue that shall be addressed. Overall it looks good, and I am excited to see this improvement.

cpp/src/dbscan/adjgraph/algo.cuh Outdated Show resolved Hide resolved
cpp/src/dbscan/adjgraph/algo.cuh Outdated Show resolved Hide resolved
cpp/src/dbscan/adjgraph/algo.cuh Outdated Show resolved Hide resolved
cpp/src/dbscan/adjgraph/algo.cuh Outdated Show resolved Hide resolved
cpp/src/dbscan/adjgraph/algo.cuh Outdated Show resolved Hide resolved
cpp/src/dbscan/adjgraph/algo.cuh Outdated Show resolved Hide resolved
cpp/src/dbscan/adjgraph/algo.cuh Outdated Show resolved Hide resolved
cpp/src/dbscan/adjgraph/algo.cuh Outdated Show resolved Hide resolved
@ahendriksen ahendriksen force-pushed the fea-2387-improve-perf-dbscan-adj-csr branch from 167dfd6 to f68e6fa Compare July 8, 2022 14:32
@github-actions github-actions bot removed the Cython / Python Cython or Python issue label Jul 8, 2022
@ahendriksen ahendriksen added the 0 - Blocked Cannot progress due to external reasons label Jul 8, 2022
For large data sizes, the batch size of the DBSCAN algorithm is small in
order to fit the distance matrix in memory.

This results in a matrix that has dimensions num_points x batch_size,
both for the distance and adjacency matrix.

The conversion of the boolean adjacency matrix to CSR format is
performed in the 'adjgraph' step. This step was slow when the batch size
was small, as described in issue rapidsai#2387.

In this commit, the adjgraph step is sped up. This is done in two ways:

1. The adjacency matrix is now stored in row-major batch_size x
   num_points format --- it was transposed before. This required changes
   in the vertexdeg step.

2. The csr_row_op kernel has been replaced by the adj_to_csr kernel.
   This kernel can divide the work over multiple blocks even when the
   number of rows (batch size) is small. It makes optimal use of memory
   bandwidth because rows of the matrix are laid out contiguously in
   memory.
Temporary commit to check if CI passes with new raft warp-aggregated
atomics.
@ahendriksen ahendriksen force-pushed the fea-2387-improve-perf-dbscan-adj-csr branch from f68e6fa to f6caeab Compare July 12, 2022 17:02
@ahendriksen ahendriksen requested a review from a team as a code owner July 12, 2022 17:02
@github-actions github-actions bot added the CMake label Jul 12, 2022
Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahendriksen overall this good to me, minus moving the csr_to_adj back over to RAFT. I also left a little reminder to update the raft pin now that the PR for the atomic utilty function has been moved.

cpp/cmake/thirdparty/get_raft.cmake Outdated Show resolved Hide resolved
cpp/src/dbscan/adjgraph/algo.cuh Show resolved Hide resolved
@cjnolet
Copy link
Member

cjnolet commented Jul 22, 2022

rerun tests

@github-actions github-actions bot removed the CMake label Jul 22, 2022
v22.08 Release automation moved this from PR-WIP to PR-Reviewer approved Jul 22, 2022
Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good to me (apart from removing adj_to_csr).

@tfeher tfeher changed the title [Feature] improve perf dbscan adj csr Accelerate adjacency matrix to CSR conversion for DBSCAN Jul 22, 2022
@cjnolet
Copy link
Member

cjnolet commented Jul 22, 2022

@gpucibot merge

@codecov-commenter
Copy link

Codecov Report

Merging #4803 (b38347b) into branch-22.08 (b5a48db) will increase coverage by 0.00%.
The diff coverage is n/a.

@@              Coverage Diff              @@
##           branch-22.08    #4803   +/-   ##
=============================================
  Coverage         77.62%   77.62%           
=============================================
  Files               180      180           
  Lines             11382    11384    +2     
=============================================
+ Hits               8835     8837    +2     
  Misses             2547     2547           
Flag Coverage Δ
dask 45.52% <ø> (+<0.01%) ⬆️
non-dask 67.26% <ø> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
python/cuml/metrics/__init__.py 100.00% <0.00%> (ø)
python/cuml/metrics/cluster/__init__.py 100.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b5a48db...b38347b. Read the comment docs.

@cjnolet
Copy link
Member

cjnolet commented Jul 22, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 2fad4c0 into rapidsai:branch-22.08 Jul 22, 2022
v22.08 Release automation moved this from PR-Reviewer approved to Done Jul 22, 2022
@ahendriksen ahendriksen deleted the fea-2387-improve-perf-dbscan-adj-csr branch July 25, 2022 10:00
jakirkham pushed a commit to jakirkham/cuml that referenced this pull request Feb 27, 2023
Fixes issue rapidsai#2387.

For large data sizes, the batch size of the DBSCAN algorithm is small in order to fit the distance matrix in memory.

This results in a matrix that has dimensions num_points x batch_size, both for the distance and adjacency matrix.

The conversion of the boolean adjacency matrix to CSR format is performed in the 'adjgraph' step. This step was slow when the batch size was small, as described in issue rapidsai#2387.

In this commit, the adjgraph step is sped up. This is done in two ways:

1. The adjacency matrix is now stored in row-major batch_size x num_points format --- it was transposed before. This required changes    in the vertexdeg step.

2. The csr_row_op kernel has been replaced by the adj_to_csr kernel.    This kernel can divide the work over multiple blocks even when the    number of rows (batch size) is small. It makes optimal use of memory    bandwidth because rows of the matrix are laid out contiguously in memory.

Authors:
  - Allard Hendriksen (https://github.com/ahendriksen)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)
  - Tamas Bela Feher (https://github.com/tfeher)

URL: rapidsai#4803
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0 - Blocked Cannot progress due to external reasons CUDA/C++ improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

4 participants