[FEA] Batching NN Descent #2403

jinsolp · 2024-07-31T02:49:30Z

Description

This PR implements batching NN Descent. It will be helpful for reducing device memory usages for large datasets (specifically if the dataset is kept on host).
index_params now has...

n_clusters: number of clusters to make. Larger clusters reduce device memory usage. Default is 1, in which case it doesn't do the batched NND.

Notes

The batching approach may have duplicate indices in the knn graph (in rare cases) because sometimes distances calculated for the same pair may be slightly different. This results in putting the same index far apart after sorting by distances, making it difficult to get unique indices (which is done by looking at 2 indices before the current one).
- handled by adding a max_duplicates for check_unique_indices in tests

Benchmarks

Dataset for NND (no batch and batch) is on host
Dataset for brute force knn is on device (but still won't be able to run with large datasets even if the data is put on the host because it brings the entire dataset to device anyway)
The dataset is just a slice of the wiki-all dataset (88M, 768) to test for different sizes

cpp/include/raft/neighbors/detail/nn_descent.cuh

cpp/include/raft/neighbors/detail/nn_descent_batch.cuh

cpp/include/raft/neighbors/detail/nn_descent.cuh

cpp/include/raft/neighbors/detail/nn_descent_batch.cuh

cjnolet

So sorry for being so late to review this @jinsolp. I think this is very close, but in order for this to be useful to users, we should make it as easy as possible to use and document very clearly what the algorithms does (and why a user would want to use it).

cpp/include/raft/neighbors/detail/nn_descent.cuh

cpp/include/raft/neighbors/detail/nn_descent_batch.cuh

cpp/test/neighbors/ann_nn_descent/test_batch_float_uint32_t.cu

cpp/include/raft/neighbors/detail/nn_descent.cuh

cpp/include/raft/neighbors/nn_descent_types.hpp

cpp/include/raft/neighbors/nn_descent.cuh

cpp/include/raft/neighbors/nn_descent_types.hpp

cjnolet · 2024-08-23T00:17:26Z

/merge

adds the following parameters as part of the `build_kwds` - `n_clusters`: number of clusters to use when batching. Larger number of clusters reduce GPU memory usage. Defaults to 1 (no batch) Results showing consistent trustworthiness scores for doing/not doing batching. Also note below that now UMAP can run with datasets that don't fit on the GPU. Putting the dataset on host and enabling the batching method allows UMAP to run with a dataset that is 50M x 768 (153GB). <img width="709" alt="Screenshot 2024-08-13 at 5 55 27 PM" src="https://github.com/user-attachments/assets/39263583-4ffc-4b0b-886f-f1b0f21a99be"> ### Notes [This PR in raft](rapidsai/raft#2403) needs to be merged before this PR Authors: - Jinsol Park (https://github.com/jinsolp) Approvers: - Corey J. Nolet (https://github.com/cjnolet) - Dante Gama Dessavre (https://github.com/dantegd) URL: #6022

nn descent batch

b7d8d2c

jinsolp requested review from a team as code owners July 31, 2024 02:49

github-actions bot added cpp CMake labels Jul 31, 2024

jinsolp and others added 4 commits July 31, 2024 16:26

remove print statements

68e3081

Merge branch 'rapidsai:branch-24.10' into batch-nnd

acaed4e

fix debug log

4175a0f

trigger CI

581db66

jinsolp changed the title ~~[FEA] Batching NN Descent~~ [WIP] Batching NN Descent Aug 1, 2024

jinsolp and others added 20 commits August 1, 2024 23:38

optimize memory usage

39c44f8

trigger CI

38f1185

remove printf

91bb1d2

change tests

f4ac2d4

Trigger CI

294f493

Trigger CI

5939638

Merge branch 'rapidsai:branch-24.10' into batch-nnd

dd55dd5

change to merge sort and add DistEpilogue base struct

15fcc39

Merge branch 'rapidsai:branch-24.10' into batch-nnd

6155dec

to check nnd is problem

45f38cd

optimize copying and fix overflow

1c81be9

change to raft managed types

35e5617

remove unused code

3d05246

for CI debugging

74cdf6a

debugging CI purposes

b3a8590

for CI debugging purposes

d722997

for debugging purposes

7c7817c

last check

f0f0909

remove all print statements

0ee8553

remove print

53bf47c

jinsolp added 8 commits August 18, 2024 06:07

better usage of memory

de63ee3

Trigger CI

9cf61af

Trigger CI

8065dfe

Trigger CI

f85d196

Trigger CI

b8a41b0

Trigger CI

3d29bbc

Trigger CI

d0a4011

Trigger CI

ed29707

divyegala requested changes Aug 19, 2024

View reviewed changes

shared mem size + mdspan accessor

e31d044

divyegala reviewed Aug 20, 2024

View reviewed changes

cpp/include/raft/neighbors/detail/nn_descent.cuh Outdated Show resolved Hide resolved

cpp/include/raft/neighbors/detail/nn_descent_batch.cuh Outdated Show resolved Hide resolved

jinsolp added 3 commits August 20, 2024 06:32

std algo and rmm::exec_policy_nosync

aa6979d

remove macro

3e429c1

Trigger CI

b9903bc

jinsolp requested a review from divyegala August 20, 2024 20:01

divyegala approved these changes Aug 21, 2024

View reviewed changes

jinsolp changed the title ~~[FEA] Batching NN Descent~~ [DO NOT MERGE] Batching NN Descent Aug 21, 2024

cjnolet requested changes Aug 22, 2024

View reviewed changes

jinsolp and others added 5 commits August 22, 2024 01:24

address comments

c7613a3

Remove public api for testing

0b6a370

Trigger CI

b872901

Merge branch 'rapidsai:branch-24.10' into batch-nnd

f08335c

Merge branch 'rapidsai:branch-24.10' into batch-nnd

9b1c509

jinsolp requested a review from cjnolet August 22, 2024 18:52

cjnolet approved these changes Aug 22, 2024

View reviewed changes

jinsolp mentioned this pull request Aug 22, 2024

[FEA] Enable HDBSCAN to build knn graph using NN Descent rapidsai/cuml#5939

Open

jinsolp changed the title ~~[DO NOT MERGE] Batching NN Descent~~ [FEA] Batching NN Descent Aug 22, 2024

rapids-bot bot merged commit 2f587b1 into rapidsai:branch-24.10 Aug 23, 2024
57 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Batching NN Descent #2403

[FEA] Batching NN Descent #2403

jinsolp commented Jul 31, 2024 •

edited

Loading

cjnolet left a comment

cjnolet commented Aug 23, 2024

[FEA] Batching NN Descent #2403

[FEA] Batching NN Descent #2403

Conversation

jinsolp commented Jul 31, 2024 • edited Loading

Description

Notes

Benchmarks

cjnolet left a comment

Choose a reason for hiding this comment

cjnolet commented Aug 23, 2024

jinsolp commented Jul 31, 2024 •

edited

Loading