Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Batching NN Descent #2403

Merged
merged 49 commits into from
Aug 23, 2024
Merged

[FEA] Batching NN Descent #2403

merged 49 commits into from
Aug 23, 2024

Conversation

jinsolp
Copy link
Contributor

@jinsolp jinsolp commented Jul 31, 2024

Description

This PR implements batching NN Descent. It will be helpful for reducing device memory usages for large datasets (specifically if the dataset is kept on host).
index_params now has...

  • n_clusters: number of clusters to make. Larger clusters reduce device memory usage. Default is 1, in which case it doesn't do the batched NND.

Notes

  • The batching approach may have duplicate indices in the knn graph (in rare cases) because sometimes distances calculated for the same pair may be slightly different. This results in putting the same index far apart after sorting by distances, making it difficult to get unique indices (which is done by looking at 2 indices before the current one).
    • handled by adding a max_duplicates for check_unique_indices in tests

Benchmarks

  • Dataset for NND (no batch and batch) is on host
  • Dataset for brute force knn is on device (but still won't be able to run with large datasets even if the data is put on the host because it brings the entire dataset to device anyway)
  • The dataset is just a slice of the wiki-all dataset (88M, 768) to test for different sizes
Screenshot 2024-08-02 at 8 35 58 AM

@jinsolp jinsolp changed the title [FEA] Batching NN Descent [WIP] Batching NN Descent Aug 1, 2024
cpp/include/raft/neighbors/detail/nn_descent.cuh Outdated Show resolved Hide resolved
cpp/include/raft/neighbors/detail/nn_descent_batch.cuh Outdated Show resolved Hide resolved
cpp/include/raft/neighbors/detail/nn_descent_batch.cuh Outdated Show resolved Hide resolved
cpp/include/raft/neighbors/detail/nn_descent_batch.cuh Outdated Show resolved Hide resolved
cpp/include/raft/neighbors/detail/nn_descent_batch.cuh Outdated Show resolved Hide resolved
cpp/include/raft/neighbors/detail/nn_descent_batch.cuh Outdated Show resolved Hide resolved
@jinsolp jinsolp requested a review from divyegala August 20, 2024 20:01
@jinsolp jinsolp changed the title [FEA] Batching NN Descent [DO NOT MERGE] Batching NN Descent Aug 21, 2024
Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So sorry for being so late to review this @jinsolp. I think this is very close, but in order for this to be useful to users, we should make it as easy as possible to use and document very clearly what the algorithms does (and why a user would want to use it).

cpp/include/raft/neighbors/detail/nn_descent.cuh Outdated Show resolved Hide resolved
cpp/include/raft/neighbors/detail/nn_descent_batch.cuh Outdated Show resolved Hide resolved
cpp/include/raft/neighbors/detail/nn_descent.cuh Outdated Show resolved Hide resolved
cpp/include/raft/neighbors/nn_descent_types.hpp Outdated Show resolved Hide resolved
cpp/include/raft/neighbors/nn_descent.cuh Outdated Show resolved Hide resolved
cpp/include/raft/neighbors/nn_descent_types.hpp Outdated Show resolved Hide resolved
@jinsolp jinsolp requested a review from cjnolet August 22, 2024 18:52
@jinsolp jinsolp changed the title [DO NOT MERGE] Batching NN Descent [FEA] Batching NN Descent Aug 22, 2024
@cjnolet
Copy link
Member

cjnolet commented Aug 23, 2024

/merge

@rapids-bot rapids-bot bot merged commit 2f587b1 into rapidsai:branch-24.10 Aug 23, 2024
57 checks passed
rapids-bot bot pushed a commit to rapidsai/cuml that referenced this pull request Aug 23, 2024
adds the following parameters as part of the `build_kwds`
- `n_clusters`: number of clusters to use when batching. Larger number of clusters reduce GPU memory usage. Defaults to 1 (no batch)

Results showing consistent trustworthiness scores for doing/not doing batching.

Also note below that now UMAP can run with datasets that don't fit on the GPU. Putting the dataset on host and enabling the batching method allows UMAP to run with a dataset that is 50M x 768 (153GB).

<img width="709" alt="Screenshot 2024-08-13 at 5 55 27 PM" src="https://github.com/user-attachments/assets/39263583-4ffc-4b0b-886f-f1b0f21a99be">


### Notes
[This PR in raft](rapidsai/raft#2403) needs to be merged before this PR

Authors:
  - Jinsol Park (https://github.com/jinsolp)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #6022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake cpp improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
Development

Successfully merging this pull request may close these issues.

3 participants