Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MaskedL2NN #838

Merged
merged 57 commits into from
Jan 27, 2023
Merged

Conversation

ahendriksen
Copy link
Contributor

This PR adds the sparseL2NN functionality.

This enables faster computing pairwise distances by making use of sparsity in the problem: the computation of distances between point pairs can be skipped.

The sparsity between arrays of points X and Y is expressed as follows:

  • X is split into rows (points)
  • Y is split into contiguous groups of points (i.e. all points in a group are adjacent)
  • A boolean adjacency matrix indicates for each row of X and each group in Y whether to compute the distance.

To speed up computation, the adjacency matrix is compressed into a bitfield.
To ensure competitive speeds, the caller must make sure that consecutive rows in X are adjacent to the same groups in Y (as much as possible) to enable efficient skipping in the kernel.

Some work is still TODO:

  • Flesh out documentation
  • Discuss / remove allocation of intermediate array
  • Optimize for skinny matrices by using a different KernelPolicy.

@ahendriksen ahendriksen requested review from a team as code owners September 22, 2022 13:16
@ahendriksen ahendriksen added enhancement New feature or request 2 - In Progress Currenty a work in progress non-breaking Non-breaking change and removed CMake labels Sep 22, 2022
@github-actions github-actions bot added the CMake label Oct 5, 2022
@cjnolet
Copy link
Member

cjnolet commented Nov 9, 2022

@ahendriksen, it's getting very close to burndown for 22.12. Mind if we bump this forward to 23.02? Do you think you might have something working by then?

@ahendriksen
Copy link
Contributor Author

ahendriksen commented Nov 10, 2022 via email

@cjnolet
Copy link
Member

cjnolet commented Dec 5, 2022

Hey @ahendriksen, we are still really excited about this feature. I was looking through your changes a bit mostly to see where we are with it. I have a few review items but I've held off from submitting the review until you feel this is in a good place to do so.

To ensure competitive speeds, the caller must make sure that consecutive rows in X are adjacent to the same groups in Y (as much as possible) to enable efficient skipping in the kernel.

This won't be easy to do in an algorithm like HDBSCAN without modifying the user's input data, though I understand the importance of memory co-location like coalescing the reads. The pattern that this PR solves pops up quite a bit, though. Do you know the expected perf hit from not co-locating the data points w/ the groups? Is it pretty much all from coalescing?

@ahendriksen
Copy link
Contributor Author

Hi @cjnolet,

Feel free to add your comments!

To ensure competitive speeds, the caller must make sure that consecutive rows in X are adjacent to the same groups in Y (as much as possible) to enable efficient skipping in the kernel.

This won't be easy to do in an algorithm like HDBSCAN without modifying the user's input data, though I understand the importance of memory co-location like coalescing the reads. The pattern that this PR solves pops up quite a bit, though. Do you know the expected perf hit from not co-locating the data points w/ the groups? Is it pretty much all from coalescing?

Coalescing is absolutely necessary.
I ran a representative benchmark where roughly 99% of distance calculations could be skipped. In the theoretical best case, sparseL2NN should have taken 1% of the time of fusedL2NN. In practice, the relative performance was

  • ~98% without coalescing (sorting input points);
  • ~10% with coalescing.

There will be a cut off point before which it does not make sense to sort the inputs. However, since the number of distance calculations scales quadratically with the number of input points, sorting will be worth it especially for medium to large data sizes.

As you suggest, the sorting may even be done in place and reverted after the sparseL2NN call (rather than allocating a separate buffer with sorted inputs).

@ahendriksen ahendriksen requested review from a team as code owners January 11, 2023 13:33
@ahendriksen ahendriksen changed the base branch from branch-22.10 to branch-23.02 January 11, 2023 13:34
@ahendriksen ahendriksen added the improvement Improvement / enhancement to an existing function label Jan 11, 2023
@github-actions github-actions bot removed the python label Jan 11, 2023
@ajschmidt8
Copy link
Member

Please use GitHub's Draft PR feature instead of WIP tags in the future. Draft PRs have the benefit of preventing notifications to codeowners until PRs are marked Ready for Review, which helps cut down on excessive notifications for PRs that are still being worked on. CI will still run on Draft PRs.

Some useful information about Draft PRs:

@ajschmidt8 ajschmidt8 marked this pull request as draft January 11, 2023 14:26
Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Allard for addressing the issues! The PR looks good to me!

@cjnolet
Copy link
Member

cjnolet commented Jan 25, 2023

@ahendriksen , the PR looks great. I have one additional ask, though. Can you create Github issues for these remaining items just so it stays on our radar?

I did not get around to:

Exposing compress_to_bits in raft/utils.
Adding an overload the accepts a bitfield.

@cjnolet
Copy link
Member

cjnolet commented Jan 26, 2023

@ahendriksen one thing that I've noticed is that this PR and #837 both seem to be consistently timing out in CI while there are some other PRs that don't seem to be timing out. This makes me a little nervous about immediately merging these right before burndown. In case this behavior is isolated to this PR, I want to be careful not to cause issues for other PRs during this release.

We still have some time before code freeze to merge these, but it would help if you are able to verify locally that the build time (and ideally memory requirement during building) doesn't seem to be drastically affected by these changes. Maybe compare the build time for your changes against branch-23.02 just to see? It could be something super simple- like a pre-compiled template instantiation might no longer be getting used which could be causing things to get re-compiled more times from scratch.

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Holding off on merging right away so we can investigate the CI timeouts.

@cjnolet
Copy link
Member

cjnolet commented Jan 26, 2023

@ahendriksen can you please update this PR to accept raft::device_resources everywhere instead of raft::handle_t?

@ahendriksen
Copy link
Contributor Author

@ahendriksen can you please update this PR to accept raft::device_resources everywhere instead of raft::handle_t?

Done.

@ahendriksen
Copy link
Contributor Author

@ahendriksen , the PR looks great. I have one additional ask, though. Can you create Github issues for these remaining items just so it stays on our radar?

I have created issues:

@cjnolet
Copy link
Member

cjnolet commented Jan 27, 2023

/merge

@rapids-bot rapids-bot bot merged commit 2fb5c06 into rapidsai:branch-23.02 Jan 27, 2023
cjnolet added a commit to cjnolet/raft that referenced this pull request Feb 2, 2023
@ahendriksen ahendriksen deleted the enh-sparse-l2-nn branch March 17, 2023 09:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - In Progress Currenty a work in progress CMake cpp enhancement New feature or request gpuCI improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
Development

Successfully merging this pull request may close these issues.

4 participants