-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Add Fused L2 Expanded KNN kernel #339
[REVIEW] Add Fused L2 Expanded KNN kernel #339
Conversation
…n higher dimensions than L2 unexpanded version
…encountered, then the current atomicCAS based implementation
…ed. make function namings consistent
…licit fusedL2KNN function call
@cjnolet I've now reverted ball cover tests to use |
…n, make customAtomicMax float only by removing the template as it is float specific function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changeds look great overall. Mostly minor/mechanical thing but we still need explicit gtests for these like we've done w/ other knn primitives that don't just proxy to FAISS (such as ball cover and haversine knn).
cpp/test/spatial/ball_cover.cu
Outdated
@@ -80,40 +78,6 @@ uint32_t count_discrepancies(value_idx *actual_idx, value_idx *expected_idx, | |||
return result; | |||
} | |||
|
|||
template <typename value_t> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reverting this! Though we no longer need to invoke the fused knn directly, I do still benefit to keeping the additional helper function so it's more simple to change the bfknn call across all the gtests in the future.
std::vector<float *> input_vec = {d_train_inputs.data()}; | ||
std::vector<uint32_t> sizes_vec = {n}; | ||
|
||
compute_bfknn(handle, d_train_inputs.data(), d_train_inputs.data(), n, d, k, | ||
metric, d_ref_D.data(), d_ref_I.data()); | ||
raft::spatial::knn::detail::brute_force_knn_impl<uint32_t, int64_t>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we have a knn that's not just proxying down to faiss, we should be gtesting it accordingly, similar to what's being done w/ the haversine and ball cover gtests. It's also important going forward because RAFT is beginning to get used by more projects and thus the impact of breaking tests is more than just cuml.
My suggestion is to test l2_unexpanded_knn
and l2_expanded_knn
directly in the gtests and then we can test the brute_force_knn
more generally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
certainly a rigorous tests within RAFT are needed for them, I'll add gtests for them separately, now instead of l2_unexpanded_knn
and l2_expanded_knn
we have single entry fusedL2Knn
function for both of them.
For these kernel I relied on cuML cpp knn tests & pytests so far which is no longer correct as you rightly mention it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this PR is still WIP, working on adding tests and some fixes needed due to API changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a new test tests\spatial\fused_l2_knn.cu
is added which does testing for both L2 exp/unexp cases which compares its output with faiss bfknn call.
I've also polished the fp32 atomicMax device function which I believe is more faster than atomicCAS based version and also takes care of NaNs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the delay in updating this PR with the unit test
…o separate function which is now part of fused_l2_knn.cuh
Can one of the admins verify this patch? |
add to allowlist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't affect cugraph
…ith same distance value exists and faiss picks one vs fusedL2KNN another, so we verify both vec index as well as distance val
@cjnolet can this PR get merged? |
@cjnolet I see this PR is marked for v22.02 so you'll be auto merging it or there is additional action required from my side? |
@mdoijade, I scraped through the PRs a couple weeks ago and aligned them to expected releases. Looking back through my reivew, I'm really happy for the new tests but it looks like there are still a couple (very minor) things to address. |
@cjnolet I believe now I have addressed all the points in this PR, the build failure is coming from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@gpucibot merge |
-- adds fused L2 expanded kNN kernel, this is faster by at least 20-25% on higher dimensions (D >= 128) than L2 unexpanded version.
-- also on smaller dimension (D <=32) L2 expanded is always faster by 10-15%
-- slight improvement in updateSortedWarpQ device function by reducing redundant instruction.
-- Fix incorrect output for NN >32 case when taking prod-cons knn merge path, this was caught in HDBSCAN pytest.