Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Add Fused L2 Expanded KNN kernel #339

Merged
merged 29 commits into from
Nov 23, 2021

Conversation

mdoijade
Copy link
Contributor

@mdoijade mdoijade commented Sep 27, 2021

-- adds fused L2 expanded kNN kernel, this is faster by at least 20-25% on higher dimensions (D >= 128) than L2 unexpanded version.
-- also on smaller dimension (D <=32) L2 expanded is always faster by 10-15%
-- slight improvement in updateSortedWarpQ device function by reducing redundant instruction.
-- Fix incorrect output for NN >32 case when taking prod-cons knn merge path, this was caught in HDBSCAN pytest.

@mdoijade mdoijade requested review from a team as code owners September 27, 2021 13:09
@github-actions github-actions bot added the cpp label Sep 27, 2021
@mdoijade mdoijade requested a review from a team as a code owner September 28, 2021 16:30
@github-actions github-actions bot added the gpuCI label Sep 28, 2021
ci/prtest.config Outdated Show resolved Hide resolved
@mdoijade
Copy link
Contributor Author

@cjnolet @teju85 @dantegd I think this PR is now in review state, please help with review whenever possible.

@cjnolet cjnolet added 3 - Ready for Review improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Oct 5, 2021
@mdoijade
Copy link
Contributor Author

mdoijade commented Oct 6, 2021

@cjnolet I've now reverted ball cover tests to use brute_force_knn instead of using l2_unexpanded_knn as with this PR that functionality is back in brute_force_knn

…n, make customAtomicMax float only by removing the template as it is float specific function
Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changeds look great overall. Mostly minor/mechanical thing but we still need explicit gtests for these like we've done w/ other knn primitives that don't just proxy to FAISS (such as ball cover and haversine knn).

ci/prtest.config Outdated Show resolved Hide resolved
cpp/include/raft/spatial/knn/detail/fused_l2_knn.cuh Outdated Show resolved Hide resolved
cpp/include/raft/spatial/knn/knn.hpp Show resolved Hide resolved
@@ -80,40 +78,6 @@ uint32_t count_discrepancies(value_idx *actual_idx, value_idx *expected_idx,
return result;
}

template <typename value_t>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reverting this! Though we no longer need to invoke the fused knn directly, I do still benefit to keeping the additional helper function so it's more simple to change the bfknn call across all the gtests in the future.

std::vector<float *> input_vec = {d_train_inputs.data()};
std::vector<uint32_t> sizes_vec = {n};

compute_bfknn(handle, d_train_inputs.data(), d_train_inputs.data(), n, d, k,
metric, d_ref_D.data(), d_ref_I.data());
raft::spatial::knn::detail::brute_force_knn_impl<uint32_t, int64_t>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we have a knn that's not just proxying down to faiss, we should be gtesting it accordingly, similar to what's being done w/ the haversine and ball cover gtests. It's also important going forward because RAFT is beginning to get used by more projects and thus the impact of breaking tests is more than just cuml.

My suggestion is to test l2_unexpanded_knn and l2_expanded_knn directly in the gtests and then we can test the brute_force_knn more generally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

certainly a rigorous tests within RAFT are needed for them, I'll add gtests for them separately, now instead of l2_unexpanded_knn and l2_expanded_knn we have single entry fusedL2Knn function for both of them.
For these kernel I relied on cuML cpp knn tests & pytests so far which is no longer correct as you rightly mention it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this PR is still WIP, working on adding tests and some fixes needed due to API changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a new test tests\spatial\fused_l2_knn.cu is added which does testing for both L2 exp/unexp cases which compares its output with faiss bfknn call.
I've also polished the fp32 atomicMax device function which I believe is more faster than atomicCAS based version and also takes care of NaNs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the delay in updating this PR with the unit test

…o separate function which is now part of fused_l2_knn.cuh
@GPUtester
Copy link
Contributor

Can one of the admins verify this patch?

@github-actions github-actions bot added the CMake label Nov 2, 2021
@cjnolet
Copy link
Member

cjnolet commented Nov 2, 2021

add to allowlist

@mdoijade mdoijade changed the title [WIP] Add Fused L2 Expanded KNN kernel [REVIEW] Add Fused L2 Expanded KNN kernel Nov 2, 2021
Copy link
Contributor

@ChuckHastings ChuckHastings left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't affect cugraph

…ith same distance value exists and faiss picks one vs fusedL2KNN another, so we verify both vec index as well as distance val
@mdoijade
Copy link
Contributor Author

@cjnolet can this PR get merged?

@mdoijade
Copy link
Contributor Author

@cjnolet I see this PR is marked for v22.02 so you'll be auto merging it or there is additional action required from my side?

@cjnolet cjnolet changed the base branch from branch-21.12 to branch-22.02 November 15, 2021 18:50
@cjnolet
Copy link
Member

cjnolet commented Nov 15, 2021

@mdoijade, I scraped through the PRs a couple weeks ago and aligned them to expected releases. Looking back through my reivew, I'm really happy for the new tests but it looks like there are still a couple (very minor) things to address.

@mdoijade
Copy link
Contributor Author

@mdoijade, I scraped through the PRs a couple weeks ago and aligned them to expected releases. Looking back through my reivew, I'm really happy for the new tests but it looks like there are still a couple (very minor) things to address.

@cjnolet I believe now I have addressed all the points in this PR, the build failure is coming from ../test/sparse/dist_coo_spmv.cu in the CI, but locally I cannot repro this build failure.

@github-actions github-actions bot added the gpuCI label Nov 17, 2021
@github-actions github-actions bot removed the gpuCI label Nov 23, 2021
Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cjnolet
Copy link
Member

cjnolet commented Nov 23, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 6166a47 into rapidsai:branch-22.02 Nov 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review CMake cpp improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants