Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FAISS with RAFT enabled Benchmarking to raft-ann-bench #2026

Merged
merged 165 commits into from
Jun 17, 2024
Merged
Show file tree
Hide file tree
Changes from 115 commits
Commits
Show all changes
165 commits
Select commit Hold shift + click to select a range
cc9cbd3
Unpack list data kernel
tarang-jain Jul 1, 2023
28484ef
Merge branch 'branch-23.08' of https://github.com/rapidsai/raft into …
tarang-jain Jul 1, 2023
e39ee56
update packing and unpacking functions
tarang-jain Jul 5, 2023
68bf927
Merge branch 'branch-23.08' of https://github.com/rapidsai/raft into …
tarang-jain Jul 5, 2023
78d6380
Update codepacker
tarang-jain Jul 14, 2023
49a8834
Merge branch 'branch-23.08' of https://github.com/rapidsai/raft into …
tarang-jain Jul 14, 2023
897338e
refactor codepacker (does not build)
tarang-jain Jul 17, 2023
c1d80f5
Merge branch 'branch-23.08' of https://github.com/rapidsai/raft into …
tarang-jain Jul 17, 2023
2a2ee51
Undo deletions
tarang-jain Jul 17, 2023
834dd2c
undo yaml changes
tarang-jain Jul 17, 2023
6013429
style
tarang-jain Jul 17, 2023
ab6345a
Update tests, correct make_list_extents
tarang-jain Jul 18, 2023
ed80d1a
More changes
tarang-jain Jul 19, 2023
cdff9e1
Merge branch 'branch-23.08' of https://github.com/rapidsai/raft into …
tarang-jain Jul 19, 2023
7412272
debugging
tarang-jain Jul 20, 2023
700ea82
Working build
tarang-jain Jul 21, 2023
27451c6
Merge branch 'branch-23.08' of https://github.com/rapidsai/raft into …
tarang-jain Jul 21, 2023
9d742ef
rename codepacking api
tarang-jain Jul 21, 2023
d1ef8a1
Updated gtest
tarang-jain Jul 27, 2023
e187147
Merge branch 'branch-23.08' of https://github.com/rapidsai/raft into …
tarang-jain Jul 27, 2023
4f233a6
Merge branch 'branch-23.08' of https://github.com/rapidsai/raft into …
tarang-jain Jul 27, 2023
4ee99e3
updates
tarang-jain Jul 27, 2023
22f4f80
update testing
tarang-jain Jul 28, 2023
9f4e22c
Merge branch 'branch-23.08' of https://github.com/rapidsai/raft into …
tarang-jain Jul 28, 2023
c95d1e0
updates
tarang-jain Jul 28, 2023
da78c66
Update testing, pow2
tarang-jain Jul 31, 2023
5cc6dc9
Merge branch 'branch-23.08' of https://github.com/rapidsai/raft into …
tarang-jain Jul 31, 2023
15db0c6
remove unneccessary changes
tarang-jain Jul 31, 2023
154dc6d
Delete log.txt
tarang-jain Jul 31, 2023
47d6421
updates
tarang-jain Jul 31, 2023
0f1d106
Merge branch 'faiss-ivf' of https://github.com/tarang-jain/raft into …
tarang-jain Jul 31, 2023
e2e1308
ore cleanup
tarang-jain Jul 31, 2023
3f470c8
Merge branch 'branch-23.08' of https://github.com/rapidsai/raft into …
tarang-jain Jul 31, 2023
41a49b2
style
tarang-jain Jul 31, 2023
1d2a5b0
Merge branch 'branch-23.10' of https://github.com/rapidsai/raft into …
tarang-jain Aug 9, 2023
8ce8115
Merge branch 'branch-23.10' of https://github.com/rapidsai/raft into …
tarang-jain Aug 23, 2023
171215b
Merge branch 'branch-23.10' of https://github.com/rapidsai/raft into …
tarang-jain Sep 11, 2023
135d973
Initial commit
tarang-jain Sep 13, 2023
d7a9b4e
Merge branch 'branch-23.10' of https://github.com/rapidsai/raft into …
tarang-jain Sep 13, 2023
5738cca
im
tarang-jain Sep 21, 2023
62b39cf
host pq codepacker
tarang-jain Sep 22, 2023
8702b92
refactored codepacker
tarang-jain Sep 22, 2023
5b2a7e0
Merge branch 'branch-23.10' of https://github.com/rapidsai/raft into …
tarang-jain Sep 22, 2023
4139c7e
updated CP
tarang-jain Sep 22, 2023
e846352
undo some diffs
tarang-jain Sep 22, 2023
2ab3da2
undo some diffs
tarang-jain Sep 22, 2023
eb493a7
undo some diffs
tarang-jain Sep 22, 2023
28b7125
update docs
tarang-jain Sep 22, 2023
4b3b3bb
Merge branch 'branch-23.10' into faiss-ivf
tarang-jain Sep 25, 2023
3da5265
Merge branch 'branch-23.12' into faiss-ivf
cjnolet Oct 5, 2023
d546d89
initial efforts for compress/decompress codepacker
tarang-jain Oct 6, 2023
b6e3de9
Merge branch 'branch-23.12' of https://github.com/rapidsai/raft into …
tarang-jain Oct 6, 2023
4b94c45
Merge branch 'branch-23.12' into faiss-ivf
cjnolet Oct 11, 2023
ec11fd8
Merge branch 'branch-23.12' into faiss-ivf
cjnolet Oct 12, 2023
8a41330
Update codepacker and helpers
tarang-jain Oct 17, 2023
86f1aa4
Merge branch 'branch-23.12' of https://github.com/rapidsai/raft into …
tarang-jain Oct 17, 2023
0baee4a
Merge branch 'faiss-ivf' of https://github.com/tarang-jain/raft into …
tarang-jain Oct 17, 2023
9d66a8f
more helpers and debugging
tarang-jain Oct 26, 2023
3be7afd
Merge branch 'branch-23.12' of https://github.com/rapidsai/raft into …
tarang-jain Oct 26, 2023
fd01442
Update tests
tarang-jain Oct 26, 2023
1b4fd0e
action struct correction
tarang-jain Nov 2, 2023
7d760e9
Merge branch 'branch-23.12' of https://github.com/rapidsai/raft into …
tarang-jain Nov 2, 2023
aaff0bf
testing
tarang-jain Nov 3, 2023
c4bc220
Merge branch 'branch-23.12' of https://github.com/rapidsai/raft into …
tarang-jain Nov 3, 2023
6a5443a
remove unneeded funcs
tarang-jain Nov 3, 2023
bca8f40
Merge branch 'branch-23.12' into faiss-ivf
cjnolet Nov 7, 2023
8edc7a1
Add helper for extracting cluster centers
tarang-jain Nov 7, 2023
93eebab
Merge branch 'branch-23.12' of https://github.com/rapidsai/raft into …
tarang-jain Nov 7, 2023
140701e
Merge branch 'faiss-ivf' of https://github.com/tarang-jain/raft into …
tarang-jain Nov 7, 2023
0b88ca4
Update docs
tarang-jain Nov 9, 2023
d67fe8d
Merge branch 'branch-23.12' of https://github.com/rapidsai/raft into …
tarang-jain Nov 9, 2023
a68d7a7
Add test
tarang-jain Nov 9, 2023
41ac27f
correction
tarang-jain Nov 9, 2023
5073ea3
Update docs
tarang-jain Nov 16, 2023
889bbdd
Merge branch 'branch-23.12' of https://github.com/rapidsai/raft into …
tarang-jain Nov 16, 2023
3dbf3a7
more updates to docs
tarang-jain Nov 16, 2023
30bdee5
style
tarang-jain Nov 16, 2023
55fa0ef
more docs
tarang-jain Nov 16, 2023
8eb07f8
undo small docstring change
tarang-jain Nov 16, 2023
f8956d5
style
tarang-jain Nov 16, 2023
228e997
more doc updates
tarang-jain Nov 16, 2023
bdd75cf
small doc fix
tarang-jain Nov 16, 2023
6adcb98
resource docs
tarang-jain Nov 16, 2023
1893963
Update docs for ivf_flat::helpers::reset_index
tarang-jain Nov 16, 2023
91e17c2
Merge branch 'branch-23.12' of https://github.com/rapidsai/raft into …
tarang-jain Nov 16, 2023
a2d4575
update reset_index
tarang-jain Nov 16, 2023
1efd28f
change helpers name to contiguous
tarang-jain Nov 17, 2023
9841e6c
move get_list_size to index struct
tarang-jain Nov 17, 2023
3f8baaa
change test name
tarang-jain Nov 17, 2023
11a681f
raft enabled BM
tarang-jain Nov 29, 2023
3cd2d4a
Merge branch 'branch-24.02' of https://github.com/rapidsai/raft into …
tarang-jain Nov 29, 2023
633ad86
raft enabled IVF-Flat BM
tarang-jain Nov 29, 2023
09bcbd8
style
tarang-jain Nov 29, 2023
ab442b3
remove hardcoded pool size
tarang-jain Nov 29, 2023
a3acb5d
update faiss::gpu::benchmark main, revert pool MR in constructor
tarang-jain Dec 1, 2023
8bc00aa
Merge branch 'branch-24.02' of https://github.com/rapidsai/raft into …
tarang-jain Dec 1, 2023
e539fd2
Merge branch 'branch-24.02' of https://github.com/rapidsai/raft into …
tarang-jain Dec 2, 2023
2b089bb
Merge branch 'branch-24.02' of https://github.com/rapidsai/raft into …
tarang-jain Dec 4, 2023
87b3eb5
updated yaml
tarang-jain Dec 6, 2023
5057525
Merge branch 'branch-24.02' of https://github.com/rapidsai/raft into …
tarang-jain Dec 6, 2023
bdf7196
update config, faiss bm
tarang-jain Dec 12, 2023
a045f8e
Merge branch 'branch-24.02' of https://github.com/rapidsai/raft into …
tarang-jain Dec 12, 2023
72b7e00
debug
tarang-jain Dec 16, 2023
3bbf67a
merge changes
tarang-jain Dec 18, 2023
22b6754
raft refinement for faiss index
tarang-jain Dec 25, 2023
9d9a078
merge
tarang-jain Dec 25, 2023
1385cf8
dbg
tarang-jain Dec 26, 2023
651ea18
Merge branch 'faiss-ivf' of https://github.com/tarang-jain/raft into …
tarang-jain Dec 26, 2023
5847a09
Merge branch 'branch-24.02' into faiss-ivf
cjnolet Jan 10, 2024
77f9366
changes
tarang-jain Jan 11, 2024
31f444d
Merge branch 'faiss-ivf' of https://github.com/tarang-jain/raft into …
tarang-jain Jan 11, 2024
dfb2c2c
changes
tarang-jain Jan 16, 2024
9be5ecc
cleanup
tarang-jain Jan 17, 2024
02bdc23
Merge branch 'branch-24.02' of https://github.com/rapidsai/raft into …
tarang-jain Jan 17, 2024
27bf943
cleanup
tarang-jain Jan 17, 2024
9012267
Merge branch 'branch-24.02' of https://github.com/rapidsai/raft into …
tarang-jain Jan 18, 2024
0c714d5
updates,cleanup,style
tarang-jain Jan 18, 2024
df10536
updates,cleanup
tarang-jain Jan 18, 2024
395402c
updates,changes,style
tarang-jain Jan 18, 2024
4b8843d
Merge branch 'branch-24.02' into faiss-ivf
tarang-jain Feb 2, 2024
b1e7495
Merge branch 'branch-24.04' into faiss-ivf
tarang-jain Feb 2, 2024
95dcd10
Remove unnecessary copyright date changes
tfeher Feb 4, 2024
e25acf1
Merge branch 'branch-24.04' into faiss-ivf
tfeher Feb 4, 2024
8975a81
add 100M params,remove debug statements
tarang-jain Feb 6, 2024
8188767
merge
tarang-jain Feb 6, 2024
f697549
Merge branch 'faiss-ivf' of https://github.com/tarang-jain/raft into …
tarang-jain Feb 6, 2024
f0aa1db
small correction in 100M params
tarang-jain Feb 6, 2024
7a429e5
Merge branch 'branch-24.04' into faiss-ivf
cjnolet Feb 13, 2024
a54408f
Merge branch 'branch-24.02' of https://github.com/rapidsai/raft into …
tarang-jain Feb 14, 2024
757e07a
adding faiss cpu configs
tarang-jain Feb 14, 2024
65f096f
Merge branch 'branch-24.04' of https://github.com/rapidsai/raft into …
tarang-jain Feb 14, 2024
d472c06
Merge branch 'faiss-ivf' of https://github.com/tarang-jain/raft into …
tarang-jain Feb 14, 2024
dbb773d
merge
tarang-jain Feb 21, 2024
66baf65
update get_faiss.cmake
tarang-jain Feb 21, 2024
ccc8056
Merge branch 'branch-24.06' into faiss-ivf
tarang-jain Apr 3, 2024
2d223dd
Merge branch 'branch-24.06' into faiss-ivf
cjnolet Apr 10, 2024
092b9b9
Merge branch 'branch-24.06' into faiss-ivf
tarang-jain Apr 11, 2024
b9c64be
style
tarang-jain Apr 11, 2024
981a730
Merge branch 'branch-24.06' of https://github.com/rapidsai/raft into …
tarang-jain May 10, 2024
507ce25
undo copyright change
tarang-jain May 10, 2024
dc14d8b
remove debug statements
tarang-jain May 10, 2024
af37e68
match func signature
tarang-jain May 11, 2024
e5170a8
make build
tarang-jain May 11, 2024
9df0d73
add metric conversion func
tarang-jain May 12, 2024
bd1fe4c
remove metric parsing bugs
tarang-jain May 12, 2024
7295308
include utils header
tarang-jain May 12, 2024
f2f2e3b
Merge branch 'branch-24.06' of https://github.com/rapidsai/raft into …
tarang-jain May 13, 2024
1838102
Merge branch 'branch-24.06' into faiss-ivf
tarang-jain May 13, 2024
fe389db
bm configs for ivfflat
tarang-jain May 13, 2024
9c5cf50
update docs to keep track of FAISS issue
tarang-jain May 13, 2024
09d2422
rm name
tarang-jain May 13, 2024
d56089d
Merge branch 'faiss-ivf' of https://github.com/tarang-jain/raft into …
tarang-jain May 13, 2024
ba2cdd8
Merge branch 'branch-24.06' of https://github.com/rapidsai/raft into …
tarang-jain May 13, 2024
921eadd
Update python/raft-ann-bench/src/raft-ann-bench/run/conf/algos/faiss_…
tarang-jain May 14, 2024
c91a94b
revert comment, final changes
tarang-jain May 14, 2024
29e08cb
merge
tarang-jain May 14, 2024
cca3927
merge 24.06
tarang-jain May 21, 2024
b0ce3ee
merge 24.08
tarang-jain Jun 7, 2024
fc5c2b3
add warning when throughput mode is enabled
tarang-jain Jun 7, 2024
0fa20a9
make compile
tarang-jain Jun 10, 2024
a2c1d7f
Merge branch 'branch-24.08' of https://github.com/rapidsai/raft into …
tarang-jain Jun 10, 2024
bd6d5b5
make compile
tarang-jain Jun 10, 2024
36c97e8
style
tarang-jain Jun 10, 2024
04a3342
corrections
tarang-jain Jun 13, 2024
60d5927
Merge branch 'branch-24.08' into faiss-ivf
tarang-jain Jun 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions cpp/bench/ann/src/common/benchmark.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
#include "ann_types.hpp"
#include "conf.hpp"
#include "dataset.hpp"
#include "raft/util/cudart_utils.hpp"
#include "util.hpp"

#include <benchmark/benchmark.h>
Expand Down Expand Up @@ -317,6 +318,7 @@ void bench_search(::benchmark::State& state,
}
auto end = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::duration<double>>(end - start).count();
// std::cout << "duration" << duration << std::endl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

if (state.thread_index() == 0) { state.counters.insert({{"end_to_end", duration}}); }
state.counters.insert(
{"Latency", {duration / double(state.iterations()), benchmark::Counter::kAvgThreads}});
Expand Down
33 changes: 32 additions & 1 deletion cpp/bench/ann/src/faiss/faiss_gpu_benchmark.cu
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,11 @@ void parse_build_param(const nlohmann::json& conf,
typename raft::bench::ann::FaissGpuIVFFlat<T>::BuildParam& param)
{
parse_base_build_param<T>(conf, param);
if (conf.contains("use_raft")) {
param.use_raft = conf.at("use_raft");
} else {
param.use_raft = false;
}
}

template <typename T>
Expand All @@ -61,6 +66,16 @@ void parse_build_param(const nlohmann::json& conf,
} else {
param.useFloat16 = false;
}
if (conf.contains("use_raft")) {
param.use_raft = conf.at("use_raft");
} else {
param.use_raft = false;
}
if (conf.contains("bitsPerCode")) {
param.bitsPerCode = conf.at("bitsPerCode");
} else {
param.bitsPerCode = 8;
}
}

template <typename T>
Expand All @@ -77,6 +92,12 @@ void parse_search_param(const nlohmann::json& conf,
{
param.nprobe = conf.at("nprobe");
if (conf.contains("refine_ratio")) { param.refine_ratio = conf.at("refine_ratio"); }
if (conf.contains("raft_refinement")) {
RAFT_LOG_INFO("found raft_refinement");
param.raft_refinement = conf.at("raft_refinement");
} else {
param.raft_refinement = false;
}
}

template <typename T, template <typename> class Algo>
Expand Down Expand Up @@ -158,5 +179,15 @@ REGISTER_ALGO_INSTANCE(std::uint8_t);

#ifdef ANN_BENCH_BUILD_MAIN
#include "../common/benchmark.hpp"
int main(int argc, char** argv) { return raft::bench::ann::run_main(argc, argv); }
int main(int argc, char** argv)
{
rmm::mr::cuda_memory_resource cuda_mr;
// Construct a resource that uses a coalescing best-fit pool allocator
rmm::mr::pool_memory_resource<rmm::mr::cuda_memory_resource> pool_mr{&cuda_mr};
rmm::mr::set_current_device_resource(
Copy link
Contributor

@tfeher tfeher Jan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For single threaded benchmarks this is fine and also for multi-threaded benchmarks, the pool will be correctly shared.

We shall consider how the RAFT handle is shared in multi threaded environment. (Commenting here because the other relevant code path in faiss_gpu_wrapper.h is not changed.) Taging @achirkin for advice.

Notes:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes, I've added the note on multithreading here

/** [NOTE Multithreading]
*
* `gpu_resource_` is a shared resource:
* 1. It uses a shared_ptr under the hood, so the copies of it refer to the same
* resource implementation instance
* 2. GpuIndex is probably keeping a reference to it, as it's passed to the constructor
*
* To avoid copying the index (database) in each thread, we make both the index and
* the gpu_resource shared.
* This means faiss GPU streams are possibly shared among the CPU threads;
* the throughput search mode may be inaccurate.
*
* WARNING: we haven't investigated whether faiss::gpu::GpuIndex or
* faiss::gpu::StandardGpuResources are thread-safe.
*
*/

The gist is that, in the current state, we share a single faiss handle among multiple threads; this is in contrast to a new raft handle being created for every thread in raft algorithms.
We know raft handle is not thread-safe (due to stateful cublas calls). I didn't know the internals of the FAISS StandardGpuResource at the time of writing this, so I postponed this investigation to push the raft fixes. Now we know FAISS keeps the raft handle and we probably need to do something about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is just one handle for each device. https://github.com/facebookresearch/faiss/blob/5e3eae4fccb20723dbc674b3ffa005ce09afcd8d/faiss/gpu/StandardGpuResources.cpp#L432

For benchmarking purposes I have just been using a single CPU thread to prevent differences due to number of threads.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a method getAlternateStreams https://github.com/facebookresearch/faiss/blob/5e3eae4fccb20723dbc674b3ffa005ce09afcd8d/faiss/gpu/StandardGpuResources.cpp#L450 that can be used to get a vector of alternate streams for the device. Perhaps we can have something similar for raft handles.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For benchmarking purposes I have just been using a single CPU thread to prevent differences due to number of threads.

@tarang-jain unfortunately we don't yet have automated testing for the benchmarks tool, which means we need to be vigilant about manually testing changes in the meantime. At a minimum, the tests should be run with both search-mode=latency and search-mode=throughput. The latter is going to require using separate raft::resources/StandardGpuResources for each thread.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just create a separate StandardGpuResources object for each copy of FaissGpu wrapper. That way they will have separate raft resource handle. Since the pool allocator is set globally all the handles will share that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tfeher but that would mean that the whole index would have to be copied for each thread.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tfeher @achirkin
I studied the benchmark wrappers of other algorithms in detail and found a fix for this: FAISS is missing two critical pieces. I have created issues: #3424 and 3425 in FAISS to address these. These are easy fixes and once we have them in place, we can do the exact same thing as raft-ann-bench's GGNN wrapper is currently doing: when an index is copied, change the stream of the index to the one designated for the current thread.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes setDefaultStream will need to update RAFT handle's default stream. But you would need to make sure that the gpu_resource_ is not a shared object anymore, because then everyone is trying to update the stream for the same object.

In practice we use the GpuResource object as a wrapper around stream. But GpuResource contains other resources like the Temporary memory allocator, which might grab 1.5 GiB per GpuResource object. That can be circumvented by setting TempMem size to 0.

&pool_mr); // Updates the current device resource pointer to `pool_mr`
rmm::mr::device_memory_resource* mr =
rmm::mr::get_current_device_resource(); // Points to `pool_mr`
return raft::bench::ann::run_main(argc, argv);
}
#endif
139 changes: 122 additions & 17 deletions cpp/bench/ann/src/faiss/faiss_gpu_wrapper.h
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,15 @@
#define FAISS_WRAPPER_H_

#include "../common/ann_types.hpp"
#include <raft/core/device_mdarray.hpp>
#include <raft/core/host_mdarray.hpp>
#include <raft/core/host_mdspan.hpp>
#include <raft/distance/distance_types.hpp>

#include <raft/core/logger.hpp>
#include <raft/util/cudart_utils.hpp>

#include <faiss/MetricType.h>
#include <faiss/IndexFlat.h>
#include <faiss/IndexIVFFlat.h>
#include <faiss/IndexIVFPQ.h>
Expand All @@ -37,6 +42,10 @@

#include <raft/core/device_resources.hpp>
#include <raft/core/resource/stream_view.hpp>
#include <raft_runtime/neighbors/refine.hpp>
#include <rmm/cuda_device.hpp>
#include <rmm/mr/device/device_memory_resource.hpp>
#include <rmm/mr/device/per_device_resource.hpp>

#include <cassert>
#include <memory>
Expand Down Expand Up @@ -99,7 +108,8 @@ class FaissGpu : public ANN<T> {
using typename ANN<T>::AnnSearchParam;
struct SearchParam : public AnnSearchParam {
int nprobe;
float refine_ratio = 1.0;
float refine_ratio = 1.0;
bool raft_refinement = false;
auto needs_dataset() const -> bool override { return refine_ratio > 1.0f; }
};

Expand Down Expand Up @@ -143,6 +153,8 @@ class FaissGpu : public ANN<T> {
return property;
}

auto metric_faiss_to_raft(faiss::MetricType metric) const -> raft::distance::DistanceType;

protected:
template <typename GpuIndex, typename CpuIndex>
void save_(const std::string& file) const;
Expand Down Expand Up @@ -181,13 +193,27 @@ class FaissGpu : public ANN<T> {
copyable_event sync_{};
double training_sample_fraction_;
std::shared_ptr<faiss::SearchParameters> search_params_;
std::shared_ptr<faiss::IndexRefineSearchParameters> refine_search_params_{nullptr};
const T* dataset_;
float refine_ratio_ = 1.0;
float refine_ratio_ = 1.0;
bool raft_refinement_ = false;
};

template <typename T>
auto FaissGpu<T>::metric_faiss_to_raft(faiss::MetricType metric) const
-> raft::distance::DistanceType
{
switch (metric) {
case faiss::MetricType::METRIC_L2: return raft::distance::DistanceType::L2Expanded;
case faiss::MetricType::METRIC_INNER_PRODUCT:
default: throw std::runtime_error("FAISS supports only metric type of inner product and L2");
}
}

template <typename T>
void FaissGpu<T>::build(const T* dataset, size_t nrow, cudaStream_t stream)
{
// raft::print_host_vector("faiss dataset", dataset, 100, std::cout);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove.

OmpSingleThreadScope omp_single_thread;
auto index_ivf = dynamic_cast<faiss::gpu::GpuIndexIVF*>(index_.get());
if (index_ivf != nullptr) {
Expand All @@ -208,7 +234,7 @@ void FaissGpu<T>::build(const T* dataset, size_t nrow, cudaStream_t stream)
nlist_,
index_ivf->cp.min_points_per_centroid);
}
index_ivf->cp.max_points_per_centroid = max_ppc;
index_ivf->cp.max_points_per_centroid = 300;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you changing this? max_ppc was defined to be consistent with raft kmeans_trainset_fraction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes these were all parts of debugging experiments. Changed it back.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you change this back to max_ppc?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you change it back to max_ppc?

index_ivf->cp.min_points_per_centroid = min_ppc;
}
index_->train(nrow, dataset); // faiss::gpu::GpuIndexFlat::train() will do nothing
Expand All @@ -225,19 +251,79 @@ void FaissGpu<T>::search(const T* queries,
float* distances,
cudaStream_t stream) const
{
using IdxT = faiss::idx_t;
static_assert(sizeof(size_t) == sizeof(faiss::idx_t),
"sizes of size_t and faiss::idx_t are different");

if (this->refine_ratio_ > 1.0) {
// TODO: FAISS changed their search APIs to accept the search parameters as a struct object
// but their refine API doesn't allow the struct to be passed in. Once this is fixed, we
// need to re-enable refinement below
// index_refine_->search(batch_size, queries, k, distances,
// reinterpret_cast<faiss::idx_t*>(neighbors), this->search_params_.get()); Related FAISS issue:
// https://github.com/facebookresearch/faiss/issues/3118
throw std::runtime_error(
"FAISS doesn't support refinement in their new APIs so this feature is disabled in the "
"benchmarks for the time being.");
if (refine_ratio_ > 1.0) {
if (raft_refinement_) {
uint32_t k0 = static_cast<uint32_t>(refine_ratio_ * k);
// auto distances_tmp = raft::make_host_matrix<float, IdxT>(batch_size, k0);
// auto candidates = raft::make_host_matrix<IdxT, IdxT>(batch_size, k0);
auto distances_tmp = raft::make_device_matrix<float, IdxT>(gpu_resource_->getRaftHandle(device_), batch_size, k0);
auto candidates = raft::make_device_matrix<IdxT, IdxT>(gpu_resource_->getRaftHandle(device_), batch_size, k0);
index_->search(batch_size,
queries,
k0,
distances_tmp.data_handle(),
candidates.data_handle(),
this->search_params_.get());
// auto queries_v = raft::make_host_matrix_view<const T, IdxT>(queries, batch_size, index_->d);


// auto dataset_v = raft::make_host_matrix_view<const T, faiss::idx_t>(
// this->dataset_, index_->ntotal, index_->d);

// auto neighbors_v =
// raft::make_host_matrix_view<IdxT, IdxT>(reinterpret_cast<IdxT*>(neighbors), batch_size, k);
// auto distances_v = raft::make_host_matrix_view<float, IdxT>(distances, batch_size, k);

// raft::runtime::neighbors::refine(gpu_resource_->getRaftHandle(device_),
// dataset_v,
// queries_v,
// candidates.view(),
// neighbors_v,
// distances_v,
// metric_faiss_to_raft(index_->metric_type));

auto queries_host = raft::make_host_matrix<T, IdxT>(batch_size, index_->d);
auto candidates_host = raft::make_host_matrix<IdxT, IdxT>(batch_size, k0);
auto neighbors_host = raft::make_host_matrix<IdxT, IdxT>(batch_size, k);
auto distances_host = raft::make_host_matrix<float, IdxT>(batch_size, k);
auto dataset_v = raft::make_host_matrix_view<const T, faiss::idx_t>(
this->dataset_, index_->ntotal, index_->d);

auto handle_ = gpu_resource_->getRaftHandle(device_);

raft::copy(queries_host.data_handle(), queries, queries_host.size(), stream);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't these already on host? We have the following property defied for our wrapper class:

property.query_memory_type   = MemoryType::Host;

I would expect that our benchmark provides queries to be on the host. If I understand correctly, we could also provide the candidates array as a host array to FAISS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am allowing queries to be on both -- host and device. When the queries are on host and we have refinement ratios > 1, use FAISS' refinement methods. When they are on device and we have refinement enabled, we use raft's refinement API because FAISS' IndexRefineFlat requires the queries and distances and candidates to be on host. The latter gives us a more apples to apples comparison with RAFT because:

  1. No unwanted copies of queries to device in FAISS' internal impl
  2. Avoid unnecessary overhead (if any) due to FAISS' refinement methods

raft::copy(candidates_host.data_handle(),
candidates.data_handle(),
candidates_host.size(),
resource::get_cuda_stream(handle_));

// wait for the queries to copy to host in 'stream` and for IVF-PQ::search to finish
// RAFT_CUDA_TRY(cudaEventRecord(handle_.get_sync_event(), resource::get_cuda_stream(handle_)));
// RAFT_CUDA_TRY(cudaEventRecord(handle_.get_sync_event(), stream));
// RAFT_CUDA_TRY(cudaEventSynchronize(handle_.get_sync_event()));
handle_.sync_stream();
raft::runtime::neighbors::refine(handle_,
dataset_v,
queries_host.view(),
candidates_host.view(),
neighbors_host.view(),
distances_host.view(),
metric_faiss_to_raft(index_->metric_type));

raft::copy(neighbors, (size_t*)neighbors_host.data_handle(), neighbors_host.size(), stream);
raft::copy(distances, distances_host.data_handle(), distances_host.size(), stream);
} else {
index_refine_->search(batch_size,
queries,
k,
distances,
reinterpret_cast<faiss::idx_t*>(neighbors),
this->refine_search_params_.get());
}
} else {
index_->search(batch_size,
queries,
Expand Down Expand Up @@ -280,13 +366,16 @@ void FaissGpu<T>::load_(const std::string& file)
template <typename T>
class FaissGpuIVFFlat : public FaissGpu<T> {
public:
using typename FaissGpu<T>::BuildParam;
struct BuildParam : public FaissGpu<T>::BuildParam {
bool use_raft;
};

FaissGpuIVFFlat(Metric metric, int dim, const BuildParam& param) : FaissGpu<T>(metric, dim, param)
{
faiss::gpu::GpuIndexIVFFlatConfig config;
config.device = this->device_;
this->index_ = std::make_shared<faiss::gpu::GpuIndexIVFFlat>(
config.device = this->device_;
config.use_raft = param.use_raft;
this->index_ = std::make_shared<faiss::gpu::GpuIndexIVFFlat>(
this->gpu_resource_.get(), dim, param.nlist, this->metric_type_, config);
}

Expand Down Expand Up @@ -320,21 +409,25 @@ class FaissGpuIVFPQ : public FaissGpu<T> {
int M;
bool useFloat16;
bool usePrecomputed;
bool use_raft;
int bitsPerCode;
};

FaissGpuIVFPQ(Metric metric, int dim, const BuildParam& param) : FaissGpu<T>(metric, dim, param)
{
faiss::gpu::GpuIndexIVFPQConfig config;
config.useFloat16LookupTables = param.useFloat16;
config.usePrecomputedTables = param.usePrecomputed;
config.use_raft = param.use_raft;
config.interleavedLayout = param.use_raft;
config.device = this->device_;

this->index_ =
std::make_shared<faiss::gpu::GpuIndexIVFPQ>(this->gpu_resource_.get(),
dim,
param.nlist,
param.M,
8, // FAISS only supports bitsPerCode=8
param.bitsPerCode,
this->metric_type_,
config);
}
Expand All @@ -354,7 +447,14 @@ class FaissGpuIVFPQ : public FaissGpu<T> {
this->index_refine_ =
std::make_shared<faiss::IndexRefineFlat>(this->index_.get(), this->dataset_);
this->index_refine_.get()->k_factor = search_param.refine_ratio;
faiss::IndexRefineSearchParameters faiss_refine_search_params;
faiss_refine_search_params.k_factor = this->index_refine_.get()->k_factor;
faiss_refine_search_params.base_index_params = this->search_params_.get();
this->refine_search_params_ =
std::make_unique<faiss::IndexRefineSearchParameters>(faiss_refine_search_params);
}
this->raft_refinement_ = search_param.raft_refinement;
RAFT_LOG_INFO("refine_ratio %f raft_refinement %d", this->refine_ratio_, this->raft_refinement_);
}

void save(const std::string& file) const override
Expand Down Expand Up @@ -410,6 +510,11 @@ class FaissGpuIVFSQ : public FaissGpu<T> {
this->index_refine_ =
std::make_shared<faiss::IndexRefineFlat>(this->index_.get(), this->dataset_);
this->index_refine_.get()->k_factor = search_param.refine_ratio;
faiss::IndexRefineSearchParameters faiss_refine_search_params;
faiss_refine_search_params.k_factor = this->index_refine_.get()->k_factor;
faiss_refine_search_params.base_index_params = this->search_params_.get();
this->refine_search_params_ =
std::make_unique<faiss::IndexRefineSearchParameters>(faiss_refine_search_params);
}
}

Expand Down
4 changes: 2 additions & 2 deletions cpp/bench/ann/src/raft/raft_ivf_pq_wrapper.h
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ class RaftIvfPQ : public ANN<T> {
{
AlgoProperty property;
property.dataset_memory_type = MemoryType::Host;
property.query_memory_type = MemoryType::Device;
property.query_memory_type = MemoryType::Host;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am surprised that this does not break the code. RAFT ivf_pq::search API expects queries on device.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does break I think. These changes and the debug (print) statements were all part of some experiments that I was doing. Updating all of those now.

return property;
}
void save(const std::string& file) const override;
Expand Down Expand Up @@ -209,7 +209,7 @@ void RaftIvfPQ<T, IdxT>::search(const T* queries,
raft::make_device_matrix_view<const T, IdxT>(queries, batch_size, index_->dim());
auto neighbors_v = raft::make_device_matrix_view<IdxT, IdxT>((IdxT*)neighbors, batch_size, k);
auto distances_v = raft::make_device_matrix_view<float, IdxT>(distances, batch_size, k);

raft::runtime::neighbors::ivf_pq::search(
handle_, search_params_, *index_, queries_v, neighbors_v, distances_v);
handle_.stream_wait(stream); // RAFT stream -> bench stream
Expand Down
7 changes: 5 additions & 2 deletions cpp/cmake/thirdparty/get_faiss.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ function(find_and_configure_faiss)
EXCLUDE_FROM_ALL ${PKG_EXCLUDE_FROM_ALL}
OPTIONS
"FAISS_ENABLE_GPU ${PKG_ENABLE_GPU}"
"FAISS_ENABLE_RAFT ON"
"FAISS_ENABLE_PYTHON OFF"
"FAISS_OPT_LEVEL ${RAFT_FAISS_OPT_LEVEL}"
"FAISS_USE_CUDA_TOOLKIT_STATIC ${CUDA_STATIC_RUNTIME}"
Expand Down Expand Up @@ -90,14 +91,16 @@ endfunction()
if(NOT RAFT_FAISS_GIT_TAG)
# TODO: Remove this once faiss supports FAISS_USE_CUDA_TOOLKIT_STATIC
# (https://github.com/facebookresearch/faiss/pull/2446)
set(RAFT_FAISS_GIT_TAG fea/statically-link-ctk)
set(RAFT_FAISS_GIT_TAG rmm-pool-alloc)
# set(RAFT_FAISS_GIT_TAG fea/statically-link-ctk)
# set(RAFT_FAISS_GIT_TAG bde7c0027191f29c9dadafe4f6e68ca0ee31fb30)
endif()

if(NOT RAFT_FAISS_GIT_REPOSITORY)
# TODO: Remove this once faiss supports FAISS_USE_CUDA_TOOLKIT_STATIC
# (https://github.com/facebookresearch/faiss/pull/2446)
set(RAFT_FAISS_GIT_REPOSITORY https://github.com/cjnolet/faiss.git)
set(RAFT_FAISS_GIT_REPOSITORY https://github.com/tarang-jain/faiss.git)
# set(RAFT_FAISS_GIT_REPOSITORY https://github.com/cjnolet/faiss.git)
# set(RAFT_FAISS_GIT_REPOSITORY https://github.com/facebookresearch/faiss.git)
endif()

Expand Down
Loading
Loading