-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Support reconstruction of original vectors from IVF-Flat / IVF-PQ Index #1205
Comments
Referencing the corresponding method in FAISS' CPU IVF index: https://github.com/facebookresearch/faiss/blob/main/faiss/IndexIVF.cpp#L918 |
Update on the reconstruction. At this moment, we have two draft implementations of reconstruction. My IVF-PQ implementation takes the (cluster_id, [in_cluster_indices]) instead of the user-defined indices, and the IVF-Flat implementation seems to be sub-optimal in that it does not support non-contiguous user-defined indices and iterates through the whole index multiple times. The ANN indices are one-directional by design. Thus, supporting the search in the opposite direction (find vectors by indices) requires either an exhaustive search through the whole DB for every query, or the introduction of a helper struct for a faster search (an attempt to make such is in ivf-flat PR). I suggest we do the latter, but decouple the helper struct from the model. Something like the following: template <typename ListT>
struct index_mapping_t {
device_vector<ListT::index_type> user_indices;
device_vector<std::Tuple<uint32_t, uint32_t>> clusters_and_offsets;
template <typename = ivf::enable_if_valid_list<ListT>>
index_mapping_t(std::vector<std::shared_ptr<ListT>> lists) {
...
cub::DeviceRadixSort::SortPairs(user_indices, clusters_and_offsets);
...
}
};
template <typename ListT>
auto get_lists_ands_offsets(const index_mapping_t<ListT>& mapping, device_vector_view<ListT::index_type> queries) ->
std::vector<std::Tuple<uint32_t, device_vector<uint32_t>>> {
// find the cluster & in-cluster-offsets, and group them by cluster ids
} Then, we could calculate |
In addition to that, we could make a separate version of |
Add public functions for reading and writing into individual ivf-pq lists (clusters), in the input space (reconstructed data) and in flat PQ codes. Partially solves (IVF-PQ) #1205 Authors: - Artem M. Chirkin (https://github.com/achirkin) - Corey J. Nolet (https://github.com/cjnolet) - Tamas Bela Feher (https://github.com/tfeher) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #1298
Add public functions for reading and writing into individual ivf-pq lists (clusters), in the input space (reconstructed data) and in flat PQ codes. Partially solves (IVF-PQ) rapidsai#1205 Authors: - Artem M. Chirkin (https://github.com/achirkin) - Corey J. Nolet (https://github.com/cjnolet) - Tamas Bela Feher (https://github.com/tfeher) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#1298
FAISS currently supports two functions (on CPU) for reconstructing original vectors from the IVF-Flat/IVF-PQ indexes:
reconstruct_n
andreconstruct_batch
. FAISS' existing IVF-Flat GPU implementation does supportreconstruct
andreconstruct_n
(a linear set of indices from a specific start to specific end index) butreconstruct_batch
would be more efficient so that we can reconstruct a specific set of vectors given their indexes, which don't even need to be in order. It's been requested that we support this in our GPU implementations.The reconstruct API is also very useful when a GPU coarse quantizer (of any index type) is combined w/ a CPU IVF index for very large indexes which may not fit into GPU memory.
The text was updated successfully, but these errors were encountered: