feat(indexer): allow user switch indexer in query time #592
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Scenario explained
In most of examples and tutorial, we use
NumpyIndexer
for the vector index. In the query time, theNumpyIndexer
stores vectors using compressed binary, then in the query time it uses exhaustive search (dot products).The motivation behind this PR is to let user switch to advanced indexer in the query time, without re-indexing again, e.g.:
annoy
annoy
is still slow, I need to checkfaiss
faiss
is fine but I need to tune the parameter here and there.This feature is possible because all vector indexers implemented so far are inherited from
NumpyIndexer
. In #589 , I made a preparation PR for refactoringNumpyIndexer
intoBaseNumpyIndexer
.Results
Say user uses the following YAML for indexing.
With the new feature enabled in this PR, in the query time, the user can switch to AnnoyIndexer, simply by quoting the above YAML in
ref_indexer
:or to
NMSLibIndexer
viaNo touch on all binary dumps is required.
Generalization
ref_indexer
is implemented atBaseNumpyIndexer
level. So all indexers inherited from it are switchable, e.g. you can also "downgrade" anAnnoyIndexer
toNumpyIndexer
by creating a!NumpyIndexer
YAML and putting!AnnnoyIndexer
insideref_indexer
.One can also use it in a compound manner, e.g. in
ChunkIndexer
: