Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(indexer): allow user switch indexer in query time #592

Merged
merged 2 commits into from
Jun 28, 2020
Merged

Conversation

hanxiao
Copy link
Member

@hanxiao hanxiao commented Jun 28, 2020

Scenario explained

In most of examples and tutorial, we use NumpyIndexer for the vector index. In the query time, the NumpyIndexer stores vectors using compressed binary, then in the query time it uses exhaustive search (dot products).

The motivation behind this PR is to let user switch to advanced indexer in the query time, without re-indexing again, e.g.:

  • exhaustive dot product is slow, I want to switch to annoy
  • annoy is still slow, I need to check faiss
  • faiss is fine but I need to tune the parameter here and there.

This feature is possible because all vector indexers implemented so far are inherited from NumpyIndexer. In #589 , I made a preparation PR for refactoring NumpyIndexer into BaseNumpyIndexer.

Results

Say user uses the following YAML for indexing.

    !NumpyIndexer
    metas:
      name: wrap-npidx
    with:
      backend: numpy
      compress_level: 1
      index_filename: wrap-npidx
      metric: euclidean

With the new feature enabled in this PR, in the query time, the user can switch to AnnoyIndexer, simply by quoting the above YAML in ref_indexer:

!AnnoyIndexer
with:
  ref_indexer:
    !NumpyIndexer
    metas:
      name: wrap-npidx
    with:
      backend: numpy
      compress_level: 1
      index_filename: wrap-npidx
      metric: euclidean

or to NMSLibIndexer via

!NmslibIndexer
with:
  ref_indexer:
    !NumpyIndexer
    metas:
      name: wrap-npidx
    with:
      backend: numpy
      compress_level: 1
      index_filename: wrap-npidx
      metric: euclidean
  space: l2

No touch on all binary dumps is required.

Generalization

ref_indexer is implemented at BaseNumpyIndexer level. So all indexers inherited from it are switchable, e.g. you can also "downgrade" an AnnoyIndexer to NumpyIndexer by creating a !NumpyIndexer YAML and putting !AnnnoyIndexer inside ref_indexer.

One can also use it in a compound manner, e.g. in ChunkIndexer:

!ChunkIndexer
components:
  - !AnnoyIndexer
    with:
      ref_indexer:
        !NumpyIndexer
        with:
          index_filename: vec.gz
        metas:
          name: vecidx  # a customized name
          workspace: $WORKDIR
  - !ChunkPbIndexer
    with:
      index_filename: chunk.gz
    metas:
      name: chunkidx
      workspace: $WORKDIR
metas:
  name: chunk_compound_indexer
  workspace: $WORKDIR

@github-actions
Copy link

github-actions bot commented Jun 28, 2020

Jina CLA check ✅ All Contributors have signed the CLA.

@github-actions
Copy link

This PR closes: #587

@jina-bot jina-bot added size/M area/core This issue/PR affects the core codebase area/docs This issue/PR affects the docs area/testing This issue/PR affects testing component/executor executor/indexer labels Jun 28, 2020
@hanxiao hanxiao added this to the v0.4.0 New Features milestone Jun 28, 2020
@hanxiao hanxiao linked an issue Jun 28, 2020 that may be closed by this pull request
@hanxiao hanxiao requested review from nan-wang and fhaase2 June 28, 2020 19:18
@hanxiao hanxiao merged commit 4fdb44c into master Jun 28, 2020
@hanxiao hanxiao deleted the feat-wrap-587 branch June 28, 2020 19:38
@github-actions github-actions bot locked and limited conversation to collaborators Jun 28, 2020
Copy link
Member

@nan-wang nan-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM👍

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/core This issue/PR affects the core codebase area/docs This issue/PR affects the docs area/testing This issue/PR affects testing component/executor executor/indexer size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

change indexer at query time
3 participants