Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HNSW Support for Vector Search #1415

Closed
loretoparisi opened this issue Sep 2, 2023 · 3 comments
Closed

HNSW Support for Vector Search #1415

loretoparisi opened this issue Sep 2, 2023 · 3 comments

Comments

@loretoparisi
Copy link

Is your feature request related to a problem? Please describe.
Indexing vectors of embeddings along with the document. Optionally supporto multi-vector per document and retrieval.

Describe the solution you'd like
Add HNSW as vector similarity search

Describe alternatives you've considered

  • OpenSearch >= 8
  • Vespa
  • Vector Stores (Pinecone, ChromaDB, etc.)

Additional context
Semantic and Similarity Search integration to keyword based search.

@sanikolaev
Copy link
Collaborator

The following SQL syntax is proposed for the new field:

<field name> 
  float_vector 
    [knn_type='hnsw'
      knn_dims='int'
      knn_similarity={l2|ip|cosine}
      [hnsw_m='int']
      [hnsw_ef_construction='int']
    ]
  • knn_type is not mandatory. If no knn* is specified, the field remains just an array of floats
  • knn_type gets turns on automatically if knn_similarity or knn_dims is specified. The default is hnsw.
  • knn_dims and knn_similarity are required if knn_type='hnsw'
  • hnsw_m and hnsw_ef_construction are optional

Examples:

  • create table t(a float_vector) - just an array of floats
  • create table t(a float_vector knn_dims='128' knn_similarity='l2') - simplest syntax to enable knn
  • create table t(a float_vector knn_type='hnsw' knn_dims='128' knn_similarity='l2') - alternative syntax mostly for the future when knn_type can be e.g. annoy
  • create table t(a float_vector knn_type='hnsw' knn_dims='16' knn_similarity='ip' hnsw_m='16') - fine-tuning
  • create table t(a float_vector knn_type='hnsw' knn_dims='16' knn_similarity='ip' hnsw_m='20' hnsw_ef_construction='90') - more fine-tuning

@glookka pls review and let me know if it looks good or if I'm missing something and there are better options.

@glookka
Copy link
Contributor

glookka commented Oct 26, 2023

knn_similarity={l2|ip|cosine} option is specific to HNSW. E.g. annoy has "angular", "euclidean", "manhattan", "hamming", or "dot". So it probably makes sense to name the option hnsw_similarity.

@sanikolaev
Copy link
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants