Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend elastic_vector_search.py to allow for kNN indexing/searching #5346

Closed
jeffvestal opened this issue May 27, 2023 · 4 comments
Closed
Assignees

Comments

@jeffvestal
Copy link
Contributor

Feature request

Extend langchain/embeddings/elasticsearch.py to support kNN indexing and searching.
The high-level objectives will be:

  1. Allow for the creation of an index with the correct mapping to store documents including dense_vectors so they can be used for kNN search
  2. Store embeddings in elasticsearch in dense_vector field type
  3. Perform kNN search
  4. Perform Hybrid BM25 (query) + kNN search

Motivation

Elasticsearch support approximate k-nearest neighbor search with dense vectors. The current module only support script score / exact match vector search.

Your contribution

I will work on the code and create the pull request

@ManZzup
Copy link

ManZzup commented May 31, 2023

Had this need today and did a crude version of it, let me know if this make sense - I can open a PR with it
I also added batching for bulk add - it was failing for larger datasets

https://gist.github.com/ManZzup/1109d4c1f6b8bc48b60a67983dfbd0fd

@jeffvestal
Copy link
Contributor Author

I'm almost done with adding a new class ElasticKnnSearch
Working code is here if you want to take a look

I've been busy on a couple different projects but I'm hoping to have the pr sumbitted in the next day or two. Just debugging and cleaning up right now

It you do take a look feel free to comment here if you have any suggestions

@ManZzup
Copy link

ManZzup commented May 31, 2023

@jeffvestal this looks much better and structured! Will wait for this to merge to use this instead.

A couple of things,

  • Would it be good to let user define the boost? For my use case for instance 0.5 and 0.5 made more sense
  • I added a source filter to prevent the vector field from coming back in response. Since we are only using the text and metadata fields from the response, do we still need to retain the vector? (I saw you have fields commented, i think newer API supports the source field instead)

dev2049 added a commit that referenced this issue Jun 2, 2023
# Create elastic_vector_search.ElasticKnnSearch class

This extends `langchain/vectorstores/elastic_vector_search.py` by adding
a new class `ElasticKnnSearch`

Features:
- Allow creating an index with the `dense_vector` mapping compataible
with kNN search
- Store embeddings in index for use with kNN search (correct mapping
creates HNSW data structure)
- Perform approximate kNN search
- Perform hybrid BM25 (`query{}`) + kNN (`knn{}`) search
- perform knn search by either providing a `query_vector` or passing a
hosted `model_id` to use query_vector_builder to automatically generate
a query_vector at search time

Connection options
- Using `cloud_id` from Elastic Cloud
- Passing elasticsearch client object

search options
- query
- k
- query_vector
- model_id
- size
- source
- knn_boost (hybrid search)
- query_boost (hybrid search)
- fields


This also adds examples to
`docs/modules/indexes/vectorstores/examples/elasticsearch.ipynb`


Fixes # [5346](#5346)

cc: @dev2049

 -->

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
@jeffvestal
Copy link
Contributor Author

@ManZzup My PR was merged. I got most/all of the arg options you mentioned (I think)
I have a couple things I'm going to tweak so feel free to create an issue and tag me if you have anything you find would make the class more useable.

Undertone0809 pushed a commit to Undertone0809/langchain that referenced this issue Jun 19, 2023
# Create elastic_vector_search.ElasticKnnSearch class

This extends `langchain/vectorstores/elastic_vector_search.py` by adding
a new class `ElasticKnnSearch`

Features:
- Allow creating an index with the `dense_vector` mapping compataible
with kNN search
- Store embeddings in index for use with kNN search (correct mapping
creates HNSW data structure)
- Perform approximate kNN search
- Perform hybrid BM25 (`query{}`) + kNN (`knn{}`) search
- perform knn search by either providing a `query_vector` or passing a
hosted `model_id` to use query_vector_builder to automatically generate
a query_vector at search time

Connection options
- Using `cloud_id` from Elastic Cloud
- Passing elasticsearch client object

search options
- query
- k
- query_vector
- model_id
- size
- source
- knn_boost (hybrid search)
- query_boost (hybrid search)
- fields


This also adds examples to
`docs/modules/indexes/vectorstores/examples/elasticsearch.ipynb`


Fixes # [5346](langchain-ai#5346)

cc: @dev2049

 -->

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants