Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancing Elasticsearch vector store implementation #592

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

l-trotta
Copy link

This PR provides a more performant search function for the Elasticsearch vector store and removes the bean autoconfiguration.

In depth

Autoconfiguration

The current implementation of afterPropertiesSet() automatically creates a new index with a set of properties that only works with OpenAI, or any other model that works with vectors with a dimension of 1536; users adopting other models would currently have to manually delete the index. By default, Elasticsearch automatically creates the correct index settings when it receives the first PUT request for vectors, so in our opinion there's no need for such autoconfiguration.

Search

The function used now is script_score, which can be slow for large data samples since it does a brute force comparison with all vectors using the similarity function. This PR replaces it with the approximate knn search, more performant because it only scans the closest neighbours. The similarity functions available for knn can be easily configured in the index mapping by setting the correct name.

We would also like to contribute to the documentation, should that be done in a different PR?

Thank you @JM-Lab for the original implementation, would you like to review these changes?

(Disclosure: I work for Elastic)

@JM-Lab
Copy link
Contributor

JM-Lab commented Apr 17, 2024

Hi @l-trotta,

Thanks for the update. I liked how script_score was quick to implement due to available examples, but I'm pleased we're switching to KNN for better performance.

I find the automatic creation of index mappings convenient; however, for my projects, I typically need to define these mappings in advance. Could we consider adding an option to provide index mappings directly in the constructor?

Additionally, could you check if the code at ElasticsearchAiSearchFilterExpressionConverter.java needs any improvements?

I'm also looking forward to the official documentation for the Elasticsearch vector store of the Spring AI project, which I think @tzolov will address.

@tzolov tzolov added enhancement New feature or request vectors store labels Apr 20, 2024
@l-trotta
Copy link
Author

Hi @JM-Lab,

Since the index mapping can be added with createIndexMapping(), and in most cases the autoconfiguration is enough, we chose not to add the mapping to the constructor to make it easier to configure it at the start.

The filter converter looked fine to me!
Thanks again for your work.

@tzolov tzolov self-assigned this Apr 26, 2024
@tzolov tzolov added this to the 1.0.0-M1 milestone Apr 26, 2024
@tzolov
Copy link
Collaborator

tzolov commented Apr 30, 2024

@l-trotta , thank you for the improvements.
Could you please rebase and update your PR after the #633 merge.

@l-trotta
Copy link
Author

l-trotta commented May 6, 2024

Updated! I'd like to explain why I removed the dense-vector-indexing property: quoting the documentation,

If true, you can search this field using the kNN search API. Defaults to true.

And since this implementation uses kNN search, setting this to false would make the data unsearchable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request vectors store
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants