Enhancing Elasticsearch vector store implementation #592

l-trotta · 2024-04-16T13:17:36Z

This PR provides a more performant search function for the Elasticsearch vector store and removes the bean autoconfiguration.

In depth

Autoconfiguration

The current implementation of afterPropertiesSet() automatically creates a new index with a set of properties that only works with OpenAI, or any other model that works with vectors with a dimension of 1536; users adopting other models would currently have to manually delete the index. By default, Elasticsearch automatically creates the correct index settings when it receives the first PUT request for vectors, so in our opinion there's no need for such autoconfiguration.

Search

The function used now is script_score, which can be slow for large data samples since it does a brute force comparison with all vectors using the similarity function. This PR replaces it with the approximate knn search, more performant because it only scans the closest neighbours. The similarity functions available for knn can be easily configured in the index mapping by setting the correct name.

We would also like to contribute to the documentation, should that be done in a different PR?

Thank you @JM-Lab for the original implementation, would you like to review these changes?

(Disclosure: I work for Elastic)

JM-Lab · 2024-04-17T15:38:25Z

Hi @l-trotta,

Thanks for the update. I liked how script_score was quick to implement due to available examples, but I'm pleased we're switching to KNN for better performance.

I find the automatic creation of index mappings convenient; however, for my projects, I typically need to define these mappings in advance. Could we consider adding an option to provide index mappings directly in the constructor?

Additionally, could you check if the code at ElasticsearchAiSearchFilterExpressionConverter.java needs any improvements?

I'm also looking forward to the official documentation for the Elasticsearch vector store of the Spring AI project, which I think @tzolov will address.

l-trotta · 2024-04-22T16:44:01Z

Hi @JM-Lab,

Since the index mapping can be added with createIndexMapping(), and in most cases the autoconfiguration is enough, we chose not to add the mapping to the constructor to make it easier to configure it at the start.

The filter converter looked fine to me!
Thanks again for your work.

tzolov · 2024-04-30T07:02:35Z

@l-trotta , thank you for the improvements.
Could you please rebase and update your PR after the #633 merge.

l-trotta · 2024-05-06T15:31:19Z

Updated! I'd like to explain why I removed the dense-vector-indexing property: quoting the documentation,

If true, you can search this field using the kNN search API. Defaults to true.

And since this implementation uses kNN search, setting this to false would make the data unsearchable.

tzolov added enhancement New feature or request vectors store labels Apr 20, 2024

tzolov self-assigned this Apr 26, 2024

tzolov added this to the 1.0.0-M1 milestone Apr 26, 2024

l-trotta added 8 commits May 6, 2024 17:17

knn instead of script_score, removed initialization

a79bae4

only using normalized similarities, adjusted unit test

b444358

import clean

f61f377

making l2norm's distances consistent with others

4bae295

refactor unit test

8d10c45

rebase

27100c8

format

ea47f84

dependency version, docs

fe12aae

l-trotta force-pushed the main branch from bf71a49 to fe12aae Compare May 6, 2024 15:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhancing Elasticsearch vector store implementation #592

Enhancing Elasticsearch vector store implementation #592

l-trotta commented Apr 16, 2024

JM-Lab commented Apr 17, 2024 •

edited

l-trotta commented Apr 22, 2024

tzolov commented Apr 30, 2024

l-trotta commented May 6, 2024

Enhancing Elasticsearch vector store implementation #592

Are you sure you want to change the base?

Enhancing Elasticsearch vector store implementation #592

Conversation

l-trotta commented Apr 16, 2024

In depth

Autoconfiguration

Search

JM-Lab commented Apr 17, 2024 • edited

l-trotta commented Apr 22, 2024

tzolov commented Apr 30, 2024

l-trotta commented May 6, 2024

JM-Lab commented Apr 17, 2024 •

edited