Skip to content

ANN search performance with multiple segments #90700

@jtibshirani

Description

@jtibshirani

To support ANN search, Elasticsearch indexes the vector values of each segment as an HNSW graph. Usually an index has multiple segments, which means ANN must check several HNSW graphs as it searches one segment after another. With fewer segments, the ANN search is more efficient, since we check a smaller number of graphs. Another way to describe it: searching an HNSW graph is sub-linear in the number of vectors, so it's better to search larger graphs vs. several smaller ones.

It's certainly possible to force merge the index to one segment, but force merge can take substantial time and has some downsides (explained here).

This meta issue explores ideas to improve performance that don't require force merge. These are early ideas and may change.

  • Improvements to Elasticsearch's flush logic
    • Fix the surprising behavior in Elasticsearch flush where we sometimes create tiny segments (#90776)
    • Enhance IndexMemoryController to no longer flush all pending segments when hitting the indices.memory.index_buffer_size threshold, and instead only flush the largest pending segment (#34553)
    • When uploading a large set of vectors in bulk, the segment size is usually limited by index.translog.flush_threshold_size (512MB by default). Are there ways to avoid bumping up against this limit?
  • Enable merge-on-refresh to avoid publishing tiny segments in point-in-time views of shards.
  • Consider a dedicated merge policy optimized for vector search that encourages larger segments.
    • Decrease index.merge.policy.segments_per_tier to encourage the merge policy to merge segments more aggressively.
    • Increase index.merge.policy.max_merged_segment to not stop merging segments at 5GB.
    • Increase index.merge.policy.floor_segment to further bias the merge policy against small segments.
  • Improve search speed in presence of multiple segments.
    • Maybe we don't need to gather the full num_candidates from each segment?
  • Perhaps we could have a special "bulk vector upload" operation that does not allow concurrent searches, and makes sure to create a single graph by tweaking settings under-the-hood.

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Distributed Indexing/EngineAnything around managing Lucene and the Translog in an open shard.:Search Relevance/VectorsVector searchMetaTeam:Distributed (Obsolete)Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.Team:Search RelevanceMeta label for the Search Relevance team in Elasticsearch

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions