Skip to content

Conversation

@mayya-sharipova
Copy link
Contributor

  • For flush, vectors are now reordered according to sortMap before building the GPU index, ensuring that HNSW graph node ordinals match the sorted document order.
  • Merge on the other hand doesn't require explicit sortMap handling since Lucene's MergedVecto utilities apply docMaps internally.
  • Enhanced tests with both approximate and exact KNN searches to validate sorting correctness.

- For flush, vectors are now reordered according to sortMap before building the GPU index,
 ensuring that HNSW graph node ordinals match the sorted document order.
 - Merge on the other hand doesn't require explicit sortMap handling since Lucene's MergedVecto
 utilities apply docMaps internally.
 - Enhanced tests with both approximate and exact KNN searches to validate sorting correctness.
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Nov 16, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @mayya-sharipova, I've created a changelog YAML for you.

Copy link
Contributor

@ldematte ldematte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks good to me; I'm not a Lucene expert so I cannot say if it's the right way to do it so I'll trust you/Chris on this.
You probably want to merge in changes from #138155 and add the test-gpu flag to be sure tests pass/are OK.

Copy link
Contributor

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good to me. I guess I'm surprised to not see some changes to ES92GpuHnswVectorsFormatTests! Is there no sorting tests there?

@mayya-sharipova
Copy link
Contributor Author

The changes look good to me. I guess I'm surprised to not see some changes to ES92GpuHnswVectorsFormatTests! Is there no sorting tests there?

These tests use tests inside BaseKnnVectorsFormatTestCase, such as testSortedIndex but testSortedIndex doesn't exercise knn search that use graphs, only goes through vector values.

@ChrisHegarty Do you suggest we need to add tests into ES92GpuHnswVectorsFormatTests, I can do that

@ChrisHegarty
Copy link
Contributor

First, I think that this PR is good to be merged as-is.

These tests use tests inside BaseKnnVectorsFormatTestCase, such as testSortedIndex but testSortedIndex doesn't exercise knn search that use graphs, only goes through vector values.

Right. It surprises me that sorting was completely unimplemented, and that no scenarios in BaseKnnVectorsFormatTestCase needed to be bypassed (and then re-enabled) or something. This seems like a gap in BaseKnnVectorsFormatTestCase, no ?

@mayya-sharipova
Copy link
Contributor Author

Thanks Chris, I will merge this PR and look into adding more sorted index test into BaseKnnVectorsFormatTestCase

@mayya-sharipova mayya-sharipova added the auto-backport Automatically create backport pull requests when merged label Nov 18, 2025
@mayya-sharipova mayya-sharipova merged commit 6e34147 into elastic:main Nov 18, 2025
34 checks passed
@mayya-sharipova mayya-sharipova deleted the gpu-sort-map branch November 18, 2025 13:46
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
9.2

mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this pull request Nov 18, 2025
- For flush, vectors are now reordered according to sortMap before building the GPU index,
 ensuring that HNSW graph node ordinals match the sorted document order.
 - Merge on the other hand doesn't require explicit sortMap handling since Lucene's MergedVecto
 utilities apply docMaps internally.
 - Enhanced tests with both approximate and exact KNN searches to validate sorting correctness.
elasticsearchmachine pushed a commit that referenced this pull request Nov 19, 2025
- For flush, vectors are now reordered according to sortMap before building the GPU index,
 ensuring that HNSW graph node ordinals match the sorted document order.
 - Merge on the other hand doesn't require explicit sortMap handling since Lucene's MergedVecto
 utilities apply docMaps internally.
 - Enhanced tests with both approximate and exact KNN searches to validate sorting correctness.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged >bug :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.2.2 v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants