Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support kNN vectors in disk usage action #88785

Merged
merged 18 commits into from Jul 26, 2022

Conversation

jtibshirani
Copy link
Contributor

@jtibshirani jtibshirani commented Jul 25, 2022

This change adds support for kNN vector fields to the _disk_usage API. The
strategy:

  • Iterate the vector values (using the same strategy as for doc values) to
    estimate the vector data size
  • Run some random vector searches to estimate the vector index size

Co-authored-by: Yannick Welsch yannick@welsch.lu

Closes #84801

@jtibshirani jtibshirani added >enhancement :Search/Search Search-related issues that do not fall into other categories v8.4.0 labels Jul 25, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Jul 25, 2022
@elasticsearchmachine
Copy link
Collaborator

Hi @jtibshirani, I've created a changelog YAML for you.

@jtibshirani
Copy link
Contributor Author

I tested this tool on the dense_vector rally track, with 1 million dense vectors indexed across 2 shards. The size estimate was quite close to the actual size of the index files:

disk usage action: 927MB
| vectors vector total | | 927.298 | MB |

index files: 929MB

275M	0/index/_8_lucene92HnswVectorsFormat_0.vec
96K	0/index/_8_lucene92HnswVectorsFormat_0.vem
189M	0/index/_8_lucene92HnswVectorsFormat_0.vex

275M	1/index/_6_lucene92HnswVectorsFormat_0.vec
96K	1/index/_6_lucene92HnswVectorsFormat_0.vem
190M	1/index/_6_lucene92HnswVectorsFormat_0.vex

@elasticsearchmachine
Copy link
Collaborator

Hi @jtibshirani, I've updated the changelog YAML for you.

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @jtibshirani.

@jtibshirani
Copy link
Contributor Author

Thanks @dnhatn for the review! I plan to merge tomorrow (unless there are other comments) so that we're sure to make 8.4.

Copy link
Contributor

@mayya-sharipova mayya-sharipova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtibshirani Thanks for making progress on this. LGTM as well!

@jtibshirani jtibshirani merged commit abd561a into elastic:main Jul 26, 2022
@jtibshirani jtibshirani deleted the vector-disk-usage branch July 26, 2022 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team v8.4.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for vectors to the disk usage API
6 participants