Skip to content

Conversation

@thecoop
Copy link
Member

@thecoop thecoop commented Nov 7, 2025

Add docs for bfloat16 and on_disk_rescore for changes in elastic/elasticsearch#138492

@github-actions
Copy link

github-actions bot commented Nov 7, 2025

@thecoop thecoop force-pushed the new-vector-options-docs branch from 7f600ab to ea5d10d Compare November 24, 2025 12:07
@thecoop thecoop marked this pull request as ready for review November 24, 2025 12:07
@thecoop thecoop requested review from a team as code owners November 24, 2025 12:07
@github-actions
Copy link

Vale Linting Results

Summary: 3 warnings, 12 suggestions found

⚠️ Warnings (3)
File Line Rule Message
deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md 131 Elastic.DontUse Don't use 'very'.
deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md 133 Elastic.DontUse Don't use 'Note that'.
solutions/search/vector/knn.md 1247 Elastic.DontUse Don't use 'Note that'.
💡 Suggestions (12)
File Line Rule Message
deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md 131 Elastic.WordChoice Consider using 'can, might' instead of 'may', unless the term is in the UI.
deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md 131 Elastic.WordChoice Consider using 'refer to (if it's a document), view (if it's a UI element)' instead of 'see', unless the term is in the UI.
deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md 131 Elastic.Acronyms 'HNSW' has no definition.
deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md 133 Elastic.FutureTense 'will need' might be in future tense. Write in the present tense to describe the state of the product as it is now.
solutions/search/vector/knn.md 328 Elastic.Capitalization 'BFloat16 vector encoding [knn-search-bfloat16]' should use sentence-style capitalization.
solutions/search/vector/knn.md 333 Elastic.FutureTense 'will automatically' might be in future tense. Write in the present tense to describe the state of the product as it is now.
solutions/search/vector/knn.md 335 Elastic.WordChoice Consider using 'can, might' instead of 'may', unless the term is in the UI.
solutions/search/vector/knn.md 1245 Elastic.FutureTense 'will read' might be in future tense. Write in the present tense to describe the state of the product as it is now.
solutions/search/vector/knn.md 1245 Elastic.WordChoice Consider using 'can, might' instead of 'may', unless the term is in the UI.
solutions/search/vector/knn.md 1245 Elastic.FutureTense 'will read' might be in future tense. Write in the present tense to describe the state of the product as it is now.
solutions/search/vector/knn.md 1247 Elastic.FutureTense 'will only' might be in future tense. Write in the present tense to describe the state of the product as it is now.
solutions/search/vector/knn.md 1247 Elastic.Semicolons Use semicolons judiciously.

@shainaraskas
Copy link
Collaborator

taking a look at this but poked quickly into elastic/elasticsearch#138492

your doc change here makes it seem like bfloat16 has always been an option even though you mention it only went ga in 9.3. those lists need to be refactored so it's clear when bfloat16 became available. let me know if you need a hand with that.

Copy link
Collaborator

@shainaraskas shainaraskas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally looks good. provided some recommended style edits for you.



## Use Direct IO when the vector data does not fit in RAM
## Use on-disk rescoring when the vector data does not fit in RAM
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the old direct IO guidance still valid in 9.1? did your team plan to not move forward with it (i.e. it is going from preview to removed)? Wonder if we need to keep it visible, but if it is going from preview to removed, your approach of just editing it out is ok.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The guidance is the same, but how you use it has changed. The JVM option has been replaced with an index setting

serverless: unavailable
```
If your indices are of type `bbq_hnsw` and your nodes don't have enough off-heap RAM to store all vector data in memory, you may see very high query latencies. Vector data includes the HNSW graph, quantized vectors, and raw float32 vectors.
If you use quantized indices and your nodes don't have enough off-heap RAM to store all vector data in memory, you may see very high query latencies. Vector data includes the HNSW graph, quantized vectors, and raw float vectors.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If you use quantized indices and your nodes don't have enough off-heap RAM to store all vector data in memory, you may see very high query latencies. Vector data includes the HNSW graph, quantized vectors, and raw float vectors.
If you use quantized indices and your nodes don't have enough off-heap RAM to store all vector data in memory, then you might experience high query latencies. Vector data includes the HNSW graph, quantized vectors, and raw float vectors.

In these scenarios, direct IO can significantly reduce query latency. Enable it by setting the JVM option `vector.rescoring.directio=true` on all vector search nodes in your cluster.

Only use this option if you're experiencing very high query latencies on indices of type `bbq_hnsw`. Otherwise, enabling direct IO may increase your query latencies.
In these scenarios, on-disk rescoring can significantly reduce query latency. Enable it by setting the `on_disk_rescore: true` option on your vector indices. Note that your data will need to be re-indexed or force-merged to use the new setting in subsequent searches.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In these scenarios, on-disk rescoring can significantly reduce query latency. Enable it by setting the `on_disk_rescore: true` option on your vector indices. Note that your data will need to be re-indexed or force-merged to use the new setting in subsequent searches.
In these scenarios, on-disk rescoring can significantly reduce query latency. Enable it by setting the `on_disk_rescore: true` option on your vector indices. Your data must be re-indexed or force-merged to use the new setting in subsequent searches.

{applies_to}
stack: ga 9.3
```
Instead of storing raw vectors as 4-byte values, you can use `element_type: bfloat16` to store each dimension as a 2-byte value. This can be useful if your indexed vectors are at bfloat16 precision already, or if you wish to reduce the disk space required to store vector data. Elasticsearch will automatically truncate 4-byte float values to 2-byte bfloat16 values when indexing vectors.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Instead of storing raw vectors as 4-byte values, you can use `element_type: bfloat16` to store each dimension as a 2-byte value. This can be useful if your indexed vectors are at bfloat16 precision already, or if you wish to reduce the disk space required to store vector data. Elasticsearch will automatically truncate 4-byte float values to 2-byte bfloat16 values when indexing vectors.
Instead of storing raw vectors as 4-byte values, you can use `element_type: bfloat16` to store each dimension as a 2-byte value. This can be useful if your indexed vectors are at bfloat16 precision already, or if you want to reduce the disk space required to store vector data. When this element type is used, {{es}} automatically truncates 4-byte float values to 2-byte bfloat16 values when indexing vectors.

```
Instead of storing raw vectors as 4-byte values, you can use `element_type: bfloat16` to store each dimension as a 2-byte value. This can be useful if your indexed vectors are at bfloat16 precision already, or if you wish to reduce the disk space required to store vector data. Elasticsearch will automatically truncate 4-byte float values to 2-byte bfloat16 values when indexing vectors.

Due to the reduced precision of bfloat16, any vectors retrieved from the index may have slightly different values to those originally indexed.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Due to the reduced precision of bfloat16, any vectors retrieved from the index may have slightly different values to those originally indexed.
Due to the reduced precision of bfloat16, any vectors retrieved from the index might have slightly different values to those originally indexed.

serverless: unavailable
```

By default, Elasticsearch will read raw vector data into memory to perform rescoring. This may have an effect on performance if the vector data is too large to all fit in off-heap memory at once. By specifying the `on_disk_rescore: true` index setting, Elasticsearch will read vector data from disk directly during rescoring.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
By default, Elasticsearch will read raw vector data into memory to perform rescoring. This may have an effect on performance if the vector data is too large to all fit in off-heap memory at once. By specifying the `on_disk_rescore: true` index setting, Elasticsearch will read vector data from disk directly during rescoring.
By default, {{es}} reads raw vector data into memory to perform rescoring. This can have an effect on performance if the vector data is too large to all fit in off-heap memory at once. When the `on_disk_rescore: true` index setting is set, {{es}} reads vector data directly from disk during rescoring.


By default, Elasticsearch will read raw vector data into memory to perform rescoring. This may have an effect on performance if the vector data is too large to all fit in off-heap memory at once. By specifying the `on_disk_rescore: true` index setting, Elasticsearch will read vector data from disk directly during rescoring.

Note that this setting will only apply to newly indexed vectors; to apply the option to all vectors in the index, the vectors must be re-indexed or force-merged after changing the setting.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Note that this setting will only apply to newly indexed vectors; to apply the option to all vectors in the index, the vectors must be re-indexed or force-merged after changing the setting.
This setting only applies to newly indexed vectors. To apply the option to all vectors in the index, the vectors must be re-indexed or force-merged after changing the setting.

@thecoop
Copy link
Member Author

thecoop commented Nov 25, 2025

@shainaraskas I'm not sure the best way to handle the docs in elastic/elasticsearch#138492 to make it clear bfloat16 is in 9.3+, other than specifying and, from 9.3, bfloat16 in every single reference to it, which seems overkill.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants