Add some initial docs on the new settings #3847

thecoop · 2025-11-07T11:10:14Z

Add docs for bfloat16 and on_disk_rescore for changes in elastic/elasticsearch#138492

github-actions · 2025-11-07T11:13:00Z

🔍 Preview links for changed docs

github-actions · 2025-11-24T12:08:11Z

Vale Linting Results

Summary: 3 warnings, 12 suggestions found

⚠️ Warnings (3)

File	Line	Rule	Message
deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md	131	Elastic.DontUse	Don't use 'very'.
deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md	133	Elastic.DontUse	Don't use 'Note that'.
solutions/search/vector/knn.md	1247	Elastic.DontUse	Don't use 'Note that'.

💡 Suggestions (12)

File	Line	Rule	Message
deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md	131	Elastic.WordChoice	Consider using 'can, might' instead of 'may', unless the term is in the UI.
deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md	131	Elastic.WordChoice	Consider using 'refer to (if it's a document), view (if it's a UI element)' instead of 'see', unless the term is in the UI.
deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md	131	Elastic.Acronyms	'HNSW' has no definition.
deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md	133	Elastic.FutureTense	'will need' might be in future tense. Write in the present tense to describe the state of the product as it is now.
solutions/search/vector/knn.md	328	Elastic.Capitalization	'BFloat16 vector encoding [knn-search-bfloat16]' should use sentence-style capitalization.
solutions/search/vector/knn.md	333	Elastic.FutureTense	'will automatically' might be in future tense. Write in the present tense to describe the state of the product as it is now.
solutions/search/vector/knn.md	335	Elastic.WordChoice	Consider using 'can, might' instead of 'may', unless the term is in the UI.
solutions/search/vector/knn.md	1245	Elastic.FutureTense	'will read' might be in future tense. Write in the present tense to describe the state of the product as it is now.
solutions/search/vector/knn.md	1245	Elastic.WordChoice	Consider using 'can, might' instead of 'may', unless the term is in the UI.
solutions/search/vector/knn.md	1245	Elastic.FutureTense	'will read' might be in future tense. Write in the present tense to describe the state of the product as it is now.
solutions/search/vector/knn.md	1247	Elastic.FutureTense	'will only' might be in future tense. Write in the present tense to describe the state of the product as it is now.
solutions/search/vector/knn.md	1247	Elastic.Semicolons	Use semicolons judiciously.

shainaraskas · 2025-11-24T19:34:52Z

taking a look at this but poked quickly into elastic/elasticsearch#138492

your doc change here makes it seem like bfloat16 has always been an option even though you mention it only went ga in 9.3. those lists need to be refactored so it's clear when bfloat16 became available. let me know if you need a hand with that.

shainaraskas

generally looks good. provided some recommended style edits for you.

shainaraskas · 2025-11-24T19:38:44Z

deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md



-## Use Direct IO when the vector data does not fit in RAM
+## Use on-disk rescoring when the vector data does not fit in RAM


is the old direct IO guidance still valid in 9.1? did your team plan to not move forward with it (i.e. it is going from preview to removed)? Wonder if we need to keep it visible, but if it is going from preview to removed, your approach of just editing it out is ok.

The guidance is the same, but how you use it has changed. The JVM option has been replaced with an index setting

shainaraskas · 2025-11-24T19:39:09Z

deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md

 serverless: unavailable
 ```
-If your indices are of type `bbq_hnsw` and your nodes don't have enough off-heap RAM to store all vector data in memory, you may see very high query latencies. Vector data includes the HNSW graph, quantized vectors, and raw float32 vectors.
+If you use quantized indices and your nodes don't have enough off-heap RAM to store all vector data in memory, you may see very high query latencies. Vector data includes the HNSW graph, quantized vectors, and raw float vectors.


Suggested change

If you use quantized indices and your nodes don't have enough off-heap RAM to store all vector data in memory, you may see very high query latencies. Vector data includes the HNSW graph, quantized vectors, and raw float vectors.

If you use quantized indices and your nodes don't have enough off-heap RAM to store all vector data in memory, then you might experience high query latencies. Vector data includes the HNSW graph, quantized vectors, and raw float vectors.

shainaraskas · 2025-11-24T19:39:49Z

deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md

-In these scenarios, direct IO can significantly reduce query latency. Enable it by setting the JVM option `vector.rescoring.directio=true` on all vector search nodes in your cluster.
-
-Only use this option if you're experiencing very high query latencies on indices of type `bbq_hnsw`. Otherwise, enabling direct IO may increase your query latencies.
+In these scenarios, on-disk rescoring can significantly reduce query latency. Enable it by setting the `on_disk_rescore: true` option on your vector indices. Note that your data will need to be re-indexed or force-merged to use the new setting in subsequent searches.


Suggested change

In these scenarios, on-disk rescoring can significantly reduce query latency. Enable it by setting the `on_disk_rescore: true` option on your vector indices. Note that your data will need to be re-indexed or force-merged to use the new setting in subsequent searches.

In these scenarios, on-disk rescoring can significantly reduce query latency. Enable it by setting the `on_disk_rescore: true` option on your vector indices. Your data must be re-indexed or force-merged to use the new setting in subsequent searches.

shainaraskas · 2025-11-24T19:41:42Z

solutions/search/vector/knn.md

+{applies_to}
+stack: ga 9.3
+```
+Instead of storing raw vectors as 4-byte values, you can use `element_type: bfloat16` to store each dimension as a 2-byte value. This can be useful if your indexed vectors are at bfloat16 precision already, or if you wish to reduce the disk space required to store vector data. Elasticsearch will automatically truncate 4-byte float values to 2-byte bfloat16 values when indexing vectors.


Suggested change

Instead of storing raw vectors as 4-byte values, you can use `element_type: bfloat16` to store each dimension as a 2-byte value. This can be useful if your indexed vectors are at bfloat16 precision already, or if you wish to reduce the disk space required to store vector data. Elasticsearch will automatically truncate 4-byte float values to 2-byte bfloat16 values when indexing vectors.

Instead of storing raw vectors as 4-byte values, you can use `element_type: bfloat16` to store each dimension as a 2-byte value. This can be useful if your indexed vectors are at bfloat16 precision already, or if you want to reduce the disk space required to store vector data. When this element type is used, {{es}} automatically truncates 4-byte float values to 2-byte bfloat16 values when indexing vectors.

shainaraskas · 2025-11-24T19:41:51Z

solutions/search/vector/knn.md

+```
+Instead of storing raw vectors as 4-byte values, you can use `element_type: bfloat16` to store each dimension as a 2-byte value. This can be useful if your indexed vectors are at bfloat16 precision already, or if you wish to reduce the disk space required to store vector data. Elasticsearch will automatically truncate 4-byte float values to 2-byte bfloat16 values when indexing vectors.
+
+Due to the reduced precision of bfloat16, any vectors retrieved from the index may have slightly different values to those originally indexed.


Suggested change

Due to the reduced precision of bfloat16, any vectors retrieved from the index may have slightly different values to those originally indexed.

Due to the reduced precision of bfloat16, any vectors retrieved from the index might have slightly different values to those originally indexed.

shainaraskas · 2025-11-24T19:43:16Z

solutions/search/vector/knn.md

+serverless: unavailable
+```
+
+By default, Elasticsearch will read raw vector data into memory to perform rescoring. This may have an effect on performance if the vector data is too large to all fit in off-heap memory at once. By specifying the `on_disk_rescore: true` index setting, Elasticsearch will read vector data from disk directly during rescoring.


Suggested change

By default, Elasticsearch will read raw vector data into memory to perform rescoring. This may have an effect on performance if the vector data is too large to all fit in off-heap memory at once. By specifying the `on_disk_rescore: true` index setting, Elasticsearch will read vector data from disk directly during rescoring.

By default, {{es}} reads raw vector data into memory to perform rescoring. This can have an effect on performance if the vector data is too large to all fit in off-heap memory at once. When the `on_disk_rescore: true` index setting is set, {{es}} reads vector data directly from disk during rescoring.

shainaraskas · 2025-11-24T19:43:37Z

solutions/search/vector/knn.md

+
+By default, Elasticsearch will read raw vector data into memory to perform rescoring. This may have an effect on performance if the vector data is too large to all fit in off-heap memory at once. By specifying the `on_disk_rescore: true` index setting, Elasticsearch will read vector data from disk directly during rescoring.
+
+Note that this setting will only apply to newly indexed vectors; to apply the option to all vectors in the index, the vectors must be re-indexed or force-merged after changing the setting.


Suggested change

Note that this setting will only apply to newly indexed vectors; to apply the option to all vectors in the index, the vectors must be re-indexed or force-merged after changing the setting.

This setting only applies to newly indexed vectors. To apply the option to all vectors in the index, the vectors must be re-indexed or force-merged after changing the setting.

thecoop · 2025-11-25T13:27:15Z

@shainaraskas I'm not sure the best way to handle the docs in elastic/elasticsearch#138492 to make it clear bfloat16 is in 9.3+, other than specifying and, from 9.3, bfloat16 in every single reference to it, which seems overkill.

github-actions bot deployed to docs-preview November 7, 2025 11:10 View deployment

Add some docs on the new dense_vector settings

ea5d10d

thecoop force-pushed the new-vector-options-docs branch from 7f600ab to ea5d10d Compare November 24, 2025 12:07

thecoop marked this pull request as ready for review November 24, 2025 12:07

thecoop requested review from a team as code owners November 24, 2025 12:07

github-actions bot deployed to docs-preview November 24, 2025 12:08 View deployment

thecoop mentioned this pull request Nov 24, 2025

Enable bfloat16 and on-disk rescoring for dense vectors elastic/elasticsearch#138492

Open

shainaraskas approved these changes Nov 24, 2025

View reviewed changes



		## Use Direct IO when the vector data does not fit in RAM
		## Use on-disk rescoring when the vector data does not fit in RAM

	If you use quantized indices and your nodes don't have enough off-heap RAM to store all vector data in memory, you may see very high query latencies. Vector data includes the HNSW graph, quantized vectors, and raw float vectors.
	If you use quantized indices and your nodes don't have enough off-heap RAM to store all vector data in memory, then you might experience high query latencies. Vector data includes the HNSW graph, quantized vectors, and raw float vectors.

	In these scenarios, on-disk rescoring can significantly reduce query latency. Enable it by setting the `on_disk_rescore: true` option on your vector indices. Note that your data will need to be re-indexed or force-merged to use the new setting in subsequent searches.
	In these scenarios, on-disk rescoring can significantly reduce query latency. Enable it by setting the `on_disk_rescore: true` option on your vector indices. Your data must be re-indexed or force-merged to use the new setting in subsequent searches.

	Due to the reduced precision of bfloat16, any vectors retrieved from the index may have slightly different values to those originally indexed.
	Due to the reduced precision of bfloat16, any vectors retrieved from the index might have slightly different values to those originally indexed.

	By default, Elasticsearch will read raw vector data into memory to perform rescoring. This may have an effect on performance if the vector data is too large to all fit in off-heap memory at once. By specifying the `on_disk_rescore: true` index setting, Elasticsearch will read vector data from disk directly during rescoring.
	By default, {{es}} reads raw vector data into memory to perform rescoring. This can have an effect on performance if the vector data is too large to all fit in off-heap memory at once. When the `on_disk_rescore: true` index setting is set, {{es}} reads vector data directly from disk during rescoring.


		By default, Elasticsearch will read raw vector data into memory to perform rescoring. This may have an effect on performance if the vector data is too large to all fit in off-heap memory at once. By specifying the `on_disk_rescore: true` index setting, Elasticsearch will read vector data from disk directly during rescoring.

		Note that this setting will only apply to newly indexed vectors; to apply the option to all vectors in the index, the vectors must be re-indexed or force-merged after changing the setting.

	Note that this setting will only apply to newly indexed vectors; to apply the option to all vectors in the index, the vectors must be re-indexed or force-merged after changing the setting.
	This setting only applies to newly indexed vectors. To apply the option to all vectors in the index, the vectors must be re-indexed or force-merged after changing the setting.

Add some initial docs on the new settings #3847

Are you sure you want to change the base?

Add some initial docs on the new settings #3847

Conversation

thecoop commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔍 Preview links for changed docs

Uh oh!

github-actions bot commented Nov 24, 2025

Vale Linting Results

Uh oh!

shainaraskas commented Nov 24, 2025

Uh oh!

shainaraskas left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thecoop commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

thecoop commented Nov 7, 2025 •

edited

Loading

github-actions bot commented Nov 7, 2025 •

edited

Loading