From bb909bc0e00dc6d420203fc3e208faa3f04b0faa Mon Sep 17 00:00:00 2001 From: kosabogi <105062005+kosabogi@users.noreply.github.com> Date: Wed, 13 Nov 2024 14:14:56 +0100 Subject: [PATCH] Updates chunk settings documentation (#116719) (cherry picked from commit bada2a60ed8561d80cdfd61b28883b6a7002b023) --- docs/reference/mapping/types/semantic-text.asciidoc | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/docs/reference/mapping/types/semantic-text.asciidoc b/docs/reference/mapping/types/semantic-text.asciidoc index ac23c153e01a3..684ad7c369e7d 100644 --- a/docs/reference/mapping/types/semantic-text.asciidoc +++ b/docs/reference/mapping/types/semantic-text.asciidoc @@ -87,7 +87,7 @@ Trying to <> that is used on a [discrete] [[auto-text-chunking]] -==== Automatic text chunking +==== Text chunking {infer-cap} endpoints have a limit on the amount of text they can process. To allow for large amounts of text to be used in semantic search, `semantic_text` automatically generates smaller passages if needed, called _chunks_. @@ -95,8 +95,7 @@ To allow for large amounts of text to be used in semantic search, `semantic_text Each chunk will include the text subpassage and the corresponding embedding generated from it. When querying, the individual passages will be automatically searched for each document, and the most relevant passage will be used to compute a score. -Documents are split into 250-word sections with a 100-word overlap so that each section shares 100 words with the previous section. -This overlap ensures continuity and prevents vital contextual information in the input text from being lost by a hard break. +For more details on chunking and how to configure chunking settings, see <> in the Inference API documentation. [discrete]