diff --git a/docs/reference/inference/inference-apis.asciidoc b/docs/reference/inference/inference-apis.asciidoc index 1206cb02ba89a..38afc7c416f18 100644 --- a/docs/reference/inference/inference-apis.asciidoc +++ b/docs/reference/inference/inference-apis.asciidoc @@ -35,7 +35,6 @@ Elastic –, then create an {infer} endpoint by the <>. Now use <> to perform <> on your data. - [discrete] [[default-enpoints]] === Default {infer} endpoints @@ -53,6 +52,67 @@ For these models, the minimum number of allocations is `0`. If there is no {infer} activity that uses the endpoint, the number of allocations will scale down to `0` automatically after 15 minutes. +[discrete] +[[infer-chunking-config]] +=== Configuring chunking + +{infer-cap} endpoints have a limit on the amount of text they can process at once, determined by the model's input capacity. +Chunking is the process of splitting the input text into pieces that remain within these limits. +It occurs when ingesting documents into <>. +Chunking also helps produce sections that are digestible for humans. +Returning a long document in search results is less useful than providing the most relevant chunk of text. + +Each chunk will include the text subpassage and the corresponding embedding generated from it. + +By default, documents are split into sentences and grouped in sections up to 250 words with 1 sentence overlap so that each chunk shares a sentence with the previous chunk. +Overlapping ensures continuity and prevents vital contextual information in the input text from being lost by a hard break. + +{es} uses the https://unicode-org.github.io/icu-docs/[ICU4J] library to detect word and sentence boundaries for chunking. +https://unicode-org.github.io/icu/userguide/boundaryanalysis/#word-boundary[Word boundaries] are identified by following a series of rules, not just the presence of a whitespace character. +For written languages that do use whitespace such as Chinese or Japanese dictionary lookups are used to detect word boundaries. + + +[discrete] +==== Chunking strategies + +Two strategies are available for chunking: `sentence` and `word`. + +The `sentence` strategy splits the input text at sentence boundaries. +Each chunk contains one or more complete sentences ensuring that the integrity of sentence-level context is preserved, except when a sentence causes a chunk to exceed a word count of `max_chunk_size`, in which case it will be split across chunks. +The `sentence_overlap` option defines the number of sentences from the previous chunk to include in the current chunk which is either `0` or `1`. + +The `word` strategy splits the input text on individual words up to the `max_chunk_size` limit. +The `overlap` option is the number of words from the previous chunk to include in the current chunk. + +The default chunking strategy is `sentence`. + +NOTE: The default chunking strategy for {infer} endpoints created before 8.16 is `word`. + + +[discrete] +==== Example of configuring the chunking behavior + +The following example creates an {infer} endpoint with the `elasticsearch` service that deploys the ELSER model by default and configures the chunking behavior. + +[source,console] +------------------------------------------------------------ +PUT _inference/sparse_embedding/small_chunk_size +{ + "service": "elasticsearch", + "service_settings": { + "num_allocations": 1, + "num_threads": 1 + }, + "chunking_settings": { + "strategy": "sentence", + "max_chunk_size": 100, + "sentence_overlap": 0 + } +} +------------------------------------------------------------ +// TEST[skip:TBD] + + include::delete-inference.asciidoc[] include::get-inference.asciidoc[] include::post-inference.asciidoc[] diff --git a/docs/reference/inference/inference-shared.asciidoc b/docs/reference/inference/inference-shared.asciidoc index 2eafa3434e89e..da497c6581e5d 100644 --- a/docs/reference/inference/inference-shared.asciidoc +++ b/docs/reference/inference/inference-shared.asciidoc @@ -31,4 +31,36 @@ end::task-settings[] tag::task-type[] The type of the {infer} task that the model will perform. -end::task-type[] \ No newline at end of file +end::task-type[] + +tag::chunking-settings[] +Chunking configuration object. +Refer to <> to learn more about chunking. +end::chunking-settings[] + +tag::chunking-settings-max-chunking-size[] +Specifies the maximum size of a chunk in words. +Defaults to `250`. +This value cannot be higher than `300` or lower than `20` (for `sentence` strategy) or `10` (for `word` strategy). +end::chunking-settings-max-chunking-size[] + +tag::chunking-settings-overlap[] +Only for `word` chunking strategy. +Specifies the number of overlapping words for chunks. +Defaults to `100`. +This value cannot be higher than the half of `max_chunking_size`. +end::chunking-settings-overlap[] + +tag::chunking-settings-sentence-overlap[] +Only for `sentence` chunking strategy. +Specifies the numnber of overlapping sentences for chunks. +It can be either `1` or `0`. +Defaults to `1`. +end::chunking-settings-sentence-overlap[] + +tag::chunking-settings-strategy[] +Specifies the chunking strategy. +It could be either `sentence` or `word`. +end::chunking-settings-strategy[] + + diff --git a/docs/reference/inference/service-alibabacloud-ai-search.asciidoc b/docs/reference/inference/service-alibabacloud-ai-search.asciidoc index 0607b56b528ea..c3ff40a39cd86 100644 --- a/docs/reference/inference/service-alibabacloud-ai-search.asciidoc +++ b/docs/reference/inference/service-alibabacloud-ai-search.asciidoc @@ -34,6 +34,26 @@ Available task types: [[infer-service-alibabacloud-ai-search-api-request-body]] ==== {api-request-body-title} +`chunking_settings`:: +(Optional, object) +include::inference-shared.asciidoc[tag=chunking-settings] + +`max_chunking_size`::: +(Optional, integer) +include::inference-shared.asciidoc[tag=chunking-settings-max-chunking-size] + +`overlap`::: +(Optional, integer) +include::inference-shared.asciidoc[tag=chunking-settings-overlap] + +`sentence_overlap`::: +(Optional, integer) +include::inference-shared.asciidoc[tag=chunking-settings-sentence-overlap] + +`strategy`::: +(Optional, string) +include::inference-shared.asciidoc[tag=chunking-settings-strategy] + `service`:: (Required, string) The type of service supported for the specified task type. In this case, @@ -108,7 +128,6 @@ To modify this, set the `requests_per_minute` setting of this object in your ser include::inference-shared.asciidoc[tag=request-per-minute-example] -- - `task_settings`:: (Optional, object) include::inference-shared.asciidoc[tag=task-settings] diff --git a/docs/reference/inference/service-amazon-bedrock.asciidoc b/docs/reference/inference/service-amazon-bedrock.asciidoc index dbffd5c26fbcc..761777e32f8e0 100644 --- a/docs/reference/inference/service-amazon-bedrock.asciidoc +++ b/docs/reference/inference/service-amazon-bedrock.asciidoc @@ -32,6 +32,26 @@ Available task types: [[infer-service-amazon-bedrock-api-request-body]] ==== {api-request-body-title} +`chunking_settings`:: +(Optional, object) +include::inference-shared.asciidoc[tag=chunking-settings] + +`max_chunking_size`::: +(Optional, integer) +include::inference-shared.asciidoc[tag=chunking-settings-max-chunking-size] + +`overlap`::: +(Optional, integer) +include::inference-shared.asciidoc[tag=chunking-settings-overlap] + +`sentence_overlap`::: +(Optional, integer) +include::inference-shared.asciidoc[tag=chunking-settings-sentence-overlap] + +`strategy`::: +(Optional, string) +include::inference-shared.asciidoc[tag=chunking-settings-strategy] + `service`:: (Required, string) The type of service supported for the specified task type. In this case, diff --git a/docs/reference/inference/service-anthropic.asciidoc b/docs/reference/inference/service-anthropic.asciidoc index 41419db7a6069..7fb3d1d5bea34 100644 --- a/docs/reference/inference/service-anthropic.asciidoc +++ b/docs/reference/inference/service-anthropic.asciidoc @@ -32,6 +32,26 @@ Available task types: [[infer-service-anthropic-api-request-body]] ==== {api-request-body-title} +`chunking_settings`:: +(Optional, object) +include::inference-shared.asciidoc[tag=chunking-settings] + +`max_chunking_size`::: +(Optional, integer) +include::inference-shared.asciidoc[tag=chunking-settings-max-chunking-size] + +`overlap`::: +(Optional, integer) +include::inference-shared.asciidoc[tag=chunking-settings-overlap] + +`sentence_overlap`::: +(Optional, integer) +include::inference-shared.asciidoc[tag=chunking-settings-sentence-overlap] + +`strategy`::: +(Optional, string) +include::inference-shared.asciidoc[tag=chunking-settings-strategy] + `service`:: (Required, string) The type of service supported for the specified task type. In this case, diff --git a/docs/reference/inference/service-azure-ai-studio.asciidoc b/docs/reference/inference/service-azure-ai-studio.asciidoc index 0d711a0d6171f..dd13a3e59aae5 100644 --- a/docs/reference/inference/service-azure-ai-studio.asciidoc +++ b/docs/reference/inference/service-azure-ai-studio.asciidoc @@ -33,6 +33,26 @@ Available task types: [[infer-service-azure-ai-studio-api-request-body]] ==== {api-request-body-title} +`chunking_settings`:: +(Optional, object) +include::inference-shared.asciidoc[tag=chunking-settings] + +`max_chunking_size`::: +(Optional, integer) +include::inference-shared.asciidoc[tag=chunking-settings-max-chunking-size] + +`overlap`::: +(Optional, integer) +include::inference-shared.asciidoc[tag=chunking-settings-overlap] + +`sentence_overlap`::: +(Optional, integer) +include::inference-shared.asciidoc[tag=chunking-settings-sentence-overlap] + +`strategy`::: +(Optional, string) +include::inference-shared.asciidoc[tag=chunking-settings-strategy] + `service`:: (Required, string) The type of service supported for the specified task type. In this case, diff --git a/docs/reference/inference/service-azure-openai.asciidoc b/docs/reference/inference/service-azure-openai.asciidoc index 6f03c5966d9e6..b134e2b687f6c 100644 --- a/docs/reference/inference/service-azure-openai.asciidoc +++ b/docs/reference/inference/service-azure-openai.asciidoc @@ -33,6 +33,26 @@ Available task types: [[infer-service-azure-openai-api-request-body]] ==== {api-request-body-title} +`chunking_settings`:: +(Optional, object) +include::inference-shared.asciidoc[tag=chunking-settings] + +`max_chunking_size`::: +(Optional, integer) +include::inference-shared.asciidoc[tag=chunking-settings-max-chunking-size] + +`overlap`::: +(Optional, integer) +include::inference-shared.asciidoc[tag=chunking-settings-overlap] + +`sentence_overlap`::: +(Optional, integer) +include::inference-shared.asciidoc[tag=chunking-settings-sentence-overlap] + +`strategy`::: +(Optional, string) +include::inference-shared.asciidoc[tag=chunking-settings-strategy] + `service`:: (Required, string) The type of service supported for the specified task type. In this case, diff --git a/docs/reference/inference/service-cohere.asciidoc b/docs/reference/inference/service-cohere.asciidoc index 84eae6e880617..1a815e3c45f36 100644 --- a/docs/reference/inference/service-cohere.asciidoc +++ b/docs/reference/inference/service-cohere.asciidoc @@ -34,6 +34,26 @@ Available task types: [[infer-service-cohere-api-request-body]] ==== {api-request-body-title} +`chunking_settings`:: +(Optional, object) +include::inference-shared.asciidoc[tag=chunking-settings] + +`max_chunking_size`::: +(Optional, integer) +include::inference-shared.asciidoc[tag=chunking-settings-max-chunking-size] + +`overlap`::: +(Optional, integer) +include::inference-shared.asciidoc[tag=chunking-settings-overlap] + +`sentence_overlap`::: +(Optional, integer) +include::inference-shared.asciidoc[tag=chunking-settings-sentence-overlap] + +`strategy`::: +(Optional, string) +include::inference-shared.asciidoc[tag=chunking-settings-strategy] + `service`:: (Required, string) The type of service supported for the specified task type. In this case, diff --git a/docs/reference/inference/service-elasticsearch.asciidoc b/docs/reference/inference/service-elasticsearch.asciidoc index 259779a12134d..0103b425faefe 100644 --- a/docs/reference/inference/service-elasticsearch.asciidoc +++ b/docs/reference/inference/service-elasticsearch.asciidoc @@ -36,6 +36,26 @@ Available task types: [[infer-service-elasticsearch-api-request-body]] ==== {api-request-body-title} +`chunking_settings`:: +(Optional, object) +include::inference-shared.asciidoc[tag=chunking-settings] + +`max_chunking_size`::: +(Optional, integer) +include::inference-shared.asciidoc[tag=chunking-settings-max-chunking-size] + +`overlap`::: +(Optional, integer) +include::inference-shared.asciidoc[tag=chunking-settings-overlap] + +`sentence_overlap`::: +(Optional, integer) +include::inference-shared.asciidoc[tag=chunking-settings-sentence-overlap] + +`strategy`::: +(Optional, string) +include::inference-shared.asciidoc[tag=chunking-settings-strategy] + `service`:: (Required, string) The type of service supported for the specified task type. In this case, diff --git a/docs/reference/inference/service-elser.asciidoc b/docs/reference/inference/service-elser.asciidoc index 521fab0375584..273d743e47a4b 100644 --- a/docs/reference/inference/service-elser.asciidoc +++ b/docs/reference/inference/service-elser.asciidoc @@ -36,6 +36,26 @@ Available task types: [[infer-service-elser-api-request-body]] ==== {api-request-body-title} +`chunking_settings`:: +(Optional, object) +include::inference-shared.asciidoc[tag=chunking-settings] + +`max_chunking_size`::: +(Optional, integer) +include::inference-shared.asciidoc[tag=chunking-settings-max-chunking-size] + +`overlap`::: +(Optional, integer) +include::inference-shared.asciidoc[tag=chunking-settings-overlap] + +`sentence_overlap`::: +(Optional, integer) +include::inference-shared.asciidoc[tag=chunking-settings-sentence-overlap] + +`strategy`::: +(Optional, string) +include::inference-shared.asciidoc[tag=chunking-settings-strategy] + `service`:: (Required, string) The type of service supported for the specified task type. In this case, diff --git a/docs/reference/inference/service-google-ai-studio.asciidoc b/docs/reference/inference/service-google-ai-studio.asciidoc index 25aa89cd49110..738fce3d53e9b 100644 --- a/docs/reference/inference/service-google-ai-studio.asciidoc +++ b/docs/reference/inference/service-google-ai-studio.asciidoc @@ -33,6 +33,26 @@ Available task types: [[infer-service-google-ai-studio-api-request-body]] ==== {api-request-body-title} +`chunking_settings`:: +(Optional, object) +include::inference-shared.asciidoc[tag=chunking-settings] + +`max_chunking_size`::: +(Optional, integer) +include::inference-shared.asciidoc[tag=chunking-settings-max-chunking-size] + +`overlap`::: +(Optional, integer) +include::inference-shared.asciidoc[tag=chunking-settings-overlap] + +`sentence_overlap`::: +(Optional, integer) +include::inference-shared.asciidoc[tag=chunking-settings-sentence-overlap] + +`strategy`::: +(Optional, string) +include::inference-shared.asciidoc[tag=chunking-settings-strategy] + `service`:: (Required, string) The type of service supported for the specified task type. In this case, diff --git a/docs/reference/inference/service-google-vertex-ai.asciidoc b/docs/reference/inference/service-google-vertex-ai.asciidoc index 640553ab74626..34e14e05e072a 100644 --- a/docs/reference/inference/service-google-vertex-ai.asciidoc +++ b/docs/reference/inference/service-google-vertex-ai.asciidoc @@ -33,6 +33,26 @@ Available task types: [[infer-service-google-vertex-ai-api-request-body]] ==== {api-request-body-title} +`chunking_settings`:: +(Optional, object) +include::inference-shared.asciidoc[tag=chunking-settings] + +`max_chunking_size`::: +(Optional, integer) +include::inference-shared.asciidoc[tag=chunking-settings-max-chunking-size] + +`overlap`::: +(Optional, integer) +include::inference-shared.asciidoc[tag=chunking-settings-overlap] + +`sentence_overlap`::: +(Optional, integer) +include::inference-shared.asciidoc[tag=chunking-settings-sentence-overlap] + +`strategy`::: +(Optional, string) +include::inference-shared.asciidoc[tag=chunking-settings-strategy] + `service`:: (Required, string) The type of service supported for the specified task type. In this case, diff --git a/docs/reference/inference/service-hugging-face.asciidoc b/docs/reference/inference/service-hugging-face.asciidoc index 177a15177d21f..6d8667351a6b4 100644 --- a/docs/reference/inference/service-hugging-face.asciidoc +++ b/docs/reference/inference/service-hugging-face.asciidoc @@ -32,6 +32,26 @@ Available task types: [[infer-service-hugging-face-api-request-body]] ==== {api-request-body-title} +`chunking_settings`:: +(Optional, object) +include::inference-shared.asciidoc[tag=chunking-settings] + +`max_chunking_size`::: +(Optional, integer) +include::inference-shared.asciidoc[tag=chunking-settings-max-chunking-size] + +`overlap`::: +(Optional, integer) +include::inference-shared.asciidoc[tag=chunking-settings-overlap] + +`sentence_overlap`::: +(Optional, integer) +include::inference-shared.asciidoc[tag=chunking-settings-sentence-overlap] + +`strategy`::: +(Optional, string) +include::inference-shared.asciidoc[tag=chunking-settings-strategy] + `service`:: (Required, string) The type of service supported for the specified task type. In this case, diff --git a/docs/reference/inference/service-mistral.asciidoc b/docs/reference/inference/service-mistral.asciidoc index 077e610191705..244381d107161 100644 --- a/docs/reference/inference/service-mistral.asciidoc +++ b/docs/reference/inference/service-mistral.asciidoc @@ -32,6 +32,26 @@ Available task types: [[infer-service-mistral-api-request-body]] ==== {api-request-body-title} +`chunking_settings`:: +(Optional, object) +include::inference-shared.asciidoc[tag=chunking-settings] + +`max_chunking_size`::: +(Optional, integer) +include::inference-shared.asciidoc[tag=chunking-settings-max-chunking-size] + +`overlap`::: +(Optional, integer) +include::inference-shared.asciidoc[tag=chunking-settings-overlap] + +`sentence_overlap`::: +(Optional, integer) +include::inference-shared.asciidoc[tag=chunking-settings-sentence-overlap] + +`strategy`::: +(Optional, string) +include::inference-shared.asciidoc[tag=chunking-settings-strategy] + `service`:: (Required, string) The type of service supported for the specified task type. In this case, diff --git a/docs/reference/inference/service-openai.asciidoc b/docs/reference/inference/service-openai.asciidoc index 075e76dc7d741..21643133553e1 100644 --- a/docs/reference/inference/service-openai.asciidoc +++ b/docs/reference/inference/service-openai.asciidoc @@ -33,6 +33,26 @@ Available task types: [[infer-service-openai-api-request-body]] ==== {api-request-body-title} +`chunking_settings`:: +(Optional, object) +include::inference-shared.asciidoc[tag=chunking-settings] + +`max_chunking_size`::: +(Optional, integer) +include::inference-shared.asciidoc[tag=chunking-settings-max-chunking-size] + +`overlap`::: +(Optional, integer) +include::inference-shared.asciidoc[tag=chunking-settings-overlap] + +`sentence_overlap`::: +(Optional, integer) +include::inference-shared.asciidoc[tag=chunking-settings-sentence-overlap] + +`strategy`::: +(Optional, string) +include::inference-shared.asciidoc[tag=chunking-settings-strategy] + `service`:: (Required, string) The type of service supported for the specified task type. In this case,