From 33e4aee58bcb306dbc23772c7b86b91023c7bbb9 Mon Sep 17 00:00:00 2001
From: kosabogi <boglarka.kosa@elastic.co>
Date: Mon, 15 Sep 2025 15:17:25 +0200
Subject: [PATCH 1/5] Adds chunking strategy examples to the Inference
 integrations page

---
 .../elastic-inference/inference-api.md        | 91 ++++++++++++++++---
 1 file changed, 76 insertions(+), 15 deletions(-)

diff --git a/explore-analyze/elastic-inference/inference-api.md b/explore-analyze/elastic-inference/inference-api.md
index 0ae5bffa0b..501ac852f3 100644
--- a/explore-analyze/elastic-inference/inference-api.md
+++ b/explore-analyze/elastic-inference/inference-api.md
@@ -107,37 +107,98 @@ By default, documents are split into sentences and grouped in sections up to 250
 
 Several strategies are available for chunking: 
 
-`sentence`
-:   The `sentence` strategy splits the input text at sentence boundaries. Each chunk contains one or more complete sentences ensuring that the integrity of sentence-level context is preserved, except when a sentence causes a chunk to exceed a word count of `max_chunk_size`, in which case it will be split across chunks. The `sentence_overlap` option defines the number of sentences from the previous chunk to include in the current chunk which is either `0` or `1`.
+#### `sentence` 
 
-`word`
-:   The `word` strategy splits the input text on individual words up to the `max_chunk_size` limit. The `overlap` option is the number of words from the previous chunk to include in the current chunk.
+The `sentence` strategy splits the input text at sentence boundaries. Each chunk contains one or more complete sentences ensuring that the integrity of sentence-level context is preserved, except when a sentence causes a chunk to exceed a word count of `max_chunk_size`, in which case it will be split across chunks. The `sentence_overlap` option defines the number of sentences from the previous chunk to include in the current chunk which is either `0` or `1`.
 
-`recursive`{applies_to}`stack: ga 9.1`
-:   The `recursive` strategy splits the input text based on a configurable list of separator patterns (for example, newlines or Markdown headers). The chunker applies these separators in order, recursively splitting any chunk that exceeds the `max_chunk_size` word limit. If no separator produces a small enough chunk, the strategy falls back to sentence-level splitting.
+The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model and configures the chunking behavior with the `sentence` strategy.
 
-`none` {applies_to}`stack: ga 9.1`
-
-:   The `none` strategy disables chunking and processes the entire input text as a single block, without any splitting or overlap. When using this strategy, you can instead [pre-chunk](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/semantic-text#auto-text-chunking) the input by providing an array of strings, where each element acts as a separate chunk to be sent directly to the inference service without further chunking.
+```console
+PUT _inference/sparse_embedding/sentence_chunks
+{
+  "service": "elasticsearch",
+  "service_settings": {
+    "model_id": ".elser_model_2",
+    "num_allocations": 1,
+    "num_threads": 1
+  },
+  "chunking_settings": {
+    "strategy": "sentence",
+    "max_chunk_size": 100,
+    "sentence_overlap": 0
+  }
+}
+```
 
 The default chunking strategy is `sentence`.
 
-#### Example of configuring the chunking behavior
+#### `word`
 
-The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model by default and configures the chunking behavior.
+The `word` strategy splits the input text on individual words up to the `max_chunk_size` limit. The `overlap` option is the number of words from the previous chunk to include in the current chunk. 
+
+The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model and configures the chunking behavior with the `word` strategy, setting a maximum of 120 words per chunk and an overlap of 40 words between chunks.
 
 ```console
-PUT _inference/sparse_embedding/small_chunk_size
+PUT _inference/sparse_embedding/word_chunks
 {
   "service": "elasticsearch",
   "service_settings": {
+    "model_id": ".elser_model_2",
     "num_allocations": 1,
     "num_threads": 1
   },
   "chunking_settings": {
-    "strategy": "sentence",
-    "max_chunk_size": 100,
-    "sentence_overlap": 0
+    "strategy": "word",
+    "max_chunk_size": 120,
+    "overlap": 40
+  }
+}
+```
+
+#### `recursive`
+
+{applies_to}`stack: ga 9.1`
+
+The `recursive` strategy splits the input text based on a configurable list of separator patterns (for example, newlines or Markdown headers). The chunker applies these separators in order, recursively splitting any chunk that exceeds the `max_chunk_size` word limit. If no separator produces a small enough chunk, the strategy falls back to sentence-level splitting.
+
+The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model and configures chunking with the `recursive` strategy using the markdown separator group and a maximum of 200 words per chunk.
+
+```console
+PUT _inference/sparse_embedding/recursive_markdown_chunks
+{
+  "service": "elasticsearch",
+  "service_settings": {
+    "model_id": ".elser_model_2",
+    "num_allocations": 1,
+    "num_threads": 1
+  },
+  "chunking_settings": {
+    "strategy": "recursive",
+    "max_chunk_size": 200,
+    "separator_group": "markdown"
+  }
+}
+```
+
+#### `none`
+
+{applies_to}`stack: ga 9.1`
+
+The `none` strategy disables chunking and processes the entire input text as a single block, without any splitting or overlap. When using this strategy, you can instead [pre-chunk](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/semantic-text#auto-text-chunking) the input by providing an array of strings, where each element acts as a separate chunk to be sent directly to the inference service without further chunking.
+
+The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the `ELSER` model and disables chunking by setting the strategy to `none`.
+
+```console
+PUT _inference/sparse_embedding/none_chunking
+{
+  "service": "elasticsearch",
+  "service_settings": {
+    "model_id": ".elser_model_2",
+    "num_allocations": 1,
+    "num_threads": 1
+  },
+  "chunking_settings": {
+    "strategy": "none"
   }
 }
 ```

From 687df2ab21fc13ecd852dd8d64e37f91ef36b90b Mon Sep 17 00:00:00 2001
From: kosabogi <boglarka.kosa@elastic.co>
Date: Wed, 17 Sep 2025 10:56:48 +0200
Subject: [PATCH 2/5] Applies suggestions from code review

---
 .../elastic-inference/inference-api.md        | 38 ++++++++++++++++++-
 1 file changed, 36 insertions(+), 2 deletions(-)

diff --git a/explore-analyze/elastic-inference/inference-api.md b/explore-analyze/elastic-inference/inference-api.md
index 501ac852f3..d400df10ce 100644
--- a/explore-analyze/elastic-inference/inference-api.md
+++ b/explore-analyze/elastic-inference/inference-api.md
@@ -157,10 +157,14 @@ PUT _inference/sparse_embedding/word_chunks
 
 #### `recursive`
 
-{applies_to}`stack: ga 9.1`
+```{applies_to}
+stack: ga 9.1`
+```
 
 The `recursive` strategy splits the input text based on a configurable list of separator patterns (for example, newlines or Markdown headers). The chunker applies these separators in order, recursively splitting any chunk that exceeds the `max_chunk_size` word limit. If no separator produces a small enough chunk, the strategy falls back to sentence-level splitting.
 
+##### Markdown separator group
+
 The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model and configures chunking with the `recursive` strategy using the markdown separator group and a maximum of 200 words per chunk.
 
 ```console
@@ -180,9 +184,39 @@ PUT _inference/sparse_embedding/recursive_markdown_chunks
 }
 ```
 
+##### Custom separator group
+
+The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model and configures chunking with the `recursive` strategy. It uses a custom list of separators to split plaintext into chunks of up to 180 words.
+
+
+```console
+PUT _inference/sparse_embedding/recursive_custom_chunks
+{
+  "service": "elasticsearch",
+  "service_settings": {
+    "model_id": ".elser_model_2",
+    "num_allocations": 1,
+    "num_threads": 1
+  },
+  "chunking_settings": {
+    "strategy": "recursive",
+    "max_chunk_size": 180,
+    "separators": [
+      "^(#{1,6})\\s",
+      "\\n\\n",
+      "\\n[-*]\\s",
+      "\\n\\d+\\.\\s",
+      "\\n"
+    ]
+  }
+}
+```
+
 #### `none`
 
-{applies_to}`stack: ga 9.1`
+```{applies_to}
+stack: ga 9.1`
+```
 
 The `none` strategy disables chunking and processes the entire input text as a single block, without any splitting or overlap. When using this strategy, you can instead [pre-chunk](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/semantic-text#auto-text-chunking) the input by providing an array of strings, where each element acts as a separate chunk to be sent directly to the inference service without further chunking.
 

From 496ec0f3348390876f05a183b73113821578ea28 Mon Sep 17 00:00:00 2001
From: kosabogi <boglarka.kosa@elastic.co>
Date: Mon, 15 Sep 2025 15:17:25 +0200
Subject: [PATCH 3/5] Adds chunking strategy examples to the Inference
 integrations page

---
 .../elastic-inference/inference-api.md        | 91 ++++++++++++++++---
 1 file changed, 76 insertions(+), 15 deletions(-)

diff --git a/explore-analyze/elastic-inference/inference-api.md b/explore-analyze/elastic-inference/inference-api.md
index 0ae5bffa0b..501ac852f3 100644
--- a/explore-analyze/elastic-inference/inference-api.md
+++ b/explore-analyze/elastic-inference/inference-api.md
@@ -107,37 +107,98 @@ By default, documents are split into sentences and grouped in sections up to 250
 
 Several strategies are available for chunking: 
 
-`sentence`
-:   The `sentence` strategy splits the input text at sentence boundaries. Each chunk contains one or more complete sentences ensuring that the integrity of sentence-level context is preserved, except when a sentence causes a chunk to exceed a word count of `max_chunk_size`, in which case it will be split across chunks. The `sentence_overlap` option defines the number of sentences from the previous chunk to include in the current chunk which is either `0` or `1`.
+#### `sentence` 
 
-`word`
-:   The `word` strategy splits the input text on individual words up to the `max_chunk_size` limit. The `overlap` option is the number of words from the previous chunk to include in the current chunk.
+The `sentence` strategy splits the input text at sentence boundaries. Each chunk contains one or more complete sentences ensuring that the integrity of sentence-level context is preserved, except when a sentence causes a chunk to exceed a word count of `max_chunk_size`, in which case it will be split across chunks. The `sentence_overlap` option defines the number of sentences from the previous chunk to include in the current chunk which is either `0` or `1`.
 
-`recursive`{applies_to}`stack: ga 9.1`
-:   The `recursive` strategy splits the input text based on a configurable list of separator patterns (for example, newlines or Markdown headers). The chunker applies these separators in order, recursively splitting any chunk that exceeds the `max_chunk_size` word limit. If no separator produces a small enough chunk, the strategy falls back to sentence-level splitting.
+The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model and configures the chunking behavior with the `sentence` strategy.
 
-`none` {applies_to}`stack: ga 9.1`
-
-:   The `none` strategy disables chunking and processes the entire input text as a single block, without any splitting or overlap. When using this strategy, you can instead [pre-chunk](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/semantic-text#auto-text-chunking) the input by providing an array of strings, where each element acts as a separate chunk to be sent directly to the inference service without further chunking.
+```console
+PUT _inference/sparse_embedding/sentence_chunks
+{
+  "service": "elasticsearch",
+  "service_settings": {
+    "model_id": ".elser_model_2",
+    "num_allocations": 1,
+    "num_threads": 1
+  },
+  "chunking_settings": {
+    "strategy": "sentence",
+    "max_chunk_size": 100,
+    "sentence_overlap": 0
+  }
+}
+```
 
 The default chunking strategy is `sentence`.
 
-#### Example of configuring the chunking behavior
+#### `word`
 
-The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model by default and configures the chunking behavior.
+The `word` strategy splits the input text on individual words up to the `max_chunk_size` limit. The `overlap` option is the number of words from the previous chunk to include in the current chunk. 
+
+The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model and configures the chunking behavior with the `word` strategy, setting a maximum of 120 words per chunk and an overlap of 40 words between chunks.
 
 ```console
-PUT _inference/sparse_embedding/small_chunk_size
+PUT _inference/sparse_embedding/word_chunks
 {
   "service": "elasticsearch",
   "service_settings": {
+    "model_id": ".elser_model_2",
     "num_allocations": 1,
     "num_threads": 1
   },
   "chunking_settings": {
-    "strategy": "sentence",
-    "max_chunk_size": 100,
-    "sentence_overlap": 0
+    "strategy": "word",
+    "max_chunk_size": 120,
+    "overlap": 40
+  }
+}
+```
+
+#### `recursive`
+
+{applies_to}`stack: ga 9.1`
+
+The `recursive` strategy splits the input text based on a configurable list of separator patterns (for example, newlines or Markdown headers). The chunker applies these separators in order, recursively splitting any chunk that exceeds the `max_chunk_size` word limit. If no separator produces a small enough chunk, the strategy falls back to sentence-level splitting.
+
+The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model and configures chunking with the `recursive` strategy using the markdown separator group and a maximum of 200 words per chunk.
+
+```console
+PUT _inference/sparse_embedding/recursive_markdown_chunks
+{
+  "service": "elasticsearch",
+  "service_settings": {
+    "model_id": ".elser_model_2",
+    "num_allocations": 1,
+    "num_threads": 1
+  },
+  "chunking_settings": {
+    "strategy": "recursive",
+    "max_chunk_size": 200,
+    "separator_group": "markdown"
+  }
+}
+```
+
+#### `none`
+
+{applies_to}`stack: ga 9.1`
+
+The `none` strategy disables chunking and processes the entire input text as a single block, without any splitting or overlap. When using this strategy, you can instead [pre-chunk](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/semantic-text#auto-text-chunking) the input by providing an array of strings, where each element acts as a separate chunk to be sent directly to the inference service without further chunking.
+
+The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the `ELSER` model and disables chunking by setting the strategy to `none`.
+
+```console
+PUT _inference/sparse_embedding/none_chunking
+{
+  "service": "elasticsearch",
+  "service_settings": {
+    "model_id": ".elser_model_2",
+    "num_allocations": 1,
+    "num_threads": 1
+  },
+  "chunking_settings": {
+    "strategy": "none"
   }
 }
 ```

From 04d2ae5bde890a085601ac85120f162b5010e5e7 Mon Sep 17 00:00:00 2001
From: kosabogi <boglarka.kosa@elastic.co>
Date: Wed, 17 Sep 2025 10:56:48 +0200
Subject: [PATCH 4/5] Applies suggestions from code review

---
 .../elastic-inference/inference-api.md        | 38 ++++++++++++++++++-
 1 file changed, 36 insertions(+), 2 deletions(-)

diff --git a/explore-analyze/elastic-inference/inference-api.md b/explore-analyze/elastic-inference/inference-api.md
index 501ac852f3..d400df10ce 100644
--- a/explore-analyze/elastic-inference/inference-api.md
+++ b/explore-analyze/elastic-inference/inference-api.md
@@ -157,10 +157,14 @@ PUT _inference/sparse_embedding/word_chunks
 
 #### `recursive`
 
-{applies_to}`stack: ga 9.1`
+```{applies_to}
+stack: ga 9.1`
+```
 
 The `recursive` strategy splits the input text based on a configurable list of separator patterns (for example, newlines or Markdown headers). The chunker applies these separators in order, recursively splitting any chunk that exceeds the `max_chunk_size` word limit. If no separator produces a small enough chunk, the strategy falls back to sentence-level splitting.
 
+##### Markdown separator group
+
 The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model and configures chunking with the `recursive` strategy using the markdown separator group and a maximum of 200 words per chunk.
 
 ```console
@@ -180,9 +184,39 @@ PUT _inference/sparse_embedding/recursive_markdown_chunks
 }
 ```
 
+##### Custom separator group
+
+The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model and configures chunking with the `recursive` strategy. It uses a custom list of separators to split plaintext into chunks of up to 180 words.
+
+
+```console
+PUT _inference/sparse_embedding/recursive_custom_chunks
+{
+  "service": "elasticsearch",
+  "service_settings": {
+    "model_id": ".elser_model_2",
+    "num_allocations": 1,
+    "num_threads": 1
+  },
+  "chunking_settings": {
+    "strategy": "recursive",
+    "max_chunk_size": 180,
+    "separators": [
+      "^(#{1,6})\\s",
+      "\\n\\n",
+      "\\n[-*]\\s",
+      "\\n\\d+\\.\\s",
+      "\\n"
+    ]
+  }
+}
+```
+
 #### `none`
 
-{applies_to}`stack: ga 9.1`
+```{applies_to}
+stack: ga 9.1`
+```
 
 The `none` strategy disables chunking and processes the entire input text as a single block, without any splitting or overlap. When using this strategy, you can instead [pre-chunk](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/semantic-text#auto-text-chunking) the input by providing an array of strings, where each element acts as a separate chunk to be sent directly to the inference service without further chunking.
 

From e449e542fffce18d7de3dbe2ecfb20bd6f46a862 Mon Sep 17 00:00:00 2001
From: kosabogi <105062005+kosabogi@users.noreply.github.com>
Date: Thu, 18 Sep 2025 12:33:51 +0200
Subject: [PATCH 5/5] Update explore-analyze/elastic-inference/inference-api.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
---
 explore-analyze/elastic-inference/inference-api.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/explore-analyze/elastic-inference/inference-api.md b/explore-analyze/elastic-inference/inference-api.md
index d400df10ce..a185fe9d04 100644
--- a/explore-analyze/elastic-inference/inference-api.md
+++ b/explore-analyze/elastic-inference/inference-api.md
@@ -220,7 +220,7 @@ stack: ga 9.1`
 
 The `none` strategy disables chunking and processes the entire input text as a single block, without any splitting or overlap. When using this strategy, you can instead [pre-chunk](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/semantic-text#auto-text-chunking) the input by providing an array of strings, where each element acts as a separate chunk to be sent directly to the inference service without further chunking.
 
-The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the `ELSER` model and disables chunking by setting the strategy to `none`.
+The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model and disables chunking by setting the strategy to `none`.
 
 ```console
 PUT _inference/sparse_embedding/none_chunking