From 2cd0ba5306bec3d259bfd3afa43120a36de68e80 Mon Sep 17 00:00:00 2001 From: James Baiera Date: Wed, 17 Sep 2025 11:49:53 -0400 Subject: [PATCH 1/6] Add docs for ingest pipeline field access pattern --- .../transform-enrich/ingest-pipelines.md | 224 +++++++++++++++++- 1 file changed, 223 insertions(+), 1 deletion(-) diff --git a/manage-data/ingest/transform-enrich/ingest-pipelines.md b/manage-data/ingest/transform-enrich/ingest-pipelines.md index de027f23b3..e128fa96d2 100644 --- a/manage-data/ingest/transform-enrich/ingest-pipelines.md +++ b/manage-data/ingest/transform-enrich/ingest-pipelines.md @@ -388,7 +388,7 @@ PUT _ingest/pipeline/my-pipeline Use dot notation to access object fields. ::::{important} -If your document contains flattened objects, use the [`dot_expander`](elasticsearch://reference/enrich-processor/dot-expand-processor.md) processor to expand them first. Other ingest processors cannot access flattened objects. +If your document contains flattened objects, you can use the [`dot_expander`](elasticsearch://reference/enrich-processor/dot-expand-processor.md) processor to expand them. If you wish to maintain your document structure, you can use the [`flexible`](ingest-pipelines.md#access-source-flexible) access pattern in your pipeline definition. Ingest processors cannot access dotted field names otherwise. :::: @@ -431,6 +431,228 @@ PUT _ingest/pipeline/my-pipeline } ``` +## Flexible field access pattern [access-source-flexible] +```{applies_to} +serverless: ga +stack: ga 9.2 +``` + +The default ingest pipeline access pattern does not recognize dotted field names in documents. Retrieving flattened and dotted field names from an ingest document requires a different field retrieval algorithm that does not have this limitation. We know that some pipelines have come to rely on these dotted field name limitations in their logic. In order to continue supporting the original behavior while still adding support for dotted field names, ingest pipelines now support configuring an access pattern to use for all processors in the pipeline. + +The `field_access_pattern` property on an ingest pipeline defines how ingest document fields are read and written for all processors in the current pipeline. It accepts two values: `classic` and `flexible`. + +```console +PUT _ingest/pipeline/my-pipeline +{ + "field_access_pattern": "classic", <1> + "processors": [ + { + "set": { + "description": "Set some searchable tags in our document's flattened field", + "field": "event.tags.ingest.processed_by", <2> + "value": "my-pipeline" + } + } + ] +} +``` +1. All processors in this pipeline will use the `classic` access pattern. +2. The logic for resolving field paths used by processors to read and write values to ingest documents is based on the access pattern. + +The `classic` access pattern is the default access pattern that has been around since ingest node first released. Field paths given to processors (e.g. `event.tags.ingest.processed_by`) are split on the dot character (`.`). The processor then uses the resulting field names to traverse the document until a value is found. When writing a value to a document, if its parent fields do not exist in the source, the processor will create nested objects for the missing fields. + +```console +POST /_ingest/pipeline/_simulate +{ + "pipeline" : { + "description": "example pipeline", + "field_access_pattern": "classic", <1> + "processors": [ + { + "set" : { + "description" : "Copy the foo.bar field into the a.b.c.d field if it exists", + "copy_from" : "foo.bar", <2> + "field" : "a.b.c.d", <3> + "ignore_empty_value": true + } + } + ] + }, + "docs": [ + { + "_index": "index", + "_id": "id", + "_source": { + "foo": { + "bar": "baz" <4> + } + } + }, + { + "_index": "index", + "_id": "id", + "_source": { + "foo.bar": "baz" <5> + } + } + ] +} +``` +1. Explicitly declaring to use the `classic` access pattern in the pipeline. This is the default value. +2. We are reading a value from the field `foo.bar`. +3. We are writing its value to the field `a.b.c.d`. +4. This document uses nested json objects in its structure. +5. This document uses dotted field names in its structure. + +```console-result +{ + "docs": [ + { + "doc": { + "_id": "id", + "_index": "index", + "_version": "-3", + "_source": { + "foo": { + "bar": "baz" <1> + }, + "a": { + "b": { + "c": { + "d": "baz" <2> + } + } + } + }, + "_ingest": { + "timestamp": "2017-05-04T22:30:03.187Z" + } + } + }, + { + "doc": { + "_id": "id", + "_index": "index", + "_version": "-3", + "_source": { + "foo.bar": "baz" <3> + }, + "_ingest": { + "timestamp": "2017-05-04T22:30:03.188Z" + } + } + } + ] +} +``` +1. The first document's `foo.bar` field is located, because it uses nested json. The processor looks for a `foo` field, and then a `bar` field. +2. The value from the `foo.bar` field is written to a nested json structure at field `a.b.c.d`. The processor creates objects for each field in the path. +3. The second document uses a dotted field name for `foo.bar`. The `classic` access pattern does not recognize dotted field names, and so nothing is copied. + +If the documents you are ingesting contain dotted field names, to read them with the `classic` access pattern, you must use the [`dot_expander`](elasticsearch://reference/enrich-processor/dot-expand-processor.md) processor. This approach is not always reasonable though. Consider the following document: + +```json +{ + "event": { + "tags": { + "http.host": "localhost:9200", + "http.host.name": "localhost", + "http.host.port": 9200 + } + } +} +``` +If the `event.tags` field was processed with the [`dot_expander`](elasticsearch://reference/enrich-processor/dot-expand-processor.md) processor, the field values would collide. The `http.host` field cannot be a text value and an object value at the same time. + +The `flexible` access pattern allows for ingest pipelines to access both nested and dotted field names without using the [`dot_expander`](elasticsearch://reference/enrich-processor/dot-expand-processor.md) processor. Additionally, when writing a value to a field that does not exist, any parent fields that are missing are concatenated to the start of the new key. Use the `flexible` access pattern if your documents have dotted field names, and also if you prefer to write missing fields to the document with dotted names. + +```console +POST /_ingest/pipeline/_simulate +{ + "pipeline" : { + "description": "example pipeline", + "field_access_pattern": "flexible", <1> + "processors": [ + { + "set" : { + "description" : "Copy the foo.bar field into the a.b.c.d field if it exists", + "copy_from" : "foo.bar", <2> + "field" : "a.b.c.d", <3> + "ignore_empty_value": true + } + } + ] + }, + "docs": [ + { + "_index": "index", + "_id": "id", + "_source": { + "foo": { + "bar": "baz" <4> + }, + "a": {} <5> + } + }, + { + "_index": "index", + "_id": "id", + "_source": { + "foo.bar": "baz", <6> + } + } + ] +} +``` +1. Using the `flexible` access pattern in the pipeline. +2. We are reading a value from the field `foo.bar`. +3. We are writing its value to the field `a.b.c.d`. +4. The first document uses nested json objects in its structure. +5. The first document has an existing `a` field in the root. +6. The second document uses a dotted field name. + +```console-result +{ + "docs": [ + { + "doc": { + "_id": "id", + "_index": "index", + "_version": "-3", + "_source": { + "foo": { + "bar": "baz" <1> + }, + "a": { + "b.c,d": "baz" <2> + } + }, + "_ingest": { + "timestamp": "2017-05-04T22:30:03.187Z" + } + } + }, + { + "doc": { + "_id": "id", + "_index": "index", + "_version": "-3", + "_source": { + "foo.bar": "baz", <3> + "a.b.c.d": "baz" <4> + }, + "_ingest": { + "timestamp": "2017-05-04T22:30:03.188Z" + } + } + } + ] +} +``` +1. The `flexible` access pattern supports nested object fields. The processor looks for a `foo` field, and then a `bar` field. +2. The value from the `foo.bar` field is written to the dotted field name `b.c.d` underneath the field `a`. The processor concatenates the missing field names together as a prefix on the key. +3. The `flexible` access pattern also supports dotted field names. The processor looks for a field named `foo`, and after not finding it, looks for a field named `foo.bar`. +4. The value from the `foo.bar` field is written to the dotted field name `a.b.c.d`. Since none of those fields exist in the document yet, they are concatenated together into a dotted field name. ## Access metadata fields in a processor [access-metadata-fields] From 37a7447e239075a1a135be78847ba6329ebbe775 Mon Sep 17 00:00:00 2001 From: James Baiera Date: Wed, 17 Sep 2025 11:51:23 -0400 Subject: [PATCH 2/6] cleanup --- manage-data/ingest/transform-enrich/ingest-pipelines.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/manage-data/ingest/transform-enrich/ingest-pipelines.md b/manage-data/ingest/transform-enrich/ingest-pipelines.md index e128fa96d2..3d4be50c62 100644 --- a/manage-data/ingest/transform-enrich/ingest-pipelines.md +++ b/manage-data/ingest/transform-enrich/ingest-pipelines.md @@ -388,7 +388,7 @@ PUT _ingest/pipeline/my-pipeline Use dot notation to access object fields. ::::{important} -If your document contains flattened objects, you can use the [`dot_expander`](elasticsearch://reference/enrich-processor/dot-expand-processor.md) processor to expand them. If you wish to maintain your document structure, you can use the [`flexible`](ingest-pipelines.md#access-source-flexible) access pattern in your pipeline definition. Ingest processors cannot access dotted field names otherwise. +If your document contains flattened objects, use the [`dot_expander`](elasticsearch://reference/enrich-processor/dot-expand-processor.md) processor to expand them. If you wish to maintain your document structure, use the [`flexible`](ingest-pipelines.md#access-source-flexible) access pattern in your pipeline definition. Ingest processors cannot access dotted field names otherwise. :::: From 15b27e7be6dd68ec1dc7d97e293f6cb2c28c0996 Mon Sep 17 00:00:00 2001 From: James Baiera Date: Fri, 26 Sep 2025 13:56:21 -0400 Subject: [PATCH 3/6] Update manage-data/ingest/transform-enrich/ingest-pipelines.md Co-authored-by: Lee Hinman --- manage-data/ingest/transform-enrich/ingest-pipelines.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/manage-data/ingest/transform-enrich/ingest-pipelines.md b/manage-data/ingest/transform-enrich/ingest-pipelines.md index 3d4be50c62..4386a094da 100644 --- a/manage-data/ingest/transform-enrich/ingest-pipelines.md +++ b/manage-data/ingest/transform-enrich/ingest-pipelines.md @@ -388,7 +388,7 @@ PUT _ingest/pipeline/my-pipeline Use dot notation to access object fields. ::::{important} -If your document contains flattened objects, use the [`dot_expander`](elasticsearch://reference/enrich-processor/dot-expand-processor.md) processor to expand them. If you wish to maintain your document structure, use the [`flexible`](ingest-pipelines.md#access-source-flexible) access pattern in your pipeline definition. Ingest processors cannot access dotted field names otherwise. +If your document contains flattened objects, use the [`dot_expander`](elasticsearch://reference/enrich-processor/dot-expand-processor.md) processor to expand them. If you wish to maintain your document structure, use the [`flexible`](ingest-pipelines.md#access-source-flexible) access pattern in your pipeline definition. Otherwise Ingest processors cannot access dotted field names. :::: From 3a65271937f6b5415c81835d1327b1f5648f15d4 Mon Sep 17 00:00:00 2001 From: James Baiera Date: Fri, 26 Sep 2025 14:27:38 -0400 Subject: [PATCH 4/6] Update manage-data/ingest/transform-enrich/ingest-pipelines.md Co-authored-by: Lee Hinman --- manage-data/ingest/transform-enrich/ingest-pipelines.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/manage-data/ingest/transform-enrich/ingest-pipelines.md b/manage-data/ingest/transform-enrich/ingest-pipelines.md index 4386a094da..e402b0f1a7 100644 --- a/manage-data/ingest/transform-enrich/ingest-pipelines.md +++ b/manage-data/ingest/transform-enrich/ingest-pipelines.md @@ -624,7 +624,7 @@ POST /_ingest/pipeline/_simulate "bar": "baz" <1> }, "a": { - "b.c,d": "baz" <2> + "b.c.d": "baz" <2> } }, "_ingest": { From 5a1fc8b21c3b8cf7eb40ff2acf95417b2dedccac Mon Sep 17 00:00:00 2001 From: James Baiera Date: Fri, 26 Sep 2025 14:37:24 -0400 Subject: [PATCH 5/6] Feedback --- manage-data/ingest/transform-enrich/ingest-pipelines.md | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/manage-data/ingest/transform-enrich/ingest-pipelines.md b/manage-data/ingest/transform-enrich/ingest-pipelines.md index e402b0f1a7..aec1b3a49d 100644 --- a/manage-data/ingest/transform-enrich/ingest-pipelines.md +++ b/manage-data/ingest/transform-enrich/ingest-pipelines.md @@ -388,7 +388,7 @@ PUT _ingest/pipeline/my-pipeline Use dot notation to access object fields. ::::{important} -If your document contains flattened objects, use the [`dot_expander`](elasticsearch://reference/enrich-processor/dot-expand-processor.md) processor to expand them. If you wish to maintain your document structure, use the [`flexible`](ingest-pipelines.md#access-source-flexible) access pattern in your pipeline definition. Otherwise Ingest processors cannot access dotted field names. +If your document contains flattened objects, use the [`dot_expander`](elasticsearch://reference/enrich-processor/dot-expand-processor.md) processor to expand them. If you wish to maintain your document structure, use the [`flexible`](ingest-pipelines.md#access-source-pattern-flexible) access pattern in your pipeline definition. Otherwise Ingest processors cannot access dotted field names. :::: @@ -431,7 +431,7 @@ PUT _ingest/pipeline/my-pipeline } ``` -## Flexible field access pattern [access-source-flexible] +## Ingest field access pattern [access-source-pattern] ```{applies_to} serverless: ga stack: ga 9.2 @@ -439,7 +439,7 @@ stack: ga 9.2 The default ingest pipeline access pattern does not recognize dotted field names in documents. Retrieving flattened and dotted field names from an ingest document requires a different field retrieval algorithm that does not have this limitation. We know that some pipelines have come to rely on these dotted field name limitations in their logic. In order to continue supporting the original behavior while still adding support for dotted field names, ingest pipelines now support configuring an access pattern to use for all processors in the pipeline. -The `field_access_pattern` property on an ingest pipeline defines how ingest document fields are read and written for all processors in the current pipeline. It accepts two values: `classic` and `flexible`. +The `field_access_pattern` property on an ingest pipeline defines how ingest document fields are read and written for all processors in the current pipeline. It accepts two values: `classic` (which is the default) and `flexible`. ```console PUT _ingest/pipeline/my-pipeline @@ -459,6 +459,7 @@ PUT _ingest/pipeline/my-pipeline 1. All processors in this pipeline will use the `classic` access pattern. 2. The logic for resolving field paths used by processors to read and write values to ingest documents is based on the access pattern. +### Classic field access pattern [access-source-pattern-classic] The `classic` access pattern is the default access pattern that has been around since ingest node first released. Field paths given to processors (e.g. `event.tags.ingest.processed_by`) are split on the dot character (`.`). The processor then uses the resulting field names to traverse the document until a value is found. When writing a value to a document, if its parent fields do not exist in the source, the processor will create nested objects for the missing fields. ```console @@ -564,6 +565,8 @@ If the documents you are ingesting contain dotted field names, to read them with ``` If the `event.tags` field was processed with the [`dot_expander`](elasticsearch://reference/enrich-processor/dot-expand-processor.md) processor, the field values would collide. The `http.host` field cannot be a text value and an object value at the same time. +### Flexible field access pattern [access-source-pattern-flexible] + The `flexible` access pattern allows for ingest pipelines to access both nested and dotted field names without using the [`dot_expander`](elasticsearch://reference/enrich-processor/dot-expand-processor.md) processor. Additionally, when writing a value to a field that does not exist, any parent fields that are missing are concatenated to the start of the new key. Use the `flexible` access pattern if your documents have dotted field names, and also if you prefer to write missing fields to the document with dotted names. ```console From 02bf5c1aab83bc975ed7521d9e9bbd969998c38e Mon Sep 17 00:00:00 2001 From: James Baiera Date: Fri, 26 Sep 2025 14:41:32 -0400 Subject: [PATCH 6/6] Whitespace --- manage-data/ingest/transform-enrich/ingest-pipelines.md | 1 + 1 file changed, 1 insertion(+) diff --git a/manage-data/ingest/transform-enrich/ingest-pipelines.md b/manage-data/ingest/transform-enrich/ingest-pipelines.md index aec1b3a49d..2c9e71af14 100644 --- a/manage-data/ingest/transform-enrich/ingest-pipelines.md +++ b/manage-data/ingest/transform-enrich/ingest-pipelines.md @@ -460,6 +460,7 @@ PUT _ingest/pipeline/my-pipeline 2. The logic for resolving field paths used by processors to read and write values to ingest documents is based on the access pattern. ### Classic field access pattern [access-source-pattern-classic] + The `classic` access pattern is the default access pattern that has been around since ingest node first released. Field paths given to processors (e.g. `event.tags.ingest.processed_by`) are split on the dot character (`.`). The processor then uses the resulting field names to traverse the document until a value is found. When writing a value to a document, if its parent fields do not exist in the source, the processor will create nested objects for the missing fields. ```console