From 37f94b3b7ac2a5268d902b12b6f463352e6420e8 Mon Sep 17 00:00:00 2001 From: kosabogi Date: Thu, 20 Mar 2025 08:32:33 +0100 Subject: [PATCH 1/3] Adds explanation on the delay parameter --- docs/reference/transform/checkpoints.asciidoc | 2 ++ docs/reference/transform/usage.asciidoc | 9 +++++++++ 2 files changed, 11 insertions(+) diff --git a/docs/reference/transform/checkpoints.asciidoc b/docs/reference/transform/checkpoints.asciidoc index 77e1eae318327..62875d0f7d890 100644 --- a/docs/reference/transform/checkpoints.asciidoc +++ b/docs/reference/transform/checkpoints.asciidoc @@ -21,6 +21,8 @@ Using a simple periodic timer, the {transform} checks for changes to the source indices. This check is done based on the interval defined in the transform's `frequency` property. + +If new data is ingested with a slight delay, it might not be immediately available when the transform runs. To prevent missing documents, you can use the `delay` parameter in the `sync` configuration. This shifts the search window backward, ensuring that late-arriving data is included before a checkpoint processes it. Adjusting this value based on your data ingestion patterns can help ensure completeness. ++ If the source indices remain unchanged or if a checkpoint is already in progress then it waits for the next timer. + diff --git a/docs/reference/transform/usage.asciidoc b/docs/reference/transform/usage.asciidoc index 2153ee63aa510..1dec88d3dc5e6 100644 --- a/docs/reference/transform/usage.asciidoc +++ b/docs/reference/transform/usage.asciidoc @@ -53,3 +53,12 @@ have a high level dashboard that is accessed by a large number of users and it uses a complex aggregation over a large dataset, it may be more efficient to create a {transform} to cache results. Thus, each user doesn't need to run the aggregation query. + +* You need to account for late-arriving data. ++ +In some cases, data might not be immediately available when a transform runs, leading to missing records in the destination index. This can happen due to ingestion delays, where documents take a few seconds or minutes to become searchable after being indexed. +To handle this, the `delay` parameter in the transform's sync configuration allows you to postpone processing new data. Instead of always querying the most recent records, the transform will skip a short period of time (e.g., 60 seconds) to ensure all relevant data has arrived before processing. +For example, if a transform runs every 5 minutes, it usually processes data from 5 minutes ago up to the current time. However, if you set `delay` to 60 seconds, the transform will instead process data from 6 minutes ago up to 1 minute ago, making sure that any documents that arrived late are included. +By adjusting the `delay` parameter, you can improve the accuracy of transformed data while still maintaining near real-time results. + + From 2fe32b312271a2f8eb359063ffdbe43f5fbf49d2 Mon Sep 17 00:00:00 2001 From: kosabogi Date: Mon, 24 Mar 2025 11:45:29 +0100 Subject: [PATCH 2/3] Attribute fixes --- docs/reference/transform/checkpoints.asciidoc | 2 +- docs/reference/transform/usage.asciidoc | 7 ++++--- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/docs/reference/transform/checkpoints.asciidoc b/docs/reference/transform/checkpoints.asciidoc index 62875d0f7d890..8a08f483f94ff 100644 --- a/docs/reference/transform/checkpoints.asciidoc +++ b/docs/reference/transform/checkpoints.asciidoc @@ -21,7 +21,7 @@ Using a simple periodic timer, the {transform} checks for changes to the source indices. This check is done based on the interval defined in the transform's `frequency` property. + -If new data is ingested with a slight delay, it might not be immediately available when the transform runs. To prevent missing documents, you can use the `delay` parameter in the `sync` configuration. This shifts the search window backward, ensuring that late-arriving data is included before a checkpoint processes it. Adjusting this value based on your data ingestion patterns can help ensure completeness. +If new data is ingested with a slight delay, it might not be immediately available when the {transform} runs. To prevent missing documents, you can use the `delay` parameter in the `sync` configuration. This shifts the search window backward, ensuring that late-arriving data is included before a checkpoint processes it. Adjusting this value based on your data ingestion patterns can help ensure completeness. + If the source indices remain unchanged or if a checkpoint is already in progress then it waits for the next timer. diff --git a/docs/reference/transform/usage.asciidoc b/docs/reference/transform/usage.asciidoc index 1dec88d3dc5e6..e917596e07f88 100644 --- a/docs/reference/transform/usage.asciidoc +++ b/docs/reference/transform/usage.asciidoc @@ -56,9 +56,10 @@ aggregation query. * You need to account for late-arriving data. + -In some cases, data might not be immediately available when a transform runs, leading to missing records in the destination index. This can happen due to ingestion delays, where documents take a few seconds or minutes to become searchable after being indexed. -To handle this, the `delay` parameter in the transform's sync configuration allows you to postpone processing new data. Instead of always querying the most recent records, the transform will skip a short period of time (e.g., 60 seconds) to ensure all relevant data has arrived before processing. -For example, if a transform runs every 5 minutes, it usually processes data from 5 minutes ago up to the current time. However, if you set `delay` to 60 seconds, the transform will instead process data from 6 minutes ago up to 1 minute ago, making sure that any documents that arrived late are included. +In some cases, data might not be immediately available when a {transform} runs, leading to missing records in the destination index. This can happen due to ingestion delays, where documents take a few seconds or minutes to become searchable after being indexed. +To handle this, the `delay` parameter in the {transform}'s sync configuration allows you to postpone processing new data. Instead of always querying the most recent records, the {transform} will skip a short period of time (e.g., 60 seconds) to ensure all relevant data has arrived before processing. ++ +For example, if a {transform} runs every 5 minutes, it usually processes data from 5 minutes ago up to the current time. However, if you set `delay` to 60 seconds, the {transform} will instead process data from 6 minutes ago up to 1 minute ago, making sure that any documents that arrived late are included. By adjusting the `delay` parameter, you can improve the accuracy of transformed data while still maintaining near real-time results. From b34749c31095e206828d5b3ce37ca561c8d1a23b Mon Sep 17 00:00:00 2001 From: kosabogi <105062005+kosabogi@users.noreply.github.com> Date: Mon, 24 Mar 2025 13:48:08 +0100 Subject: [PATCH 3/3] Update docs/reference/transform/usage.asciidoc MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: István Zoltán Szabó --- docs/reference/transform/usage.asciidoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/reference/transform/usage.asciidoc b/docs/reference/transform/usage.asciidoc index e917596e07f88..0fd3822a22fc3 100644 --- a/docs/reference/transform/usage.asciidoc +++ b/docs/reference/transform/usage.asciidoc @@ -57,7 +57,7 @@ aggregation query. * You need to account for late-arriving data. + In some cases, data might not be immediately available when a {transform} runs, leading to missing records in the destination index. This can happen due to ingestion delays, where documents take a few seconds or minutes to become searchable after being indexed. -To handle this, the `delay` parameter in the {transform}'s sync configuration allows you to postpone processing new data. Instead of always querying the most recent records, the {transform} will skip a short period of time (e.g., 60 seconds) to ensure all relevant data has arrived before processing. +To handle this, the `delay` parameter in the {transform}'s sync configuration allows you to postpone processing new data. Instead of always querying the most recent records, the {transform} will skip a short period of time (for example, 60 seconds) to ensure all relevant data has arrived before processing. + For example, if a {transform} runs every 5 minutes, it usually processes data from 5 minutes ago up to the current time. However, if you set `delay` to 60 seconds, the {transform} will instead process data from 6 minutes ago up to 1 minute ago, making sure that any documents that arrived late are included. By adjusting the `delay` parameter, you can improve the accuracy of transformed data while still maintaining near real-time results.