aggregators should not re-run processors #7993

ssoroka · 2020-08-16T21:03:20Z

Relevant telegraf.conf:

[[inputs.file]]
  files = ["file.txt"] # contains `data 1`
  data_format = "influx"

[[processors.starlark]]
  # Multiply any float fields by 10
  source = '''
def apply(metric):
    for k, v in metric.fields.items():
        if type(v) == "float":
            metric.fields[k] = v * 10
    return metric
'''

# Keep the aggregate min/max of each metric passing through.
[[aggregators.minmax]]
  period = "30s"
  drop_original = false

Steps to reproduce:

run config like above. Run aggregator with any non-idempotent processor

Expected behavior:

processor runs once, outputting data 10

Actual behavior:

processor runs twice, outputting data 100

Additional info:

This is an old feature of aggregators, and it's due to the fact that aggregators re-run all processors. This was probably figured to be a good idea since you may want to modify the output of aggregators, but it's 1. wasteful, and 2. breaks any processors that can't be applied twice to the same field, eg x * 10 where x=1 gives you 10 after one run, but 100 after two runs; it's not idempotent. Adding an aggregator will break any of these processors.

The solution here would be to convert aggregators to be processors themselves. This will let you order them across both processors and aggregators, and resolve the problem in a very sensible way. Aggregators are a special case of processors, and with the new streaming processor support, they fit perfectly into the processor model.

The text was updated successfully, but these errors were encountered:

iot-monitor-net · 2020-12-28T23:35:14Z

Took me some time to figure out what was wrong, as I ran into this problem as well. Would be great to have this resolved.

ssoroka · 2021-03-01T21:31:22Z

This will be part of Telegraf 2.0

jimmyseto · 2021-03-14T16:50:01Z

@ssoroka - is there a workaround for this in 1.17.x by any chance? we can use metric filtering to bypass processors the second time around. but, if we are using execd, for example, and we don't want a second instance of that program to spin up, is there way to prevent that from happening?

ssoroka · 2021-07-02T19:51:15Z

If you're using aggregators you can't prevent a second copy from spinning up in 1.x

sjwang90 · 2021-08-13T18:50:11Z

Behavior when fix is implemented

 +------------+                     Processors and aggregators can be ordered                     +--------+
 | Input      +---+                        and chained arbitrarily                           +--->+ Output |
 +------------+   |                                                                          |    +--------+
                  |                                                                          |
 +------------+   |   +-----------+    +------------+      +-----------+    +------------+   |    +--------+
 | Input      +------>+ Processor +--->+ Aggregator +----->+ Processor +--->+ Aggregator +-->---->+ Output |
 +------------+   |   +-----------+    +------------+      +-----------+    +------------+   |    +--------+
                  |                                                                          |
 +------------+   |                                                                          |    +--------+
 | Input      +---+                                                                          +--->+ Output |
 +------------+   |                                                                          |    +--------+
                  |                                                                          |
 +------------+   |                                                                          |    +--------+
 | Input      +---+                                                                          +--->+ Output |
 +------------+                                                                                   +--------+

redbaron · 2023-05-06T10:54:57Z

@ssoroka - is there a workaround for this in 1.17.x by any chance? we can use metric filtering to bypass processors the second time around. but, if we are using execd, for example, and we don't want a second instance of that program to spin up, is there way to prevent that from happening?

I haven't tried myself, but adding marker tag in the aggregator, then tagdrop in processor and tagexclude in output should prevent loops

fixes: influxdata#7993

ssoroka added bug unexpected problem or unintended behavior area/agent labels Aug 16, 2020

ssoroka mentioned this issue Aug 16, 2020

Output buffer persistence #802

Open

ssoroka self-assigned this Aug 19, 2020

sjwang90 added this to the 1.16.0 milestone Sep 14, 2020

sjwang90 modified the milestones: 1.16.0, 1.15.4, Planned Oct 19, 2020

ssoroka added area/execd Issues related to execd or plugins that would be better suited to be used through execd area/starlark labels Nov 3, 2020

sjwang90 removed this from the Planned milestone Jan 29, 2021

sjwang90 assigned sspaink Feb 1, 2021

sjwang90 added this to the 2.0.0 milestone Apr 14, 2021

ssoroka mentioned this issue Jul 27, 2021

feat: Config api 2 #9546

Closed

reimda unassigned ssoroka Dec 6, 2021

powersj mentioned this issue Dec 13, 2021

aggregators.merge causes processing loop #10261

Closed

sspaink mentioned this issue Mar 9, 2022

Make breaking changes that require a major version update (2.0) #9478

Closed

75 tasks

hassanbabaie mentioned this issue Aug 12, 2022

Ability for processors and aggregators can be ordered and chained arbitrarily does not appear to be documented? #11668

Closed

sspaink added the size/l 1 week or more effort label Sep 28, 2022

sspaink removed their assignment Nov 17, 2022

LarsStegman mentioned this issue Jan 19, 2023

Output from any point in the pipeline #12523

Open

powersj removed this from the 2.0.0 milestone Feb 2, 2023

powersj mentioned this issue Feb 10, 2023

processors.starlark and aggregators.basicstats will be core when used together #12649

Closed

powersj self-assigned this Feb 21, 2024

powersj added a commit to powersj/telegraf that referenced this issue Feb 22, 2024

feat(agent): Add option to skip re-running processors after aggregators

7170928

fixes: influxdata#7993

powersj mentioned this issue Feb 22, 2024

feat(agent): Add option to skip re-running processors after aggregators #14882

Merged

1 task

DStrand1 closed this as completed in #14882 Feb 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aggregators should not re-run processors #7993

aggregators should not re-run processors #7993

ssoroka commented Aug 16, 2020 •

edited

Loading

iot-monitor-net commented Dec 28, 2020

ssoroka commented Mar 1, 2021

jimmyseto commented Mar 14, 2021

ssoroka commented Jul 2, 2021

sjwang90 commented Aug 13, 2021

redbaron commented May 6, 2023 •

edited

Loading

aggregators should not re-run processors #7993

aggregators should not re-run processors #7993

Comments

ssoroka commented Aug 16, 2020 • edited Loading

Relevant telegraf.conf:

Steps to reproduce:

Expected behavior:

Actual behavior:

Additional info:

iot-monitor-net commented Dec 28, 2020

ssoroka commented Mar 1, 2021

jimmyseto commented Mar 14, 2021

ssoroka commented Jul 2, 2021

sjwang90 commented Aug 13, 2021

redbaron commented May 6, 2023 • edited Loading

ssoroka commented Aug 16, 2020 •

edited

Loading

redbaron commented May 6, 2023 •

edited

Loading