Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arbitrary rule can cause all Graphite templates to be ignored #5894

Closed
piotrp opened this issue May 22, 2019 · 2 comments · Fixed by #6135
Closed

Arbitrary rule can cause all Graphite templates to be ignored #5894

piotrp opened this issue May 22, 2019 · 2 comments · Fixed by #6135
Assignees
Labels
bug unexpected problem or unintended behavior
Milestone

Comments

@piotrp
Copy link

piotrp commented May 22, 2019

Relevant telegraf.conf:

[agent]
  omit_hostname = true

[[inputs.socket_listener]]
  service_address = "tcp://:2000"
  data_format = "graphite"
  templates = [
    "taskmanagerTask.alarm-detector.Assign.alarmDefinitionId metricsType.process.nodeId.x.alarmDefinitionId.measurement.field rule=1",
    "taskmanagerTask.*.*.*.*                                 metricsType.process.nodeId.measurement rule=2"
  ]
[[outputs.file]]
  files = ["stdout"]
  data_format = "influx"
  flush_interval = "1s"

System info:

Telegraf 1.10.4

Steps to reproduce:

  1. telegraf --config conf-from-bug-report.conf
  2.  echo "taskmanagerTask.alarm-detector.Assign.alarmDefinitionId.timeout_errors.duration.p75 100 $(date +%s)" | nc 127.0.0.1 2000
     echo "taskmanagerTask.alarm-detector.Assign.numRecordsInPerSecond.m5_rate 100 $(date +%s)" | nc 127.0.0.1 2000
    

Expected behavior:

Telegraf output:

duration,alarmDefinitionId=timeout_errors,metricsType=taskmanagerTask,nodeId=Assign,process=alarm-detector,rule=1,x=alarmDefinitionId p75=100 1558538449000000000
numRecordsInPerSecond,metricsType=taskmanagerTask,nodeId=Assign,process=alarm-detector,rule=2 value=100 1558537634000000000

Actual behavior:

Telegraf output:

duration,alarmDefinitionId=timeout_errors,metricsType=taskmanagerTask,nodeId=Assign,process=alarm-detector,rule=1,x=alarmDefinitionId p75=100 1558538449000000000
taskmanagerTask.alarm-detector.Assign.numRecordsInPerSecond.m5_rate value=100 1558537531000000000

Additional info:

I get expected behavior when I remove first template, but I need both of them to work correctly.

@danielnelson danielnelson added the bug unexpected problem or unintended behavior label May 22, 2019
@GeorgeMac GeorgeMac self-assigned this Jul 16, 2019
@GeorgeMac
Copy link
Contributor

GeorgeMac commented Jul 16, 2019

After doing some digging into this problem I have found some behavior I need to clarify further.

What I find is hard to express in a PR, so bare with me and I am happy to attempt to clarify more.

The parse tree for these filters looks like this:

[taskmanagerTask]
       |            \
[alarm-detector]    [*]
       |             |
   [Assign]         [*]
       |             |
[alarmDefinitionId] [*]

With precedence from left to right. Since we prioritise exact matches over wildcards.
The moment an incoming line starts to match we descend to the depths of one of the branches while matches occur. If at any point we stop making matches, we bail and fallback to the default template. The default template produces the output you are observing.

So your line taskmanagerTask.alarm-detector.Assign.numRecordsInPerSecond.m5_rate matches all the way down the search tree to taskmanagerTask.alarm-detector.Assign and then alarmDefinitionId != numRecordsInPerSecond. The search bails at this point. Rather than going back to other branches which could match. As it forked from the wildcard filter early on when it exactly matched alarm-detector.

@danielnelson @glinton Can you deny or confirm that the desired behavior in this scenario would be to continue searching other candidates which would have also matched early parts of the line?

@danielnelson
Copy link
Contributor

It seems to me that we should backtrack and try the next pattern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants