Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

telegraf fails to start if its wavefront output fails to resolve hostname #4357

Closed
dimas opened this issue Jun 28, 2018 · 2 comments
Closed

Comments

@dimas
Copy link

dimas commented Jun 28, 2018

Note that I already reported a similar issue a while ago - #3545 , which was resolved since.

Relevant telegraf.conf:

[[outputs.wavefront]]
  host = "no-such-host"
  port = 2878
  # Do not convert underscores we have in metrics to dots
  convert_paths = false

System info:

$ uname -a
Linux ip-10-0-164-199 3.13.0-149-generic #199-Ubuntu SMP Thu May 17 10:12:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

$ dpkg -l | grep telegraf
ii  telegraf                                  1.6.2-1                                    amd64        Plugin-driven server agent for reporting metrics into InfluxDB.

Expected behavior:

$ sudo service telegraf start
Starting the process telegraf [ OK ]
telegraf process was started [ OK ]

should start the daemon which must be trying to connect to the wavefront forever

Actual behavior:

The /var/log/telegraf/telegraf.log shows:

2018-06-28T12:24:11Z E! Failed to connect to output wavefront, retrying in 15s, error was 'Wavefront: TCP address cannot be resolved lookup no-such-host on 10.0.0.2:53: no such host' 
2018-06-28T12:24:26Z E! Wavefront: TCP address cannot be resolved lookup no-such-host on 10.0.0.2:53: no such host

and the process terminates

@glinton
Copy link
Contributor

glinton commented Jun 28, 2018

I worry that retrying indefinitely without explicitly configuring telegraf to do so could create some false sense of it "just working." I think a connection retry config option would satisfy this, if set to <0, it would retry forever, otherwise it could retry for the x number of times. It would also be beneficial if this retry-er had a connection backoff that waited a bit longer each retry (to a point) so it doesn't end up spamming the connection/logs.

@danielnelson
Copy link
Contributor

We have an existing issue open for this in #3723.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants