Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ping input does not report readings with duration of 0 correctly #3503

Closed
xkilian opened this issue Nov 25, 2017 · 9 comments
Closed

Ping input does not report readings with duration of 0 correctly #3503

xkilian opened this issue Nov 25, 2017 · 9 comments
Labels
bug unexpected problem or unintended behavior
Milestone

Comments

@xkilian
Copy link

xkilian commented Nov 25, 2017

Directions

Using the ping plugin the output is inconsistent when used in conjunction with the basicstats

Bug report

Relevant telegraf.conf:

System info:

Windows 10 and windows server 2008 (that I tested)
Telegraf Master from a two weeks ago.

Steps to reproduce:

  1. Configure Telegraf with ping input, basicstats and graphite output

Expected behavior:

Receive all series from the graphite output.

Actual behavior:

Systems are pinged as expected but...
Some series end up missing altogether(not always the sames ones), some appear and disappear. Some are only missing the metrics associated with basicstats (ex. mean).

Additional info:

I will try without basicstats to see if it is related.

Feature request

1 - Have it work consistently.
2 - Do not depend on OS ping output (Native Go pinger) or create a win_ping plugin that will handle correctly the various particularities of the ping output.

@danielnelson
Copy link
Contributor

Can you add your config file and some example output in line protocol format, just enough to show the issue. You can use a file output like:

[[outputs.file]]
  files = ["ping.log"]
  namepass = ["ping"]
  data_format = "influx"

Adding native Go ping is #2833

@xkilian
Copy link
Author

xkilian commented Nov 29, 2017

I will get the information today, I will have access to the servers.
Issue is seen with Windows server 2008, not in Windows 10.

@xkilian
Copy link
Author

xkilian commented Nov 29, 2017

  • The file output data_format : influxdb
    all expected data is there. CORRECTION : Nope I am blind, influxdb is also missing average/max...

  • The file output data_format : graphite
    average, max response times are missing or show up only occasionally. Count, number of packets sent/received, etc are all present.

I will send an example output in a couple hours.
I was not running any aggregator.
I have a single Graphite output destination.

@danielnelson
Copy link
Contributor

Very strange. Look forward to seeing the output, please make sure to include some "good" influxdb output so I can feed it back into Telegraf and hopefully reproduce the missing output.

@pstatho
Copy link

pstatho commented Nov 29, 2017

Not entirely sure if this is the same issue that @xkilian is having, but my issue is that I get no response metrics when the latency is low i.e. 0ms. That is because of the check done on here

On Windows, when pinging internal network devices the response time may be 0ms such as:

C:\Windows\System32>ping 10.11.1.1
Pinging 10.11.1.1 with 32 bytes of data:
Reply from 10.11.1.1: bytes=32 time<1ms TTL=64
Reply from 10.11.1.1: bytes=32 time<1ms TTL=64
Reply from 10.11.1.1: bytes=32 time<1ms TTL=64
Reply from 10.11.1.1: bytes=32 time<1ms TTL=64

Ping statistics for 10.11.1.1:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 0ms, Maximum = 0ms, Average = 0ms

So when that value is parsed as a value of 0, no response time metrics (average_response_ms, minimum_response_ms, and maximum_response_ms) are returned. Those values should be returned even if the value is 0.

I recommend initializing the values to -1, and if there is an issue parsing, return -1 to inform the user there is an issue. Return no values at all for those metrics is not a good idea.

Why I want to plot a zero (0) value, is that there may be congestion happening at some point and I actually want to see a low value including zero, and anything over 20ms can be an alert. As it stands now with no metric, I can't even create a graph since InfluxDB doesn't even have that value.

@xkilian
Copy link
Author

xkilian commented Nov 30, 2017

That is exactly it. Which is why the only values I would sometimes see are 1 and 2. Never 0.

@xkilian
Copy link
Author

xkilian commented Nov 30, 2017

graphite output 👍 telegraf.10_116.ping.reply_received 2 1511992562
telegraf.10_116.ping.packets_received 2 1511992562
telegraf.10_116.ping.percent_packet_loss 0 1511992562
telegraf.10_116.ping.percent_reply_loss 0 1511992562
telegraf.10_116.ping.maximum_response_ms 1 1511992562
telegraf.10_116.ping.result_code 0 1511992562
telegraf.10_116.ping.packets_transmitted 2 1511992562
telegraf.10_103.ping.packets_transmitted 2 1511992562
telegraf.10_103.ping.reply_received 2 1511992562
telegraf.10_103.ping.packets_received 2 1511992562
telegraf.10_103.ping.percent_packet_loss 0 1511992562
telegraf.10_103.ping.percent_reply_loss 0 1511992562
telegraf.10_103.ping.result_code 0 1511992562

@xkilian
Copy link
Author

xkilian commented Nov 30, 2017

Both the influx db and graphite output are the same as pstatho described. Not sure why I thought I was seeing average, max in the infludb output. Guess I was tired and it doesn't help that the influx output data is unsorted, kind of annoying but I guess that is another issue.

@danielnelson danielnelson changed the title Ping output behaviour in windows is erratic Ping input does not report readings with duration of 0 correctly Nov 30, 2017
@danielnelson danielnelson added bug unexpected problem or unintended behavior and removed need more info labels Nov 30, 2017
@danielnelson danielnelson added this to the 1.5.3 milestone Feb 9, 2018
@danielnelson
Copy link
Contributor

Closed by #3503, thanks to @efficks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

3 participants