Ping input does not report readings with duration of 0 correctly #3503

xkilian · 2017-11-25T04:10:24Z

Directions

Using the ping plugin the output is inconsistent when used in conjunction with the basicstats

Bug report

Relevant telegraf.conf:

System info:

Windows 10 and windows server 2008 (that I tested)
Telegraf Master from a two weeks ago.

Steps to reproduce:

Configure Telegraf with ping input, basicstats and graphite output

Expected behavior:

Receive all series from the graphite output.

Actual behavior:

Systems are pinged as expected but...
Some series end up missing altogether(not always the sames ones), some appear and disappear. Some are only missing the metrics associated with basicstats (ex. mean).

Additional info:

I will try without basicstats to see if it is related.

Feature request

1 - Have it work consistently.
2 - Do not depend on OS ping output (Native Go pinger) or create a win_ping plugin that will handle correctly the various particularities of the ping output.

danielnelson · 2017-11-27T20:55:24Z

Can you add your config file and some example output in line protocol format, just enough to show the issue. You can use a file output like:

[[outputs.file]]
  files = ["ping.log"]
  namepass = ["ping"]
  data_format = "influx"

Adding native Go ping is #2833

xkilian · 2017-11-29T15:38:47Z

I will get the information today, I will have access to the servers.
Issue is seen with Windows server 2008, not in Windows 10.

xkilian · 2017-11-29T22:12:57Z

The file output data_format : influxdb
all expected data is there. CORRECTION : Nope I am blind, influxdb is also missing average/max...
The file output data_format : graphite
average, max response times are missing or show up only occasionally. Count, number of packets sent/received, etc are all present.

I will send an example output in a couple hours.
I was not running any aggregator.
I have a single Graphite output destination.

danielnelson · 2017-11-29T22:40:18Z

Very strange. Look forward to seeing the output, please make sure to include some "good" influxdb output so I can feed it back into Telegraf and hopefully reproduce the missing output.

pstatho · 2017-11-29T23:36:57Z

Not entirely sure if this is the same issue that @xkilian is having, but my issue is that I get no response metrics when the latency is low i.e. 0ms. That is because of the check done on here

On Windows, when pinging internal network devices the response time may be 0ms such as:

C:\Windows\System32>ping 10.11.1.1
Pinging 10.11.1.1 with 32 bytes of data:
Reply from 10.11.1.1: bytes=32 time<1ms TTL=64
Reply from 10.11.1.1: bytes=32 time<1ms TTL=64
Reply from 10.11.1.1: bytes=32 time<1ms TTL=64
Reply from 10.11.1.1: bytes=32 time<1ms TTL=64

Ping statistics for 10.11.1.1:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 0ms, Maximum = 0ms, Average = 0ms

So when that value is parsed as a value of 0, no response time metrics (average_response_ms, minimum_response_ms, and maximum_response_ms) are returned. Those values should be returned even if the value is 0.

I recommend initializing the values to -1, and if there is an issue parsing, return -1 to inform the user there is an issue. Return no values at all for those metrics is not a good idea.

Why I want to plot a zero (0) value, is that there may be congestion happening at some point and I actually want to see a low value including zero, and anything over 20ms can be an alert. As it stands now with no metric, I can't even create a graph since InfluxDB doesn't even have that value.

xkilian · 2017-11-30T02:34:08Z

That is exactly it. Which is why the only values I would sometimes see are 1 and 2. Never 0.

xkilian · 2017-11-30T02:37:02Z

graphite output 👍 telegraf.10_116.ping.reply_received 2 1511992562
telegraf.10_116.ping.packets_received 2 1511992562
telegraf.10_116.ping.percent_packet_loss 0 1511992562
telegraf.10_116.ping.percent_reply_loss 0 1511992562
telegraf.10_116.ping.maximum_response_ms 1 1511992562
telegraf.10_116.ping.result_code 0 1511992562
telegraf.10_116.ping.packets_transmitted 2 1511992562
telegraf.10_103.ping.packets_transmitted 2 1511992562
telegraf.10_103.ping.reply_received 2 1511992562
telegraf.10_103.ping.packets_received 2 1511992562
telegraf.10_103.ping.percent_packet_loss 0 1511992562
telegraf.10_103.ping.percent_reply_loss 0 1511992562
telegraf.10_103.ping.result_code 0 1511992562

xkilian · 2017-11-30T02:41:43Z

Both the influx db and graphite output are the same as pstatho described. Not sure why I thought I was seeing average, max in the infludb output. Guess I was tired and it doesn't help that the influx output data is unsorted, kind of annoying but I guess that is another issue.

danielnelson · 2018-02-09T20:14:57Z

Closed by #3503, thanks to @efficks

danielnelson added the need more info label Nov 27, 2017

danielnelson changed the title ~~Ping output behaviour in windows is erratic~~ Ping input does not report readings with duration of 0 correctly Nov 30, 2017

danielnelson added bug unexpected problem or unintended behavior and removed need more info labels Nov 30, 2017

efficks mentioned this issue Feb 9, 2018

Fixes #3503 -- Ping can return a response time of 0 #3778

Merged

danielnelson added this to the 1.5.3 milestone Feb 9, 2018

danielnelson closed this as completed Feb 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ping input does not report readings with duration of 0 correctly #3503

Ping input does not report readings with duration of 0 correctly #3503

xkilian commented Nov 25, 2017

danielnelson commented Nov 27, 2017

xkilian commented Nov 29, 2017

xkilian commented Nov 29, 2017 •

edited

danielnelson commented Nov 29, 2017

pstatho commented Nov 29, 2017

xkilian commented Nov 30, 2017

xkilian commented Nov 30, 2017

xkilian commented Nov 30, 2017

danielnelson commented Feb 9, 2018

Ping input does not report readings with duration of 0 correctly #3503

Ping input does not report readings with duration of 0 correctly #3503

Comments

xkilian commented Nov 25, 2017

Directions

Bug report

Relevant telegraf.conf:

System info:

Steps to reproduce:

Expected behavior:

Actual behavior:

Additional info:

Feature request

danielnelson commented Nov 27, 2017

xkilian commented Nov 29, 2017

xkilian commented Nov 29, 2017 • edited

danielnelson commented Nov 29, 2017

pstatho commented Nov 29, 2017

xkilian commented Nov 30, 2017

xkilian commented Nov 30, 2017

xkilian commented Nov 30, 2017

danielnelson commented Feb 9, 2018

xkilian commented Nov 29, 2017 •

edited