Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update readme for ping input to clarify supported platforms/ping versions #5665

Closed
victorhooi opened this issue Apr 2, 2019 · 22 comments
Closed
Labels
docs Issues related to Telegraf documentation and configuration descriptions

Comments

@victorhooi
Copy link

The current ping plugin appears to depend on iputils-ping, per the README.

However, this package is Linux-specific, and not supported on any other platform (e.g. FreeBSD, Windows etc.)

Is there a specific reason we require the Linux version of ping? Can there be some mode/functionality that also works on say, FreeBSD?

(My specific use case is for use on a pfSense box, with Telegraf installed in order to provide latency stats).

@sawo1337
Copy link

sawo1337 commented Apr 2, 2019

I use it on Windows without issues? Windows Server 2016 box.

@glinton
Copy link
Contributor

glinton commented Apr 2, 2019

@victorhooi What pfsense version are you using? From what I can see in pfsense 2.4.4, ping has the same options and output as telegraf expects. Did you try and it failed?

@glinton
Copy link
Contributor

glinton commented Apr 2, 2019

I just confirmed the ping input plugin works on pfsense 2.4.4. Maybe we need to update the readme

@glinton glinton changed the title Telegraf ping plugin only works on Linux - but not FreeBSD, Windows etc. Update readme for ping input to clarify supported platforms/ping versions Apr 2, 2019
@glinton glinton added docs Issues related to Telegraf documentation and configuration descriptions and removed need more info labels Apr 2, 2019
@victorhooi
Copy link
Author

Oh - that's fantastic news!

Sorry, yes, I was going off the Telegraf README which seemed to suggest I needed the Linux-only version of ping. (I was actually curious why this was).

Are you able to share your pfSense telegraf config? I can test it on one of my instances today.

@glinton
Copy link
Contributor

glinton commented Apr 4, 2019

I just ran the ping plugin on it, so it's purely a POC config, nothing useful whatsoever:

[agent]
  interval="1s"
  flush_interval="1s"
  omit_hostname=true

[[inputs.ping]]
  urls = ["someurl.com"]
  count = 3

[[outputs.file]]
  files = ["stdout"]

@victorhooi
Copy link
Author

I added the following as a custom directive for Telegraf in pfSense:

[[inputs.ping]]
  urls = ["example.org"]
  count = 3

(My Telegraf package is already configured to output to InfluxDB, and I can confirm that works. Fro the agent config, I believe pfSense Telegraf already defaults to an interval of 1.0 second, and I think ping should still work with the default flush_interval, and with setting hostnames?)

However, when I check InfluxDB, the ping plugin only seems to return a single field (result_code):

> select * FROM ping
name: ping
time                host                   result_code url
----                ----                   ----------- ---
1554363432000000000 ang-router.localdomain 0           example.org
1554363440000000000 ang-router.localdomain 0           example.org
1554363450000000000 ang-router.localdomain 0           example.org
1554363460000000000 ang-router.localdomain 0           example.org
1554363470000000000 ang-router.localdomain 0           example.org
1554363480000000000 ang-router.localdomain 0           example.org
1554363490000000000 ang-router.localdomain 0           example.org
1554363500000000000 ang-router.localdomain 0           example.org
1554363510000000000 ang-router.localdomain 0           example.org
1554363520000000000 ang-router.localdomain 0           example.org
1554363530000000000 ang-router.localdomain 0           example.org
1554363540000000000 ang-router.localdomain 0           example.org
1554363550000000000 ang-router.localdomain 0           example.org
1554363560000000000 ang-router.localdomain 0           example.org
1554363570000000000 ang-router.localdomain 0           example.org
1554363580000000000 ang-router.localdomain 0           example.org

Why is it not writing the other fields? (e.g. packets_transmitted, packets_received, percent_packets_loss etc.)

Super confused...

@victorhooi
Copy link
Author

Looking at #4613 - could it be that the output format is different somehow?

@victorhooi
Copy link
Author

I saw this post and there was a suggestion to try the following command-line:

ping -c 8 -n -s 16 -i 1.0 -W 1.0 8.8.8.8

I ran this on a Ubuntu host:

victorhooi@unifi-monitoring:~$ ping -c 8 -n -s 16 -i 1.0 -W 1.0 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 16(44) bytes of data.
24 bytes from 8.8.8.8: icmp_seq=1 ttl=59 time=0.965 ms
24 bytes from 8.8.8.8: icmp_seq=2 ttl=59 time=0.567 ms
24 bytes from 8.8.8.8: icmp_seq=3 ttl=59 time=0.575 ms
24 bytes from 8.8.8.8: icmp_seq=4 ttl=59 time=0.527 ms
24 bytes from 8.8.8.8: icmp_seq=5 ttl=59 time=0.659 ms
24 bytes from 8.8.8.8: icmp_seq=6 ttl=59 time=0.789 ms
24 bytes from 8.8.8.8: icmp_seq=7 ttl=59 time=0.677 ms
24 bytes from 8.8.8.8: icmp_seq=8 ttl=59 time=0.743 ms

--- 8.8.8.8 ping statistics ---
8 packets transmitted, 8 received, 0% packet loss, time 142ms
rtt min/avg/max/mdev = 0.527/0.687/0.965/0.138 ms

I then ran this on pfSense:

[2.4.4-RELEASE][admin@ang-router.localdomain]/root: ping -c 8 -n -s 16 -i 1.0 -W 1.0 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 16 data bytes

--- 8.8.8.8 ping statistics ---
8 packets transmitted, 8 packets received, 0.0% packet loss, 8 packets out of wait time
round-trip min/avg/max/stddev = 11.254/23.166/41.044/10.736 ms

Does that tell us anything useful?

Also, might be nice to add to ping plugin README what the default command-line it runs under the hood is, if that helps diagnosis.

@glinton
Copy link
Contributor

glinton commented Apr 4, 2019

How did you install telegraf on your pfsense box? The builtin package manger version is 1.6.3. I'm going to assume it's that version, as 1.10.x works fine. Until that gets updated, you can (and should) manually install a newer version. As simply as dropping the newer binary over top of the old one.

@victorhooi
Copy link
Author

On FreeBSD, "-W" sets the timeout in milliseconds .

On Linux, "-W" sets the timeout in seconds.

https://unix.stackexchange.com/questions/63651/what-is-the-difference-between-ping-w-and-ping-w

Also, I realised you can run telegraf with --test on the pfSense box to help debug some issues:

/usr/local/bin/telegraf -config=/usr/local/etc/telegraf.conf --test

Anyhow, I try to set the ping timeout in my telegraf.conf:

[[inputs.ping]]^M
  urls = ["example.org"]^M
  count = 3
  timeout = 1000.0

but I get an error:

* Plugin: inputs.ping, Collection 1
2019-04-04T17:35:07Z E! Error in plugin [inputs.ping]: host example.org: ping: illegal option -- w
usage: ping [-AaDdfnoQqRrv] [-c count] [-G sweepmaxsize] [-g sweepminsize]
            [-h sweepincrsize] [-i wait] [-l preload] [-M mask | time] [-m ttl]
            [-P policy] [-p pattern] [-S src_addr] [-s packetsize] [-t timeout]
            [-W waittime] [-z tos] host
       ping [-AaDdfLnoQqRrv] [-c count] [-I iface] [-i wait] [-l preload]
            [-M mask | time] [-m ttl] [-P policy] [-p pattern] [-S src_addr]
            [-s packetsize] [-T ttl] [-t timeout] [-W waittime]
            [-z tos] mcast-group, exit status 64
> ping,host=ang-router.localdomain,url=example.org result_code=0i 1554399307000000000

I thought timeout was meant to use -W (uppercase), not -w (lowercase)?

Sorry for the number of comments, trying to include all the steps taken and information as I try to troubleshoot this myself, in case it helps.

@victorhooi
Copy link
Author

Ah yes - I am using the inbuilt pfSense package, which is indeed 1.6.3. This is running on a Netgate XG-7100 (amd64).

I downloaded the latest 1.10.2 binary from here using curl.

I had to stop the Telegraf service on pfSense, as FreeBSD complained about /usr/local/bin/telegraf being busy:

[2.4.4-RELEASE][admin@ang-router.localdomain]/usr/local/bin: cp /tmp/telegraf/usr/bin/telegraf .
cp: ./telegraf: Text file busy

I then restarted the service, and can confirm it all works now!

Was is just a bug with the ping plugin in the older Telegraf?

Also - I noticed there's no arm package for FreeBSD on releases. Is that intentional? (This is for devices like the Netgate SG-3100, which I think is arm64 - there's this PR for pfSense to build the package, but if Influx also provides a binary, it means I can drop-in replace as I did above).

@glinton
Copy link
Contributor

glinton commented Apr 4, 2019

no arm package for FreeBSD on releases. Is that intentional?

I'm not certain, it could be though, due to lack of demand..

@danielnelson
Copy link
Contributor

You may want to consider using the packages from FreeBSD ports https://www.freshports.org/net-mgmt/telegraf/

@victorhooi
Copy link
Author

I believe pfSense pulls from FreeBSD ports - but they're usually delayed by a few months (or longer, in some cases, I believe).

Using the drop-in replacement binary, as @glinton suggested above worked well on the Netgate XG-7100 (x64) based hardware.

Netgate also make several ARM-based devices. Would be super useful if there were arm64 binaries available as well, to use whilst we waited for pfSense to update their packages.

@glinton
Copy link
Contributor

glinton commented Apr 8, 2019

You have to configure pfsense to pull from the ports. If configuration isn't for you, you can manually add the package using:

pkg add http://pkg0.cyb.freebsd.org/FreeBSD:11:amd64/latest/All/telegraf-1.10.1.txz

@danielnelson
Copy link
Contributor

I updated the documentation to be less confusing. 90593a0

@girgen Maybe we could add arm support to the build.py file. I looked at this patch but I must be missing something, it doesn't seem like this is enough to support tgz and the right ARM flags. Is there an additional patch?

@glinton
Copy link
Contributor

glinton commented Apr 9, 2019

Closed in 90593a0

@glinton glinton closed this as completed Apr 9, 2019
@girgen
Copy link

girgen commented Apr 10, 2019

@girgen Maybe we could add arm support to the build.py file. I looked at this patch but I must be missing something, it doesn't seem like this is enough to support tgz and the right ARM flags. Is there an additional patch?

Yes, you also need to copy the files according to https://svnweb.freebsd.org/ports/head/net-mgmt/telegraf/Makefile?r1=485905&r2=490433

cp src/github.com/shirou/gopsutil/disk/disk_freebsd_386.go  \
     src/github.com/shirou/gopsutil/disk/disk_freebsd_arm.go
cp src/github.com/shirou/gopsutil/cpu/cpu_freebsd_386.go  \
     src/github.com/shirou/gopsutil/cpu/cpu_freebsd_arm.go

@danielnelson
Copy link
Contributor

I'm confused about the patch to build.py specifically, if I make the same changes and run it like:

./scripts/build.py --package --platform=freebsd --arch=all
...
[ERROR] build: Invalid ARM architecture specified: armv6
[ERROR] build: Please specify either 'armel', 'armhf', or 'arm64'.

Now looking at your Makefile I think perhaps you aren't using build.py at all, so perhaps this patch isn't needed. Also, in Telegraf 1.10 and later those two files should already be included in gopsutil.

@danielnelson
Copy link
Contributor

@victorhooi Could you create a new feature request issue for FreeBSD arm package?

@girgen
Copy link

girgen commented Apr 11, 2019

The two "addtional" files, created by the cp commands, are necessary. I cannot say from the top of my head if the build.py patch really makes a difference.

@victorhooi
Copy link
Author

Done - created FR #5714 to add ARM binaries for FreeBSD

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Issues related to Telegraf documentation and configuration descriptions
Projects
None yet
Development

No branches or pull requests

5 participants