Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update go-ping to latest version #8771

Merged
merged 3 commits into from
Feb 1, 2021

Conversation

sspaink
Copy link
Contributor

@sspaink sspaink commented Jan 28, 2021

DON'T MERGE, for testing only to create temporary artifacts to give community a chance to test changes.

Fix #8704

A pull request by @ssoroka has been merged go-ping that should resolve concurrency issues with inputs.ping plugin.

Copy link
Contributor

@telegraf-tiger telegraf-tiger bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤝 ✅ CLA has been signed. Thank you!

@sspaink
Copy link
Contributor Author

sspaink commented Jan 29, 2021

@redrumdk If you would like to try out this change, here are the Telegraf exectuables: https://app.circleci.com/pipelines/github/influxdata/telegraf/3092/workflows/755bf8fc-366a-4d4a-9228-c692611efb8a/jobs/75778/artifacts

If you do get a chance to try it, let us know. We would love to see if the pending changes to go-ping helps the problem you are facing in #8704. Thanks!

@redrumdk
Copy link

redrumdk commented Feb 1, 2021

@redrumdk If you would like to try out this change, here are the Telegraf exectuables: https://app.circleci.com/pipelines/github/influxdata/telegraf/3092/workflows/755bf8fc-366a-4d4a-9228-c692611efb8a/jobs/75778/artifacts

If you do get a chance to try it, let us know. We would love to see if the pending changes to go-ping helps the problem you are facing in #8704. Thanks!

Yeah, just tried the pull request version.
Now all of my 114 IP's are showing a 100% packet loss :(
Running telegraf-1.18.0_8b6a3bce_windows_amd64 on Windows 2019
https://ibb.co/LdpbcdW

With config

[[inputs.ping]]

List of urls to ping

urls = [
"X.X.X.X", (114 IP's in this list)
]
method = "native"
count = 4

@sspaink sspaink changed the title FOR TESTING: Use forked version of go-ping to resolve strange ping output Update go-ping to latest version Feb 1, 2021
@ssoroka ssoroka merged commit f2cf447 into influxdata:master Feb 1, 2021
@sspaink
Copy link
Contributor Author

sspaink commented Feb 1, 2021

@redrumdk thank you for trying out the change, over the weekend @ssoroka made progress on the upstream change and got it successfully working and merged. If you are able to can you try the newest package? https://app.circleci.com/pipelines/github/influxdata/telegraf/3119/workflows/6b0a7e09-29f4-467c-9dcd-0a634fe791e3/jobs/75947/artifacts

@redrumdk
Copy link

redrumdk commented Feb 1, 2021

@redrumdk thank you for trying out the change, over the weekend @ssoroka made progress on the upstream change and got it successfully working and merged. If you are able to can you try the newest package? https://app.circleci.com/pipelines/github/influxdata/telegraf/3119/workflows/6b0a7e09-29f4-467c-9dcd-0a634fe791e3/jobs/75947/artifacts

just tried it, EXEC still gives me errors with 114 test ip's and count set to 4
[inputs.ping] Error in plugin: , exit status 3221225794:

Switching to Native, now my permanently down IP's are showing 100 percent_packet_loss, which is great.
But now i have around 40 of my unique ip's all show 50 percent_packet_loss.
https://ibb.co/vk9ctVq
It also seems like the Count parameter is ignored when using Native ?

@ssoroka
Copy link
Contributor

ssoroka commented Feb 2, 2021

is it possible they've actually got 50% packet loss? A side effect of this update is that we match the exact packets. eg:
send ICMP ping packets 1,2,3,4, and if you receive back 1,1,3,3,3,3, we count that as 2 packets received at 50% loss.

@redrumdk
Copy link

redrumdk commented Feb 2, 2021

is it possible they've actually got 50% packet loss? A side effect of this update is that we match the exact packets. eg:
send ICMP ping packets 1,2,3,4, and if you receive back 1,1,3,3,3,3, we count that as 2 packets received at 50% loss.

It's a constant 50% packetloss, no it's not possible :)
im also pinging manuel with ping.exe from windows.
pinging with telegraf exec method from another server
and a tool called servers alive.
Only native telegraf showing packetloss.

if you look at my screenshot, i have crossed out the IP's, but they are all unique IP's all showing 50% packets loss, with 2 transmitted packets, and 1 received, and constantly as long as telegraf with native is running

@ivorybilled
Copy link
Contributor

@redrumdk are you specifying a Timeout on your config for the plugin while running in native mode? if so, what is it?

@redrumdk
Copy link

redrumdk commented Feb 10, 2021 via email

@ivorybilled
Copy link
Contributor

ivorybilled commented Feb 10, 2021

@redrumdk so I was able to replicate this behavior on windows. I used 100 URLs and had random 50% packet loss for a decent chunk of them. however, after I bumped up the timeout the issues went away. I'd suggest configuring a high timeout and seeing if this helps. For me a 10 second timeout resulted in 0 packet loss. This is for native mode and the latest telegraf (including this pull request)

I believe what's happening here is that the timeout default of 1 second just isn't enough time to do 4 sends/receives for whatever reason, hopefully.

@redrumdk
Copy link

redrumdk commented Feb 10, 2021 via email

@ivorybilled
Copy link
Contributor

@redrumdk yes. I was using a count of 4, so 4 packets were sent/received before reporting metrics. Note that the timeout needs to cover the amount of time it will take to send and receive all 4 of them.

@redrumdk
Copy link

@redrumdk yes. I was using a count of 4, so 4 packets were sent/received before reporting metrics. Note that the timeout needs to cover the amount of time it will take to send and receive all 4 of them.

Damn, seems like your timeout settings worked!
Thanks, must be some kind of bug/undocumented feature.
If for whatever reason the collection can't finish within whatever the default timeout is, it will just count it as packet loss ....
Now it seems i can collect 114 Ip's with a count of 10 and an interval of 60 using Native as long as the timeout is high enough

Tried it for EXEC method as well, but that didn't help.

@ivorybilled
Copy link
Contributor

@redrumdk curiously I was able to run it with exec with no issues, so that still might be something to look into. But glad to hear Native is working! and that's correct, I'm thinking go-ping might want to change so that the timeout is only for one packet send/receive as opposed to all of them.

@redrumdk
Copy link

redrumdk commented Feb 10, 2021

@redrumdk curiously I was able to run it with exec with no issues, so that still might be something to look into. But glad to hear Native is working! and that's correct, I'm thinking go-ping might want to change so that the timeout is only for one packet send/receive as opposed to all of them.

it worked for you with 100 ip's and count of 4+ without getting errors in the debug log for exec ? or errors = 100 in the database ?
try increasing the count, im also on latest build we got from here.

@redrumdk
Copy link

i tested today using the latest official download, and the nightly build (1.18) and they still have problems.
seems to only be fixed on that pull build :)

@ivorybilled
Copy link
Contributor

@redrumdk you're referring to the exec method correct? and yes. I had 100 urls / count of 4 and didn't have issues, but perhaps I need to increase that. I saw your issue about running in exec mode, haven't gotten a chance to dive into it yet.

ssoroka pushed a commit that referenced this pull request Feb 17, 2021
arstercz pushed a commit to arstercz/telegraf that referenced this pull request Mar 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[inputs.ping] Error in plugin: , exit status 3221225794: X.X.X.X
4 participants