Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Periodically check_ntp_time return "Socket timeout" when one of several ntp server doesn't response #150

Closed
lvasiliev opened this issue Apr 13, 2016 · 8 comments

Comments

@lvasiliev
Copy link

Hi!

Periodically (may be depends of server LA) check_ntp_time return:
CRITICAL - Socket timeout after 45 seconds
when one of the several ntp servers doesn't response. But AVG_NUM of response has been received from other servers.

It the attachment you will find our debug output for using command (nagios-plugins-2.1.1):
/usr/local/libexec/nagios/check_ntp_time -H pool.ntp.org -w 1 -c 5 -t 45 -vv

check_ntp_time.txt

We believe that this commit 1afc22d reduced time for the post-processing function best_offset_server and check_ntp_time is interrupted by alarm (timeout_interval).

Possible patch to the prevent this situation:

--- plugins/check_ntp_time.c.orig       2016-04-11 18:16:54.214032043 +0300
+++ plugins/check_ntp_time.c    2016-04-13 16:09:19.725436017 +0300
@@ -415,6 +415,9 @@
                        }
                }
                /* lather, rinse, repeat. */
+               /* break if we have one response but other ntp servers doesn't response */
+               /* greater than timeout_interval/2 */
+               if (servers_completed && now_time-start_ts > timeout_interval/2) break;
        }

        if (one_read == 0) {
@digitalprecision
Copy link

Amazing that this is still an issue... I am running into the same exact mess. Going to try this patch.

@jfrickson jfrickson added the Bug label Oct 31, 2016
@jfrickson jfrickson self-assigned this Oct 31, 2016
@tmartinfr
Copy link

I'm probably concerned by the same issue. Anyway switching back to the old check_ntp didn't solve the problem, so I'm confused.

@digitalprecision
Copy link

The patch must have worked, as I haven't had the issue since. It's been so long I forgot about it. :/

@dominikborkowski
Copy link

The following is an observation of what happened to us, it may help others. TL;DR version: this plugin will time out in 10 seconds on hosts with link-local IPv6 address only, when querying an NTP pool that includes IPv6 addresses.

We saw the same problem when EPEL switched from nagios-plugins 2.0.3 to 2.1.4 for both CentOS6 & 7. We suddenly were seeing timeouts across all tons of systems, but with various level of randomness (our NTP service has four A records and four AAAA ones). Checking against individual NTP IPs was very fast, so at first it was a bit confusing.

It would be nice if there was a better way of handling this, rather than having to forcefully set the plugin to use IPv4 in those situations.

@jfrickson
Copy link
Contributor

I'm looking into the patch, and will probably integrate it fairly soon.

@jfrickson jfrickson modified the milestones: next bugfix release, 2.2.0, 2.2.2 Apr 5, 2017
jfrickson pushed a commit that referenced this issue Apr 7, 2017
several ntp server doesn't response

Fix for issue #150

Thanks to Leonid Vasiliev for the patch. We may want to make a
better fix at some point, but this should do the job for now.
@jfrickson
Copy link
Contributor

Fixed in branch maint via commit df485c7.

Will be release in version 2.2.2

jfrickson pushed a commit that referenced this issue Apr 7, 2017
several ntp server doesn't response

Fix for issue #150

Thanks to Leonid Vasiliev for the patch. We may want to make a
better fix at some point, but this should do the job for now.
@LorenzBischof
Copy link

What's the timeline for 2.2.2? We currently deactivated this plugin because of the false positives and would like to reenable it

jfrickson pushed a commit that referenced this issue May 17, 2017
several ntp server doesn't response

Fix for issue #150

Thanks to Leonid Vasiliev for the patch. We may want to make a
better fix at some point, but this should do the job for now.
jfrickson pushed a commit that referenced this issue Jun 12, 2017
several ntp server doesn't response

Fix for issue #150

Thanks to Leonid Vasiliev for the patch. We may want to make a
better fix at some point, but this should do the job for now.
@amontalban
Copy link

Would be great to have a new release of this plugin, any planned release date? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants