Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check_ntp_time: NTP CRITICAL: Offset unknown #329

Closed
sbuchb opened this issue Oct 6, 2017 · 16 comments
Closed

check_ntp_time: NTP CRITICAL: Offset unknown #329

sbuchb opened this issue Oct 6, 2017 · 16 comments
Milestone

Comments

@sbuchb
Copy link

sbuchb commented Oct 6, 2017

Hi,

I am trying to monitor the ntp synchronization with check_ntp_time. But I am getting the error "NTP CRITICAL: Offset unknown". The plugin works well most of the time but sometimes the service check gets critical for some time on some hosts. I already recompiled the nagios-plugins and modified the check_ntp_time like suggested here: https://serverfault.com/questions/625027/nagios-check-ntp-time-offset-unknown
In the following output you can see the verbose output of the check command:

[root@monitoring]# ./check_ntp_time -H 192.168.10.223 -vvv
Found 1 peers to check
sending request to peer 0
response from peer 0: packet contents:
	flags: 0xe4
	  li=3 (0xc0)
	  vn=4 (0x20)
	  mode=4 (0x04)
	stratum = 4
	poll = 16
	precision = 5.96046e-08
	rtdelay = 0.0001983642578125
	rtdisp = 7.941909790039062
	refid = b6e0a0a
	refts = 1507279467.201385
	origts = 1507279609.642747
	rxts = 1507279609.896963
	txts = 1507279609.897002
offset 0.2541623116
sending request to peer 0
response from peer 0: packet contents:
	flags: 0xe4
	  li=3 (0xc0)
	  vn=4 (0x20)
	  mode=4 (0x04)
	stratum = 4
	poll = 16
	precision = 5.96046e-08
	rtdelay = 0.0001983642578125
	rtdisp = 7.941909790039062
	refid = b6e0a0a
	refts = 1507279467.201385
	origts = 1507279609.642948
	rxts = 1507279609.89715
	txts = 1507279609.897158
offset 0.2541673183
sending request to peer 0
response from peer 0: packet contents:
	flags: 0xe4
	  li=3 (0xc0)
	  vn=4 (0x20)
	  mode=4 (0x04)
	stratum = 4
	poll = 16
	precision = 5.96046e-08
	rtdelay = 0.0001983642578125
	rtdisp = 7.941909790039062
	refid = b6e0a0a
	refts = 1507279467.201385
	origts = 1507279609.643054
	rxts = 1507279609.89728
	txts = 1507279609.897286
offset 0.254180789
sending request to peer 0
response from peer 0: packet contents:
	flags: 0xe4
	  li=3 (0xc0)
	  vn=4 (0x20)
	  mode=4 (0x04)
	stratum = 4
	poll = 16
	precision = 5.96046e-08
	rtdelay = 0.0001983642578125
	rtdisp = 7.941909790039062
	refid = b6e0a0a
	refts = 1507279467.201385
	origts = 1507279609.643179
	rxts = 1507279609.89738
	txts = 1507279609.897387
offset 0.2541663647
discarding peer 0: flags=3
no peers meeting synchronization criteria :(
overall average offset: 0
NTP CRITICAL: Offset unknown|

Thank you for your help!

Best regards,
Stefan

@drnarahari
Copy link

I have a question. Here -H is host address of time server or localhost?

@linuxmail
Copy link

linuxmail commented Jan 31, 2018

hi,

I have the same problem all the time, in "random" mode:

# ntpq -c rv
associd=0 status=0615 leap_none, sync_ntp, 1 event, clock_sync,
version="ntpd 4.2.8p10@1.3728-o Sat Sep 23 19:02:38 UTC 2017 (1)",
processor="x86_64", system="Linux/4.13.13-4-pve", leap=00, stratum=2,
precision=-24, rootdelay=22.994, rootdisp=5.299, refid=193.175.73.151,
reftime=de1be30d.0c50cc60  Wed, Jan 31 2018  7:39:09.048,
clock=de1be364.b2cc7ecc  Wed, Jan 31 2018  7:40:36.698, peer=33763, tc=6,
mintc=3, offset=0.034698, frequency=5.808, sys_jitter=0.033374,
clk_jitter=0.096, clk_wander=0.023

@rwaffen
Copy link

rwaffen commented Feb 20, 2018

Same Problem here. I Have a host which every now and than has the "offset unknown" critical, only to be ok in the next minutes again.

@huguley
Copy link

huguley commented Mar 9, 2018

Having the same problem on redhat 7. rh6 does not seem to have the problem and it only happens in one of our environments. It seems the li=3 (0xc0) is a special flag in the leap indicator that says the clock is unsynchronized. Why its unsynchronized is what I have been trying to sort through. A seemingly random machine reports NTP CRITICAL: Offset unknown for 15 or 20 minutes then goes back to normal.

@rapha-dev
Copy link

rapha-dev commented Mar 13, 2018

Similar here in Debian 7, it remains in failed state.

# ntpq -c rv ntp.local
associd=0 status=0615 leap_none, sync_ntp, 1 event, clock_sync,
version="ntpd 4.2.8p10@1.3728-o Sun Oct  8 22:12:06 UTC 2017 (1)",
processor="amd64", system="FreeBSD/11.1-RELEASE-p6", leap=00, stratum=2,
precision=-20, rootdelay=13.231, rootdisp=4.556, refid=192.53.103.103,
reftime=de521ff7.b5342e86  Tue, Mar 13 2018 11:01:27.707,
clock=de522028.38595d38  Tue, Mar 13 2018 11:02:16.220, peer=37475, tc=6,
mintc=3, offset=0.320, frequency=-16.529, sys_jitter=0.051299,
clk_jitter=0.106, clk_wander=0.001
# /usr/lib/nagios/plugins/check_ntp_time -H ntp.local -vv
Found 1 peers to check
sending request to peer 0
response from peer 0: packet contents:
        flags: 0x24
          li=0 (0x00)
          vn=4 (0x20)
          mode=4 (0x04)
        stratum = 2
        poll = 16
        precision = 9,53674e-07
        rtdelay = 0,013092041015625
        rtdisp = 0,003692626953125
        refid = 6c6735c0
        refts = 1520935694,681659
        origts = 1520935721,335445
        rxts = 1520935721,335743
        txts = 1520935721,335827
offset 0,0001443624496
sending request to peer 0
response from peer 0: packet contents:
        flags: 0xe4
          li=3 (0xc0)
          vn=4 (0x20)
          mode=4 (0x04)
        stratum = 0
        poll = 16
        precision = 0,015625
        rtdelay = 1
        rtdisp = 1
        refid = 45544152
        refts = 0
        origts = 1520935721,338597
        rxts = 1520935721,338597
        txts = 1520935721,338597
offset -0,0001389980316
sending request to peer 0
re-sending request to peer 0
re-sending request to peer 0
re-sending request to peer 0
re-sending request to peer 0
re-sending request to peer 0
re-sending request to peer 0
discarding peer 0: stratum=0
no peers meeting synchronization criteria :(
overall average offset: 0
NTP CRITICAL: Offset unknown|

It only happens when querying a pfsense (FreeBSD 11.1) NTP server, some Windows ADS seems to be OK.

Unsure about the li-flag, first query li=0 and second query li=3. Further querys timing out (or maybe discarded).

@helmo
Copy link

helmo commented Apr 20, 2018

There is an near duplicate issue in monitoring-plugins/monitoring-plugins#1142

@saiteamf1
Copy link

saiteamf1 commented Nov 30, 2018

We had the same issue in ubuntu14.04

  1. Checked latest plugins (2.x)
  2. applied the https://serverfault.com/questions/625027/nagios-check-ntp-time-offset-unknown.
  3. discard minimum 1(ntp.conf)

Issue is repeating every few hours daily.

@MarkYSA
Copy link

MarkYSA commented Jun 11, 2019

Is there any updates for this issue?

@sawolf
Copy link
Member

sawolf commented Jun 12, 2019

Hi @MarkYSA,

The serverfault link mentions that if you need immediate resolution, you can try removing lines 254-257 and recompiling.

In terms of an "official" solution, I don't see anyone else writing patches for this. I will most likely address this in the next nagios-plugins release, which I plan to do after the next Nagios Core release.

@espey
Copy link

espey commented Jun 21, 2019

@Madlohe when do you think there will be a patch for this? We are getting similar issues using this check in our env

@sawolf
Copy link
Member

sawolf commented Jun 24, 2019

@espey We're looking at probably 5-6 weeks before I start on the next nagios-plugins release.

@davidford365
Copy link

davidford365 commented Aug 15, 2019

@Madlohe - looks like there is something else to look at. The offset response is now formatted differently. Rather than being pure decimal, it is now has E-notation for some of the responses

`
Found 1 peers to check
sending request to peer 0
response from peer 0: packet contents:
flags: 0xe4
li=3 (0xc0)
vn=4 (0x20)
mode=4 (0x04)
stratum = 0
poll = 256
precision = 5.96046e-08
rtdelay = 0
rtdisp = 7.62939453125e-05
refid = 54494e49
refts = 0
origts = 1565895280.537717
rxts = 1565895280.538023
txts = 1565895280.538121
offset 7.654039655e-05
sending request to peer 0
response from peer 0: packet contents:
flags: 0xe4
li=3 (0xc0)
vn=4 (0x20)
mode=4 (0x04)
stratum = 0
poll = 256
precision = 5.96046e-08
rtdelay = 0
rtdisp = 7.62939453125e-05
refid = 54494e49
refts = 0
origts = 1565895280.53844
rxts = 1565895280.538494
txts = 1565895280.538507
offset -1.897977199e-05
sending request to peer 0
response from peer 0: packet contents:
flags: 0xe4
li=3 (0xc0)
vn=4 (0x20)
mode=4 (0x04)
stratum = 0
poll = 256
precision = 5.96046e-08
rtdelay = 0
rtdisp = 7.62939453125e-05
refid = 54494e49
refts = 0
origts = 1565895280.538666
rxts = 1565895280.538722
txts = 1565895280.538756
offset -2.216885332e-05
sending request to peer 0
response from peer 0: packet contents:
flags: 0xe4
li=3 (0xc0)
vn=4 (0x20)
mode=4 (0x04)
stratum = 0
poll = 256
precision = 5.96046e-08
rtdelay = 0
rtdisp = 7.62939453125e-05
refid = 54494e49
refts = 0
origts = 1565895280.538908
rxts = 1565895280.538953
txts = 1565895280.538984
offset -2.671667608e-05
discarding peer 0: stratum=0
no peers meeting synchronization criteria :(
overall average offset: 0
NTP CRITICAL: Offset unknown|

`

@sawolf sawolf added this to the 2.3.0 milestone Aug 15, 2019
@sawolf
Copy link
Member

sawolf commented Aug 15, 2019

Hi @davidford365 - it looks like that message is formatted using %g, which will print in either standard or exponential notation "depending on what is appropriate". I don't have a rigorous definition for that last part, but the code for that hasn't changed recently.

Are you able to show output from ntpq -c rv as @rapha-dev has done? It seems to me like both of your issues (peer discarded due to stratum=0) are different from @sbuchb's problem (NTP server reports errors with flags=3), but it would be good to verify that yours are at least similar to each other.

EDIT: I did manage to get an NTP server set up, but I haven't recreated the issue so far:

[root@localhost nagios-plugins]# ntpq -c rv 192.168.4.243
associd=0 status=0615 leap_none, sync_ntp, 1 event, clock_sync,
version="ntpd 4.2.6p5@1.2349-o Fri Jul 22 17:24:22 UTC 2016 (1)",
processor="x86_64", system="Linux/3.2.0-4-amd64", leap=00, stratum=2,
precision=-23, rootdelay=54.267, rootdisp=32.362, refid=74.117.214.3,
reftime=e100527d.f730599e  Thu, Aug 15 2019 16:51:57.965,
clock=e10052c1.1c707472  Thu, Aug 15 2019 16:53:05.111, peer=21130, tc=6,
mintc=3, offset=0.376, frequency=4.188, sys_jitter=1.914,
clk_jitter=0.235, clk_wander=0.016
[root@localhost nagios-plugins]# ./plugins/check_ntp_time -H 192.168.4.243
NTP OK: Offset 0.0005514621735 secs, stratum best:2 worst:2|offset=0.000551s;60.000000;120.000000; stratum_best=2 stratum_worst=2 num_warn_stratum=0 num_crit_stratum=0

If anyone is able to reproduce consistently and can point me to a public NTP server that has the issue, I'm happy to debug the root cause. Otherwise, I'll probably just add an option to ignore the stratum check.

sawolf added a commit to sawolf/nagios-plugins that referenced this issue Aug 19, 2019
@Gehirn-Mag-Net
Copy link

A tcpdump indicates that too fast repetitive requests are not answered. A short wait of one second between requests solves this problem. Possibly the timeout should be increased accordingly.

That works - the question remains whether that is the right solution.

That would have to change:
plugins/check_ntp_time.c:

    /* read from any sockets with pending data */
    for(i=0; servers_readable && i<num_hosts; i++){
        sleep(1);    // <<-- INSERT THIS LINE
        if(ufds[i].revents&POLLIN && servers[i].num_responses < AVG_NUM){
            if(verbose) {

Maybe you have to increase the timeout with -t or just increase the default timeout:
plugins/common.h: Search for the line DEFAULT_SOCKET_TIMEOUT and increase that value.

@sawolf
Copy link
Member

sawolf commented Sep 18, 2019

@Gehirn-Mag-Net Thanks for investigating!

If you look at @eskyuu's pull request to monitoring-plugins above, that looks like a pretty good solution to me. I'll be cherry-picking it into a PR for this repository if no one has any objections.

sawolf added a commit that referenced this issue Sep 19, 2019
check_ntp: add --allow-zero-stratum flag to resolve several commenters' issues on #329
@sawolf
Copy link
Member

sawolf commented Sep 19, 2019

Okay, I've merged 2 PRs which address the different issues discussed in this thread. If anyone's still having issues after compiling from maint, please let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests