Time is not synced on (re)boot automatically after a sysupgrade to 22.03.0 #10843

birdie-github · 2022-09-29T09:57:45Z

I don't want to file a bug report under packages because I've not been able to find the package for ntpd. I'm running OpenWrt 22.03.0 r19685-512e76967f . My device is TP-Link TL-WDR3600 v1. After a sysupgrade the date was set to September 3rd, 2022. Today it's the 29th of September.

Of course opkg update failed with wget returned 5.

I couldn't find a command to sync the time in a SSH session, so I simply ran /etc/init.d/sysntpd restart which fixed the issue. I waited at the very least 10 minutes for the time to sync automatically. Didn't happen.

Please make sure after a sysupgrade time is indeed getting synced.

There are zero ntp related messages in the System Log.

I've never touched any NTP related settings, everything is AFAIK by default. My Internet connection is working perfectly.

The text was updated successfully, but these errors were encountered:

Hurricos · 2022-11-22T17:52:51Z

I've had the same issue for a few months now. This seems to have been a post-21 regression of some sort. It's definitely a dependency management issue, where some network access dependency is missing from /etc/init.d/sysntpd; I adjusted /etc/init.d/sysntpd:

 start_service() {                                   
         . /lib/functions/network.sh                 
+        sleeptime=20                                 
+        sleep $sleeptime                                   
+        logger "slept $sleeptime"                         
         validate_ntp_section ntp start_ntpd_instance      
 }

... and rebooted, getting:

[mkennedy@imp ~]$ ssh root@ap3825i0.lan "logread | grep slept; date;"; date
Tue Nov 22 17:44:27 UTC 2022
Tue Nov 22 12:46:35 PM EST 2022
[mkennedy@imp ~]$ ssh root@ap3825i0.lan "logread | grep slept; date;"; date
Tue Nov 22 17:44:27 2022 user.notice root: slept 20
Tue Nov 22 17:44:28 UTC 2022
Tue Nov 22 12:46:36 PM EST 2022
[mkennedy@imp ~]$ ssh root@ap3825i0.lan "logread | grep slept; date;"; date
Tue Nov 22 17:44:27 2022 user.notice root: slept 20
Tue Nov 22 17:44:28 UTC 2022
Tue Nov 22 12:46:36 PM EST 2022
[mkennedy@imp ~]$ ssh root@ap3825i0.lan "logread | grep slept; date;"; date
Tue Nov 22 17:44:27 2022 user.notice root: slept 20
Tue Nov 22 17:46:38 UTC 2022 ## <-- sysntpd successful
Tue Nov 22 12:46:38 PM EST 2022

... but when I set sleeptime=1:

[mkennedy@imp ~]$ ssh root@ap3825i0.lan "logread | grep slept; date;"; date
Tue Nov 22 17:48:41 2022 user.notice root: slept 1
Tue Nov 22 17:48:48 2022 user.notice root: slept 1
Tue Nov 22 17:48:50 UTC 2022
Tue Nov 22 12:49:31 PM EST 2022
[mkennedy@imp ~]$ ssh root@ap3825i0.lan "logread | grep slept; date;"; date
Tue Nov 22 17:48:41 2022 user.notice root: slept 1
Tue Nov 22 17:48:48 2022 user.notice root: slept 1
Tue Nov 22 17:48:56 UTC 2022
Tue Nov 22 12:49:37 PM EST 2022
[mkennedy@imp ~]$ ssh root@ap3825i0.lan "logread | grep slept; date;"; date
Tue Nov 22 17:48:41 2022 user.notice root: slept 1
Tue Nov 22 17:48:48 2022 user.notice root: slept 1
Tue Nov 22 17:49:02 UTC 2022 ## <-- sysntpd never succeeds in updating the time
Tue Nov 22 12:49:43 PM EST 2022

... sysntpd consistently does not successfully update.

Hurricos · 2022-11-22T18:12:26Z

Two problems:

systntpd does not wait for network access to attempt to get the current time
sysntpd's procd triggers are sufficient to invoke start_service, but this does not actually cause the service to re-attempt an NTP query (either by restarting the daemon or sending it a signal, etc.).

Hurricos · 2022-11-22T20:21:54Z

There's a third problem here: sysntpd's procd triggers are not always sufficient to invoke start_service in the case of a network state change if use_dhcp is off -- that is, we only have a network-state-trigger to reload if use_dhcp is on.

Unrelatedly, there's a hotplug race-condition in the network -> sysntpd /etc/init.d/network reload (at least when the DHCP server provides an NTP server for us).

Edit: To note, this third problem is most easily packaged as, "network updates don't always trigger a change in the procd line and thus restart the daemon upon reload". It is not a good expectation, however, to have the "reload" trigger a new attempt at connecting if the problem is not with a change of configuration at all. So, 'problems' 2 and 3 really don't matter here; detection of changed procd command line works fine and has its own specific purpose.

Hurricos · 2022-11-22T21:09:21Z

A fourth problem might be that although busybox ntpd does not produce errors if it fails to act as an NTP client. Looking into what exactly fails in ntpd now.

Of the two sets (1 vs 4) of valid problems, only one need be solved.

The OpenWrt config with which ntpd doesn't succeed on boot involves an external DNS resolver, so I'm going to guess the issue is some sort of bug with refreshing DNS resolution after a failure that busybox hasn't yet addressed, which would be easiest solved by just waiting until DNS is back, but probably best solved by a patch to busybox.

Hurricos · 2022-11-22T21:32:50Z

OK, it's a ujail porting issue combined. If you use an external resolver, you get:

clock_gettime64(CLOCK_REALTIME, {tv_sec=1669151348, tv_nsec=377491500}) = 0
open("/etc/hosts", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/etc/resolv.conf", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
fcntl64(3, F_SETFD, FD_CLOEXEC)         = 0
read(3, "", 248)                        = 0
close(3)                                = 0
clock_gettime64(CLOCK_REALTIME, {tv_sec=1669151348, tv_nsec=391753460}) = 0
clock_gettime64(CLOCK_REALTIME, {tv_sec=1669151348, tv_nsec=392073940}) = 0
socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 3
bind(3, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
clock_gettime64(CLOCK_REALTIME, {tv_sec=1669151348, tv_nsec=403804320}) = 0
sendto(3, "\306M\1\0\0\1\0\0\0\0\0\0\0010\7openwrt\4pool\3ntp\3"..., 40, MSG_NOSIGNAL, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 40
sendto(3, "\2522\1\0\0\1\0\0\0\0\0\0\0010\7openwrt\4pool\3ntp\3"..., 40, MSG_NOSIGNAL, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 40
poll([{fd=3, events=POLLIN}], 1, 2500)  = 0 (Timeout)
clock_gettime64(CLOCK_REALTIME, {tv_sec=1669151350, tv_nsec=917659980}) = 0
sendto(3, "\306M\1\0\0\1\0\0\0\0\0\0\0010\7openwrt\4pool\3ntp\3"..., 40, MSG_NOSIGNAL, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 40
sendto(3, "\2522\1\0\0\1\0\0\0\0\0\0\0010\7openwrt\4pool\3ntp\3"..., 40, MSG_NOSIGNAL, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 40
poll([{fd=3, events=POLLIN}], 1, 2500)  = 0 (Timeout)
clock_gettime64(CLOCK_REALTIME, {tv_sec=1669151353, tv_nsec=422786220}) = 0
close(3)                                = 0
write(2, "ntpd: bad address '0.openwrt.poo"..., 43) = 43
clock_gettime64(CLOCK_REALTIME, {tv_sec=1669151353, tv_nsec=423631820}) = 0
poll([], 0, 143000n)                     = 0 (Timeout)
clock_gettime64(CLOCK_REALTIME, {tv_sec=1669151496, tv_nsec=524269380}) = 0
fork()                                  = 2
clock_gettime64(CLOCK_REALTIME, {tv_sec=1669151496, tv_nsec=526236600}) = 0
poll([], 0, 109000)                     = 0 (Timeout)

What I don't understand is this bit:

open("/etc/resolv.conf", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
fcntl64(3, F_SETFD, FD_CLOEXEC)         = 0
read(3, "", 248)                        = 0
close(3)                                = 0

Why do I get a successful open, but nevertheless read 0 after attempting to read 248B?

Hurricos · 2022-11-22T22:06:19Z

OK. The problem is that for systems which do not use dnsmasq for DNS caching, sysntpd starts (and its ujail is set up) before /tmp/resolv.conf.d/resolv.conf.auto (the ultimate target of /etc/resolv.conf) is finalized.

The first run of sysntpd after booting:

root@ap3825i0:/proc/2931# grep resolv mountinfo
43 37 0:16 /resolv.conf.d/resolv.conf.auto//deleted /etc/resolv.conf ro,noatime - tmpfs tmpfs rw

After a restart of sysntpd:

root@ap3825i0:/proc/3540# grep resolv mountinfo
43 37 0:16 /resolv.conf.d/resolv.conf.auto /etc/resolv.conf ro,noatime - tmpfs tmpfs rw

This affects all OpenWrt versions post-ujail-transition, but it's not necessarily showing up on all of them, usually because dnsmasq is running and ntpd.c will default to querying 127.0.0.1:53 should it fail elsewhere.

Hurricos · 2022-11-22T22:11:06Z

... and that is this: #10389

@birdie-github I'd appreciate if you can confirm that on first boot, if you find the process ID $PID of your ntpd with e.g. ps www | grep ntpd | grep -v ujail, that grep resolv /proc/$PID/mountinfo gives you roughly 43 37 0:16 /resolv.conf.d/resolv.conf.auto//deleted /etc/resolv.conf ro,noatime - tmpfs tmpfs rw and not 43 37 0:16 /resolv.conf.d/resolv.conf.auto /etc/resolv.conf ro,noatime - tmpfs tmpfs rw; that way, we can move on to merging this with #10389.

Hurricos · 2022-11-23T00:10:51Z

The fix for now for this would be to provide DHCP option 42 on your network. When you have a network reload of any kind, the loss and regain of the additional IP address in option 42 will cause two reloads of ntpd in tandem with the recreation of /tmp/resolv.conf.d/resolv.conf.auto, thus avoiding the issue of the file being deleted.

Hurricos · 2022-11-23T00:13:26Z

Correction: two restarts. Reload only causes a restart when the resulting configuration is different.

odon-noda · 2023-01-06T19:04:58Z

Workaround

Start ntpd directly, without jail

# mv /etc/capabilities/ntpd.json /etc/capabilities/ntpd.json-disabled
# service sysntpd restart

Details:
When will sysntpd sync time?

By default, ntpd is executed in a sandbox. ujail uses bind mounts to give ntpd access to specific files, including resolv.conf. /etc/resolv.conf is a symlink which ultimately points to /tmp/resolv.conf.d/resolv.conf.auto. The bind mount is created on a single file, therefore the mount breaks as soon as the file gets deleted. The bind mount still refers to the deleted source file, even if a new file is created in the same location. The problem can be avoided by not running ujail in a sandbox. The sandbox is disabled by removing the ntpd capabilities file. Upstream issues: - openwrt/openwrt#10389 - openwrt/openwrt#10843 Fixes: #291 Signed-off-by: Fabian Bläse <fabian@blaese.de>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Time is not synced on (re)boot automatically after a sysupgrade to 22.03.0 #10843

Time is not synced on (re)boot automatically after a sysupgrade to 22.03.0 #10843

birdie-github commented Sep 29, 2022 •

edited

Loading

Hurricos commented Nov 22, 2022 •

edited

Loading

Hurricos commented Nov 22, 2022

Hurricos commented Nov 22, 2022 •

edited

Loading

Hurricos commented Nov 22, 2022

Hurricos commented Nov 22, 2022

Hurricos commented Nov 22, 2022 •

edited

Loading

Hurricos commented Nov 22, 2022

Hurricos commented Nov 23, 2022

Hurricos commented Nov 23, 2022

odon-noda commented Jan 6, 2023

Time is not synced on (re)boot automatically after a sysupgrade to 22.03.0 #10843

Time is not synced on (re)boot automatically after a sysupgrade to 22.03.0 #10843

Comments

birdie-github commented Sep 29, 2022 • edited Loading

Hurricos commented Nov 22, 2022 • edited Loading

Hurricos commented Nov 22, 2022

Hurricos commented Nov 22, 2022 • edited Loading

Hurricos commented Nov 22, 2022

Hurricos commented Nov 22, 2022

Hurricos commented Nov 22, 2022 • edited Loading

Hurricos commented Nov 22, 2022

Hurricos commented Nov 23, 2022

Hurricos commented Nov 23, 2022

odon-noda commented Jan 6, 2023

Workaround

birdie-github commented Sep 29, 2022 •

edited

Loading

Hurricos commented Nov 22, 2022 •

edited

Loading

Hurricos commented Nov 22, 2022 •

edited

Loading

Hurricos commented Nov 22, 2022 •

edited

Loading