Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time is not synced on (re)boot automatically after a sysupgrade to 22.03.0 #10843

Open
birdie-github opened this issue Sep 29, 2022 · 10 comments

Comments

@birdie-github
Copy link

birdie-github commented Sep 29, 2022

I don't want to file a bug report under packages because I've not been able to find the package for ntpd. I'm running OpenWrt 22.03.0 r19685-512e76967f . My device is TP-Link TL-WDR3600 v1. After a sysupgrade the date was set to September 3rd, 2022. Today it's the 29th of September.

Of course opkg update failed with wget returned 5.

I couldn't find a command to sync the time in a SSH session, so I simply ran /etc/init.d/sysntpd restart which fixed the issue. I waited at the very least 10 minutes for the time to sync automatically. Didn't happen.

Please make sure after a sysupgrade time is indeed getting synced.

There are zero ntp related messages in the System Log.

I've never touched any NTP related settings, everything is AFAIK by default. My Internet connection is working perfectly.

ntp

@Hurricos
Copy link
Contributor

Hurricos commented Nov 22, 2022

I've had the same issue for a few months now. This seems to have been a post-21 regression of some sort. It's definitely a dependency management issue, where some network access dependency is missing from /etc/init.d/sysntpd; I adjusted /etc/init.d/sysntpd:

 start_service() {                                   
         . /lib/functions/network.sh                 
+        sleeptime=20                                 
+        sleep $sleeptime                                   
+        logger "slept $sleeptime"                         
         validate_ntp_section ntp start_ntpd_instance      
 }

... and rebooted, getting:

[mkennedy@imp ~]$ ssh root@ap3825i0.lan "logread | grep slept; date;"; date
Tue Nov 22 17:44:27 UTC 2022
Tue Nov 22 12:46:35 PM EST 2022
[mkennedy@imp ~]$ ssh root@ap3825i0.lan "logread | grep slept; date;"; date
Tue Nov 22 17:44:27 2022 user.notice root: slept 20
Tue Nov 22 17:44:28 UTC 2022
Tue Nov 22 12:46:36 PM EST 2022
[mkennedy@imp ~]$ ssh root@ap3825i0.lan "logread | grep slept; date;"; date
Tue Nov 22 17:44:27 2022 user.notice root: slept 20
Tue Nov 22 17:44:28 UTC 2022
Tue Nov 22 12:46:36 PM EST 2022
[mkennedy@imp ~]$ ssh root@ap3825i0.lan "logread | grep slept; date;"; date
Tue Nov 22 17:44:27 2022 user.notice root: slept 20
Tue Nov 22 17:46:38 UTC 2022 ## <-- sysntpd successful
Tue Nov 22 12:46:38 PM EST 2022

... but when I set sleeptime=1:

[mkennedy@imp ~]$ ssh root@ap3825i0.lan "logread | grep slept; date;"; date
Tue Nov 22 17:48:41 2022 user.notice root: slept 1
Tue Nov 22 17:48:48 2022 user.notice root: slept 1
Tue Nov 22 17:48:50 UTC 2022
Tue Nov 22 12:49:31 PM EST 2022
[mkennedy@imp ~]$ ssh root@ap3825i0.lan "logread | grep slept; date;"; date
Tue Nov 22 17:48:41 2022 user.notice root: slept 1
Tue Nov 22 17:48:48 2022 user.notice root: slept 1
Tue Nov 22 17:48:56 UTC 2022
Tue Nov 22 12:49:37 PM EST 2022
[mkennedy@imp ~]$ ssh root@ap3825i0.lan "logread | grep slept; date;"; date
Tue Nov 22 17:48:41 2022 user.notice root: slept 1
Tue Nov 22 17:48:48 2022 user.notice root: slept 1
Tue Nov 22 17:49:02 UTC 2022 ## <-- sysntpd never succeeds in updating the time
Tue Nov 22 12:49:43 PM EST 2022

... sysntpd consistently does not successfully update.

@Hurricos
Copy link
Contributor

Two problems:

  • systntpd does not wait for network access to attempt to get the current time
  • sysntpd's procd triggers are sufficient to invoke start_service, but this does not actually cause the service to re-attempt an NTP query (either by restarting the daemon or sending it a signal, etc.).

@Hurricos
Copy link
Contributor

Hurricos commented Nov 22, 2022

There's a third problem here: sysntpd's procd triggers are not always sufficient to invoke start_service in the case of a network state change if use_dhcp is off -- that is, we only have a network-state-trigger to reload if use_dhcp is on.

Unrelatedly, there's a hotplug race-condition in the network -> sysntpd /etc/init.d/network reload (at least when the DHCP server provides an NTP server for us).

Edit: To note, this third problem is most easily packaged as, "network updates don't always trigger a change in the procd line and thus restart the daemon upon reload". It is not a good expectation, however, to have the "reload" trigger a new attempt at connecting if the problem is not with a change of configuration at all. So, 'problems' 2 and 3 really don't matter here; detection of changed procd command line works fine and has its own specific purpose.

@Hurricos
Copy link
Contributor

A fourth problem might be that although busybox ntpd does not produce errors if it fails to act as an NTP client. Looking into what exactly fails in ntpd now.

Of the two sets (1 vs 4) of valid problems, only one need be solved.

The OpenWrt config with which ntpd doesn't succeed on boot involves an external DNS resolver, so I'm going to guess the issue is some sort of bug with refreshing DNS resolution after a failure that busybox hasn't yet addressed, which would be easiest solved by just waiting until DNS is back, but probably best solved by a patch to busybox.

@Hurricos
Copy link
Contributor

OK, it's a ujail porting issue combined. If you use an external resolver, you get:

clock_gettime64(CLOCK_REALTIME, {tv_sec=1669151348, tv_nsec=377491500}) = 0
open("/etc/hosts", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/etc/resolv.conf", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
fcntl64(3, F_SETFD, FD_CLOEXEC)         = 0
read(3, "", 248)                        = 0
close(3)                                = 0
clock_gettime64(CLOCK_REALTIME, {tv_sec=1669151348, tv_nsec=391753460}) = 0
clock_gettime64(CLOCK_REALTIME, {tv_sec=1669151348, tv_nsec=392073940}) = 0
socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 3
bind(3, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
clock_gettime64(CLOCK_REALTIME, {tv_sec=1669151348, tv_nsec=403804320}) = 0
sendto(3, "\306M\1\0\0\1\0\0\0\0\0\0\0010\7openwrt\4pool\3ntp\3"..., 40, MSG_NOSIGNAL, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 40
sendto(3, "\2522\1\0\0\1\0\0\0\0\0\0\0010\7openwrt\4pool\3ntp\3"..., 40, MSG_NOSIGNAL, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 40
poll([{fd=3, events=POLLIN}], 1, 2500)  = 0 (Timeout)
clock_gettime64(CLOCK_REALTIME, {tv_sec=1669151350, tv_nsec=917659980}) = 0
sendto(3, "\306M\1\0\0\1\0\0\0\0\0\0\0010\7openwrt\4pool\3ntp\3"..., 40, MSG_NOSIGNAL, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 40
sendto(3, "\2522\1\0\0\1\0\0\0\0\0\0\0010\7openwrt\4pool\3ntp\3"..., 40, MSG_NOSIGNAL, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 40
poll([{fd=3, events=POLLIN}], 1, 2500)  = 0 (Timeout)
clock_gettime64(CLOCK_REALTIME, {tv_sec=1669151353, tv_nsec=422786220}) = 0
close(3)                                = 0
write(2, "ntpd: bad address '0.openwrt.poo"..., 43) = 43
clock_gettime64(CLOCK_REALTIME, {tv_sec=1669151353, tv_nsec=423631820}) = 0
poll([], 0, 143000n)                     = 0 (Timeout)
clock_gettime64(CLOCK_REALTIME, {tv_sec=1669151496, tv_nsec=524269380}) = 0
fork()                                  = 2
clock_gettime64(CLOCK_REALTIME, {tv_sec=1669151496, tv_nsec=526236600}) = 0
poll([], 0, 109000)                     = 0 (Timeout)

What I don't understand is this bit:

open("/etc/resolv.conf", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
fcntl64(3, F_SETFD, FD_CLOEXEC)         = 0
read(3, "", 248)                        = 0
close(3)                                = 0

Why do I get a successful open, but nevertheless read 0 after attempting to read 248B?

@Hurricos
Copy link
Contributor

Hurricos commented Nov 22, 2022

OK. The problem is that for systems which do not use dnsmasq for DNS caching, sysntpd starts (and its ujail is set up) before /tmp/resolv.conf.d/resolv.conf.auto (the ultimate target of /etc/resolv.conf) is finalized.

The first run of sysntpd after booting:

root@ap3825i0:/proc/2931# grep resolv mountinfo
43 37 0:16 /resolv.conf.d/resolv.conf.auto//deleted /etc/resolv.conf ro,noatime - tmpfs tmpfs rw

After a restart of sysntpd:

root@ap3825i0:/proc/3540# grep resolv mountinfo
43 37 0:16 /resolv.conf.d/resolv.conf.auto /etc/resolv.conf ro,noatime - tmpfs tmpfs rw

This affects all OpenWrt versions post-ujail-transition, but it's not necessarily showing up on all of them, usually because dnsmasq is running and ntpd.c will default to querying 127.0.0.1:53 should it fail elsewhere.

@Hurricos
Copy link
Contributor

... and that is this: #10389

@birdie-github I'd appreciate if you can confirm that on first boot, if you find the process ID $PID of your ntpd with e.g. ps www | grep ntpd | grep -v ujail, that grep resolv /proc/$PID/mountinfo gives you roughly 43 37 0:16 /resolv.conf.d/resolv.conf.auto//deleted /etc/resolv.conf ro,noatime - tmpfs tmpfs rw and not 43 37 0:16 /resolv.conf.d/resolv.conf.auto /etc/resolv.conf ro,noatime - tmpfs tmpfs rw; that way, we can move on to merging this with #10389.

@Hurricos
Copy link
Contributor

The fix for now for this would be to provide DHCP option 42 on your network. When you have a network reload of any kind, the loss and regain of the additional IP address in option 42 will cause two reloads of ntpd in tandem with the recreation of /tmp/resolv.conf.d/resolv.conf.auto, thus avoiding the issue of the file being deleted.

@Hurricos
Copy link
Contributor

Correction: two restarts. Reload only causes a restart when the resulting configuration is different.

@odon-noda
Copy link

Workaround

Start ntpd directly, without jail

# mv /etc/capabilities/ntpd.json /etc/capabilities/ntpd.json-disabled
# service sysntpd restart

Details:
When will sysntpd sync time?

fblaese added a commit to FreifunkFranken/firmware that referenced this issue Sep 17, 2024
By default, ntpd is executed in a sandbox. ujail uses bind mounts to
give ntpd access to specific files, including resolv.conf.

/etc/resolv.conf is a symlink which ultimately points to
/tmp/resolv.conf.d/resolv.conf.auto. The bind mount is created on a
single file, therefore the mount breaks as soon as the file gets
deleted. The bind mount still refers to the deleted source file, even
if a new file is created in the same location.

The problem can be avoided by not running ujail in a sandbox. The
sandbox is disabled by removing the ntpd capabilities file.

Upstream issues:
- openwrt/openwrt#10389
- openwrt/openwrt#10843

Fixes: #291

Signed-off-by: Fabian Bläse <fabian@blaese.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants