manager: always store watchdog timeout, even if watchdog is not ready #17460

danc86 · 2020-10-27T07:47:54Z

Commit 986935c ("pid1: update manager settings on reload too")
introduced this new logic, which discards any attempt to set the
runtime watchdog timeout if watchdog_set_timeout() reported an error.
That can happen if /dev/watchdog does not exist or is not working.

But there is an important situation where /dev/watchdog may not exist:
if it is provided by a kernel module that is loaded late in
boot, after pid 1 has loaded its configuration, the
watchdog_set_timeout() call may return an error but the watchdog still
would have been opened successfully on the next ping.

When RuntimeWatchdogSec= is loaded on first boot, or set via DBus, we
need to store the new timeout regardless whether /dev/watchdog exists
and is operational, or not. This restores the previous behaviour from
systemd < v246.

Commit 986935c ("pid1: update manager settings on reload too") introduced this new logic, which discards any attempt to set the runtime watchdog timeout if watchdog_set_timeout() reported an error. That can happen if /dev/watchdog does not exist or is not working. But there is an important situation where /dev/watchdog may not exist: if it is provided by a kernel module that is loaded late in boot, after pid 1 has loaded its configuration, the watchdog_set_timeout() call may return an error but the watchdog still would have been opened successfully on the next ping. When RuntimeWatchdogSec= is loaded on first boot, or set via DBus, we need to store the new timeout regardless whether /dev/watchdog exists and is operational, or not. This restores the previous behaviour from systemd < v246.

poettering

So this doesn't look right. Watchdog hw/drivers typically don't support all possible timeouts, but just a few discrete choices. Hence we have to set the timeout to some value, which the driver then finds a close approximation for, and read it back to see what the actually configured timeout was. The resulting timeout we then need to take into account for pinging the device regularly enough. That's why watchdog_set_timeout() takes a pointer to a timeout instead of a literal value, it is both an input and output parameter. The updated timeout is stored in m->watchdog[] and m->overriden_watchdog[].

This propagation step from the hw back to the two array fields is also necessary when we start using a device later on, that wasn't available at first. And that's missing right now.

poettering · 2020-10-27T08:08:46Z

src/core/manager.c

@@ -3409,18 +3407,15 @@ void manager_set_watchdog(Manager *m, WatchdogType t, usec_t timeout) {
        if (t == WATCHDOG_RUNTIME)
                if (!timestamp_is_set(m->watchdog_overridden[WATCHDOG_RUNTIME])) {
                        if (timestamp_is_set(timeout))
-                                r = watchdog_set_timeout(&timeout);
+                                watchdog_set_timeout(&timeout);


given that we actively ignore the return code, please cast result explicitly to (void)

poettering · 2020-10-27T08:12:24Z

src/core/manager.c

@@ -3434,13 +3429,12 @@ int manager_override_watchdog(Manager *m, WatchdogType t, usec_t timeout) {

                p = timestamp_is_set(timeout) ? &timeout : &m->watchdog[t];
                if (timestamp_is_set(*p))
-                        r = watchdog_set_timeout(p);
+                        watchdog_set_timeout(p);


keszybz · 2020-12-07T10:09:42Z

@danc86 another round?

@danc86

…r loaded When manager_{set|override}_watchdog is called, set the watchdog timeout regardless of whether the hardware watchdog was successfully initialized. If the watchdog was requested but could not be initialized, then instead of pinging it, attempt to initialize it again. This ensures that the hardware watchdog is initialized even if the kernel module for it isn't loaded when systemd starts (which is quite likely, unless it is compiled in). This builds on work by @danc86 in systemd#17460, but fixes the issue of not updating the watchdog timeout with the actual value from the hardware. Co-authored-by: Dan Callaghan <djc@djc.id.au> Co-authored-by: Michael Marley <michael@michaelmarley.com>

@danc86

…r loaded When manager_{set|override}_watchdog is called, set the watchdog timeout regardless of whether the hardware watchdog was successfully initialized. If the watchdog was requested but could not be initialized, then instead of pinging it, attempt to initialize it again. This ensures that the hardware watchdog is initialized even if the kernel module for it isn't loaded when systemd starts (which is quite likely, unless it is compiled in). This builds on work by @danc86 in systemd#17460, but fixes the issue of not updating the watchdog timeout with the actual value from the hardware. Fixes systemd#17838 Co-authored-by: Dan Callaghan <djc@djc.id.au> Co-authored-by: Michael Marley <michael@michaelmarley.com>

@danc86

When manager_{set|override}_watchdog is called, set the watchdog timeout regardless of whether the hardware watchdog was successfully initialized. If the watchdog was requested but could not be initialized, then instead of pinging it, attempt to initialize it again. This ensures that the hardware watchdog is initialized even if the kernel module for it isn't loaded when systemd starts (which is quite likely, unless it is compiled in). This builds on work by @danc86 in systemd#17460, but fixes the issue of not updating the watchdog timeout with the actual value from the hardware. Fixes systemd#17838 Co-authored-by: Dan Callaghan <djc@djc.id.au> Co-authored-by: Michael Marley <michael@michaelmarley.com>

@danc86

When manager_{set|override}_watchdog is called, set the watchdog timeout regardless of whether the hardware watchdog was successfully initialized. If the watchdog was requested but could not be initialized, then instead of pinging it, attempt to initialize it again. This ensures that the hardware watchdog is initialized even if the kernel module for it isn't loaded when systemd starts (which is quite likely, unless it is compiled in). This builds on work by @danc86 in systemd#17460, but fixes the issue of not updating the watchdog timeout with the actual value from the hardware. Fixes systemd#17838 Co-authored-by: Dan Callaghan <djc@djc.id.au> Co-authored-by: Michael Marley <michael@michaelmarley.com>

@danc86

When manager_{set|override}_watchdog is called, set the watchdog timeout regardless of whether the hardware watchdog was successfully initialized. If the watchdog was requested but could not be initialized, then instead of pinging it, attempt to initialize it again. This ensures that the hardware watchdog is initialized even if the kernel module for it isn't loaded when systemd starts (which is quite likely, unless it is compiled in). This builds on work by @danc86 in systemd#17460, but fixes the issue of not updating the watchdog timeout with the actual value from the hardware. Fixes systemd#17838 Co-authored-by: Dan Callaghan <djc@djc.id.au> Co-authored-by: Michael Marley <michael@michaelmarley.com>

yuwata · 2020-12-08T04:42:38Z

Replaced by #17895.

@danc86

When manager_{set|override}_watchdog is called, set the watchdog timeout regardless of whether the hardware watchdog was successfully initialized. If the watchdog was requested but could not be initialized, then instead of pinging it, attempt to initialize it again. This ensures that the hardware watchdog is initialized even if the kernel module for it isn't loaded when systemd starts (which is quite likely, unless it is compiled in). This builds on work by @danc86 in systemd#17460, but fixes the issue of not updating the watchdog timeout with the actual value from the hardware. Fixes systemd#17838 Co-authored-by: Dan Callaghan <djc@djc.id.au> Co-authored-by: Michael Marley <michael@michaelmarley.com>

@danc86

When manager_{set|override}_watchdog is called, set the watchdog timeout regardless of whether the hardware watchdog was successfully initialized. If the watchdog was requested but could not be initialized, then instead of pinging it, attempt to initialize it again. This ensures that the hardware watchdog is initialized even if the kernel module for it isn't loaded when systemd starts (which is quite likely, unless it is compiled in). This builds on work by @danc86 in systemd#17460, but fixes the issue of not updating the watchdog timeout with the actual value from the hardware. Fixes systemd#17838 Co-authored-by: Dan Callaghan <djc@djc.id.au> Co-authored-by: Michael Marley <michael@michaelmarley.com>

@danc86

When manager_{set|override}_watchdog is called, set the watchdog timeout regardless of whether the hardware watchdog was successfully initialized. If the watchdog was requested but could not be initialized, then instead of pinging it, attempt to initialize it again. This ensures that the hardware watchdog is initialized even if the kernel module for it isn't loaded when systemd starts (which is quite likely, unless it is compiled in). This builds on work by @danc86 in #17460, but fixes the issue of not updating the watchdog timeout with the actual value from the hardware. Fixes #17838 Co-authored-by: Dan Callaghan <djc@djc.id.au> Co-authored-by: Michael Marley <michael@michaelmarley.com>

@danc86

When manager_{set|override}_watchdog is called, set the watchdog timeout regardless of whether the hardware watchdog was successfully initialized. If the watchdog was requested but could not be initialized, then instead of pinging it, attempt to initialize it again. This ensures that the hardware watchdog is initialized even if the kernel module for it isn't loaded when systemd starts (which is quite likely, unless it is compiled in). This builds on work by @danc86 in systemd/systemd#17460, but fixes the issue of not updating the watchdog timeout with the actual value from the hardware. Fixes systemd/systemd#17838 Co-authored-by: Dan Callaghan <djc@djc.id.au> Co-authored-by: Michael Marley <michael@michaelmarley.com> (cherry picked from commit 61927b9)

This basically reverts commit 61927b9 and relies on the fact that watchdog_ping() will open and setup the watchdog for us in case the device appears later on. Also unlike what is said in comment systemd#17460 (review), both m->watchdog[] and m->overriden_watchdog[] are not supposed to store the actual timeout used by the watchdog device but stores the value defined by the user. If the HW timeout value is really needed by the manager then it's probably better to read it via an helper defined in watchdog.c instead. However the HW timeout value is currently only needed by the watchdog code itself mainly when it calculates the time for the next ping.

@danc86

When manager_{set|override}_watchdog is called, set the watchdog timeout regardless of whether the hardware watchdog was successfully initialized. If the watchdog was requested but could not be initialized, then instead of pinging it, attempt to initialize it again. This ensures that the hardware watchdog is initialized even if the kernel module for it isn't loaded when systemd starts (which is quite likely, unless it is compiled in). This builds on work by @danc86 in systemd/systemd#17460, but fixes the issue of not updating the watchdog timeout with the actual value from the hardware. Fixes systemd/systemd#17838 Co-authored-by: Dan Callaghan <djc@djc.id.au> Co-authored-by: Michael Marley <michael@michaelmarley.com> (cherry picked from commit 61927b9) [fbui: fixes bsc#1189446]

This basically reverts commit 61927b9 and relies on the fact that watchdog_ping() will open and setup the watchdog for us in case the device appears later on. Also unlike what is said in comment systemd#17460 (review), both m->watchdog[] and m->overriden_watchdog[] are not supposed to store the actual timeout used by the watchdog device but stores the value defined by the user. If the HW timeout value is really needed by the manager then it's probably better to read it via an helper defined in watchdog.c instead. However the HW timeout value is currently only needed by the watchdog code itself mainly when it calculates the time for the next ping.

This basically reverts commit 61927b9 and relies on the fact that watchdog_ping() will open and setup the watchdog for us in case the device appears later on. Also unlike what is said in comment systemd#17460 (review), both m->watchdog[] and m->overriden_watchdog[] are not supposed to store the actual timeout used by the watchdog device but stores the value defined by the user. If the HW timeout value is really needed by the manager then it's probably better to read it via an helper defined in watchdog.c instead. However the HW timeout value is currently only needed by the watchdog code itself mainly when it calculates the time for the next ping. (cherry picked from commit ae4a0ec) Conflicts: src/core/manager.c

poettering requested changes Oct 27, 2020

View reviewed changes

poettering added pid1 reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks labels Oct 27, 2020

poettering mentioned this pull request Dec 3, 2020

Hardware watchdog does not work if kernel module was not loaded by the time systemd starts #17838

Closed

poettering linked an issue Dec 3, 2020 that may be closed by this pull request

Hardware watchdog does not work if kernel module was not loaded by the time systemd starts #17838

Closed

mamarley mentioned this pull request Dec 8, 2020

manager: Fix HW watchdog when systemd starts before driver loaded #17895

Merged

yuwata closed this Dec 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

manager: always store watchdog timeout, even if watchdog is not ready #17460

manager: always store watchdog timeout, even if watchdog is not ready #17460

danc86 commented Oct 27, 2020

poettering left a comment

poettering Oct 27, 2020

poettering Oct 27, 2020

keszybz commented Dec 7, 2020

yuwata commented Dec 8, 2020

manager: always store watchdog timeout, even if watchdog is not ready #17460

manager: always store watchdog timeout, even if watchdog is not ready #17460

Conversation

danc86 commented Oct 27, 2020

poettering left a comment

Choose a reason for hiding this comment

poettering Oct 27, 2020

Choose a reason for hiding this comment

poettering Oct 27, 2020

Choose a reason for hiding this comment

keszybz commented Dec 7, 2020

yuwata commented Dec 8, 2020