New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
manager: always store watchdog timeout, even if watchdog is not ready #17460
Conversation
Commit 986935c ("pid1: update manager settings on reload too") introduced this new logic, which discards any attempt to set the runtime watchdog timeout if watchdog_set_timeout() reported an error. That can happen if /dev/watchdog does not exist or is not working. But there is an important situation where /dev/watchdog may not exist: if it is provided by a kernel module that is loaded late in boot, after pid 1 has loaded its configuration, the watchdog_set_timeout() call may return an error but the watchdog still would have been opened successfully on the next ping. When RuntimeWatchdogSec= is loaded on first boot, or set via DBus, we need to store the new timeout regardless whether /dev/watchdog exists and is operational, or not. This restores the previous behaviour from systemd < v246.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this doesn't look right. Watchdog hw/drivers typically don't support all possible timeouts, but just a few discrete choices. Hence we have to set the timeout to some value, which the driver then finds a close approximation for, and read it back to see what the actually configured timeout was. The resulting timeout we then need to take into account for pinging the device regularly enough. That's why watchdog_set_timeout() takes a pointer to a timeout instead of a literal value, it is both an input and output parameter. The updated timeout is stored in m->watchdog[]
and m->overriden_watchdog[]
.
This propagation step from the hw back to the two array fields is also necessary when we start using a device later on, that wasn't available at first. And that's missing right now.
@@ -3409,18 +3407,15 @@ void manager_set_watchdog(Manager *m, WatchdogType t, usec_t timeout) { | |||
if (t == WATCHDOG_RUNTIME) | |||
if (!timestamp_is_set(m->watchdog_overridden[WATCHDOG_RUNTIME])) { | |||
if (timestamp_is_set(timeout)) | |||
r = watchdog_set_timeout(&timeout); | |||
watchdog_set_timeout(&timeout); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given that we actively ignore the return code, please cast result explicitly to (void)
@@ -3434,13 +3429,12 @@ int manager_override_watchdog(Manager *m, WatchdogType t, usec_t timeout) { | |||
|
|||
p = timestamp_is_set(timeout) ? &timeout : &m->watchdog[t]; | |||
if (timestamp_is_set(*p)) | |||
r = watchdog_set_timeout(p); | |||
watchdog_set_timeout(p); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as above
@danc86 another round? |
…r loaded When manager_{set|override}_watchdog is called, set the watchdog timeout regardless of whether the hardware watchdog was successfully initialized. If the watchdog was requested but could not be initialized, then instead of pinging it, attempt to initialize it again. This ensures that the hardware watchdog is initialized even if the kernel module for it isn't loaded when systemd starts (which is quite likely, unless it is compiled in). This builds on work by @danc86 in systemd#17460, but fixes the issue of not updating the watchdog timeout with the actual value from the hardware. Co-authored-by: Dan Callaghan <djc@djc.id.au> Co-authored-by: Michael Marley <michael@michaelmarley.com>
…r loaded When manager_{set|override}_watchdog is called, set the watchdog timeout regardless of whether the hardware watchdog was successfully initialized. If the watchdog was requested but could not be initialized, then instead of pinging it, attempt to initialize it again. This ensures that the hardware watchdog is initialized even if the kernel module for it isn't loaded when systemd starts (which is quite likely, unless it is compiled in). This builds on work by @danc86 in systemd#17460, but fixes the issue of not updating the watchdog timeout with the actual value from the hardware. Fixes systemd#17838 Co-authored-by: Dan Callaghan <djc@djc.id.au> Co-authored-by: Michael Marley <michael@michaelmarley.com>
When manager_{set|override}_watchdog is called, set the watchdog timeout regardless of whether the hardware watchdog was successfully initialized. If the watchdog was requested but could not be initialized, then instead of pinging it, attempt to initialize it again. This ensures that the hardware watchdog is initialized even if the kernel module for it isn't loaded when systemd starts (which is quite likely, unless it is compiled in). This builds on work by @danc86 in systemd#17460, but fixes the issue of not updating the watchdog timeout with the actual value from the hardware. Fixes systemd#17838 Co-authored-by: Dan Callaghan <djc@djc.id.au> Co-authored-by: Michael Marley <michael@michaelmarley.com>
When manager_{set|override}_watchdog is called, set the watchdog timeout regardless of whether the hardware watchdog was successfully initialized. If the watchdog was requested but could not be initialized, then instead of pinging it, attempt to initialize it again. This ensures that the hardware watchdog is initialized even if the kernel module for it isn't loaded when systemd starts (which is quite likely, unless it is compiled in). This builds on work by @danc86 in systemd#17460, but fixes the issue of not updating the watchdog timeout with the actual value from the hardware. Fixes systemd#17838 Co-authored-by: Dan Callaghan <djc@djc.id.au> Co-authored-by: Michael Marley <michael@michaelmarley.com>
When manager_{set|override}_watchdog is called, set the watchdog timeout regardless of whether the hardware watchdog was successfully initialized. If the watchdog was requested but could not be initialized, then instead of pinging it, attempt to initialize it again. This ensures that the hardware watchdog is initialized even if the kernel module for it isn't loaded when systemd starts (which is quite likely, unless it is compiled in). This builds on work by @danc86 in systemd#17460, but fixes the issue of not updating the watchdog timeout with the actual value from the hardware. Fixes systemd#17838 Co-authored-by: Dan Callaghan <djc@djc.id.au> Co-authored-by: Michael Marley <michael@michaelmarley.com>
Replaced by #17895. |
When manager_{set|override}_watchdog is called, set the watchdog timeout regardless of whether the hardware watchdog was successfully initialized. If the watchdog was requested but could not be initialized, then instead of pinging it, attempt to initialize it again. This ensures that the hardware watchdog is initialized even if the kernel module for it isn't loaded when systemd starts (which is quite likely, unless it is compiled in). This builds on work by @danc86 in systemd#17460, but fixes the issue of not updating the watchdog timeout with the actual value from the hardware. Fixes systemd#17838 Co-authored-by: Dan Callaghan <djc@djc.id.au> Co-authored-by: Michael Marley <michael@michaelmarley.com>
When manager_{set|override}_watchdog is called, set the watchdog timeout regardless of whether the hardware watchdog was successfully initialized. If the watchdog was requested but could not be initialized, then instead of pinging it, attempt to initialize it again. This ensures that the hardware watchdog is initialized even if the kernel module for it isn't loaded when systemd starts (which is quite likely, unless it is compiled in). This builds on work by @danc86 in systemd#17460, but fixes the issue of not updating the watchdog timeout with the actual value from the hardware. Fixes systemd#17838 Co-authored-by: Dan Callaghan <djc@djc.id.au> Co-authored-by: Michael Marley <michael@michaelmarley.com>
When manager_{set|override}_watchdog is called, set the watchdog timeout regardless of whether the hardware watchdog was successfully initialized. If the watchdog was requested but could not be initialized, then instead of pinging it, attempt to initialize it again. This ensures that the hardware watchdog is initialized even if the kernel module for it isn't loaded when systemd starts (which is quite likely, unless it is compiled in). This builds on work by @danc86 in #17460, but fixes the issue of not updating the watchdog timeout with the actual value from the hardware. Fixes #17838 Co-authored-by: Dan Callaghan <djc@djc.id.au> Co-authored-by: Michael Marley <michael@michaelmarley.com>
When manager_{set|override}_watchdog is called, set the watchdog timeout regardless of whether the hardware watchdog was successfully initialized. If the watchdog was requested but could not be initialized, then instead of pinging it, attempt to initialize it again. This ensures that the hardware watchdog is initialized even if the kernel module for it isn't loaded when systemd starts (which is quite likely, unless it is compiled in). This builds on work by @danc86 in systemd/systemd#17460, but fixes the issue of not updating the watchdog timeout with the actual value from the hardware. Fixes systemd/systemd#17838 Co-authored-by: Dan Callaghan <djc@djc.id.au> Co-authored-by: Michael Marley <michael@michaelmarley.com> (cherry picked from commit 61927b9)
This basically reverts commit 61927b9 and relies on the fact that watchdog_ping() will open and setup the watchdog for us in case the device appears later on. Also unlike what is said in comment systemd#17460 (review), both m->watchdog[] and m->overriden_watchdog[] are not supposed to store the actual timeout used by the watchdog device but stores the value defined by the user. If the HW timeout value is really needed by the manager then it's probably better to read it via an helper defined in watchdog.c instead. However the HW timeout value is currently only needed by the watchdog code itself mainly when it calculates the time for the next ping.
This basically reverts commit 61927b9 and relies on the fact that watchdog_ping() will open and setup the watchdog for us in case the device appears later on. Also unlike what is said in comment systemd#17460 (review), both m->watchdog[] and m->overriden_watchdog[] are not supposed to store the actual timeout used by the watchdog device but stores the value defined by the user. If the HW timeout value is really needed by the manager then it's probably better to read it via an helper defined in watchdog.c instead. However the HW timeout value is currently only needed by the watchdog code itself mainly when it calculates the time for the next ping.
This basically reverts commit 61927b9 and relies on the fact that watchdog_ping() will open and setup the watchdog for us in case the device appears later on. Also unlike what is said in comment systemd#17460 (review), both m->watchdog[] and m->overriden_watchdog[] are not supposed to store the actual timeout used by the watchdog device but stores the value defined by the user. If the HW timeout value is really needed by the manager then it's probably better to read it via an helper defined in watchdog.c instead. However the HW timeout value is currently only needed by the watchdog code itself mainly when it calculates the time for the next ping.
When manager_{set|override}_watchdog is called, set the watchdog timeout regardless of whether the hardware watchdog was successfully initialized. If the watchdog was requested but could not be initialized, then instead of pinging it, attempt to initialize it again. This ensures that the hardware watchdog is initialized even if the kernel module for it isn't loaded when systemd starts (which is quite likely, unless it is compiled in). This builds on work by @danc86 in systemd/systemd#17460, but fixes the issue of not updating the watchdog timeout with the actual value from the hardware. Fixes systemd/systemd#17838 Co-authored-by: Dan Callaghan <djc@djc.id.au> Co-authored-by: Michael Marley <michael@michaelmarley.com> (cherry picked from commit 61927b9) [fbui: fixes bsc#1189446]
This basically reverts commit 61927b9 and relies on the fact that watchdog_ping() will open and setup the watchdog for us in case the device appears later on. Also unlike what is said in comment systemd#17460 (review), both m->watchdog[] and m->overriden_watchdog[] are not supposed to store the actual timeout used by the watchdog device but stores the value defined by the user. If the HW timeout value is really needed by the manager then it's probably better to read it via an helper defined in watchdog.c instead. However the HW timeout value is currently only needed by the watchdog code itself mainly when it calculates the time for the next ping.
This basically reverts commit 61927b9 and relies on the fact that watchdog_ping() will open and setup the watchdog for us in case the device appears later on. Also unlike what is said in comment systemd#17460 (review), both m->watchdog[] and m->overriden_watchdog[] are not supposed to store the actual timeout used by the watchdog device but stores the value defined by the user. If the HW timeout value is really needed by the manager then it's probably better to read it via an helper defined in watchdog.c instead. However the HW timeout value is currently only needed by the watchdog code itself mainly when it calculates the time for the next ping.
This basically reverts commit 61927b9 and relies on the fact that watchdog_ping() will open and setup the watchdog for us in case the device appears later on. Also unlike what is said in comment systemd#17460 (review), both m->watchdog[] and m->overriden_watchdog[] are not supposed to store the actual timeout used by the watchdog device but stores the value defined by the user. If the HW timeout value is really needed by the manager then it's probably better to read it via an helper defined in watchdog.c instead. However the HW timeout value is currently only needed by the watchdog code itself mainly when it calculates the time for the next ping. (cherry picked from commit ae4a0ec) Conflicts: src/core/manager.c
This basically reverts commit 61927b9 and relies on the fact that watchdog_ping() will open and setup the watchdog for us in case the device appears later on. Also unlike what is said in comment systemd#17460 (review), both m->watchdog[] and m->overriden_watchdog[] are not supposed to store the actual timeout used by the watchdog device but stores the value defined by the user. If the HW timeout value is really needed by the manager then it's probably better to read it via an helper defined in watchdog.c instead. However the HW timeout value is currently only needed by the watchdog code itself mainly when it calculates the time for the next ping. (cherry picked from commit ae4a0ec) Conflicts: src/core/manager.c
This basically reverts commit 61927b9 and relies on the fact that watchdog_ping() will open and setup the watchdog for us in case the device appears later on. Also unlike what is said in comment systemd#17460 (review), both m->watchdog[] and m->overriden_watchdog[] are not supposed to store the actual timeout used by the watchdog device but stores the value defined by the user. If the HW timeout value is really needed by the manager then it's probably better to read it via an helper defined in watchdog.c instead. However the HW timeout value is currently only needed by the watchdog code itself mainly when it calculates the time for the next ping. (cherry picked from commit ae4a0ec) Conflicts: src/core/manager.c
Commit 986935c ("pid1: update manager settings on reload too")
introduced this new logic, which discards any attempt to set the
runtime watchdog timeout if watchdog_set_timeout() reported an error.
That can happen if /dev/watchdog does not exist or is not working.
But there is an important situation where /dev/watchdog may not exist:
if it is provided by a kernel module that is loaded late in
boot, after pid 1 has loaded its configuration, the
watchdog_set_timeout() call may return an error but the watchdog still
would have been opened successfully on the next ping.
When RuntimeWatchdogSec= is loaded on first boot, or set via DBus, we
need to store the new timeout regardless whether /dev/watchdog exists
and is operational, or not. This restores the previous behaviour from
systemd < v246.