Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

manager: always store watchdog timeout, even if watchdog is not ready #17460

Closed
wants to merge 1 commit into from

Conversation

danc86
Copy link
Contributor

@danc86 danc86 commented Oct 27, 2020

Commit 986935c ("pid1: update manager settings on reload too")
introduced this new logic, which discards any attempt to set the
runtime watchdog timeout if watchdog_set_timeout() reported an error.
That can happen if /dev/watchdog does not exist or is not working.

But there is an important situation where /dev/watchdog may not exist:
if it is provided by a kernel module that is loaded late in
boot, after pid 1 has loaded its configuration, the
watchdog_set_timeout() call may return an error but the watchdog still
would have been opened successfully on the next ping.

When RuntimeWatchdogSec= is loaded on first boot, or set via DBus, we
need to store the new timeout regardless whether /dev/watchdog exists
and is operational, or not. This restores the previous behaviour from
systemd < v246.

Commit 986935c ("pid1: update manager settings on reload too")
introduced this new logic, which discards any attempt to set the
runtime watchdog timeout if watchdog_set_timeout() reported an error.
That can happen if /dev/watchdog does not exist or is not working.

But there is an important situation where /dev/watchdog may not exist:
if it is provided by a kernel module that is loaded late in
boot, after pid 1 has loaded its configuration, the
watchdog_set_timeout() call may return an error but the watchdog still
would have been opened successfully on the next ping.

When RuntimeWatchdogSec= is loaded on first boot, or set via DBus, we
need to store the new timeout regardless whether /dev/watchdog exists
and is operational, or not. This restores the previous behaviour from
systemd < v246.
Copy link
Member

@poettering poettering left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this doesn't look right. Watchdog hw/drivers typically don't support all possible timeouts, but just a few discrete choices. Hence we have to set the timeout to some value, which the driver then finds a close approximation for, and read it back to see what the actually configured timeout was. The resulting timeout we then need to take into account for pinging the device regularly enough. That's why watchdog_set_timeout() takes a pointer to a timeout instead of a literal value, it is both an input and output parameter. The updated timeout is stored in m->watchdog[] and m->overriden_watchdog[].

This propagation step from the hw back to the two array fields is also necessary when we start using a device later on, that wasn't available at first. And that's missing right now.

@@ -3409,18 +3407,15 @@ void manager_set_watchdog(Manager *m, WatchdogType t, usec_t timeout) {
if (t == WATCHDOG_RUNTIME)
if (!timestamp_is_set(m->watchdog_overridden[WATCHDOG_RUNTIME])) {
if (timestamp_is_set(timeout))
r = watchdog_set_timeout(&timeout);
watchdog_set_timeout(&timeout);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that we actively ignore the return code, please cast result explicitly to (void)

@@ -3434,13 +3429,12 @@ int manager_override_watchdog(Manager *m, WatchdogType t, usec_t timeout) {

p = timestamp_is_set(timeout) ? &timeout : &m->watchdog[t];
if (timestamp_is_set(*p))
r = watchdog_set_timeout(p);
watchdog_set_timeout(p);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above

@keszybz
Copy link
Member

keszybz commented Dec 7, 2020

@danc86 another round?

mamarley added a commit to mamarley/systemd that referenced this pull request Dec 8, 2020
…r loaded

When manager_{set|override}_watchdog is called, set the watchdog timeout
regardless of whether the hardware watchdog was successfully initialized.  If
the watchdog was requested but could not be initialized, then instead of
pinging it, attempt to initialize it again.  This ensures that the hardware
watchdog is initialized even if the kernel module for it isn't loaded when
systemd starts (which is quite likely, unless it is compiled in).

This builds on work by @danc86 in systemd#17460,
but fixes the issue of not updating the watchdog timeout with the actual value
from the hardware.

Co-authored-by: Dan Callaghan <djc@djc.id.au>
Co-authored-by: Michael Marley <michael@michaelmarley.com>
mamarley added a commit to mamarley/systemd that referenced this pull request Dec 8, 2020
…r loaded

When manager_{set|override}_watchdog is called, set the watchdog timeout
regardless of whether the hardware watchdog was successfully initialized.  If
the watchdog was requested but could not be initialized, then instead of
pinging it, attempt to initialize it again.  This ensures that the hardware
watchdog is initialized even if the kernel module for it isn't loaded when
systemd starts (which is quite likely, unless it is compiled in).

This builds on work by @danc86 in systemd#17460,
but fixes the issue of not updating the watchdog timeout with the actual value
from the hardware.

Fixes systemd#17838

Co-authored-by: Dan Callaghan <djc@djc.id.au>
Co-authored-by: Michael Marley <michael@michaelmarley.com>
mamarley added a commit to mamarley/systemd that referenced this pull request Dec 8, 2020
When manager_{set|override}_watchdog is called, set the watchdog timeout
regardless of whether the hardware watchdog was successfully initialized.  If
the watchdog was requested but could not be initialized, then instead of
pinging it, attempt to initialize it again.  This ensures that the hardware
watchdog is initialized even if the kernel module for it isn't loaded when
systemd starts (which is quite likely, unless it is compiled in).

This builds on work by @danc86 in systemd#17460,
but fixes the issue of not updating the watchdog timeout with the actual value
from the hardware.

Fixes systemd#17838

Co-authored-by: Dan Callaghan <djc@djc.id.au>
Co-authored-by: Michael Marley <michael@michaelmarley.com>
mamarley added a commit to mamarley/systemd that referenced this pull request Dec 8, 2020
When manager_{set|override}_watchdog is called, set the watchdog timeout
regardless of whether the hardware watchdog was successfully initialized.  If
the watchdog was requested but could not be initialized, then instead of
pinging it, attempt to initialize it again.  This ensures that the hardware
watchdog is initialized even if the kernel module for it isn't loaded when
systemd starts (which is quite likely, unless it is compiled in).

This builds on work by @danc86 in systemd#17460,
but fixes the issue of not updating the watchdog timeout with the actual value
from the hardware.

Fixes systemd#17838

Co-authored-by: Dan Callaghan <djc@djc.id.au>
Co-authored-by: Michael Marley <michael@michaelmarley.com>
mamarley added a commit to mamarley/systemd that referenced this pull request Dec 8, 2020
When manager_{set|override}_watchdog is called, set the watchdog timeout
regardless of whether the hardware watchdog was successfully initialized.  If
the watchdog was requested but could not be initialized, then instead of
pinging it, attempt to initialize it again.  This ensures that the hardware
watchdog is initialized even if the kernel module for it isn't loaded when
systemd starts (which is quite likely, unless it is compiled in).

This builds on work by @danc86 in systemd#17460,
but fixes the issue of not updating the watchdog timeout with the actual value
from the hardware.

Fixes systemd#17838

Co-authored-by: Dan Callaghan <djc@djc.id.au>
Co-authored-by: Michael Marley <michael@michaelmarley.com>
@yuwata
Copy link
Member

yuwata commented Dec 8, 2020

Replaced by #17895.

@yuwata yuwata closed this Dec 8, 2020
mamarley added a commit to mamarley/systemd that referenced this pull request Dec 8, 2020
When manager_{set|override}_watchdog is called, set the watchdog timeout
regardless of whether the hardware watchdog was successfully initialized.  If
the watchdog was requested but could not be initialized, then instead of
pinging it, attempt to initialize it again.  This ensures that the hardware
watchdog is initialized even if the kernel module for it isn't loaded when
systemd starts (which is quite likely, unless it is compiled in).

This builds on work by @danc86 in systemd#17460,
but fixes the issue of not updating the watchdog timeout with the actual value
from the hardware.

Fixes systemd#17838

Co-authored-by: Dan Callaghan <djc@djc.id.au>
Co-authored-by: Michael Marley <michael@michaelmarley.com>
mamarley added a commit to mamarley/systemd that referenced this pull request Dec 8, 2020
When manager_{set|override}_watchdog is called, set the watchdog timeout
regardless of whether the hardware watchdog was successfully initialized.  If
the watchdog was requested but could not be initialized, then instead of
pinging it, attempt to initialize it again.  This ensures that the hardware
watchdog is initialized even if the kernel module for it isn't loaded when
systemd starts (which is quite likely, unless it is compiled in).

This builds on work by @danc86 in systemd#17460,
but fixes the issue of not updating the watchdog timeout with the actual value
from the hardware.

Fixes systemd#17838

Co-authored-by: Dan Callaghan <djc@djc.id.au>
Co-authored-by: Michael Marley <michael@michaelmarley.com>
bluca pushed a commit that referenced this pull request Dec 9, 2020
When manager_{set|override}_watchdog is called, set the watchdog timeout
regardless of whether the hardware watchdog was successfully initialized.  If
the watchdog was requested but could not be initialized, then instead of
pinging it, attempt to initialize it again.  This ensures that the hardware
watchdog is initialized even if the kernel module for it isn't loaded when
systemd starts (which is quite likely, unless it is compiled in).

This builds on work by @danc86 in #17460,
but fixes the issue of not updating the watchdog timeout with the actual value
from the hardware.

Fixes #17838

Co-authored-by: Dan Callaghan <djc@djc.id.au>
Co-authored-by: Michael Marley <michael@michaelmarley.com>
keszybz pushed a commit to systemd/systemd-stable that referenced this pull request Dec 16, 2020
When manager_{set|override}_watchdog is called, set the watchdog timeout
regardless of whether the hardware watchdog was successfully initialized.  If
the watchdog was requested but could not be initialized, then instead of
pinging it, attempt to initialize it again.  This ensures that the hardware
watchdog is initialized even if the kernel module for it isn't loaded when
systemd starts (which is quite likely, unless it is compiled in).

This builds on work by @danc86 in systemd/systemd#17460,
but fixes the issue of not updating the watchdog timeout with the actual value
from the hardware.

Fixes systemd/systemd#17838

Co-authored-by: Dan Callaghan <djc@djc.id.au>
Co-authored-by: Michael Marley <michael@michaelmarley.com>
(cherry picked from commit 61927b9)
fbuihuu added a commit to fbuihuu/systemd that referenced this pull request Sep 6, 2021
This basically reverts commit 61927b9 and
relies on the fact that watchdog_ping() will open and setup the watchdog for us
in case the device appears later on.

Also unlike what is said in comment
systemd#17460 (review), both
m->watchdog[] and m->overriden_watchdog[] are not supposed to store the actual
timeout used by the watchdog device but stores the value defined by the user.

If the HW timeout value is really needed by the manager then it's probably
better to read it via an helper defined in watchdog.c instead. However the HW
timeout value is currently only needed by the watchdog code itself mainly when
it calculates the time for the next ping.
fbuihuu added a commit to fbuihuu/systemd that referenced this pull request Sep 15, 2021
This basically reverts commit 61927b9 and
relies on the fact that watchdog_ping() will open and setup the watchdog for us
in case the device appears later on.

Also unlike what is said in comment
systemd#17460 (review), both
m->watchdog[] and m->overriden_watchdog[] are not supposed to store the actual
timeout used by the watchdog device but stores the value defined by the user.

If the HW timeout value is really needed by the manager then it's probably
better to read it via an helper defined in watchdog.c instead. However the HW
timeout value is currently only needed by the watchdog code itself mainly when
it calculates the time for the next ping.
fbuihuu added a commit to fbuihuu/systemd that referenced this pull request Sep 15, 2021
This basically reverts commit 61927b9 and
relies on the fact that watchdog_ping() will open and setup the watchdog for us
in case the device appears later on.

Also unlike what is said in comment
systemd#17460 (review), both
m->watchdog[] and m->overriden_watchdog[] are not supposed to store the actual
timeout used by the watchdog device but stores the value defined by the user.

If the HW timeout value is really needed by the manager then it's probably
better to read it via an helper defined in watchdog.c instead. However the HW
timeout value is currently only needed by the watchdog code itself mainly when
it calculates the time for the next ping.
fbuihuu pushed a commit to openSUSE/systemd that referenced this pull request Sep 22, 2021
When manager_{set|override}_watchdog is called, set the watchdog timeout
regardless of whether the hardware watchdog was successfully initialized.  If
the watchdog was requested but could not be initialized, then instead of
pinging it, attempt to initialize it again.  This ensures that the hardware
watchdog is initialized even if the kernel module for it isn't loaded when
systemd starts (which is quite likely, unless it is compiled in).

This builds on work by @danc86 in systemd/systemd#17460,
but fixes the issue of not updating the watchdog timeout with the actual value
from the hardware.

Fixes systemd/systemd#17838

Co-authored-by: Dan Callaghan <djc@djc.id.au>
Co-authored-by: Michael Marley <michael@michaelmarley.com>
(cherry picked from commit 61927b9)

[fbui: fixes bsc#1189446]
peckato1 pushed a commit to peckato1/systemd that referenced this pull request Oct 4, 2021
This basically reverts commit 61927b9 and
relies on the fact that watchdog_ping() will open and setup the watchdog for us
in case the device appears later on.

Also unlike what is said in comment
systemd#17460 (review), both
m->watchdog[] and m->overriden_watchdog[] are not supposed to store the actual
timeout used by the watchdog device but stores the value defined by the user.

If the HW timeout value is really needed by the manager then it's probably
better to read it via an helper defined in watchdog.c instead. However the HW
timeout value is currently only needed by the watchdog code itself mainly when
it calculates the time for the next ping.
peckato1 pushed a commit to peckato1/systemd that referenced this pull request Oct 7, 2021
This basically reverts commit 61927b9 and
relies on the fact that watchdog_ping() will open and setup the watchdog for us
in case the device appears later on.

Also unlike what is said in comment
systemd#17460 (review), both
m->watchdog[] and m->overriden_watchdog[] are not supposed to store the actual
timeout used by the watchdog device but stores the value defined by the user.

If the HW timeout value is really needed by the manager then it's probably
better to read it via an helper defined in watchdog.c instead. However the HW
timeout value is currently only needed by the watchdog code itself mainly when
it calculates the time for the next ping.
bluca pushed a commit to bluca/systemd that referenced this pull request Mar 4, 2022
This basically reverts commit 61927b9 and
relies on the fact that watchdog_ping() will open and setup the watchdog for us
in case the device appears later on.

Also unlike what is said in comment
systemd#17460 (review), both
m->watchdog[] and m->overriden_watchdog[] are not supposed to store the actual
timeout used by the watchdog device but stores the value defined by the user.

If the HW timeout value is really needed by the manager then it's probably
better to read it via an helper defined in watchdog.c instead. However the HW
timeout value is currently only needed by the watchdog code itself mainly when
it calculates the time for the next ping.

(cherry picked from commit ae4a0ec)

Conflicts:
	src/core/manager.c
bluca pushed a commit to bluca/systemd that referenced this pull request Oct 3, 2022
This basically reverts commit 61927b9 and
relies on the fact that watchdog_ping() will open and setup the watchdog for us
in case the device appears later on.

Also unlike what is said in comment
systemd#17460 (review), both
m->watchdog[] and m->overriden_watchdog[] are not supposed to store the actual
timeout used by the watchdog device but stores the value defined by the user.

If the HW timeout value is really needed by the manager then it's probably
better to read it via an helper defined in watchdog.c instead. However the HW
timeout value is currently only needed by the watchdog code itself mainly when
it calculates the time for the next ping.

(cherry picked from commit ae4a0ec)

Conflicts:
	src/core/manager.c
bluca pushed a commit to bluca/systemd that referenced this pull request Apr 21, 2023
This basically reverts commit 61927b9 and
relies on the fact that watchdog_ping() will open and setup the watchdog for us
in case the device appears later on.

Also unlike what is said in comment
systemd#17460 (review), both
m->watchdog[] and m->overriden_watchdog[] are not supposed to store the actual
timeout used by the watchdog device but stores the value defined by the user.

If the HW timeout value is really needed by the manager then it's probably
better to read it via an helper defined in watchdog.c instead. However the HW
timeout value is currently only needed by the watchdog code itself mainly when
it calculates the time for the next ping.

(cherry picked from commit ae4a0ec)

Conflicts:
	src/core/manager.c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pid1 reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks
Development

Successfully merging this pull request may close these issues.

Hardware watchdog does not work if kernel module was not loaded by the time systemd starts
4 participants