Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

watchdog support doesn't support legacy watchdog interface /dev/watchdog anymore, but ipmi_watchdog driver only supports that #24661

Closed
ctheune opened this issue Sep 13, 2022 · 19 comments · Fixed by #24664
Labels
bug 🐛 Programming errors, that need preferential fixing pid1 watchdog

Comments

@ctheune
Copy link

ctheune commented Sep 13, 2022

systemd version the issue has been seen with

250.4

Used distribution

NixOS

Linux kernel version used

5.15.67

CPU architectures issue was seen on

x86_64

Component

No response

Expected behaviour you didn't see

I configured the explicit device:

[root@wendy00:/etc/nixos]# grep "WatchdogDevice" /etc/systemd/system.conf
WatchdogDevice=/dev/watchdog

However, system keeps trying /dev/watchdog0:

Sep 13 19:35:44 wendy00 systemd[1]: Failed to open watchdog device /dev/watchdog0, ignoring: No such file or directory

When symlinking /dev/watchdog0 to /dev/watchdog (this is an IPMI watchdog) it picks up the device properly.

However, I'm also confused because the documented option is not visible in any "show" inquiries:

[root@wendy00:/etc/nixos]# systemctl show --property=WatchdogDevice

[root@wendy00:/etc/nixos]# echo $?
0

I haven't found any other errors that would indicate me misspelling the property name.

Unexpected behaviour you saw

No response

Steps to reproduce the problem

No response

Additional program output to the terminal or log subsystem illustrating the issue

Sep 13 19:35:44 wendy00 systemd[1]: Failed to open watchdog device /dev/watchdog0, ignoring: No such file or directory
@ctheune ctheune added the bug 🐛 Programming errors, that need preferential fixing label Sep 13, 2022
@yuwata yuwata added the pid1 label Sep 13, 2022
yuwata added a commit to yuwata/systemd that referenced this issue Sep 13, 2022
yuwata added a commit to yuwata/systemd that referenced this issue Sep 13, 2022
@yuwata
Copy link
Member

yuwata commented Sep 13, 2022

Fix is waiting #24664.

yuwata added a commit to yuwata/systemd that referenced this issue Sep 13, 2022
yuwata added a commit to yuwata/systemd that referenced this issue Sep 13, 2022
@bluca
Copy link
Member

bluca commented Sep 14, 2022

@ctheune do you have some custom kernel patches or kconfig? AFAIK /dev/watchdog0 should be available since kernel 3.5, why is it not there on your system?

@bluca bluca added the needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer label Sep 14, 2022
@jcpunk
Copy link
Contributor

jcpunk commented Sep 14, 2022

I'm seeing the same thing on CentOS Stream9. I'm not sure why I don't have a /dev/watchdog0, but my IPMI device is showing up as /dev/watchdog.

@yuwata
Copy link
Member

yuwata commented Sep 14, 2022

@ctheune and @jcpunk Could you provide more details about the watchdog device? E.g. wdctl for the device.

@jcpunk
Copy link
Contributor

jcpunk commented Sep 14, 2022

[root@hostname ~]# ls -l /dev/watch*
crw-------. 1 root root 10, 130 Sep 13 15:58 /dev/watchdog
[root@hostname ~]# wdctl /dev/watchdog 
Device:        /dev/watchdog
Identity:      IPMI [version 1]
Timeout:       300 seconds
Pre-timeout:    0 seconds
[root@hostname ~]# rpm -q  systemd
systemd-250-11.el9.x86_64

@yuwata
Copy link
Member

yuwata commented Sep 14, 2022

Thanks. Could you also provide udevadm info --name=watchdog or udevadm info /dev/watchdog?

@jcpunk
Copy link
Contributor

jcpunk commented Sep 14, 2022

[root@hostname ~]# udevadm info --name=watchdog
P: /devices/virtual/misc/watchdog
N: watchdog
L: 0
E: DEVPATH=/devices/virtual/misc/watchdog
E: DEVNAME=/dev/watchdog
E: MAJOR=10
E: MINOR=130
E: SUBSYSTEM=misc

[root@hostname ~]# udevadm info /dev/watchdog
P: /devices/virtual/misc/watchdog
N: watchdog
L: 0
E: DEVPATH=/devices/virtual/misc/watchdog
E: DEVNAME=/dev/watchdog
E: MAJOR=10
E: MINOR=130
E: SUBSYSTEM=misc

[root@hostname ~]# dmidecode
Handle 0x0014, DMI type 38, 18 bytes
IPMI Device Information
        Interface Type: KCS (Keyboard Control Style)
        Specification Version: 2.0
        I2C Slave Address: 0x10
        NV Storage Device: Not Present
        Base Address: 0x0000000000000CA2 (I/O)
        Register Spacing: Successive Byte Boundaries

@yuwata yuwata removed the needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer label Sep 14, 2022
@yuwata
Copy link
Member

yuwata commented Sep 14, 2022

Thank you.

@poettering
Copy link
Member

/dev/watchdog is a legacy interface, /dev/watchdog0 is how the first watchdog device is called.

It appears you are using a really old kernel or old/broken driver?

@jcpunk
Copy link
Contributor

jcpunk commented Sep 14, 2022

I'm showing kernel 5.14.0-160.el9.x86_64 which should have current looking drivers...

(edit, added modinfo)

[root@hostname ~]# lsmod |grep -i dog
ipmi_watchdog          32768  0
ipmi_msghandler       131072  5 ipmi_devintf,ipmi_si,ipmi_watchdog,acpi_ipmi,ipmi_ssif
[root@hostname ~]# modinfo ipmi_watchdog
filename:       /lib/modules/5.14.0-160.el9.x86_64/kernel/drivers/char/ipmi/ipmi_watchdog.ko.xz
description:    watchdog timer based upon the IPMI interface.
author:         Corey Minyard <minyard@mvista.com>
license:        GPL
rhelversion:    9.1
srcversion:     68A9D8683664E7933D64761
depends:        ipmi_msghandler
retpoline:      Y
intree:         Y
name:           ipmi_watchdog
vermagic:       5.14.0-160.el9.x86_64 SMP preempt mod_unload modversions 
sig_id:         PKCS#7
signer:         CentOS Stream kernel signing key
sig_key:        1A:0C:FE:64:6D:46:38:C1:98:E3:10:1C:F0:7F:5D:E9:FC:51:3C:4D
sig_hashalgo:   sha256
signature:      36:54:B9:9E:E2:93:AD:C8:06:56:E0:A3:AD:2F:93:DF:7A:7D:78:25:
		CF:AA:EB:F0:2F:00:14:C8:E8:53:CE:39:84:DC:1B:FF:FA:C7:8B:5B:
		57:41:E2:A7:7E:E5:2A:AF:E6:F7:1E:FA:DF:77:EA:40:F4:A3:CA:2B:
		5E:22:BC:F0:FA:AB:FE:48:B8:64:01:21:1A:61:0E:32:7F:D4:A3:86:
		EE:05:00:ED:EC:EC:AE:D8:91:0B:0F:A0:C6:E5:FE:41:5D:7E:33:E2:
		8A:C9:0F:9B:BE:F8:B9:8D:E8:B1:80:EC:8B:6A:89:37:A7:AC:C9:A0:
		B1:73:D1:87:63:4D:07:3C:03:B8:BC:2F:A2:8C:28:21:57:36:4B:A6:
		AF:79:0A:B1:D0:A4:68:31:3E:D2:22:AB:CF:F7:F4:FC:E8:41:70:C9:
		E4:72:60:E8:B7:F3:12:3D:1E:70:6B:7B:64:DB:2D:94:3C:1E:BE:FE:
		F0:62:38:38:55:A1:93:F6:07:B8:E6:8C:B7:6A:8D:15:6D:5A:AE:DB:
		D7:3B:57:95:70:91:07:FF:88:97:30:B8:AA:DE:E1:FF:AC:2E:AC:78:
		CB:CE:B1:A4:F5:89:34:92:41:0A:3E:27:2F:93:A3:F6:0E:C8:98:81:
		88:66:BB:CC:1D:71:85:31:A5:9C:DB:C4:2D:04:90:6D:0A:53:9E:B6:
		10:F9:D9:3E:7E:5F:E7:B7:B6:AA:3A:54:90:D1:E3:ED:C9:40:3D:A8:
		56:69:3A:61:AD:CB:65:E2:63:46:8F:89:DA:9E:6C:19:AE:25:2C:A5:
		38:9B:3B:40:F4:A0:69:3C:F4:AB:2F:23:B4:D1:82:A5:E9:81:7E:34:
		A4:EA:06:30:16:10:1F:A9:B1:46:5C:61:F6:D8:60:AF:83:AB:BC:A6:
		DC:53:C8:73:21:1E:D2:A6:67:AC:AA:4F:08:83:6D:E4:1E:EA:9D:35:
		9D:ED:3B:56:55:F4:5D:E4:46:1B:12:3D:78:67:30:C9:1D:FB:EE:52:
		4D:57:72:D4
parm:           ifnum_to_use:The interface number to use for the watchdog timer.  Setting to -1 defaults to the first registered interface (wdog_ifnum)
parm:           timeout:Timeout value in seconds. (timeout)
parm:           pretimeout:Pretimeout value in seconds. (timeout)
parm:           panic_wdt_timeout:Timeout value on kernel panic in seconds. (timeout)
parm:           action:Timeout action. One of: reset, none, power_cycle, power_off.
parm:           preaction:Pretimeout action.  One of: pre_none, pre_smi, pre_nmi, pre_int.
parm:           preop:Pretimeout driver operation.  One of: preop_none, preop_panic, preop_give_data.
parm:           start_now:Set to 1 to start the watchdog assoon as the driver is loaded. (int)
parm:           nowayout:Watchdog cannot be stopped once started (default=CONFIG_WATCHDOG_NOWAYOUT) (bool)
[root@hostname ~]# modinfo ipmi_msghandler
filename:       /lib/modules/5.14.0-160.el9.x86_64/kernel/drivers/char/ipmi/ipmi_msghandler.ko.xz
softdep:        post: ipmi_devintf
version:        39.2
description:    Incoming and outgoing message routing for an IPMI interface.
author:         Corey Minyard <minyard@mvista.com>
license:        GPL
rhelversion:    9.1
srcversion:     AC829DDEEF222504A83CE24
depends:        
retpoline:      Y
intree:         Y
name:           ipmi_msghandler
vermagic:       5.14.0-160.el9.x86_64 SMP preempt mod_unload modversions 
sig_id:         PKCS#7
signer:         CentOS Stream kernel signing key
sig_key:        1A:0C:FE:64:6D:46:38:C1:98:E3:10:1C:F0:7F:5D:E9:FC:51:3C:4D
sig_hashalgo:   sha256
signature:      38:40:DE:CA:D5:FD:55:28:A0:15:AC:FC:7F:DE:92:AA:9C:1C:28:90:
		E9:81:C6:7F:7C:AA:71:EE:C3:7B:47:2E:D2:ED:66:3D:07:BC:16:9C:
		AC:E5:14:92:3F:FD:48:FF:65:04:A0:ED:04:69:71:AB:6E:C6:20:08:
		67:B8:25:54:03:A8:DC:CF:11:5F:3A:03:28:E1:B6:A6:38:09:5A:27:
		AC:14:30:8F:C7:08:F1:6C:7F:CF:30:1B:9F:F5:18:96:30:65:ED:E7:
		54:72:CA:1B:F1:73:D3:EB:58:C7:A7:0F:65:1B:CB:37:8B:D6:D1:39:
		BF:06:E2:E6:58:AC:19:4D:8E:0B:A3:B2:C4:EC:2B:61:DF:F0:5C:0F:
		20:D2:9C:CF:02:12:23:36:F3:2B:A0:36:93:2B:F0:6D:3C:3A:73:47:
		B9:2D:FA:D6:BE:93:7C:95:9E:14:C3:5D:6C:96:7C:54:DF:AC:00:53:
		01:D9:3C:87:8D:DB:A3:C7:1D:23:37:D5:57:0A:71:3C:E4:06:73:45:
		AD:EF:BC:E0:93:0A:DF:17:69:6B:F2:AF:D8:32:DF:DD:BD:4D:77:FD:
		B3:89:0C:30:A3:31:0E:17:E7:4F:9B:DE:82:96:30:76:FC:87:86:E1:
		E4:EC:2E:CD:D1:54:A5:25:12:EA:5B:B1:85:D4:AE:E3:19:6F:6E:FE:
		A0:42:6A:8A:A1:07:3A:94:1F:4B:F1:0A:1A:8A:E0:36:3B:B2:61:16:
		1F:90:C9:7B:12:B5:E2:5F:AA:33:C4:73:F0:63:77:0D:34:62:BC:49:
		21:48:AC:0F:68:82:66:35:D2:3E:1B:39:19:40:FF:91:DF:8E:DF:D3:
		8F:D5:A6:96:A5:41:77:AA:C2:EF:94:32:C4:82:38:30:ED:A8:4B:DB:
		CC:DD:9F:01:47:F8:79:78:5F:0B:B4:68:4C:B1:D2:76:09:9E:07:B9:
		2B:BD:A2:D3:75:2A:31:7D:8A:28:90:66:6F:46:21:83:AF:B2:02:4D:
		11:81:6B:97
parm:           panic_op:Sets if the IPMI driver will attempt to store panic information in the event log in the event of a panic.  Set to 'none' for no, 'event' for a single event, or 'string' for a generic event and the panic string in IPMI OEM events.
parm:           maintenance_mode_timeout_ms:The time (milliseconds) after the last maintenance message that the connection stays in maintenance mode. (ulong)
parm:           default_retry_ms:The time (milliseconds) between retry sends (ulong)
parm:           default_maintenance_retry_ms:The time (milliseconds) between retry sends in maintenance mode (ulong)
parm:           default_max_retries:The time (milliseconds) between retry sends in maintenance mode (uint)

@yuwata
Copy link
Member

yuwata commented Sep 14, 2022

Hm, looking at kernel source, most(?) watchdog driver registers their watchdog devices through watchdog_register_device() in drivers/watchdog/watchdog_core.c, but ipmi_watchdog does not. It seems ipmi_watchdog be a legacy watchdog device.

@poettering
Copy link
Member

hmpf, is ipmi_watchdog still maintained? is this some legacy driver that's on the chopping block? Replaced by something else? Anyone knows the backstory of this?

iirc the watchdog subsystem was added to the linux kernel 10y ago or so, and the drivers converted. Why was the ipmi stuff not covered?

@poettering poettering changed the title WatchdogDevice seems to not be properly used watchdog support doesn't support legacy watchdog interface /dev/watchdog anymore, but ipmi_watchdog driver only supports that Sep 15, 2022
@yuwata
Copy link
Member

yuwata commented Sep 15, 2022

I do not know the history, but the source code of ipmi_watchdog driver is located in drivers/char/ipmi/ipmi_watchdog.c, while other watchdog drivers are in drivers/watchdog.

@yuwata
Copy link
Member

yuwata commented Sep 15, 2022

ipmi_watchdog still maintained?

Not sure, but the last commit for ipmi_watchdog.c is Tue Apr 12 15:49:47 2022 -0500. Seems not outdated.

@jcpunk
Copy link
Contributor

jcpunk commented Sep 15, 2022

I can't speak to the conversion, but the IPMI spec itself requires a watchdog so having it along side the IPMI drivers makes sense to me.

yuwata added a commit to yuwata/systemd that referenced this issue Sep 16, 2022
yuwata added a commit to yuwata/systemd that referenced this issue Sep 16, 2022
@ctheune
Copy link
Author

ctheune commented Sep 20, 2022

Thanks for fixing - I have no idea why the ipmi driver maintenance was stuck. I'd be happy to finance the conversion if I get in touch with the maintainers, though.

@ctheune
Copy link
Author

ctheune commented Sep 20, 2022

It seems like @cminyard is the maintainer here. I'll try to get in touch.

@ctheune
Copy link
Author

ctheune commented Sep 26, 2022

I was in touch with @cminyard and he allowed me to post his response about the situation that the IPMI watchdog isn't using the new interface. For posterity:

Yeah, this has been a long-standing issue. The IPMI watchdog driver has
a lot of capabilities not in the main watchdog interface, and it
predates the main watchdog interface by years. So it was never moved
over. I'm not going to get the capabilities added to the main watchdog
interface, and it's against the kernel rules to drop functionality.

So I'm kind of stuck. Funding is not really the issue.

@henning-schild
Copy link

I am not coming across this as well. Supermicro servers with two watchdogs ... in theory. In practice iTCO has been made to not work forcing people to use ipmi. And IPMI is that horribly outdated driver which does not auto-load and uses that legacy interface. That is all not very nice.

In the kernel the fact that it does not auto-load could in fact be a chance to write a modern version with less features that would auto-load and be mutually exclusive to "ipmi_watchdog". Effectively adding a second driver which will win and be more modern with possibly less features that hardly anyone uses. And one would not need to remove functionality, people who prefer that old driver could manually load it and blacklist the modern one while they are writing special config files for their machine.

So now about systemd ... it just has to support both the new and that legacy interface it seems. Maybe not nice, but there is at least still one legacy driver in any modern kernel, and IPMI is likely to be the only thing you can ever get working in a server.

I am on debian11 with systemd 250 and despite

# cat /etc/systemd/system.conf.d/99-wd.conf 
[Manager]
RuntimeWatchdogSec=60s
ShutdownWatchdogSec=60s
WatchdogDevice=/dev/watchdog
# cat /proc/cmdline 
root=/dev/nvme0n1p4 rootwait rw console=ttyS0,115200 console=tty0 default_hugepagesz=1G hugepagesz=1G hugepages=10 intel_iommu=on iommu=pt vt.handoff=7 earlyprintk debug systemd.watchdog-device=/dev/watchdog modules_load=ipmi_watchdog

i am seeing

systemd[1]: Failed to open watchdog device /dev/watchdog0, ignoring: No such file or directory

and will have to use watchdogd because systemd does not work

From what i read that has been fixed, but there are non rolling distros out there that might be slow on updating things. Maybe again a server-world thing ;)

valentindavid pushed a commit to valentindavid/systemd that referenced this issue Aug 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Programming errors, that need preferential fixing pid1 watchdog
Development

Successfully merging a pull request may close this issue.

6 participants