Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ncm-systemd: alias sys-device-pci doesn't match expected pattern (EL9) #1677

Open
jouvin opened this issue Apr 2, 2024 · 19 comments
Open

ncm-systemd: alias sys-device-pci doesn't match expected pattern (EL9) #1677

jouvin opened this issue Apr 2, 2024 · 19 comments
Assignees

Comments

@jouvin
Copy link
Contributor

jouvin commented Apr 2, 2024

I'm trying to install an EL9 server (Alma 9.3) and when running ncm-systemd, I get the following errors:

2024/04/02-11:37:57 [VERB] [INFO] running component: systemd
2024/04/02-11:37:57 [VERB] ---------------------------------------------------------
2024/04/02-11:37:59 [VERB] [ERROR] Found alias "sys-devices-pci0000:00-0000:00:1a.0-usb1-1\\x2d1-1\\x2d1.6-1\\x2d1.6.3.device" for unit sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3.device that doesn't match expected pattern '^(.*)\.(automount|device|mount|path|scope|service|slice|snapshot|socket|swap|target|timer)$'. Skipping.
2024/04/02-11:37:59 [VERB] [ERROR] get_unit_show: no alias for unit sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3.device defined. (Forgot to update cache?)
2024/04/02-11:38:00 [VERB] [ERROR] Found alias "sys-devices-pci0000:00-0000:00:1a.0-usb1-1\\x2d1-1\\x2d1.6-1\\x2d1.6.3.device" for unit sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3.device that doesn't match expected pattern '^(.*)\.(automount|device|mount|path|scope|service|slice|snapshot|socket|swap|target|timer)$'. Skipping.
2024/04/02-11:38:00 [VERB] [ERROR] get_unit_show: no alias for unit sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3.device defined. (Forgot to update cache?)
2024/04/02-11:38:01 [VERB] [INFO] Configure on component systemd executed, 4 errors, 0 warnings

When looking at the details in component/systemd.log, I find:

2024/04/02-11:15:20 [VERB] Getting output of command: /usr/bin/systemctl --all --no-pager --no-legend --full list-units
2024/04/02-11:15:20 [VERB] Getting output of command: /usr/bin/systemctl --all --no-pager --no-legend --full list-unit-files
2024/04/02-11:15:20 [VERB] Getting output of command: /usr/bin/systemctl --no-pager --all show -- sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3.device
2024/04/02-11:15:20 [ERROR] Found alias "sys-devices-pci0000:00-0000:00:1a.0-usb1-1\\x2d1-1\\x2d1.6-1\\x2d1.6.3.device" for unit sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3.device that doesn't match expected pattern '^(.*)\.(automount|device|mount|path|scope|service|slice|snapshot|socket|swap|target|timer)$'. Skipping.
2024/04/02-11:15:20 [VERB] make_cache_alias completed with 328 cached units 37 alias units
2024/04/02-11:15:20 [ERROR] get_unit_show: no alias for unit sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3.device defined. (Forgot to update cache?)
2024/04/02-11:15:20 [VERB] Undefined UnitFileState for unit sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3.device

Environement used:

  • AlmaLinux 9.3
  • Quattor 23.6.0 (ncm-systemd-23.6.0-1.noarch)
@jouvin jouvin assigned jouvin and stdweird and unassigned jouvin Apr 2, 2024
@jouvin
Copy link
Contributor Author

jouvin commented Apr 2, 2024

I just installed another server with a different hardware model and I didn't see the problem initially but it appeared after adding a couple of RPMs related to NFS client. Not sure whether it is related or just that it appeared after the initial configuration/ncm-systemd run...

The HW is also a server from Dell, not the same model but the same generation, so an HW-related issue cannot be excluded...

@stdweird
Copy link
Member

stdweird commented Apr 4, 2024

@jouvin is there a way for you to find out when this unit was discovered/added? maybe journalctl will tell.
i suspect that it pops up during the ncm-systemd run.

if you run it a second time, is the error gone?

@jouvin
Copy link
Contributor Author

jouvin commented Apr 4, 2024

@stdweird no, it's the opposite. During first run the problem is not there but after that it appears and never disappears. I reinstalled my test box to better assess when it happens and this time it happened at the very first run of ncm-systemd. I may have missed it during my previous checks... From journalctl, I get this with journalctl|grep pci|grep usb :

Apr 04 10:16:45 psonar1.ijclab.in2p3.fr kernel: usb 1-1: new high-speed USB device number 2 using ehci-pci
Apr 04 10:16:45 psonar1.ijclab.in2p3.fr kernel: usb 2-1: new high-speed USB device number 2 using ehci-pci
Apr 04 10:16:45 psonar1.ijclab.in2p3.fr kernel: usb 1-1.6: new high-speed USB device number 3 using ehci-pci
Apr 04 10:16:45 psonar1.ijclab.in2p3.fr kernel: usb 1-1.6.3: new high-speed USB device number 4 using ehci-pci
Apr 04 10:17:19 psonar1.ijclab.in2p3.fr component-systemd[1266]: Found alias "sys-devices-pci0000:00-0000:00:1a.0-usb1-1\\x2d1-1\\x2d1.6-1\\x2d1.6.3.device" for unit sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3.device that doesn't match expected pattern '^(.*)\.(automount|device|mount|path|scope|service|slice|snapshot|socket|swap|target|timer)$'. Skipping.
Apr 04 10:17:19 psonar1.ijclab.in2p3.fr component-systemd[1266]: get_unit_show: no alias for unit sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3.device defined. (Forgot to update cache?)
Apr 04 10:17:20 psonar1.ijclab.in2p3.fr component-systemd[1266]: Found alias "sys-devices-pci0000:00-0000:00:1a.0-usb1-1\\x2d1-1\\x2d1.6-1\\x2d1.6.3.device" for unit sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3.device that doesn't match expected pattern '^(.*)\.(automount|device|mount|path|scope|service|slice|snapshot|socket|swap|target|timer)$'. Skipping.
Apr 04 10:17:20 psonar1.ijclab.in2p3.fr component-systemd[1266]: get_unit_show: no alias for unit sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3.device defined. (Forgot to update cache?)

@stdweird
Copy link
Member

stdweird commented Apr 4, 2024

ok, my next guess is that the device uses some utf8 chars in the device name, and the regex doesn't match it because it's not properly dealing with utf8. can you locate the file Systemd/Service/Unit.pm and add use utf8; after the use 5.10.1; and see if this works?

@jouvin
Copy link
Contributor Author

jouvin commented Apr 4, 2024

@stdweird unfortunately it doesn't help. But I think you are right: the name contains some hexadecimal characters that may be a unicode one. Doing systemctl|grep usb, it seems to be the Virtual NIC created for the management port/card of the server:

  sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3-1\x2d1.6.3:1.0-net-idrac.device loaded active     plugged   iDRAC Virtual NIC
  sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3.device                          loaded active     plugged   iDRAC Virtual NIC

According to https://www.compart.com/en/unicode/U+02D1, it may be a "half triangular colon"...

@stdweird
Copy link
Member

stdweird commented Apr 4, 2024

@jouvin hmm, next try: replace the use 5.10.1 with use 5.12 (you can try with or without use utf8, but that should not matter)

@stdweird
Copy link
Member

stdweird commented Apr 4, 2024

@jouvin or do a systemctl show strangeunit.device > output and mail me that. i'll have a look what i can do to make it work

@stdweird
Copy link
Member

stdweird commented Apr 4, 2024

ok, next guess: ithas nothing to do with utf8

there is a method in Unit.pm called _handle_bug_wrong_escaped_unit. it does somethign similar and i think it needs to be extended with support for \x2d :

[root@test2819 ~]# systemd-escape '-'
\x2d
[root@test2819 ~]# systemd-escape -u 'sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3.device'
sys/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.6/1-1.6.3.device
[root@test2819 ~]# systemd-escape 'sys/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.6/1-1.6.3.device'
sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3.device

so in that method add last line

...
my $newid = join("\\", split(/\\x5c/, $id));
$newid = join("-", split(/\\x2d1/, $newid)); # add this line

@jouvin
Copy link
Contributor Author

jouvin commented Apr 4, 2024

@stdweird still not working unfortunately: no message about the wrong escaping found in component-systemd.log. My guess is that the unit name has the \x2d1 characters and thus the test in the test in the method _handle_bug_wrong_escaped_unit before doing the escaping doesn't match (unit name different from the id). What about doing the systemd-escape -u for each unit? It is harmless if there is not escaped characters...

@jouvin
Copy link
Contributor Author

jouvin commented Apr 16, 2024

@stdweird I have been busy deploying our first EL9 systems and had no time to troubleshhot more this problem and come with a fix... I can only say that I started to deploy servers from a different vendor (HP) where the problem doesn't appear... Seems somewhat HW-related...

@stdweird
Copy link
Member

@jouvin i just tried to setup idrac with virtual media attached. i see bunch of devices pop up in dmesg, but nothing going wrong in systemd units.
if you can mail me the ouptut of systemctl show strangeunit.device > output, i'll be able to investiagte further

@jouvin
Copy link
Contributor Author

jouvin commented Apr 17, 2024

@stdweird here it is:
idrac_unit.out.txt

@stdweird
Copy link
Member

can you also do

systemctl list-units | grep pci-device
systemctl list-units | grep pci-device | cat -v

and paste that here. in the Id in the output, there is no escaping; i guess that is the issue.

@jouvin
Copy link
Contributor Author

jouvin commented Apr 17, 2024

@stdweird here it is:

[root@quattorsrv ~]# systemctl list-units | grep sys-devices-pci0000:00-0000:00:1a.0-usb1-1
  sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3-1\x2d1.6.3:1.0-net-idrac.device loaded    active     plugged   iDRAC Virtual NIC
  sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3.device                          loaded    active     plugged   iDRAC Virtual NIC
[root@quattorsrv ~]# systemctl list-units | grep sys-devices-pci0000:00-0000:00:1a.0-usb1-1|cat -v
  sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3-1\x2d1.6.3:1.0-net-idrac.device loaded    active     plugged   iDRAC Virtual NIC
  sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3.device                          loaded    active     plugged   iDRAC Virtual NIC

@jrha
Copy link
Member

jrha commented Apr 24, 2024

https://www.freedesktop.org/software/systemd/man/latest/systemd.unit.html#String%20Escaping%20for%20Inclusion%20in%20Unit%20Names

all other characters which are not ASCII alphanumerics, ":", "_" or "." are replaced by C-style "\x2d" escapes.

@jrha
Copy link
Member

jrha commented Apr 24, 2024

e.g.

> systemd-escape 'abc/123'
abc-123
> systemd-escape 'abc:123'
abc:123
> systemd-escape 'abc-123'
abc\x2d123
> systemd-escape 'abc#123'
abc\x23123
> systemd-escape 'abc?123'
abc\x3f123
> systemd-escape 'abc^123'
abc\x5e123

@jouvin
Copy link
Contributor Author

jouvin commented Apr 24, 2024

Why we don't use systemd-escape to process the names we receive from systemd. I gave it a try in my original tests but failed to complete the change... The unescape function in the component mentions in the comments that it could be an approach...

@wdpypere
Copy link
Contributor

for reference, we also get a similar error while configuring qemu-guest-agent in systemd:

2024/04/25-13:12:59 [VERB] Getting output of command: /usr/bin/systemctl --all --no-pager --no-legend --full list-units
2024/04/25-13:12:59 [VERB] Getting output of command: /usr/bin/systemctl --all --no-pager --no-legend --full list-unit-files
2024/04/25-13:13:00 [VERB] Getting output of command: /usr/bin/systemctl --no-pager --all show -- sys-devices-pci0000:00-0000:00:06.0-virtio3-virtio\x2dports-vport3p1.device
2024/04/25-13:13:00 [ERROR] Found alias "sys-devices-pci0000:00-0000:00:06.0-virtio3-virtio\\x2dports-vport3p1.device" for unit sys-devices-pci0000:00-0000:00:06.0-virtio3-virtio\x2dports-vport3p1.device that doesn't match expected pattern '^(.*)\.(automount|device|mount|path|scope|service|slice|snapshot|socket|swap|target|timer)$'. Skipping.
2024/04/25-13:13:00 [VERB] Getting output of command: /usr/bin/systemctl --no-pager --all show -- dev-virtio\x2dports-org.qemu.guest_agent.0.device
2024/04/25-13:13:00 [ERROR] Found alias "dev-virtio\\x2dports-org.qemu.guest_agent.0.device" for unit dev-virtio\x2dports-org.qemu.guest_agent.0.device that doesn't match expected pattern '^(.*)\.(automount|device|mount|path|scope|service|slice|snapshot|socket|swap|target|timer)$'. Skipping.
2024/04/25-13:13:00 [VERB] make_cache_alias completed with 358 cached units 46 alias units
2024/04/25-13:13:00 [ERROR] get_unit_show: no alias for unit dev-virtio\x2dports-org.qemu.guest_agent.0.device defined. (Forgot to update cache?)
2024/04/25-13:13:00 [VERB] Undefined UnitFileState for unit dev-virtio\x2dports-org.qemu.guest_agent.0.device
2024/04/25-13:13:00 [ERROR] get_unit_show: no alias for unit sys-devices-pci0000:00-0000:00:06.0-virtio3-virtio\x2dports-vport3p1.device defined. (Forgot to update cache?)
2024/04/25-13:13:00 [VERB] Undefined UnitFileState for unit sys-devices-pci0000:00-0000:00:06.0-virtio3-virtio\x2dports-vport3p1.device

@stdweird
Copy link
Member

not sure this is the same.
output of command and is same one el8 and el9

# /usr/bin/systemctl --no-pager --all show -- dev-virtio\x2dports-org.qemu.guest_agent.0.device|grep -e 'Names=\|Id='
Id=dev-virtiox2dports-org.qemu.guest_agent.0.device
Names=dev-virtiox2dports-org.qemu.guest_agent.0.device

this smells like another escaping bug, you clearly see here that the backslash from \x2d from the command line unitname is not in the output of systemctl show Id or Names. so the component is confused

i also spotted another bug: the regex of teh list-units parser needs the extra (?:(?:.|[?]{3})\s)?

Ouptut from /usr/bin/systemctl --all --no-pager --no-legend --full list-units does not match pattern (?^:^(?:.\s)?(?<name>(?<shortname>\S+)\.(?<type>\w+))\s+(?<
loaded>\S+)\s+(?<active>\S+)\s+(?<running>\S+)(?:\s+|$)): ??? syslog.target 

(that is with partial fix)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants