Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed pstore modules with core22 20220706 #72

Open
plars opened this issue Jul 20, 2022 · 3 comments
Open

Failed pstore modules with core22 20220706 #72

plars opened this issue Jul 20, 2022 · 3 comments

Comments

@plars
Copy link

plars commented Jul 20, 2022

We are testing core22 20220706 in beta and hitting this issue frequently on both amd64 and arm devices. It doesn't happen 100% of the time, but it seems to be pretty easy to hit, and easy to reproduce by simply installing core22 from beta channel and rebooting. After rebooting, it looks like there are quite a few failed systemd units if you run this:

$ systemctl --system --no-ask-password --no-pager list-units --state=failed
Failed units:
  UNIT                             LOAD   ACTIVE SUB    DESCRIPTION
● modprobe@chromeos_pstore.service loaded failed failed Load Kernel Module chromeos_pstore
● modprobe@efi_pstore.service      loaded failed failed Load Kernel Module efi_pstore
● modprobe@mtdpstore.service       loaded failed failed Load Kernel Module mtdpstore
● modprobe@pstore_blk.service      loaded failed failed Load Kernel Module pstore_blk
● modprobe@pstore_zone.service     loaded failed failed Load Kernel Module pstore_zone
● modprobe@ramoops.service         loaded failed failed Load Kernel Module ramoops

This persists across future reboots, but if I revert back to the previous stable core22, the problem goes away.

@alfonsosanchezbeato
Copy link
Member

In latest systemd update in jammy, these two lines (that come from systemd/systemd@70e74a5) are added to systemd-pstore.service:

After=modprobe@efi_pstore.service modprobe@mtdpstore.service modprobe@chromeos_pstore.service modprobe@ramoops.service modprobe@pstore_zone.service modprobe@pstore_blk.service
Wants=modprobe@efi_pstore.service modprobe@mtdpstore.service modprobe@chromeos_pstore.service modprobe@ramoops.service modprobe@pstore_zone.service modprobe@pstore_blk.service

The intention is to load any possible pstore provider before trying to start systemd-pstore. Note that actually if the loading of the kernel module fails, that should be fine as - is used as a prefix in modprobe@.service (this makes the status successful even if modprobe returns with error):

...
[Service]
Type=oneshot
ExecStart=-/sbin/modprobe -abq %I

The interesting thing is that these services do not fail because of a failure to load the module, but because an excessive number of retries:

$ sudo journalctl -u modprobe@chromeos_pstore.service
Jul 20 09:02:37 ubuntu systemd[1]: modprobe@chromeos_pstore.service: Deactivated successfully.
Jul 20 09:02:37 ubuntu systemd[1]: Finished Load Kernel Module chromeos_pstore.
Jul 20 09:02:38 ubuntu systemd[1]: Starting Load Kernel Module chromeos_pstore...
Jul 20 09:02:38 ubuntu systemd[1]: modprobe@chromeos_pstore.service: Deactivated successfully.
Jul 20 09:02:38 ubuntu systemd[1]: Finished Load Kernel Module chromeos_pstore.
Jul 20 09:02:38 ubuntu systemd[1]: Starting Load Kernel Module chromeos_pstore...
Jul 20 09:02:39 ubuntu systemd[1]: modprobe@chromeos_pstore.service: Deactivated successfully.
Jul 20 09:02:39 ubuntu systemd[1]: Finished Load Kernel Module chromeos_pstore.
Jul 20 09:02:39 ubuntu systemd[1]: Starting Load Kernel Module chromeos_pstore...
Jul 20 09:02:39 ubuntu systemd[1]: modprobe@chromeos_pstore.service: Deactivated successfully.
Jul 20 09:02:39 ubuntu systemd[1]: Finished Load Kernel Module chromeos_pstore.
Jul 20 09:02:39 ubuntu systemd[1]: Starting Load Kernel Module chromeos_pstore...
Jul 20 09:02:39 ubuntu systemd[1]: modprobe@chromeos_pstore.service: Deactivated successfully.
Jul 20 09:02:39 ubuntu systemd[1]: Finished Load Kernel Module chromeos_pstore.
Jul 20 09:02:39 ubuntu systemd[1]: modprobe@chromeos_pstore.service: Start request repeated too quickly.
Jul 20 09:02:39 ubuntu systemd[1]: modprobe@chromeos_pstore.service: Failed with result 'start-limit-hit'.
Jul 20 09:02:39 ubuntu systemd[1]: Failed to start Load Kernel Module chromeos_pstore.
Jul 20 09:02:39 ubuntu systemd[1]: modprobe@chromeos_pstore.service: Start request repeated too quickly.
Jul 20 09:02:39 ubuntu systemd[1]: modprobe@chromeos_pstore.service: Failed with result 'start-limit-hit'.
Jul 20 09:02:39 ubuntu systemd[1]: Failed to start Load Kernel Module chromeos_pstore.
Jul 20 09:02:39 ubuntu systemd[1]: modprobe@chromeos_pstore.service: Start request repeated too quickly.
Jul 20 09:02:39 ubuntu systemd[1]: modprobe@chromeos_pstore.service: Failed with result 'start-limit-hit'.
Jul 20 09:02:39 ubuntu systemd[1]: Failed to start Load Kernel Module chromeos_pstore.
Jul 20 09:02:39 ubuntu systemd[1]: modprobe@chromeos_pstore.service: Start request repeated too quickly.
Jul 20 09:02:39 ubuntu systemd[1]: modprobe@chromeos_pstore.service: Failed with result 'start-limit-hit'.
Jul 20 09:02:39 ubuntu systemd[1]: Failed to start Load Kernel Module chromeos_pstore.

And this happens because systemd-pstore.service is being called a few consecutive times (each of these lines is a restart):

$ sudo journalctl -u systemd-pstore.service
Jul 20 09:02:37 ubuntu systemd[1]: Condition check resulted in Platform Persistent Storage Archival being skipped.
Jul 20 09:02:39 ubuntu systemd[1]: Condition check resulted in Platform Persistent Storage Archival being skipped.
Jul 20 09:02:39 ubuntu systemd[1]: Condition check resulted in Platform Persistent Storage Archival being skipped.
Jul 20 09:02:39 ubuntu systemd[1]: Condition check resulted in Platform Persistent Storage Archival being skipped.
Jul 20 09:02:39 ubuntu systemd[1]: Condition check resulted in Platform Persistent Storage Archival being skipped.
Jul 20 09:02:39 ubuntu systemd[1]: Condition check resulted in Platform Persistent Storage Archival being skipped.

Interestingly, I see the same on my laptop with jammy:

$ sudo journalctl -b 0 -u systemd-pstore.service
jul 20 07:49:39 numancia systemd[1]: Condition check resulted in Platform Persistent Storage Archival being skipped.
jul 20 07:49:40 numancia systemd[1]: Condition check resulted in Platform Persistent Storage Archival being skipped.
jul 20 07:49:40 numancia systemd[1]: Condition check resulted in Platform Persistent Storage Archival being skipped.
jul 20 07:49:41 numancia systemd[1]: Condition check resulted in Platform Persistent Storage Archival being skipped.

It is just that the number of restarts is smaller so it does not trigger errors in the services loading kernel modules because of excessive number of retries.

To summarize, the problem here is that systemd-pstore.service is being tried too many times, we need to understand why that happens. It should be tried once, and as the condition ConditionDirectoryNotEmpty=/sys/fs/pstore fails, it should just stop there. (Trying to load modules is to try to have something in /sys/fs/pstore before starting the service.)

@valentindavid
Copy link
Contributor

In latest systemd update in jammy, these two lines (that come from systemd/systemd@70e74a5) are added to systemd-pstore.service:

This commit has not yet been released upstream. On jammy, it is applied downstream.

https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1978079

Unfortunately this one is not applied: systemd/systemd@9625350

I suggest we try with this patch added.

@alfonsosanchezbeato
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants