New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Devices transition plugged -> dead -> plugged is still not completely fixed #12953
Comments
I think the regression is severe enough to deserve the v243 milestone. |
I've reopend the previous issue page... |
Damn I missed that and strangely enough I haven't received any notification about your comment in the other bug although I'm subscribed to it. Maybe we should continue the discussion here ? |
OK, old one is closed again. |
@yuwata I started working on a possible fix for this, please have a look at: https://github.com/fbuihuu/systemd/commits/fix-inconsistent-device-state-transition |
Rather than assuming that the device enumeration can be trusted based on the manager state (reloading vs reexecuting), we now simply look at the udev DB itself - ie if no single devices couldn't have been enumerated then we assume that the DB is not yet initialized [1] and thereby don't trust the enumeration phase. In this case we restore the serialized state and rely on events retriggering to update the view on device PID1 had before it reexecuted. [1] Maybe there's a better way to detect this state. Fixes: systemd#12953
Rather than assuming that the device enumeration can be trusted based on the manager state (reloading vs reexecuting), we now simply look at the udev DB itself - ie if no single devices couldn't have been enumerated then we assume that the DB is not yet initialized [1] and thereby don't trust the enumeration phase. In this case we restore the serialized state and rely on events retriggering to update the view on device PID1 had before it reexecuted. [1] Maybe there's a better way to detect this state. Fixes: systemd#12953
@fbuihuu Do you have a reproducer of this issue? |
Rather than assuming that the device enumeration can be trusted based on the manager state (reloading vs reexecuting), we now simply look at the udev DB itself - ie if no single devices couldn't have been enumerated then we assume that the DB is not yet initialized [1] and thereby don't trust the enumeration phase. In this case we restore the serialized state and rely on events retriggering to update the view on device PID1 had before it reexecuted. [1] Maybe there's a better way to detect this state. Fixes: systemd#12953
@yuwata no sorry. I tried to run "systemctl daemon-reload" right before |
Rather than assuming that the device enumeration can be trusted based on the manager state (reloading vs reexecuting), we now simply look at the udev DB itself - ie if no single devices couldn't have been enumerated then we assume that the DB is not yet initialized [1] and thereby don't trust the enumeration phase. In this case we restore the serialized state and rely on events retriggering to update the view on device PID1 had before it reexecuted. [1] Maybe there's a better way to detect this state. Fixes: systemd#12953
Rather than assuming that the device enumeration can be trusted based on the manager state (reloading vs reexecuting), we now simply look at the udev DB itself - ie if no single devices couldn't have been enumerated then we assume that the DB is not yet initialized [1] and thereby don't trust the enumeration phase. In this case we restore the serialized state and rely on events retriggering to update the view on device PID1 had before it reexecuted. [1] Maybe there's a better way to detect this state. Fixes: systemd#12953
since this isn't clear year, let's move to v244 |
See #13002 for more details. |
@yuwata, it seems that commit c6e892b also introduced a regression. IIRC the commit removed the "plugged -> dead -> plugged" transition after switching root and devices in "plugged" state in initramfs stay in "plugged" state after switching root. But this prevents device dependencies which are only defined in the root filesystem (i.e. not included in initramfs) to be pulled in and started. |
It would nice if all of these (new) issues could be fixed for good. |
@keszybz, IMHO we shouldn't postpone this (again). |
Another user affected by this issue: https://bugzilla.opensuse.org/show_bug.cgi?id=1155170 |
It would be great if such bug could be fixed now. |
While the manager is reloaded, the device_monitor socket is kept open; outstanding uevents will be received sooner or later. Thus if a device isn't found during udev enumeration / coldplug, we shouldn't blindly assume it's gone. Possibly the reload operation took place before udev started, or had the time to re-probe previously discovered devices (this matters mainly during boot / coldplug). So if the device actually disappeared, leave it to the outstanding uevent to make this happen in systemd. Fixes systemd#12953.
See the comments in the code. This is based on the work by Martin Wilck. Fixes systemd#12953 and systemd#23208. Replaces systemd#23215.
See the comments in the code. This is based on the work by Martin Wilck. Fixes systemd#12953 and systemd#23208. Replaces systemd#23215.
See the comments in the code. This is based on the work by Martin Wilck. Fixes systemd#12953 and systemd#23208. Replaces systemd#23215.
See the comments in the code. This is based on the work by Martin Wilck. Fixes systemd#12953 and systemd#23208. Replaces systemd#23215.
See the comments in the code. This is based on the work by Martin Wilck. Fixes systemd#12953 and systemd#23208. Replaces systemd#23215.
See the comments in the code. Fixes systemd#12953 and systemd#23208. Replaces systemd#23215. Co-authored-by: Martin Wilck <mwilck@suse.com>
The issue systemd#12953 is caused by the following: On switching root, - deserialized_found == DEVICE_FOUND_UDEV | DEVICE_FOUND_MOUNT, - deserialized_state == DEVICE_PLUGGED, - enumerated_found == DEVICE_FOUND_MOUNT, On switching root, most devices are not found by the enumeration process. Hence, the device state is set to plugged by device_coldplug(), and then changed to the dead state in device_catchup(). So the corresponding mount point is unmounted. Later when the device is processed by udevd, it will be changed to plugged state again. The issue systemd#23208 is caused by the fact that generated udev database in initramfs and the main system are often different. So, the two issues have the same root; we should not honor DEVICE_FOUND_UDEV bit in the deserialized_found on switching root. This partially reverts c6e892b. Fixes systemd#12953 and systemd#23208. Replaces systemd#23215. Co-authored-by: Martin Wilck <mwilck@suse.com>
The issue systemd#12953 is caused by the following: On switching root, - deserialized_found == DEVICE_FOUND_UDEV | DEVICE_FOUND_MOUNT, - deserialized_state == DEVICE_PLUGGED, - enumerated_found == DEVICE_FOUND_MOUNT, On switching root, most devices are not found by the enumeration process. Hence, the device state is set to plugged by device_coldplug(), and then changed to the dead state in device_catchup(). So the corresponding mount point is unmounted. Later when the device is processed by udevd, it will be changed to plugged state again. The issue systemd#23208 is caused by the fact that generated udev database in initramfs and the main system are often different. So, the two issues have the same root; we should not honor DEVICE_FOUND_UDEV bit in the deserialized_found on switching root. This partially reverts c6e892b. Fixes systemd#12953 and systemd#23208. Replaces systemd#23215. Co-authored-by: Martin Wilck <mwilck@suse.com>
The issue systemd#12953 is caused by the following: On switching root, - deserialized_found == DEVICE_FOUND_UDEV | DEVICE_FOUND_MOUNT, - deserialized_state == DEVICE_PLUGGED, - enumerated_found == DEVICE_FOUND_MOUNT, On switching root, most devices are not found by the enumeration process. Hence, the device state is set to plugged by device_coldplug(), and then changed to the dead state in device_catchup(). So the corresponding mount point is unmounted. Later when the device is processed by udevd, it will be changed to plugged state again. The issue systemd#23208 is caused by the fact that generated udev database in initramfs and the main system are often different. So, the two issues have the same root; we should not honor DEVICE_FOUND_UDEV bit in the deserialized_found on switching root. This partially reverts c6e892b. Fixes systemd#12953 and systemd#23208. Replaces systemd#23215. Co-authored-by: Martin Wilck <mwilck@suse.com>
The issue systemd#12953 is caused by the following: On switching root, - deserialized_found == DEVICE_FOUND_UDEV | DEVICE_FOUND_MOUNT, - deserialized_state == DEVICE_PLUGGED, - enumerated_found == DEVICE_FOUND_MOUNT, On switching root, most devices are not found by the enumeration process. Hence, the device state is set to plugged by device_coldplug(), and then changed to the dead state in device_catchup(). So the corresponding mount point is unmounted. Later when the device is processed by udevd, it will be changed to plugged state again. The issue systemd#23208 is caused by the fact that generated udev database in initramfs and the main system are often different. So, the two issues have the same root; we should not honor DEVICE_FOUND_UDEV bit in the deserialized_found on switching root. This partially reverts c6e892b. Fixes systemd#12953 and systemd#23208. Replaces systemd#23215. Co-authored-by: Martin Wilck <mwilck@suse.com>
The issue systemd#12953 is caused by the following: On switching root, - deserialized_found == DEVICE_FOUND_UDEV | DEVICE_FOUND_MOUNT, - deserialized_state == DEVICE_PLUGGED, - enumerated_found == DEVICE_FOUND_MOUNT, On switching root, most devices are not found by the enumeration process. Hence, the device state is set to plugged by device_coldplug(), and then changed to the dead state in device_catchup(). So the corresponding mount point is unmounted. Later when the device is processed by udevd, it will be changed to plugged state again. The issue systemd#23208 is caused by the fact that generated udev database in initramfs and the main system are often different. So, the two issues have the same root; we should not honor DEVICE_FOUND_UDEV bit in the deserialized_found on switching root. This partially reverts c6e892b. Fixes systemd#12953 and systemd#23208. Replaces systemd#23215. Co-authored-by: Martin Wilck <mwilck@suse.com>
The issue systemd#12953 is caused by the following: On switching root, - deserialized_found == DEVICE_FOUND_UDEV | DEVICE_FOUND_MOUNT, - deserialized_state == DEVICE_PLUGGED, - enumerated_found == DEVICE_FOUND_MOUNT, On switching root, most devices are not found by the enumeration process. Hence, the device state is set to plugged by device_coldplug(), and then changed to the dead state in device_catchup(). So the corresponding mount point is unmounted. Later when the device is processed by udevd, it will be changed to plugged state again. The issue systemd#23208 is caused by the fact that generated udev database in initramfs and the main system are often different. So, the two issues have the same root; we should not honor DEVICE_FOUND_UDEV bit in the deserialized_found on switching root. This partially reverts c6e892b. Fixes systemd#12953 and systemd#23208. Replaces systemd#23215. Co-authored-by: Martin Wilck <mwilck@suse.com>
This is a slightly different approach than the one taken by commit 75d7b59 to fix issue systemd#12953 and systemd#23208. This patch forces PID1 to forget all devices (except those with the "db_persist" option see below) that were known by PID1 before switching root by pretending that the devices were in DEAD state before being serialized. Hence no artificial "plugged->dead" state transitions happen when PID1 is reexecuting from a switch root followed by "dead->plugged" state transitions when all devices are coldplugged with the new set of udev rule from the host. As mentioned previously, devices with the "db_persistent" option are exceptions of the previously described mechanism. Since these devices remain in the udev DB even after the DB has been cleared, they still continue to be deserialized in plugged state and remain in this state hence following the description of the option. This should fix the regression introduced by 75d7b59. Fixes: systemd#23429 Replaces: systemd#23218
The issue systemd#12953 is caused by the following: On switching root, - deserialized_found == DEVICE_FOUND_UDEV | DEVICE_FOUND_MOUNT, - deserialized_state == DEVICE_PLUGGED, - enumerated_found == DEVICE_FOUND_MOUNT, On switching root, most devices are not found by the enumeration process. Hence, the device state is set to plugged by device_coldplug(), and then changed to the dead state in device_catchup(). So the corresponding mount point is unmounted. Later when the device is processed by udevd, it will be changed to plugged state again. The issue systemd#23208 is caused by the fact that generated udev database in initramfs and the main system are often different. So, the two issues have the same root; we should not honor DEVICE_FOUND_UDEV bit in the deserialized_found on switching root. This partially reverts c6e892b. Fixes systemd#12953 and systemd#23208. Replaces systemd#23215. Co-authored-by: Martin Wilck <mwilck@suse.com> (cherry picked from commit 75d7b59)
The issue systemd#12953 is caused by the following: On switching root, - deserialized_found == DEVICE_FOUND_UDEV | DEVICE_FOUND_MOUNT, - deserialized_state == DEVICE_PLUGGED, - enumerated_found == DEVICE_FOUND_MOUNT, On switching root, most devices are not found by the enumeration process. Hence, the device state is set to plugged by device_coldplug(), and then changed to the dead state in device_catchup(). So the corresponding mount point is unmounted. Later when the device is processed by udevd, it will be changed to plugged state again. The issue systemd#23208 is caused by the fact that generated udev database in initramfs and the main system are often different. So, the two issues have the same root; we should not honor DEVICE_FOUND_UDEV bit in the deserialized_found on switching root. This partially reverts c6e892b. Fixes systemd#12953 and systemd#23208. Replaces systemd#23215. Co-authored-by: Martin Wilck <mwilck@suse.com> (cherry picked from commit 75d7b59) [mwilck: fixes bsc#1137373] [mwilck: fixes bsc#1181658] [mwilck: fixes bsc#1194708] [mwilck: fixes bsc#1195157] [mwilck: fixes bsc#1197570]
This happened on Tumbleweed which ships systemd v242.
For the complete debug logs, please consult https://bugzilla.suse.com/show_bug.cgi?id=1137373.
Except from the logs:
This was already reported in #11997 but the fix was incorrect, see #12013 (comment)
The text was updated successfully, but these errors were encountered: