Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix inconsistent device state transition #13002

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

fbuihuu
Copy link
Contributor

@fbuihuu fbuihuu commented Jul 10, 2019

No description provided.

@fbuihuu fbuihuu added the pid1 label Jul 10, 2019
@lgtm-com

This comment has been minimized.

@fbuihuu
Copy link
Contributor Author

fbuihuu commented Jul 10, 2019

Please hold on, I gave it more testing and there is still an issue, investigating...

@fbuihuu fbuihuu force-pushed the fix-inconsistent-device-state-transition branch from 74eea20 to f6b5399 Compare July 10, 2019 07:57
src/core/device.c Outdated Show resolved Hide resolved
src/core/device.c Outdated Show resolved Hide resolved
src/core/device.c Outdated Show resolved Hide resolved
@@ -176,6 +176,7 @@ static int device_coldplug(Unit *u) {
}

static void device_catchup(Unit *u) {
Manager *m = u->manager;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, can you elaborate why you need to know at all whether the udev db is trusted or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's supposed to be describe in the commit message. Which part of it should be improved ?

@poettering poettering added the reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks label Jul 11, 2019
@fbuihuu fbuihuu force-pushed the fix-inconsistent-device-state-transition branch from f6b5399 to 319ac3e Compare July 12, 2019 10:19
@fbuihuu fbuihuu removed the reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks label Jul 12, 2019
@fbuihuu
Copy link
Contributor Author

fbuihuu commented Jul 12, 2019

@poettering new version force pushed, hopefully I haven't missed anything this time.

@fbuihuu
Copy link
Contributor Author

fbuihuu commented Jul 12, 2019

BTW this can be tested by reloading PID1 right before udevd is executed:

# cat /etc/systemd/system/systemd-udevd.service.d/daemon-reload.conf 
[Service]
ExecStartPre=/usr/bin/systemctl daemon-reload

Without this patchset:

# journalctl -b | grep -F "plugged -> dead" | wc -l
98

With this patchset:

# journalctl -b | grep -F "plugged -> dead" | wc -l
0

@yuwata
Copy link
Member

yuwata commented Jul 14, 2019

Please force-push this to re-trigger CIs.

@fbuihuu fbuihuu force-pushed the fix-inconsistent-device-state-transition branch from 319ac3e to d4cae70 Compare July 15, 2019 09:02
d->enumerated_found was used to store the new 'found' value recalculated during
the enumeration phase while d->found was keeping the previous value.

However there's no such needs as the previous value is already kept by
d->deserialized_found, so let's use d->found instead to store the new one.
Rather than assuming that the device enumeration can be trusted based on the
manager state (reloading vs reexecuting), we now simply look at the udev DB
itself - ie if no single devices couldn't have been enumerated then we assume
that the DB is not yet initialized [1] and thereby don't trust the enumeration
phase.

In this case we restore the serialized state and rely on events retriggering to
update the view on device PID1 had before it reexecuted.

[1] Maybe there's a better way to detect this state.

Fixes: systemd#12953
@fbuihuu fbuihuu force-pushed the fix-inconsistent-device-state-transition branch from 319ac3e to 5b4341b Compare July 15, 2019 09:06
@fbuihuu
Copy link
Contributor Author

fbuihuu commented Jul 15, 2019

@yuwata rebased and force pushed.

* because reload of PID1 can happen at anytime especially early during the boot
* process. So in order to figure this out, we simply assume that if the DB is
* fully populated, we should at least enumerate one device otherwise we consider
* that the device enumeration hasn't happened yet hence we shouldn't trust it */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the assumption true even if there exist a rule which has db_persistent?

Copy link
Member

@yuwata yuwata Jul 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more question. What happens when udev is processing devices? That is, some devices are initialized, but some are not.

@yuwata
Copy link
Member

yuwata commented Jul 17, 2019

What happens if daemon-reload is triggered by ExecStartPost=, rather than ExecStartPre=? You can slow down udev e.g., by running with valgrind or using single child process with --children-max=1. Then, I guess you can introduce the situation such that 'some devices are initialized but some are not'.

@fbuihuu
Copy link
Contributor Author

fbuihuu commented Jul 17, 2019

@yuwata yeah, you're right it's still racy :(

And the whole sequence "cleanup DB/retrigger events" is completely asynchronous... I can't see any way to know when the DB is full repopulated.

Copy link
Member

@poettering poettering left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the late review.

FOREACH_DEVICE(e, dev) {
const char *sysfs;

m->honor_device_enumeration = true;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, i find this a tiny bit sloppy. I wonder if we can find a better approach here.

maybe simply make systemd-udev-trigger.service a magic service: we trust udev events exactly as long as it is up and running.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I don't remember all the details so I'm not sure to get your idea...

@fbuihuu
Copy link
Contributor Author

fbuihuu commented Jan 24, 2020

@poettering, I think the whole mess is due to the fact that udevd needs to be stopped while switching root.

This implies its DB can be outdated and I guess that's the reason why the DB needs to be cleared (to not miss any "REMOVE" uevents) and we have to retrigger all events (to not miss any "ADD" uevents) before starting udevd again.

Another thing, also part of the problem, is that we don't know when the DB is fully updated after systemd-udev-trigger.service has been fired up and I don't think that the fact that either
systemd-udev-trigger.service or systemd-udevd.service are up and running helps.

It would be much simpler if udevd could survive from the switch root somehow. This way no event would be lost and all events that PID1 is monitoring would be queued. Hence reexecuting (and regardless if it was after switching root) would work like reloading, PID1 would just need to read the socket and process events (if any) as usual. In fact this might make the device enumeration step no more necessary... (BTW why do we need it when PID1 is simply reloading or reexecuting without
switching root ?).

Hmm or maybe rather than trying to make the whole udevd process survive from the switch root, we could try to leave the udev's netlink socket opened somehow and reuse it after udevd is restarted later. This way, both "ADD" and "REMOVE" device uevents could still be queued while udevd is temporary unavailable ?

We have already an similar mechanism to temporary store fds in PID1 but it only works for service restarting currently. Maybe we could extend it a bit and make it possible to store fds "permanently" ?

Base automatically changed from master to main January 21, 2021 11:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

4 participants