New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core/device: honor deserialized state and check device existence on switching-root #23507
Conversation
The unit name may be hashed. Hence, we cannot obtain the original path from the unit name. The path will be used in the later commits.
2c4928d
to
fffc7bc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
d->found = found; | ||
device_set_state(d, state); | ||
d->found = d->deserialized_found; | ||
device_set_state(d, d->deserialized_state); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll call device_set_state(DEVICE_PLUGGED)
here if deserialized_state == DEVICE_PLUGGED
. This will cause #23208 to occur again. Note that you also set DEVICE_FOUND_UDEV
in d->found
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll call
device_set_state(DEVICE_PLUGGED)
here ifdeserialized_state == DEVICE_PLUGGED
. Note that you also setDEVICE_FOUND_UDEV
ind->found
.
Right.
This will cause #23208 to occur again.
Ah, maybe. I will re-consider the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More seriously, I noticed that this re-introduces #12953. I will update this soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still debugging. But you're generating code faster than me catching up 😀
/* We aren't running yet, let's apply the new mask to the shadow variable instead, which we'll apply as | ||
* soon as we catch-up with the state. */ | ||
d->enumerated_found = (d->enumerated_found & ~mask) | (found & mask); | ||
d->enumerated_mask |= mask; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code path is only used if device_update_found_one()
is called while MANAGER_IS_RUNNING()
is false.
The DEVICE_FOUND_UDEV
bit in enumerated_mask
will be set in the call chain device_enumerate()→device_update_found_by_sysfs()→device_update_found_one()
if and only if the device was encountered by device_enumerate()
, in which case the bit will be set in d->enumerated_found
, too.
Tracking enumerated_mask
separately would only make sense if the DEVICE_FOUND_UDEV
bit could be cleared in enumerated_found
later in a different code path. I don't see how this could happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code path is only used if
device_update_found_one()
is called whileMANAGER_IS_RUNNING()
is false.
Yes.
The
DEVICE_FOUND_UDEV
bit inenumerated_mask
will be set in the call chaindevice_enumerate()→device_update_found_by_sysfs()→device_update_found_one()
if and only if the device was encountered bydevice_enumerate()
, in which case the bit will be set ind->enumerated_found
, too.Tracking
enumerated_mask
separately would only make sense if theDEVICE_FOUND_UDEV
bit could be cleared inenumerated_found
later in a different code path. I don't see how this could happen.
Currently, that's right. But in general, that DEVICE_FOUND_UDEV flag set in xxx_found
means the device is ready, and it is more than the fact that the device exists. Here, I'd like to make it clear if the device has udev database or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, I'd like to make it clear if the device has udev database or not.
But that's exactly the meaning of FLAG_SET(d->enumerated_found, DEVICE_FOUND_UDEV)
, or what am I missing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. If FLAG_SET(d->enumerated_found, DEVICE_FOUND_UDEV)
is true, then device_is_ready()
for the device is true.
If the flag is not set, there are multiple possible situations.
- the device is removed from sysfs,
- the udev database for the device has been removed,
- the device is not ready for PID1, e.g. SYSTEMD_READY=0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. If
FLAG_SET(d->enumerated_found, DEVICE_FOUND_UDEV)
is true, thendevice_is_ready()
for the device is true. If the flag is not set, there are multiple possible situations.* the device is removed from sysfs,
This is unrelated to d->enumerated_found
, because udevd isn't running at this point in time. The device may have disappeared from sysfs, but udev wouldn't have noticed, so the device would still be in the udev db (provided it had the db_persist
flag before switching root — otherwise the device hadn't been enuerated anyway).
* the udev database for the device has been removed, * the device is not ready for PID1, e.g. SYSTEMD_READY=0.
Right, these two can't be distinguished, but AFAICS they wouldn't be distinguished with enumerated_mask
, either??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. If
FLAG_SET(d->enumerated_found, DEVICE_FOUND_UDEV)
is true, thendevice_is_ready()
for the device is true. If the flag is not set, there are multiple possible situations.* the device is removed from sysfs,
This is unrelated to
d->enumerated_found
, because udevd isn't running at this point in time.
This is related.
The device may have disappeared from sysfs, but udev wouldn't have noticed, so the device would still be in the udev db (provided it had the
db_persist
flag before switching root — otherwise the device hadn't been enuerated anyway).
sd_device_enumerator
checks existence of both syspath and udev database. Hence, if a device is not enumerated, then it is removed from sysfs, or udev database is removed.
* the udev database for the device has been removed, * the device is not ready for PID1, e.g. SYSTEMD_READY=0.
Right, these two can't be distinguished, but AFAICS they wouldn't be distinguished with
enumerated_mask
, either??
Right, and this is why we need to check the existence of syspath in device_found_mask_on_catchup()
.
@@ -295,6 +311,9 @@ static int device_serialize(Unit *u, FILE *f, FDSet *fds) { | |||
assert(f); | |||
assert(fds); | |||
|
|||
if (d->path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, I have no idea to split it. To check if the device exists, we need syspath or path to the device node, and they are not explicitly stored previously. Of course, in most cases, it is encoded in the unit name, but not always, as the syspath may be too long.
/* Not switching root. Let's use the all bits. */ | ||
return DEVICE_FOUND_MASK; | ||
|
||
if (FLAGS_SET(d->enumerated_mask, DEVICE_FOUND_UDEV)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICS you might as well test for FLAGS_SET(d->enumerated_found, DEVICE_FOUND_UDEV)
here. enumerated_mask
is not needed. See my comment in device_update_found_one()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See mt reply in the above. We can optimize it. But I cannot find any merit to do that, except for suppressing a tiny amount of memory usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it confusing to introduce a redundant new variable. The code is difficult enough to understand. I guess it's a matter of taste.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I will drop enumerated_mask
in the next push.
src/core/device.c
Outdated
/* The device does not exist any more. */ | ||
return DEVICE_FOUND_MASK; | ||
|
||
/* The device still exists, and we will receive uevent for the device when the device is re-processed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC we'd get here if FLAGS_SET(d->enumerated_found, DEVICE_FOUND_UDEV)
is false (i.e. the device has been "forgotten" while switching root) but still exists in sysfs. 1
In this case you'll call device_update_found_one()
with DEVICE_FOUND_UDEV
cleared in the mask. In device_found_changed()
, you'll end up setting the device state to DEVICE_DEAD
because the previous found
value has DEVICE_FOUND_UDEV
set (see me remark for device_coldplug()
), and the new found
value has not.
Unless I'm completely on the wrong track, this is not what we want.
I like the basic idea to check device existence in sysfs before udev's coldplug processing occurs. But IMO we need to apply a different logic (roughly speaking, we should check sysfs if serialized_found
indicates the devices is present but enumerated_found
does not — I haven't thought it through yet).
Footnotes
-
the device must have existed before switching root, otherwise we'd never know about it. While it can have been removed on the kernel side, udev can hardly have processed a REMOVE event yet at this point in time, when we're switching root. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC we'd get here if
FLAGS_SET(d->enumerated_found, DEVICE_FOUND_UDEV)
is false (i.e. the device has been "forgotten" while switching root) but still exists in sysfs.
Right.
In this case you'll call
device_update_found_one()
withDEVICE_FOUND_UDEV
cleared in the mask. Indevice_found_changed()
, you'll end up setting the device state toDEVICE_DEAD
because the previousfound
value hasDEVICE_FOUND_UDEV
set (see me remark fordevice_coldplug()
), and the newfound
value has not.Unless I'm completely on the wrong track, this is not what we want.
No, new found
still has the DEVICE_FOUND_UDEV flag, as the mask does not have the bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, sorry. I misread the code.
This means in the case described, you wouldn't clear the DEVICE_FOUND_UDEV
bit. Thus in the case at hand, device state would only change if DEVICE_FOUND_MOUNT
or DEVICE_FOUND_SWAP
changed. Which basically can't happen because we wouldn't have (de)activated any mounts or swaps at the point in time when enumerated_found
is initialized. IOW, the device state remains the same that had been set in device_coldplug()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means in the case described, you wouldn't clear the
DEVICE_FOUND_UDEV
bit. Thus in the case at hand, device state would only change ifDEVICE_FOUND_MOUNT
orDEVICE_FOUND_SWAP
changed. Which basically can't happen because we wouldn't have (de)activated any mounts or swaps at the point in time whenenumerated_found
is initialized.
Right.
IOW, the device state remains the same that had been set in
device_coldplug()
.
No. device_dispatch_io()
will drop the DEVICE_FOUND_UDEV
flag on remove uevent or SYSTEMD_READY=0
is set on add/change uevent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can that happen before device_catchup()
is called? (Sorry for bothering you here. I think it's important we clarify everything).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately no. As mentioned in the above, I have no idea about fixing #23208 now...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Sorry for bothering you here. I think it's important we clarify everything).
No problem. I am enjoying this whole discussion.
On switching root, the database for the device may be cleared, or the device itself may be unplugged. The function device_enumerate() cannot distinguish the two cases. Let's explicitly check if the device is still around in device_catchup(). This mostly reverts 75d7b59. Fixes systemd#23429. Unfortunately, this re-introduces systemd#23208.
fffc7bc
to
77aecd9
Compare
Still I like this approach, but I cannot find a way to fix #23208. Closing. |
This reverts 75d7b59.
Fixes #23429.
Replaces #23436, #23437, #23489.