-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
udev: fragile handling of uevent actions breaks with kernel 4.12+ #8221
Comments
Make sure that udev doesn't lose our properties when bind events come in, as implemented in kernels 4.12+. See systemd/systemd#8221 and systemd/systemd#7109
|
Hacked an patch together which let udev ignore bind/unbind uevents. |
|
@buczek: Why not if it fits better. My patch is, as mentioned, only an hack/quick fix to resolve the bug for me. Because this bug also breaks mtp access support in kde-framework: https://bugs.kde.org/show_bug.cgi?id=387454 |
This requires auditing of every udev rule to make sure it works properly. E.g. for
|
|
Looks like this issue is fully breaking all device-level udev tags managed in ModemManager, e.g. to blacklist non-modem TTY ports... https://lists.freedesktop.org/archives/modemmanager-devel/2018-May/006417.html |
It's not clear what "unbind" events precisely mean, they appear to be entirely undocumented and broke all kinds of userspace, but they do exist: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1455cf8dbfd06aa7651dcfccbadb7a093944ca65 Let's treat them like "remove", since apparently they make stuff unavailable (?). This is at least along the lines of what people discussed on systemd#7587 and systemd#8221. See: systemd#7587 systemd#8221
Not sure what the best approach is here, but given the kernel introduced a new concept here, we have to follow up, hence let's skip most rules for "unbind" events too, like we already skip them for "remove". See: systemd#7587 systemd#8221
|
Quite frankly @aleksander0m, mm should invert those rules files checks: instead of listing "positive" events in your rules file ("add", "change", …), list only the negative event of "remove". This should be more robust, and is what udev does for all its own rules files and code. |
|
@poettering thanks for the hint, I'll get to do that for the next release. |
|
So, while the kernel additoin is problematic I think we can actually close this specific issue here: rules files should always use a logic where instead of listing positives, they should list negatives. i.e. create your SYMLINKS= props in all cases but in "remove". This is what udev's/systemd's own rules files do, and is what should be done in 3rd party files too. This bug is hence of mishandling of "bind" events, #7587 is a different kind of bug though, it is about mishandling of "unbind" events. I think "bind" events are easy and safe to handle, as described above. It's just key that everybody does that correctly, which is not the case right now. I don't think we need to change anything about the handling of "bind" in udev/systemd. It sucks it got added in the kernel, and that it broke so much stuff, but it's unlikely to be reverted now, and systemd/udev is not the place to hide/supress the event, or fake compat where the kernel broke it. anyway, closing this one hence. in systemd/udev/logind/… itself "bind" is treated like "change" or "add" everywhere, which means we are good on this. If 3rd party rules files need updating, I am sorry, but please blame the kernel for that, not udev, udev is literally just the messenger there. |
|
I might be missing something, but the original rules posted for bitbabbler does use negatives? |
|
@mbiebl AFAICS it uses |
|
Ok, then I don't understand what's meant here. |
|
@mbiebl the pasted rules files does this: It lists the 'positive' events "add" and "change" (and thus misses all the others, one of them being "bind"). It should instead list negatives, i.e. "remove": This is what all our own rules files do, if you look closely. |
|
And now it should say "remove|unbind" I assume? @poettering when you say "rules files should always use a logic where instead of listing positives, they should list negatives", is that a general suggestion always? how is that more robust, if e.g. now we also need to consider possible negatives that may be missing? The fact that you need to consider "unbind" in your list of negative events (#10998) is the same reason we had to consider "bind" in our list of positive events (this issue #8221). |
|
Don't take me wrong, I'm fine with you closing this issue, udev just follows the new kernel features. It's just been very unfortunate that we all had to change udev rules everywhere due to this new bind/unbind events. |
That's what #10998 does for everything in systemd. But it's not clear that that's the right approach. This needs more discussion, that PR is marked "dont-merge" hence. I have pinged the kernel guys now. But nobody appears to know anything about this. I have never seen any hw needing this, I don't even know what "unbind" really means, as there's no documentation about it. Do you happen to know? |
Means a kernel driver was disconnected from the device, usually to disconnect a generic driver from a device to assign it a more specialised one. For example, for a HID driver: |
|
@aleksander0m See 4b06c40 which did the transition back in 2010 for all files systemd/udev upstream ships. The commit msgs doesn't say much, but yeah, it's more robust to consider all events "positive" except for the ones which are known "negative". |
Yeah, this sucks, but blame the kernel guys for that. udev is literally just the messenger there... |
So what precisely happens on "unbind"? do the various sysfs attr previously exported disappear? |
Any chance you can get me a diff of such a device's |
btw, that's directed to @hadess as much as to @aleksander0m I guess ;-) |
All the sysfs attributes added by that driver get removed, and the driver link is also removed. |
I wouldn't be so righteous "Anyway, if we come to an agreement on this I will look into getting systemd handle this nicely." - @dtor 2/14/2017 [1] The frustration being expressed surrounding this breakage is authentic and arguably justified. Had you followed through on your commitment quoted above, it all could have been prevented. Sometimes things fall through the cracks, people get busy, etc. It's understandable. But for them to then appear nearly two years later acting like they didn't drop the ball, not so much. |
If the device is already removed, we will not be able to load identifying information from the device :-). So we will fail to calculate the correct tag. Abort processing before we save the incorrect data to the DB! Also don't broadcast the incomplete event or run any of the collected RUN programs. This avoids losing tags in the DB, which we need for the "remove" event. On the remove event, we will broadcast the event with the tags (and other properties) loaded from the DB. My test case is to plug and unplug a USB printer, while running on kernel version 4.12+. Before this fix, the device would remain active in `systemctl list-devices *usb*.device`. After this fix, the device unit is removed correctly. This fixes the systemd part of issue systemd#7587. If there are third-party rules which add tags (are there any of these??), they can match our behaviour by handling all unknown events the same as "change" events. This is *already required* on kernels 4.12+ due to the addition of "bind" and "unbind" events, combined with a change to systemd-udev which allowed "bind" and "unbind" events to be executed with all of the same features as "change" events). If your rules skip unknown events, you will lose symlinks and tags, for example. Losing tags is *not allowed for* by the libudev filter design. (Again, I don't know if anything really uses tags outside of the systemd project).If the device is already removed, we will not be able to load identifying information from the device :-). So we will fail to calculate the correct tag. Abort processing before we save the incorrect data to the DB! Also don't broadcast the incomplete event or run any of the collected RUN programs. This avoids losing tags in the DB, which we need for the "remove" event. On the remove event, we will broadcast the event with the tags (and other properties) loaded from the DB. My test case is to plug and unplug a USB printer, while running on kernel version 4.12+. Before this fix, the device would remain active in `systemctl list-devices *usb*.device`. After this fix, the device unit is removed correctly. This fixes the systemd part of systemd#7587 / systemd#8221. Third-party rules which add tags (e.g. to create systemd device units), they can match our behaviour by handling all unknown events the same as "change" events. This is *already required* on kernels 4.12+ due to the addition of "bind" and "unbind" events, combined with a change to systemd-udev which allowed "bind" and "unbind" events to be executed with all of the same features as "change" events). If your rules skip unknown events, you will lose symlinks and tags, for example. Note, removing tags from devices is *not allowed for* by the libudev filter design. (I don't know if anything really consumes tags outside of the systemd project though).
|
On Sat, Dec 8, 2018 at 1:11 PM Vito Caputo ***@***.***> wrote:
Nice comments we have here :)
I wouldn't be so righteous
"Anyway, if we come to an agreement on this I will look into getting
systemd handle this nicely." - @dtor <https://github.com/dtor> 2/14/2017
[1]
The frustration being expressed surrounding this breakage is authentic and
arguably justified. Had you followed through on your commitment quoted
above, it all could have been prevented.
Sometimes things fall through the cracks, people get busy, etc. It's
understandable. But for them to then appear nearly two years later acting
like they didn't drop the ball, not so much.
I did follow up on my promise (as far as having udev handling it nicely by
no longer dropping the events):
5e09a23
The patch was merged to udev, unfortunately even at that time the fact that
udev flushes state on each new event was overlooked (not only by yours
truly, but by the maintainers as well).
|
|
So what's the short version for dumbos like me? Was this fixed in udev, or are downstream changes required? Or are we still debating the fix? |
|
debating the fix... it's not clear at this point if the fix is on the kernel side, udev side, udev-consumers side, multiple or none of the above... |
|
Something like this: #11101 It will simply cause systemd to ignore the bind/unbind events and hopefully prevent it from flushing the old state. Note that while this compiles I have not run it ;) Once we change systemd/udv to accumulate the state instead of resetting it (save for change events I guess) we can revert this patch. |
We need something reliable, something we can count on to be stable kernel API. Something with fewer question marks and ideally some documentation in the kernel tree that this behaviour is official behaviour. |
Tags currently never survive uevents: whenever a uevent is seen they need to be re-applied, or they'll vanish. That's kinda the problem here... So you remove them, by not having TAGS+= in the rules files for a specific uevent. |
|
On Mon, Dec 10, 2018 at 1:28 AM Lennart Poettering ***@***.***> wrote:
Can we remove tags? For practical purposes i doubt "bind" actions would
add any, but we could ask for "unbind" actions to clear them. Another
option is to "tag" the tags with the action that produced the
tag/attribute, and clear them when matching event is received. I.e. clear
tags that were added by "add" on "remove", and clear tags from "bind" on
"unbind". Assume that "change" == "add" for tagging purposes here.
Tags currently never survive uevents: whenever a uevent is seen they need
to be re-applied, or they'll vanish. That's kinda the problem here... So
you remove them, by not having TAGS+= in the rules files for a specific
uevent.
Yes, this assumes that you buy in into my opinion that we should not clear
device state on each and every uevent, but only on some. Then you will
likely need a syntax to clear some tags.
|
Since commit 1455cf8dbfd0 ("driver core: emit uevents when
device is bound to a driver") the kernel started emitting
"bind" and "unbind" uevents which confuse the libgphoto2
udev rules.
This caused ID_GPHOTO2 and GPHOTO2_DRIVER udev properties not being set
on devices, causing them not to be visible to user-space that uses those
properties (such as gvfs' gphoto2 backend).
See systemd/systemd#8221
This tries to address the "bind"/"unbind" uevent kernel API breakage, by changing the semantics of device tags. Previously, tags would be applied on uevents (and the database entries they result in) only depending on the immediate context. This means that if one uevent causes the tag to be set and the next to be unset, this would immediately effect what apps would see and the database entries would contain each time. This is problematic however, as tags are a filtering concept, and if tags vanish then clients won't hence notice when a device stops being relevant to them since not only the tags disappear but immediately also the uevents for it are filtered including the one necessary for the app to notice that the device lost its tag and hence relevance. With this change tags become "eternal". If a tag is applied is once applied to a device it will stay in place forever, until the device is removed. Tags can never be removed again. This means that an app watching a specific set of devices by filtering for a tag is guaranteed to not only see the events where the tag is set but also all follow-up events where the tags might be removed again. This change of behaviour is unfortunate, but is required due to the kernel introducing new "bind" and "unbind" uevents that generally have the effect that tags and properties disappear and apps hence don't notice when a device looses relevance to it. "bind"/"unbind" events were introduced in kernel 4.12, and are now used in more and more subsystems. The introduction broke userspace widely, and this commit is an attempt to provide a way for apps to deal with it. While tags are now "eternal" a new automatic device property CURRENT_TAGS is introduced (matching the existing TAGS property) that always reflects the precise set of tags applied on the most recent events. Thus, when subscribing to devices through tags, all devices that ever had the tag put on them will be be seen, and by CURRENT_TAGS it may be checked whether the device right at the moment matches the tag requirements. See: systemd#7578 systemd#7018 systemd#8221
I see this was left without apparent response. And the same word was later repeated in a comment on one of the other associated issues. @poettering Please do not use that word any more. It reads as intended to provoke. It encourages antagonistic behaviour in other discussions. Both are undesirable. I know everyone has been frustrated by the problem, and maybe also from the userspace breakage around mknod() behaviour. But I see this word as inappropriate, both times it was used. It was antagonistic towards kernel developers, after you had invited kernel developers to look at the discussions. This makes things harder for everyone. Whether or not a kernel developer happens to take offence in any one case. I am not sure why this needed to be pointed out. I am not very practised at doing so. It made things harder for me, at least. |
|
I'm afraid to be a lonely voice here. I relied on bind event to re-configure my Elantech touchpad. The default config didn't work for me, so I fired an echo command to change the protocol when device bound. After upgrade to systemd-240, this no longer works for me and I'm lost. Trying to do the same on add event doesn't work. My rule always fire before the default rule that loads the module - i.e. I have nothing in sysfs yet to respond to my write. I can't understand the details of the discussion. I suffer from unsolicited removal of an important feature. Since I'm on a source-based Gentoo distrib, I reversed your PR for me. Can I raise a humble request? When you remove a function, announce it in the change log. |
The udev rules are handled in lexical order, regardless of the directory in which they live. Try to put e.g. 99 as prefix to your udev rule name. Then it should be parsed after the udev rule which loads the kernel modul |
|
What can we do from driver side to overcome this issue. I have written a driver which uses only "add" and "remove" but with this case sometimes, I am receiving "bind" instead of "add". My driver is not detecting the device. is it good to consider the behaviour is equivalent for both "bind" and "add" ? |
Since commit 1455cf8dbfd0 ("driver core: emit uevents when
device is bound to a driver") the kernel started emitting
"bind" and "unbind" uevents which confuse the libgphoto2
udev rules.
This caused ID_GPHOTO2 and GPHOTO2_DRIVER udev properties not being set
on devices, causing them not to be visible to user-space that uses those
properties (such as gvfs' gphoto2 backend).
See systemd/systemd#8221
This tries to address the "bind"/"unbind" uevent kernel API breakage, by changing the semantics of device tags. Previously, tags would be applied on uevents (and the database entries they result in) only depending on the immediate context. This means that if one uevent causes the tag to be set and the next to be unset, this would immediately effect what apps would see and the database entries would contain each time. This is problematic however, as tags are a filtering concept, and if tags vanish then clients won't hence notice when a device stops being relevant to them since not only the tags disappear but immediately also the uevents for it are filtered including the one necessary for the app to notice that the device lost its tag and hence relevance. With this change tags become "sticky". If a tag is applied is once applied to a device it will stay in place forever, until the device is removed. Tags can never be removed again. This means that an app watching a specific set of devices by filtering for a tag is guaranteed to not only see the events where the tag is set but also all follow-up events where the tags might be removed again. This change of behaviour is unfortunate, but is required due to the kernel introducing new "bind" and "unbind" uevents that generally have the effect that tags and properties disappear and apps hence don't notice when a device looses relevance to it. "bind"/"unbind" events were introduced in kernel 4.12, and are now used in more and more subsystems. The introduction broke userspace widely, and this commit is an attempt to provide a way for apps to deal with it. While tags are now "sticky" a new automatic device property CURRENT_TAGS is introduced (matching the existing TAGS property) that always reflects the precise set of tags applied on the most recent events. Thus, when subscribing to devices through tags, all devices that ever had the tag put on them will be be seen, and by CURRENT_TAGS it may be checked whether the device right at the moment matches the tag requirements. See: systemd#7578 systemd#7018 systemd#8221 sd-device
This tries to address the "bind"/"unbind" uevent kernel API breakage, by changing the semantics of device tags. Previously, tags would be applied on uevents (and the database entries they result in) only depending on the immediate context. This means that if one uevent causes the tag to be set and the next to be unset, this would immediately effect what apps would see and the database entries would contain each time. This is problematic however, as tags are a filtering concept, and if tags vanish then clients won't hence notice when a device stops being relevant to them since not only the tags disappear but immediately also the uevents for it are filtered including the one necessary for the app to notice that the device lost its tag and hence relevance. With this change tags become "sticky". If a tag is applied is once applied to a device it will stay in place forever, until the device is removed. Tags can never be removed again. This means that an app watching a specific set of devices by filtering for a tag is guaranteed to not only see the events where the tag is set but also all follow-up events where the tags might be removed again. This change of behaviour is unfortunate, but is required due to the kernel introducing new "bind" and "unbind" uevents that generally have the effect that tags and properties disappear and apps hence don't notice when a device looses relevance to it. "bind"/"unbind" events were introduced in kernel 4.12, and are now used in more and more subsystems. The introduction broke userspace widely, and this commit is an attempt to provide a way for apps to deal with it. While tags are now "sticky" a new automatic device property CURRENT_TAGS is introduced (matching the existing TAGS property) that always reflects the precise set of tags applied on the most recent events. Thus, when subscribing to devices through tags, all devices that ever had the tag put on them will be be seen, and by CURRENT_TAGS it may be checked whether the device right at the moment matches the tag requirements. See: systemd#7587 systemd#7018 systemd#8221
This tries to address the "bind"/"unbind" uevent kernel API breakage, by changing the semantics of device tags. Previously, tags would be applied on uevents (and the database entries they result in) only depending on the immediate context. This means that if one uevent causes the tag to be set and the next to be unset, this would immediately effect what apps would see and the database entries would contain each time. This is problematic however, as tags are a filtering concept, and if tags vanish then clients won't hence notice when a device stops being relevant to them since not only the tags disappear but immediately also the uevents for it are filtered including the one necessary for the app to notice that the device lost its tag and hence relevance. With this change tags become "sticky". If a tag is applied is once applied to a device it will stay in place forever, until the device is removed. Tags can never be removed again. This means that an app watching a specific set of devices by filtering for a tag is guaranteed to not only see the events where the tag is set but also all follow-up events where the tags might be removed again. This change of behaviour is unfortunate, but is required due to the kernel introducing new "bind" and "unbind" uevents that generally have the effect that tags and properties disappear and apps hence don't notice when a device looses relevance to it. "bind"/"unbind" events were introduced in kernel 4.12, and are now used in more and more subsystems. The introduction broke userspace widely, and this commit is an attempt to provide a way for apps to deal with it. While tags are now "sticky" a new automatic device property CURRENT_TAGS is introduced (matching the existing TAGS property) that always reflects the precise set of tags applied on the most recent events. Thus, when subscribing to devices through tags, all devices that ever had the tag put on them will be be seen, and by CURRENT_TAGS it may be checked whether the device right at the moment matches the tag requirements. See: systemd#7587 systemd#7018 systemd#8221
This script was expecting only add/remove events which has not been the case since Kernel 4.12 (which added bind/unbind). Bind events were getting treated as remove events which would cause hotplugged 3g modems to not work. More info: https://lkml.org/lkml/2018/12/23/128 systemd/systemd#8221 Signed-off-by: Arjun AK <arjunak234@gmail.com>
This script was expecting only add/remove events which has not been the case since Kernel 4.12 (which added bind/unbind). Bind events were getting treated as remove events which would cause hotplugged 3g modems to not work. More info: https://lkml.org/lkml/2018/12/23/128 systemd/systemd#8221 Signed-off-by: Arjun AK <arjunak234@gmail.com> (cherry picked from commit 89ef883)
This script was expecting only add/remove events which has not been the case since Kernel 4.12 (which added bind/unbind). Bind events were getting treated as remove events which would cause hotplugged 3g modems to not work. More info: https://lkml.org/lkml/2018/12/23/128 systemd/systemd#8221 Signed-off-by: Arjun AK <arjunak234@gmail.com>
This script was expecting only add/remove events which has not been the case since Kernel 4.12 (which added bind/unbind). Bind events were getting treated as remove events which would cause hotplugged 3g modems to not work. More info: https://lkml.org/lkml/2018/12/23/128 systemd/systemd#8221 Signed-off-by: Arjun AK <arjunak234@gmail.com>
This script was expecting only add/remove events which has not been the case since Kernel 4.12 (which added bind/unbind). Bind events were getting treated as remove events which would cause hotplugged 3g modems to not work. More info: https://lkml.org/lkml/2018/12/23/128 systemd/systemd#8221 Signed-off-by: Arjun AK <arjunak234@gmail.com>
This script was expecting only add/remove events which has not been the case since Kernel 4.12 (which added bind/unbind). Bind events were getting treated as remove events which would cause hotplugged 3g modems to not work. More info: https://lkml.org/lkml/2018/12/23/128 systemd/systemd#8221 Signed-off-by: Arjun AK <arjunak234@gmail.com>
This script was expecting only add/remove events which has not been the case since Kernel 4.12 (which added bind/unbind). Bind events were getting treated as remove events which would cause hotplugged 3g modems to not work. More info: https://lkml.org/lkml/2018/12/23/128 systemd/systemd#8221 Signed-off-by: Arjun AK <arjunak234@gmail.com> (cherry picked from commit 0e087be)
Submission type
systemd version the issue has been seen with
v237
Used distribution
Debian unstable
Filed as downstream bug report at
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=890641
The following is copied from there.
Hi,
I'm filing this against udev, because that's where the fallout of this
first became obvious to me, though looking at the code there are other
places in systemd effected by it too (pretty much everywhere that the
udev_device_get_action() function is used). And given we'll release
with a kernel that breaks the assumptions this code makes, it really
should be RC - but I'm not going to niggle over severity right now.
The symptom that first got my attention was that setting the SYMLINK
key in a rule for a USB device was no longer creating that symlink in
Buster, despite working as expected in Stretch and several releases
prior.
After fixing the broken udev logging in Buster which made it impossible
to see why that was happening, it appears that the answer is Linux 4.12
added the actions "bind" and "unbind" for notification of devices being
bound to, or released from, a driver. And udev handles that poorly, so
the symlink gets created as expected for the "add" event, and then
promptly gets deleted again when the "bind" event is processed.
Bind events were added in 1455cf8dbfd06aa7651dcfccbadb7a093944ca65 of
Linus' tree. There was a followup commit two months later (Sep 2017)
6878e7de6af726de47f9f3bec649c3f49e786586, to work around a different
symptom of the same brand of problems in udev when MODALIAS is included
in an unbind event - but this one is still a live breakage. And there
may well be others that still just haven't been noticed yet.
The code in systemd/udev (and all other users of these events too)
really should be explicit about exactly which actions are expected to
be processed in any given code path, with "default" paths not doing
anything more than reporting an unknown event was received. It can't
possibly assume safely that any unknown future event should just be
treated the same as "add" or "change" would be. And looking at the
current code it's almost impossible to tell whether some of the less
common action types really were expected to follow the path they will,
or if they are just rare enough that nobody has both seen it explode
and understood why it did so ...
This is a minimal debug log from udev that demonstrates the problem
as seen from a rule setting SYMLINK+="bitbabbler/$attr{serial}", it
was logged with 50-udev-default.rules and 80-drivers.rules disabled
(while trying to figure out if it was some bad interaction with them
at fault), but that makes no difference to the end result here, it
just eliminates the distracting noise they'd interleave in the log.
has worked exactly as we normally expect it to.
This is what 'udevadm monitor' reports for the events which do what is
seen above. It's not from the same test run, but aside from the USB
device enumeration changing, it's the same sequence of events from a
hotplug of the device.
And when it is unplugged, there are likewise new "unbind" events to
deal with, along with the normal "remove" set.
And that should be enough of what I collected while getting to the
bottom of this to make the problem clear, and reasonably easy to
reproduce with any handy device. Looking at the code the problem
should be obvious enough anyway though. I was tempted to also include
a preliminary patch for it - but it really all needs careful auditing
by as many of the eyes responsible for the existing assumptions as
possible, at least in the first pass through.
The udev rules file for bit babbler looks like this:
The text was updated successfully, but these errors were encountered: