-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
udev: make algorithm that selects highest priority devlink less susceptible to race conditions #17431
Conversation
@mwilck I rebased your |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love where this is going (obviously), some comments though
c02230d
to
f9e765c
Compare
Updated. @poettering PTAL. As agreed offline we now run algorithm ( |
Allow testing cases where multiple devices are added and removed. This implies a change of the data structure: every test allows for multiple devices to be added, and "exp_name" etc. are now properties of the device, not of the test.
It's not necessary to write the rules for every udev run, as we now may have many (rather than just 2) per test.
Allow testing cases where multiple devices are added and removed simultaneously. Tests are started as synchronously as possible using a semaphore, in order to test possible race conditions. If this isn't desired, the test parameter "sleep_us" can be set to the number of microseconds to wait between udev invocations.
More often than not, the created devnode is the basename of the sysfs entry. The "devnode" device may be used to override the auto-detected node name. Permissions and major/minor number are now verified on the devnode itself, not on symlinks. For those tests where exp_name is set to the computed devnode name, the explicit "exp_name" can be removed. "exp_name" is only required for symlinks. This allows separate testing for devnodes and symlinks an a follow-up patch.
Test if symlinks are created correctly by comparing the symlink targets to the devnode path. This implies (for the symlink) that major/minor numbers and permissions are correct, as we have tested that on the devnode already.
Instead of testing the existence or non-exisitence of just a single symlink, allow testing of several links per device. Change the test definitions accordingly.
udev hasn't supported renaming device nodes for some time.
the "last_rule" option hasn't been supported for some time. Therefore this test fails if a "not_exp_links" attribute is added, as it should be. Mark it appropriately.
Add some rules that make it a bit harder to pass, mainly the non-existence checks.
These rules have survived from an ancient version of the code and save no purpose any more.
As we can check multiple links in a single test now, these 3 tests can be merged into one.
As we can test multiple devices and multiple links per device in one test now, these two tests can be merged into one.
This is helpful to catch possible regressions in the test. Also, don't count wait() errors, they are likely not udev errors.
Add 4 new tests using multiple devices. Number 2-4 use many devices claiming the same symlink, where only one device has a higher priority thatn the others. They fail sporadically with the current code, if a race condition causes the symlink to point to the wrong device. Test 4 is like test 2 with sleeps in between, it's much less likely to fail.
for easier reproduction of sporadic test failures.
Manually listing all devices in the test definition becomes cumbersome with lots of devices. Add a function that scans on all block devices in the test sysfs and generates a list of devices to test.
Script for generating LOTS of SCSI disk and partition devices in the fake sysfs we use for udev testing. This script is supposed to be run after sys-script.py. It uses code from sys-script.py as template to generate additional SCSI disk data structures. Together with the "generator" code in udev-test.pl added in the previous patch, it allows to run udev tests with almost arbitrarily many devices, and thus to do performance scaling tests.
umount emits an error message "no mount point specified" if the tmpfs isn't mounted yet, which is the normal case. Suppress that by redirecting stderr.
Since 'test/udev-test.pl: count "good" results', we know how many checks succeeded. Add an "expected good" count to make that number more meaningful.
Hi Michael, I finally came down to testing your code. First I made sure the issue reproduced with current master (it does, easily). Then I tested your tree with 50 devices (
IIRC this is not worse than what I saw previously with my code - hard to compare anyway because the hardware has changed in the meantime. Anyway, great job! |
Some numbers from my test system (24 x Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz, 64 GB RAM), from
It scales roughly quadratically with the number of devices. Maybe the change in the directory structure I made in 6b2db24 would improve matters somewhat. Anyway, in practice there are rarely more than ~10 contenders for a symlink. |
i didn't review the perl stuff, I am not a perlguru enough for this |
f9e765c
to
27cbccb
Compare
@poettering @keszybz thanks for the reviews so far! PTAL at the newest version. |
@keszybz I have no idea why hwdb-test fails. AFAICT, hwdb doesn't use any code that I've touched in this PR. |
01a7d8a
to
aaf0ef7
Compare
@keszybz updated PTAL. I removed initialisation of |
Note that st_mtime member of struct stat is defined as follows, #define st_mtime st_mtim.tv_sec Hence we omitted checking nanosecond part of the timestamp (struct timespec) and possibly would miss modifications that happened within the same second.
084f427
to
52edc45
Compare
…ptible to race conditions Previously it was very likely, when multiple contenders for the symlink appear in parallel, that algorithm would select wrong symlink (i.e. one with lower-priority). Now the algorithm is much more defensive and when we detect change in set of contenders for the symlink we reevaluate the selection. Same happens when new symlink replaces already existing symlink that points to different device node.
52edc45
to
30f6dce
Compare
@poettering @keszybz updated. Thank you for all the suggestions and help. |
@mwilck, @msekletar, good to see this done! |
Eventually, we had the customer test this code who originally reported this race and made me suggest #9551 in the first place. Results are mixed, unfortunately. First of all, we see
This message is printed if one path looses the race for the symlink. The message "Too many levels of symbolic links", created from the -ELOOP return value from Note that this message is printed in a situation where only SCSI devices are contending for the same symlink. They all have the same priority. The issue here is, AFAICS, that every contender puts itself first in the list and will only give up if some other device claims a higher priority. It might be better to just use e.g. alphabetical ordering to avoid ping-pong between contenders. |
In PR systemd#17431 we have introduced retry loop in link_update() in order to maximize the chance that we end up with correct target when there are multiple contenders for given symlink. Number of interations in retry loop is either 1 or LINK_UPDATE_MAX_RETRIES, depending on the value of 'initialized' db flag. When device appears for the first time we need to set the flag before calling link_update() via update_devnode() for the second time to make sure we run the second invocation with higher retry loop counter.
In PR systemd#17431 we have introduced retry loop in link_update() in order to maximize the chance that we end up with correct target when there are multiple contenders for given symlink. Number of iterations in retry loop is either 1 or LINK_UPDATE_MAX_RETRIES, depending on the value of 'initialized' db flag. When device appears for the first time we need to set the flag before calling link_update() via update_devnode() for the second time to make sure we run the second invocation with higher retry loop counter.
In PR #17431 we have introduced retry loop in link_update() in order to maximize the chance that we end up with correct target when there are multiple contenders for given symlink. Number of iterations in retry loop is either 1 or LINK_UPDATE_MAX_RETRIES, depending on the value of 'initialized' db flag. When device appears for the first time we need to set the flag before calling link_update() via update_devnode() for the second time to make sure we run the second invocation with higher retry loop counter.
No description provided.