-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
internal/exec/stages/disks: prevent races with udev #35
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This was referenced Feb 11, 2022
pothos
force-pushed
the
kai/udev-race
branch
4 times, most recently
from
February 16, 2022 10:30
3ef74a4
to
efa39f9
Compare
pothos
changed the title
wip: try to prevent races by issuing and waiting for tagged uevents
internal/exec/stages/disks: prevent races with udev
Feb 16, 2022
pothos
added a commit
to flatcar-archive/coreos-overlay
that referenced
this pull request
Feb 16, 2022
This pulls in flatcar/ignition#35 to prevent boot failures such as fsck running while udev was still processing the disk changes, and thus failing when the /dev/disk/ symlink is shortly gone.
jepio
approved these changes
Feb 16, 2022
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks OK to me, but would appreciate a second pair of eyes looking at this.
krnowak
approved these changes
Feb 16, 2022
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
The "udevadm settle" command used to wait for udev to process the disk changes and recreate the entries under /dev was still prone to races where udev didn't get notified yet of the final event to wait for. This caused the boot with a btrfs root filesystem created by Ignition to fail almost every time on certain hardware. Issue tagged events and wait for them to be processed by udev. This is actually meanigful in all stages not only for the other parts of the initramfs which may be surprised by sudden device nodes disappearing shortly like the case was with systemd's fsck service but also for the inter-stage dependencies which currently are using the waiter for systemd device units but that doesn't really prevent from races with udev device node recreation. Thus, these changes are complementary to the existing waiter which mainly has the purpose to wait for unmodified devices. For newly created RAIDs we can wait for the new node to be available as udev will not recreate it. Note: This is a port for Ignition 0.35 while for 2.2 this also should be done for LUKS.
pothos
force-pushed
the
kai/udev-race
branch
from
February 21, 2022 10:10
4e52e2e
to
69eb3a8
Compare
pothos
added a commit
to flatcar-archive/coreos-overlay
that referenced
this pull request
Feb 21, 2022
This pulls in flatcar/ignition#35 to prevent boot failures such as fsck running while udev was still processing the disk changes, and thus failing when the /dev/disk/ symlink is shortly gone.
pothos
added a commit
to flatcar-archive/coreos-overlay
that referenced
this pull request
Feb 21, 2022
This pulls in flatcar/ignition#35 to prevent boot failures such as fsck running while udev was still processing the disk changes, and thus failing when the /dev/disk/ symlink is shortly gone.
pothos
added a commit
to flatcar-archive/coreos-overlay
that referenced
this pull request
Feb 21, 2022
This pulls in flatcar/ignition#35 to prevent boot failures such as fsck running while udev was still processing the disk changes, and thus failing when the /dev/disk/ symlink is shortly gone.
pothos
added a commit
to flatcar-archive/coreos-overlay
that referenced
this pull request
Feb 21, 2022
This pulls in flatcar/ignition#35 to prevent boot failures such as fsck running while udev was still processing the disk changes, and thus failing when the /dev/disk/ symlink is shortly gone.
t-lo
pushed a commit
to flatcar/scripts
that referenced
this pull request
Apr 13, 2023
This pulls in flatcar/ignition#35 to prevent boot failures such as fsck running while udev was still processing the disk changes, and thus failing when the /dev/disk/ symlink is shortly gone.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The "udevadm settle" command used to wait for udev to process the disk
changes and recreate the entries under /dev was still prone to races
where udev didn't get notified yet of the final event to wait for.
This caused the boot with a btrfs root filesystem created by Ignition
to fail almost every time on certain hardware.
Issue tagged events and wait for them to be processed by udev. This is
actually meanigful in all stages not only for the other parts of the
initramfs which may be surprised by sudden device nodes disappearing
shortly like the case was with systemd's fsck service but also for the
inter-stage dependencies which currently are using the waiter for
systemd device units but that doesn't really prevent from races with
udev device node recreation. Thus, these changes are complementary to
the existing waiter which mainly has the purpose to wait for unmodified
devices. For newly created RAIDs we can wait for the new node to be
available as udev will not recreate it.
Note: This is a port for Ignition 0.35 while for 2.2 this also should
be done for LUKS.
How to use
Upstream it but maybe merge here early to solve the noisy test failures
Testing done
http://jenkins.infra.kinvolk.io:8080/job/os/job/manifest/4850/cldsv/
changelog/
directory (user-facing change, bug fix, security fix, update)↑ TODO in coreos-overlay