Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal/exec/stages/disks: prevent races with udev #35

Merged
merged 1 commit into from
Feb 21, 2022

Conversation

pothos
Copy link
Member

@pothos pothos commented Feb 11, 2022

The "udevadm settle" command used to wait for udev to process the disk
changes and recreate the entries under /dev was still prone to races
where udev didn't get notified yet of the final event to wait for.
This caused the boot with a btrfs root filesystem created by Ignition
to fail almost every time on certain hardware.

Issue tagged events and wait for them to be processed by udev. This is
actually meanigful in all stages not only for the other parts of the
initramfs which may be surprised by sudden device nodes disappearing
shortly like the case was with systemd's fsck service but also for the
inter-stage dependencies which currently are using the waiter for
systemd device units but that doesn't really prevent from races with
udev device node recreation. Thus, these changes are complementary to
the existing waiter which mainly has the purpose to wait for unmodified
devices. For newly created RAIDs we can wait for the new node to be
available as udev will not recreate it.
Note: This is a port for Ignition 0.35 while for 2.2 this also should
be done for LUKS.

How to use

Upstream it but maybe merge here early to solve the noisy test failures

Testing done

http://jenkins.infra.kinvolk.io:8080/job/os/job/manifest/4850/cldsv/

  • Changelog entries added in the respective changelog/ directory (user-facing change, bug fix, security fix, update)
    ↑ TODO in coreos-overlay

@pothos pothos force-pushed the kai/udev-race branch 4 times, most recently from 3ef74a4 to efa39f9 Compare February 16, 2022 10:30
@pothos pothos changed the title wip: try to prevent races by issuing and waiting for tagged uevents internal/exec/stages/disks: prevent races with udev Feb 16, 2022
@pothos pothos marked this pull request as ready for review February 16, 2022 10:32
@pothos pothos requested a review from a team February 16, 2022 10:32
pothos added a commit to flatcar-archive/coreos-overlay that referenced this pull request Feb 16, 2022
This pulls in
flatcar/ignition#35
to prevent boot failures such as fsck running while udev was still
processing the disk changes, and thus failing when the /dev/disk/
symlink is shortly gone.
Copy link
Member

@jepio jepio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK to me, but would appreciate a second pair of eyes looking at this.

Copy link
Member

@krnowak krnowak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

internal/exec/stages/disks/disks.go Outdated Show resolved Hide resolved
The "udevadm settle" command used to wait for udev to process the disk
changes and recreate the entries under /dev was still prone to races
where udev didn't get notified yet of the final event to wait for.
This caused the boot with a btrfs root filesystem created by Ignition
to fail almost every time on certain hardware.

Issue tagged events and wait for them to be processed by udev. This is
actually meanigful in all stages not only for the other parts of the
initramfs which may be surprised by sudden device nodes disappearing
shortly like the case was with systemd's fsck service but also for the
inter-stage dependencies which currently are using the waiter for
systemd device units but that doesn't really prevent from races with
udev device node recreation. Thus, these changes are complementary to
the existing waiter which mainly has the purpose to wait for unmodified
devices. For newly created RAIDs we can wait for the new node to be
available as udev will not recreate it.
Note: This is a port for Ignition 0.35 while for 2.2 this also should
be done for LUKS.
@pothos pothos merged commit de4e6cc into flatcar-master Feb 21, 2022
@pothos pothos deleted the kai/udev-race branch February 21, 2022 10:10
pothos added a commit to flatcar-archive/coreos-overlay that referenced this pull request Feb 21, 2022
This pulls in
flatcar/ignition#35
to prevent boot failures such as fsck running while udev was still
processing the disk changes, and thus failing when the /dev/disk/
symlink is shortly gone.
pothos added a commit to flatcar-archive/coreos-overlay that referenced this pull request Feb 21, 2022
This pulls in
flatcar/ignition#35
to prevent boot failures such as fsck running while udev was still
processing the disk changes, and thus failing when the /dev/disk/
symlink is shortly gone.
pothos added a commit to flatcar-archive/coreos-overlay that referenced this pull request Feb 21, 2022
This pulls in
flatcar/ignition#35
to prevent boot failures such as fsck running while udev was still
processing the disk changes, and thus failing when the /dev/disk/
symlink is shortly gone.
pothos added a commit to flatcar-archive/coreos-overlay that referenced this pull request Feb 21, 2022
This pulls in
flatcar/ignition#35
to prevent boot failures such as fsck running while udev was still
processing the disk changes, and thus failing when the /dev/disk/
symlink is shortly gone.
t-lo pushed a commit to flatcar/scripts that referenced this pull request Apr 13, 2023
This pulls in
flatcar/ignition#35
to prevent boot failures such as fsck running while udev was still
processing the disk changes, and thus failing when the /dev/disk/
symlink is shortly gone.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants