-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
osd: handle device name change and device removal correctly #11567
Conversation
This PR will be ready to review when I finish verifying this PR covers the following matrix.
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions. |
7ced084
to
144a3e3
Compare
I finished testing. So this PR is ready to review. test detail
environment
device name: kernel namedevice becomes missing
=> OK flip device names
=> OK
device name: persistent device name (/dev/disk/by-id/wwn-0x60022480e20a2701d91006b946030922)device becomes missing
=> OK flip device names
=> OK |
The original problem happened in host-based clusters. A similar problem exists in PVC-based clusters. In this case, if a PV, corresponds to an existing OSD, and points to a missing block device file, the OSD pod fails to consume this PV. Although this behavior is undesirable, I don't think this problem should be handled in Rook. If doing so, we must re-create the existing PV. Rook shouldn't do such work. |
If a kernel device name change happens and a block device file in the OSD directory becomes dangling link, this OSD fails to start continuously. This problem can be resolved by confirming the validity of the device file and recreating it if necessary. The original problem happened in host-based clusters. A similar problem exists in PVC-based clusters. In this case, if a PV, corresponds to an existing OSD, and points to a missing block device file, the OSD pod fails to consume this PV. Although this behavior is undesirable, I don't think this problem should be handled in Rook. If doing so, we must re-create the existing PV. Rook shouldn't do such work. Closes: rook#10860 Signed-off-by: Satoru Takeuchi <satoru.takeuchi@gmail.com>
144a3e3
to
3a44a39
Compare
multi-cluster-mirroring test fails consistently as described in #11742 |
# If a kernel device name change happens and a block device file | ||
# in the OSD directory becomes missing, this OSD fails to start | ||
# continuously. This problem can be resolved by confirming | ||
# the validity of the device file and recreating it if necessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The re-creating of the path is done by ceph-volume, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's correct.
osd: handle device name change and device removel correctly (backport #11567)
osd: handle device name change and device removel correctly (backport #11567)
Description of your changes:
If a kernel device name change happens and a block device file in the OSD directory becomes dangling link, this OSD fails to start continuously. This problem can be resolved by confirming the validity of the device file and recreating it if necessary.
Which issue is resolved by this Pull Request:
Resolves #10860
Checklist:
skip-ci
on the PR.