Automated cherry pick of #102059: Bump k8s.io/utils #101862: Retry detaching FibreChannel volume few times #102656

jsafrane · 2021-06-07T09:26:29Z

Cherry pick of #102059 #101862 on release-1.21.

#102059: Retry reading /proc/mounts when unable to get a consistent read
#101862: Retry detaching FibreChannel volume few times

For details on the cherry pick process, see the cherry pick requests page.

Note:

Both PRs are needed to fix a corrupted FibreChannel volume in a very rare corner case.
This brings k8s.io/utils master to k/k release-1.21 branch, which sounds scary, but in the end there were just one extra file k8s.io/utils/pointer/pointer.go changed.
Especially note commit "Regenerate vendor/", I had to run hack/update-vendor.sh to get everything from k8s.io/utils.
Aiming at 1.21, older releases would have more problematic vendor changes.

Release note manually copied from the PRs above:

Fixed very rare volume corruption when a pod is deleted while kubelet is offline.
Retry FibreChannel devices cleanup after error to ensure FC device is detached before it can be used on another node.

k8s-ci-robot · 2021-06-07T09:26:31Z

@jsafrane: This cherry pick PR is for a release branch and has not yet been approved by Release Managers.
Adding the do-not-merge/cherry-pick-not-approved label.

To merge this cherry pick, it must first be approved (/lgtm + /approve) by the relevant OWNERS.

AFTER it has been approved by code owners, please ping the kubernetes/release-managers team in a comment to request a cherry pick review.

(For details on the patch release process and schedule, see the Patch Releases page.)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

jsafrane · 2021-06-07T09:32:17Z

/kind bug

jsafrane · 2021-06-10T13:55:26Z

Updated k8s.io/utils to the same SHA as today's k/k master.

gnufied · 2021-06-10T14:30:27Z

/lgtm

dims · 2021-06-14T08:34:51Z

/assign @liggitt @thockin

(for root approval)

liggitt · 2021-06-14T14:47:16Z

I can confirm this change is limited to fc and iscsi volume plugins.

/approve
for dep update

k8s-ci-robot · 2021-06-14T14:47:43Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jsafrane, liggitt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [liggitt]
~~staging/src/k8s.io/legacy-cloud-providers/OWNERS~~ [liggitt]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

thockin

/lgtm

To get io/ConsistentRead updates.

We've seen clusters where 3 attempts were not enough. Bumping to 10. The slowdown should be negligible and it will reduce retry attempts in the upper layers of kubelet.

…ruction iSCSI and FC volume plugins do not implement real 3rd party attach/detach. If reconstruction fails with an error on a FC or iSCSI volume, it will not be unmounted from the volume global dir and at the same time it will be marked as unused, to be available to be mounted on another node. The volume can then be mounted on several nodes, resulting in volume corruption. The other block based volume plugins implement attach/detach that either makes the volume stuck (can't be detached) or will be force-detached from a node before attaching it somewhere else.

Move reporting of GetReliableMountRefs error to the volume plugins that have more context about severity of the error.

Run hack/update-vendor.sh to get the new file.

When UnmountDevice() of a FibreChannel volume fails after unmounting the device and before the device is fully cleaned up, subsequent UnmountDevice() retry won't find the device mounted and return without retrying the device cleanup. Therefore implement its own retry inside UnmountDevice() to make sure that the volume devices are either fully cleaned or the error is serius enough that even 1 minute of trying does not help.

gnufied · 2021-11-05T13:52:12Z

/lgtm

k8s-ci-robot added this to the v1.21 milestone Jun 7, 2021

k8s-ci-robot requested review from andrewsykim, andyzhangx and a team June 7, 2021 09:27

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. and removed do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Jun 7, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 10, 2021

k8s-ci-robot assigned thockin Jun 14, 2021

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 14, 2021

thockin reviewed Jun 14, 2021

View reviewed changes

jsafrane mentioned this pull request Jun 29, 2021

[3.11] Bug 1970977: UPSTREAM: multiple: Fix corruption of FibreChannel volumes openshift/origin#26222

Merged

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 9, 2021

jsafrane mentioned this pull request Jul 23, 2021

WIP: [3.11] Bug 1970977: UPSTREAM: multiple: Fix corruption of FibreChannel volumes openshift/origin#26346

Closed

jsafrane force-pushed the automated-cherry-pick-of-#102059-#101862-upstream-release-1.21 branch from aec7814 to ba5f038 Compare September 15, 2021 15:50

k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Sep 15, 2021

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 7, 2021

jsafrane force-pushed the automated-cherry-pick-of-#102059-#101862-upstream-release-1.21 branch from ba5f038 to 385024a Compare October 14, 2021 12:18

jsafrane added 6 commits November 2, 2021 13:24

Bump k8s.io/utils

38dfca8

To get io/ConsistentRead updates.

ConsistentRead tries 10 times

270d98a

We've seen clusters where 3 attempts were not enough. Bumping to 10. The slowdown should be negligible and it will reduce retry attempts in the upper layers of kubelet.

Move error reporting to volume plugins

455f159

Move reporting of GetReliableMountRefs error to the volume plugins that have more context about severity of the error.

Regenerate vendor/

89b4987

Run hack/update-vendor.sh to get the new file.

jsafrane force-pushed the automated-cherry-pick-of-#102059-#101862-upstream-release-1.21 branch from 385024a to b973d29 Compare November 2, 2021 12:50

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 2, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 5, 2021

justaugustus added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Nov 9, 2021

k8s-ci-robot removed the do-not-merge/cherry-pick-not-approved Indicates that a PR is not yet approved to merge into a release branch. label Nov 9, 2021

k8s-ci-robot merged commit c96b0a2 into kubernetes:release-1.21 Nov 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automated cherry pick of #102059: Bump k8s.io/utils #101862: Retry detaching FibreChannel volume few times #102656

Automated cherry pick of #102059: Bump k8s.io/utils #101862: Retry detaching FibreChannel volume few times #102656

jsafrane commented Jun 7, 2021 •

edited

k8s-ci-robot commented Jun 7, 2021

jsafrane commented Jun 7, 2021

jsafrane commented Jun 10, 2021

gnufied commented Jun 10, 2021

dims commented Jun 14, 2021

liggitt commented Jun 14, 2021

k8s-ci-robot commented Jun 14, 2021

thockin left a comment

gnufied commented Nov 5, 2021

Automated cherry pick of #102059: Bump k8s.io/utils #101862: Retry detaching FibreChannel volume few times #102656

Automated cherry pick of #102059: Bump k8s.io/utils #101862: Retry detaching FibreChannel volume few times #102656

Conversation

jsafrane commented Jun 7, 2021 • edited

k8s-ci-robot commented Jun 7, 2021

jsafrane commented Jun 7, 2021

jsafrane commented Jun 10, 2021

gnufied commented Jun 10, 2021

dims commented Jun 14, 2021

liggitt commented Jun 14, 2021

k8s-ci-robot commented Jun 14, 2021

thockin left a comment

Choose a reason for hiding this comment

gnufied commented Nov 5, 2021

jsafrane commented Jun 7, 2021 •

edited