Fix race condition between actual and desired state in kublet volume manager #75458

jingxu97 · 2019-03-19T00:18:45Z

This PR fixes the issue #75345. This fix modified the checking volume in
actual state when validating whether volume can be removed from desired state or not. Only if volume status is already mounted in actual state, it can be removed from desired state.
For the case of mounting fails always, it can still work because the
check also validate whether pod still exist in pod manager. In case of
mount fails, pod should be able to removed from pod manager so that
volume can also be removed from desired state.

manager This PR fixes the issue kubernetes#75345. This fix modified the checking volume in actual state when validating whether volume can be removed from desired state or not. Only if volume status is already mounted in actual state, it can be removed from desired state. For the case of mounting fails always, it can still work because the check also validate whether pod still exist in pod manager. In case of mount fails, pod should be able to removed from pod manager so that volume can also be removed from desired state.

k8s-ci-robot · 2019-03-19T00:18:45Z

@jingxu97: Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

jingxu97 · 2019-03-19T00:18:57Z

cc @msau42

gnufied · 2019-03-19T02:33:15Z

pkg/kubelet/volumemanager/populator/desired_state_of_world_populator.go

+		if !exists && podExists {
+			klog.V(4).Infof(
+				volumeToMount.GenerateMsgDetailed(fmt.Sprintf("Actual state has not yet has this volume mounted information and pod (%q) still exists in pod manager, skip removing volume from desired state",
+					format.Pod(volumeToMount.Pod)), ""))
 			continue


I have been thinking about how this interacts with reconstruction process. Reconstruction process runs after reconciliation loop has run and all pod sources has been synced. It considers volumes that are not mounted in ASOW and not present in DSOW.

So my main point is - I am thinking reconstruction process ideally, should have added back the volume to ASOW as mounted and that could have caused proper cleanup. @jingxu97 do you think that is accurate?

the full reconstruction process works as follows

wait for all sources are ready and desired state has synced once (that means all the pods that already exist before kubelet restarts should be added into the desired state after this stage)

reconciler starts reconstruction process once. It scans the directory and tries to get information about the volume. It first uses the volume name information to check whether desired state already has it or not. If desired state already has the volume information, skip reconstruction process so that this volume can be handled by reconciler later through desired and actual state.

after reconstruction process finishes, the reconciler starts its loops, and volume should be added into actual state through normal volume operations. (the mount operations should pass since mount already exists)

The problem is if pod is deleted immediately during or after kubelet restarts, desired state has volume information for a moment, but could be deleted very soon before actual state has the mount information. The previous check in this code is just checking whether volume exist in the actual state. This is not enough because volume could exist in actual state after it passes verify attach check, but it might not go through the mount operation yet. So the problem now is if actual state does not has mount information, desired state deletes the volume, unmount will not be triggered.

This PR changes to check whether the volume mount information already exists in the actual state. If not, it will prevent volume being deleted from the desired state so that it gives more time for actual state to update.

yeah, I understand what we are fixing. I am just thinking, we may have a potential race between regular reconciliation and reconstruction process.

If regular reconciliation has volume in DSW but not mounted in ASOW, unless reconstruction and removal of volume from DSW race with each other, reconstruction process should have fixed the volume.

Even if regular reconciliation process runs and removes the volume from DSW and volume is not mounted in ASW, the reconstruction process should have at least cleaned up the mount point.

mariantalla · 2019-03-19T13:15:06Z

/milestone v1.14

xmudrii · 2019-03-19T16:26:27Z

@jingxu97 @gnufied Hello. I'd like to remind that Code Thaw is starting today PST EOD. We'd like to merge this before that. Is it possible to approve this PR soon?

spiffxp · 2019-03-19T17:19:26Z

/priority important-soon

spiffxp · 2019-03-19T18:08:31Z

/milestone clear
Discussed offline with @saad-ali and @msau42, this has been around for a while, it's not pressing that this land as part of v1.14.0

saad-ali

/approve

I agree with @gnufied that reconstruction should add to ASOW not DSOW. But, I spoke with @jingxu97 and she explained that the reason the reconstruction logic adds to DSOW instead of ASOW is because reconstruction has incomplete information about the volumes. If the volume hasn't been deleted, it relies on the the normal reconciler to eventually reconstruct the volume in to ASOW. This of course can take time and volume can get deleted from DSOW before the mount operation happens, which results in the mount not getting cleaned up. Her fix here is to delay the removal from DSOW until mount happens on ASOW.

It fixes the race, by allowing more time for the ASOW to do mount (it will wait until the pod is actually deleted -- which I believe is pod termination timeout). So I think the fix is fine.

Taking a step back the volume reconstruction logic really should be fixed. I opened #75484 to track that.

Furthermore, since this regression has existed before the 1.14 release, it can wait to merge after 1.14.0 release.

k8s-ci-robot · 2019-03-19T18:33:16Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jingxu97, saad-ali

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/kubelet/volumemanager/OWNERS~~ [saad-ali]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

mariantalla · 2019-03-19T19:11:25Z

Discussed offline with @saad-ali and @msau42, this has been around for a while, it's not pressing that this land as part of v1.14.0

Sure; however if work is done, shall we let it merge anyways? My angle is that it will help stabilize upgrade dashboards, which are the flakiest at the moment.

mariantalla · 2019-03-20T11:24:01Z

👋 Poking on this again. Are you happy for it to be merged while #75484 is worked on?

cc @msau42 @jingxu97

nikopen · 2019-03-21T15:06:32Z

this can be merged to master no problemo, 1.14 is branched

nikopen · 2019-03-21T15:06:55Z

/lgtm

mariantalla · 2019-03-25T14:06:52Z

@jingxu97 any release notes for this one? I think it's the only bit missing for it to be merged.

jingxu97 · 2019-03-25T18:51:51Z

@mariantalla no release note is needed. I added the tag. Thanks!

cofyc · 2019-04-21T07:31:02Z

Can this be cherry-picked into 1.13 and 1.14?

tpepper · 2019-05-02T17:40:17Z

Cherry pick on 1.14 is #75458

I don't see one on 1.13.

@jingxu97 are you using the cherry pick automation as per: https://git.k8s.io/community/contributors/devel/sig-release/cherry-picks.md

jingxu97 · 2019-05-02T17:54:26Z

cherrypick 1.14: #76980
cherrypick 1.13: #77351

jingxu97 added kind/bug Categorizes issue or PR as related to a bug. sig/storage Categorizes an issue or PR as relevant to SIG Storage. labels Mar 19, 2019

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 19, 2019

k8s-ci-robot requested review from davidz627 and gnufied March 19, 2019 00:19

k8s-ci-robot added area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. labels Mar 19, 2019

jingxu97 mentioned this pull request Mar 19, 2019

Volumes fail to clean up when kubelet restart due to race between actual and desired state #75345

Closed

gnufied reviewed Mar 19, 2019

View reviewed changes

k8s-ci-robot added this to the v1.14 milestone Mar 19, 2019

mariantalla mentioned this pull request Mar 19, 2019

[Flaky test] When kubelet restarts Should test that a volume mounted to a pod that is deleted while the kubelet is down unmounts when the kubelet returns #75328

Closed

k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 19, 2019

k8s-ci-robot removed this from the v1.14 milestone Mar 19, 2019

saad-ali mentioned this pull request Mar 19, 2019

Simplify Kubelet Volume Reconstruction #75484

Closed

saad-ali reviewed Mar 19, 2019

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 19, 2019

k8s-ci-robot assigned nikopen Mar 21, 2019

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 21, 2019

jingxu97 mentioned this pull request Mar 22, 2019

Avoid volume unexpected deleted from DSW by DSWP when kubelet start #72852

Closed

nikopen mentioned this pull request Mar 24, 2019

[Failing Test] [sig-storage] subPath should unmount if pod is gracefully deleted while kubelet is down #75643

Closed

jingxu97 added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Mar 25, 2019

k8s-ci-robot merged commit 428a8e0 into kubernetes:master Mar 25, 2019

KubeStacker mentioned this pull request Apr 17, 2019

Pod is stuck in Terminating status forever after Kubelet restart #72604

Closed

cofyc mentioned this pull request Apr 20, 2019

[Flaking test] CSI Volumes [Driver: pd.csi.storage.gke.io][Serial] [Testpattern: Dynamic PV (default fs)] subPath should unmount if pod is gracefully deleted while kubelet is down #75326

Closed

mattjmcnaughton mentioned this pull request May 3, 2019

Fix race condition between actual and desired state in kublet volume … #77351

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix race condition between actual and desired state in kublet volume manager #75458

Fix race condition between actual and desired state in kublet volume manager #75458

jingxu97 commented Mar 19, 2019

k8s-ci-robot commented Mar 19, 2019

jingxu97 commented Mar 19, 2019

gnufied Mar 19, 2019

gnufied Mar 19, 2019

jingxu97 Mar 19, 2019

gnufied Mar 19, 2019

mariantalla commented Mar 19, 2019

xmudrii commented Mar 19, 2019

spiffxp commented Mar 19, 2019

spiffxp commented Mar 19, 2019

saad-ali left a comment •

edited

Loading

k8s-ci-robot commented Mar 19, 2019

mariantalla commented Mar 19, 2019 •

edited

Loading

mariantalla commented Mar 20, 2019

nikopen commented Mar 21, 2019

nikopen commented Mar 21, 2019

mariantalla commented Mar 25, 2019

jingxu97 commented Mar 25, 2019

cofyc commented Apr 21, 2019

tpepper commented May 2, 2019

jingxu97 commented May 2, 2019

Fix race condition between actual and desired state in kublet volume manager #75458

Fix race condition between actual and desired state in kublet volume manager #75458

Conversation

jingxu97 commented Mar 19, 2019

k8s-ci-robot commented Mar 19, 2019

jingxu97 commented Mar 19, 2019

gnufied Mar 19, 2019

Choose a reason for hiding this comment

gnufied Mar 19, 2019

Choose a reason for hiding this comment

jingxu97 Mar 19, 2019

Choose a reason for hiding this comment

gnufied Mar 19, 2019

Choose a reason for hiding this comment

mariantalla commented Mar 19, 2019

xmudrii commented Mar 19, 2019

spiffxp commented Mar 19, 2019

spiffxp commented Mar 19, 2019

saad-ali left a comment • edited Loading

Choose a reason for hiding this comment

k8s-ci-robot commented Mar 19, 2019

mariantalla commented Mar 19, 2019 • edited Loading

mariantalla commented Mar 20, 2019

nikopen commented Mar 21, 2019

nikopen commented Mar 21, 2019

mariantalla commented Mar 25, 2019

jingxu97 commented Mar 25, 2019

cofyc commented Apr 21, 2019

tpepper commented May 2, 2019

jingxu97 commented May 2, 2019

saad-ali left a comment •

edited

Loading

mariantalla commented Mar 19, 2019 •

edited

Loading