Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race condition between actual and desired state in kublet volume manager #75458

Merged

Conversation

jingxu97
Copy link
Contributor

This PR fixes the issue #75345. This fix modified the checking volume in
actual state when validating whether volume can be removed from desired state or not. Only if volume status is already mounted in actual state, it can be removed from desired state.
For the case of mounting fails always, it can still work because the
check also validate whether pod still exist in pod manager. In case of
mount fails, pod should be able to removed from pod manager so that
volume can also be removed from desired state.

manager

This PR fixes the issue kubernetes#75345. This fix modified the checking volume in
actual state when validating whether volume can be removed from desired state or not. Only if volume status is already mounted in actual state, it can be removed from desired state.
For the case of mounting fails always, it can still work because the
check also validate whether pod still exist in pod manager. In case of
mount fails, pod should be able to removed from pod manager so that
volume can also be removed from desired state.
@k8s-ci-robot
Copy link
Contributor

@jingxu97: Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jingxu97 jingxu97 added kind/bug Categorizes issue or PR as related to a bug. sig/storage Categorizes an issue or PR as relevant to SIG Storage. labels Mar 19, 2019
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 19, 2019
@jingxu97
Copy link
Contributor Author

cc @msau42

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 19, 2019
@k8s-ci-robot k8s-ci-robot added area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. labels Mar 19, 2019
if !exists && podExists {
klog.V(4).Infof(
volumeToMount.GenerateMsgDetailed(fmt.Sprintf("Actual state has not yet has this volume mounted information and pod (%q) still exists in pod manager, skip removing volume from desired state",
format.Pod(volumeToMount.Pod)), ""))
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been thinking about how this interacts with reconstruction process. Reconstruction process runs after reconciliation loop has run and all pod sources has been synced. It considers volumes that are not mounted in ASOW and not present in DSOW.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my main point is - I am thinking reconstruction process ideally, should have added back the volume to ASOW as mounted and that could have caused proper cleanup. @jingxu97 do you think that is accurate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the full reconstruction process works as follows

  1. wait for all sources are ready and desired state has synced once (that means all the pods that already exist before kubelet restarts should be added into the desired state after this stage)
  2. reconciler starts reconstruction process once. It scans the directory and tries to get information about the volume. It first uses the volume name information to check whether desired state already has it or not. If desired state already has the volume information, skip reconstruction process so that this volume can be handled by reconciler later through desired and actual state.
  3. after reconstruction process finishes, the reconciler starts its loops, and volume should be added into actual state through normal volume operations. (the mount operations should pass since mount already exists)

The problem is if pod is deleted immediately during or after kubelet restarts, desired state has volume information for a moment, but could be deleted very soon before actual state has the mount information. The previous check in this code is just checking whether volume exist in the actual state. This is not enough because volume could exist in actual state after it passes verify attach check, but it might not go through the mount operation yet. So the problem now is if actual state does not has mount information, desired state deletes the volume, unmount will not be triggered.

This PR changes to check whether the volume mount information already exists in the actual state. If not, it will prevent volume being deleted from the desired state so that it gives more time for actual state to update.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I understand what we are fixing. I am just thinking, we may have a potential race between regular reconciliation and reconstruction process.

  1. If regular reconciliation has volume in DSW but not mounted in ASOW, unless reconstruction and removal of volume from DSW race with each other, reconstruction process should have fixed the volume.
  2. Even if regular reconciliation process runs and removes the volume from DSW and volume is not mounted in ASW, the reconstruction process should have at least cleaned up the mount point.

@mariantalla
Copy link
Contributor

/milestone v1.14

@xmudrii
Copy link
Member

xmudrii commented Mar 19, 2019

@jingxu97 @gnufied Hello. I'd like to remind that Code Thaw is starting today PST EOD. We'd like to merge this before that. Is it possible to approve this PR soon?

@spiffxp
Copy link
Member

spiffxp commented Mar 19, 2019

/priority important-soon

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 19, 2019
@spiffxp
Copy link
Member

spiffxp commented Mar 19, 2019

/milestone clear
Discussed offline with @saad-ali and @msau42, this has been around for a while, it's not pressing that this land as part of v1.14.0

@k8s-ci-robot k8s-ci-robot removed this from the v1.14 milestone Mar 19, 2019
Copy link
Member

@saad-ali saad-ali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

I agree with @gnufied that reconstruction should add to ASOW not DSOW. But, I spoke with @jingxu97 and she explained that the reason the reconstruction logic adds to DSOW instead of ASOW is because reconstruction has incomplete information about the volumes. If the volume hasn't been deleted, it relies on the the normal reconciler to eventually reconstruct the volume in to ASOW. This of course can take time and volume can get deleted from DSOW before the mount operation happens, which results in the mount not getting cleaned up. Her fix here is to delay the removal from DSOW until mount happens on ASOW.

It fixes the race, by allowing more time for the ASOW to do mount (it will wait until the pod is actually deleted -- which I believe is pod termination timeout). So I think the fix is fine.

Taking a step back the volume reconstruction logic really should be fixed. I opened #75484 to track that.

Furthermore, since this regression has existed before the 1.14 release, it can wait to merge after 1.14.0 release.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jingxu97, saad-ali

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 19, 2019
@mariantalla
Copy link
Contributor

mariantalla commented Mar 19, 2019

Discussed offline with @saad-ali and @msau42, this has been around for a while, it's not pressing that this land as part of v1.14.0

Sure; however if work is done, shall we let it merge anyways? My angle is that it will help stabilize upgrade dashboards, which are the flakiest at the moment.

@mariantalla
Copy link
Contributor

👋 Poking on this again. Are you happy for it to be merged while #75484 is worked on?

cc @msau42 @jingxu97

@nikopen
Copy link
Contributor

nikopen commented Mar 21, 2019

this can be merged to master no problemo, 1.14 is branched

@nikopen
Copy link
Contributor

nikopen commented Mar 21, 2019

/lgtm

@mariantalla
Copy link
Contributor

@jingxu97 any release notes for this one? I think it's the only bit missing for it to be merged.

@jingxu97 jingxu97 added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Mar 25, 2019
@jingxu97
Copy link
Contributor Author

@mariantalla no release note is needed. I added the tag. Thanks!

@cofyc
Copy link
Member

cofyc commented Apr 21, 2019

Can this be cherry-picked into 1.13 and 1.14?

@tpepper
Copy link
Member

tpepper commented May 2, 2019

Cherry pick on 1.14 is #75458

I don't see one on 1.13.

@jingxu97 are you using the cherry pick automation as per: https://git.k8s.io/community/contributors/devel/sig-release/cherry-picks.md

@jingxu97
Copy link
Contributor Author

jingxu97 commented May 2, 2019

cherrypick 1.14: #76980
cherrypick 1.13: #77351

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants