-
Notifications
You must be signed in to change notification settings - Fork 39k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race condition between actual and desired state in kublet volume manager #75458
Fix race condition between actual and desired state in kublet volume manager #75458
Conversation
manager This PR fixes the issue kubernetes#75345. This fix modified the checking volume in actual state when validating whether volume can be removed from desired state or not. Only if volume status is already mounted in actual state, it can be removed from desired state. For the case of mounting fails always, it can still work because the check also validate whether pod still exist in pod manager. In case of mount fails, pod should be able to removed from pod manager so that volume can also be removed from desired state.
@jingxu97: Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
cc @msau42 |
if !exists && podExists { | ||
klog.V(4).Infof( | ||
volumeToMount.GenerateMsgDetailed(fmt.Sprintf("Actual state has not yet has this volume mounted information and pod (%q) still exists in pod manager, skip removing volume from desired state", | ||
format.Pod(volumeToMount.Pod)), "")) | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have been thinking about how this interacts with reconstruction process. Reconstruction process runs after reconciliation loop has run and all pod sources has been synced. It considers volumes that are not mounted in ASOW and not present in DSOW.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So my main point is - I am thinking reconstruction process ideally, should have added back the volume to ASOW as mounted and that could have caused proper cleanup. @jingxu97 do you think that is accurate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the full reconstruction process works as follows
- wait for all sources are ready and desired state has synced once (that means all the pods that already exist before kubelet restarts should be added into the desired state after this stage)
- reconciler starts reconstruction process once. It scans the directory and tries to get information about the volume. It first uses the volume name information to check whether desired state already has it or not. If desired state already has the volume information, skip reconstruction process so that this volume can be handled by reconciler later through desired and actual state.
- after reconstruction process finishes, the reconciler starts its loops, and volume should be added into actual state through normal volume operations. (the mount operations should pass since mount already exists)
The problem is if pod is deleted immediately during or after kubelet restarts, desired state has volume information for a moment, but could be deleted very soon before actual state has the mount information. The previous check in this code is just checking whether volume exist in the actual state. This is not enough because volume could exist in actual state after it passes verify attach check, but it might not go through the mount operation yet. So the problem now is if actual state does not has mount information, desired state deletes the volume, unmount will not be triggered.
This PR changes to check whether the volume mount information already exists in the actual state. If not, it will prevent volume being deleted from the desired state so that it gives more time for actual state to update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I understand what we are fixing. I am just thinking, we may have a potential race between regular reconciliation and reconstruction process.
- If regular reconciliation has volume in DSW but not mounted in ASOW, unless reconstruction and removal of volume from DSW race with each other, reconstruction process should have fixed the volume.
- Even if regular reconciliation process runs and removes the volume from DSW and volume is not mounted in ASW, the reconstruction process should have at least cleaned up the mount point.
/milestone v1.14 |
/priority important-soon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
I agree with @gnufied that reconstruction should add to ASOW not DSOW. But, I spoke with @jingxu97 and she explained that the reason the reconstruction logic adds to DSOW instead of ASOW is because reconstruction has incomplete information about the volumes. If the volume hasn't been deleted, it relies on the the normal reconciler to eventually reconstruct the volume in to ASOW. This of course can take time and volume can get deleted from DSOW before the mount operation happens, which results in the mount not getting cleaned up. Her fix here is to delay the removal from DSOW until mount happens on ASOW.
It fixes the race, by allowing more time for the ASOW to do mount (it will wait until the pod is actually deleted -- which I believe is pod termination timeout). So I think the fix is fine.
Taking a step back the volume reconstruction logic really should be fixed. I opened #75484 to track that.
Furthermore, since this regression has existed before the 1.14 release, it can wait to merge after 1.14.0 release.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jingxu97, saad-ali The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
this can be merged to master no problemo, 1.14 is branched |
/lgtm |
@jingxu97 any release notes for this one? I think it's the only bit missing for it to be merged. |
@mariantalla no release note is needed. I added the tag. Thanks! |
Can this be cherry-picked into 1.13 and 1.14? |
Cherry pick on 1.14 is #75458 I don't see one on 1.13. @jingxu97 are you using the cherry pick automation as per: https://git.k8s.io/community/contributors/devel/sig-release/cherry-picks.md |
This PR fixes the issue #75345. This fix modified the checking volume in
actual state when validating whether volume can be removed from desired state or not. Only if volume status is already mounted in actual state, it can be removed from desired state.
For the case of mounting fails always, it can still work because the
check also validate whether pod still exist in pod manager. In case of
mount fails, pod should be able to removed from pod manager so that
volume can also be removed from desired state.