-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ws-manager] Don't stop workspace too early #5688
Conversation
@geropl: Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: No associated issue. Update pull-request body to add a reference to an issue, or get approval with The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Codecov Report
@@ Coverage Diff @@
## main #5688 +/- ##
===========================================
+ Coverage 19.04% 38.39% +19.34%
===========================================
Files 2 12 +10
Lines 168 3594 +3426
===========================================
+ Hits 32 1380 +1348
- Misses 134 2097 +1963
- Partials 2 117 +115
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
} else { | ||
// add an additional wait time on top of a deletionGracePeriod | ||
// to make sure the changes propagate on the data plane. | ||
var gracePeriod int64 = 30 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@csweichel Is there a reason why we no longer wait for terminated
but instead have a timeout here? To me this feels overly brittle, and the go func... time.Sleep
might cause out-of-order problems if I'm not mistaken..? Might this is even the cause for: #5689
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might indeed be the cause. The underlying issue that the unmount of the mark mount in ws-daemon happens during content finalization. Without this unmount the workspace container never stops because containerd cannot unmount the rootfs, hence we won't trigger the finalization in the first place.
There are two ways we can break this circle:
- the solution that's currently implemented where ws-manager forces workspace content finalisation if we've exceeded the grace period
- a ws-daemon dispatch based method that functions much like the former containerd workaround, except that does the unmount rather than force a workspace stop on the API plane.
We opted for the former because it was easier/less complex to implement. It would seem that it hard other adverse side effects though.
If we merged the change you're proposing here, we'd be re-introducing the "stuck in stopping" issue if ws-daemon restarts.
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Superseded by: #5897 |
Description
Related Issue(s)
Fixes #
How to test
Release Notes