-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ws-daemon] Properly handle mark unmount #5897
Conversation
78dcf4a
to
5e17564
Compare
Codecov Report
@@ Coverage Diff @@
## main #5897 +/- ##
==========================================
+ Coverage 19.04% 22.40% +3.35%
==========================================
Files 2 11 +9
Lines 168 1933 +1765
==========================================
+ Hits 32 433 +401
- Misses 134 1442 +1308
- Partials 2 58 +56
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
@@ -363,22 +363,9 @@ func actOnPodEvent(ctx context.Context, m actingManager, status *api.WorkspaceSt | |||
_, gone := wso.Pod.Annotations[wsk8s.ContainerIsGoneAnnotation] | |||
|
|||
if terminated || gone { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
I might mix this up, but: This is basically the mechanism we had in place before the "stuck in stopping" weeks, just with "unmountMarkMount" instead of Just trying to completely understand this 🤔 |
Indeed, the mechanism is the same. Actually I went back in the Git history and pulled out the containerd workaround code :)
The key difference between then and now is that containerd behaves differently in this situation. It continuously tries to unmount the root filesystem, and eventually (once this mechanism has kicked in) succeeds.
💯 |
5e17564
to
c49dd22
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure this helps, but: LGTM 🙃
LGTM label has been added. Git tree hash: 7e8e635de31c41d0b4a65276bdb33ec9ff0d6665
|
c49dd22
to
61b283d
Compare
/werft run 👍 started the job as gitpod-build-cw-fix-5689.5 |
@csweichel I see two things with this change:
|
That makes sense as we now wait for containerd to realise that the rootfs was unmounted
Did it increment by two for one workspace stop? |
Yes |
Testing this I saw a single increase only. Beware that preview environments have a single ghost running as well, whose stop would be affected by this behaviour also. |
/lgtm |
LGTM label has been added. Git tree hash: 7b03f9fcb851847126a83209fa202093976f1dde
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aledbf, geropl Associated issue: #5689 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Description
This PR moves the mark unmount fallback back to ws-daemon. Prior to this change we'd try and finalise workspace content even before the pod was stopped. During content finalisation we'd try and unmount the mark mount that might have been propagated to ws-daemon during a restart. If that happened, the pod would never actually stop - hence we'd try to finalise even if the pod was not stopped yet.
This change pushes all this mark mount business back into ws-dameon using the
dispatch
mechanism. If a pod lingers around for longer than its termination grace period, we'll try and unmount the mark mount, eventually causing the pod to stop.Related Issue(s)
Fixes #5689
How to test
gitpod_ws_daemon_markunmountfallback_active_total
metric on ws-daemon should have incremented by one.Release Notes