Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ws-manager] Don't stop workspace too early #5688

Closed
wants to merge 1 commit into from

Conversation

geropl
Copy link
Member

@geropl geropl commented Sep 14, 2021

Description

Related Issue(s)

Fixes #

How to test

Release Notes

@roboquat
Copy link
Contributor

@geropl: Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@roboquat
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign csweichel after the PR has been reviewed.
You can assign the PR to them by writing /assign @csweichel in a comment when ready.

No associated issue. Update pull-request body to add a reference to an issue, or get approval with /approve no-issue

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@codecov
Copy link

codecov bot commented Sep 14, 2021

Codecov Report

Merging #5688 (9066dc6) into main (b01132a) will increase coverage by 19.34%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff             @@
##             main    #5688       +/-   ##
===========================================
+ Coverage   19.04%   38.39%   +19.34%     
===========================================
  Files           2       12       +10     
  Lines         168     3594     +3426     
===========================================
+ Hits           32     1380     +1348     
- Misses        134     2097     +1963     
- Partials        2      117      +115     
Flag Coverage Δ
components-local-app-app-linux ?
components-local-app-app-windows ?
components-ws-manager-app 38.39% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
components/ws-manager/pkg/manager/monitor.go 8.27% <ø> (ø)
components/local-app/pkg/auth/pkce.go
components/local-app/pkg/auth/auth.go
...s/ws-manager/pkg/manager/internal/grpcpool/pool.go 74.46% <0.00%> (ø)
components/ws-manager/pkg/manager/status.go 73.33% <0.00%> (ø)
components/ws-manager/pkg/manager/annotations.go 65.11% <0.00%> (ø)
components/ws-manager/pkg/manager/manager.go 23.80% <0.00%> (ø)
...-manager/pkg/manager/internal/workpool/workpool.go 100.00% <0.00%> (ø)
components/ws-manager/pkg/manager/manager_ee.go 0.00% <0.00%> (ø)
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b01132a...9066dc6. Read the comment docs.

} else {
// add an additional wait time on top of a deletionGracePeriod
// to make sure the changes propagate on the data plane.
var gracePeriod int64 = 30
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@csweichel Is there a reason why we no longer wait for terminated but instead have a timeout here? To me this feels overly brittle, and the go func... time.Sleep might cause out-of-order problems if I'm not mistaken..? Might this is even the cause for: #5689

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might indeed be the cause. The underlying issue that the unmount of the mark mount in ws-daemon happens during content finalization. Without this unmount the workspace container never stops because containerd cannot unmount the rootfs, hence we won't trigger the finalization in the first place.

There are two ways we can break this circle:

  1. the solution that's currently implemented where ws-manager forces workspace content finalisation if we've exceeded the grace period
  2. a ws-daemon dispatch based method that functions much like the former containerd workaround, except that does the unmount rather than force a workspace stop on the API plane.

We opted for the former because it was easier/less complex to implement. It would seem that it hard other adverse side effects though.

If we merged the change you're proposing here, we'd be re-introducing the "stuck in stopping" issue if ws-daemon restarts.

@stale
Copy link

stale bot commented Sep 28, 2021

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the meta: stale This issue/PR is stale and will be closed soon label Sep 28, 2021
@geropl
Copy link
Member Author

geropl commented Sep 29, 2021

Superseded by: #5897

@geropl geropl closed this Sep 29, 2021
@geropl geropl deleted the gpl/workspace-stopping branch December 7, 2021 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants