-
Notifications
You must be signed in to change notification settings - Fork 259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wcow: support graceful termination of servercore containers #1416
Conversation
Could you clarify what the problem with this is? I would think getting stats on an exited container (so long as it's not cleaned up) would be supported.
I wonder if this is a bug in our cri? Do you know if that's the case?
Did you mean 30 seconds? Do we know if this will impact container users if they rely on this 5 second timeout? |
|
8be7adb
to
3d06063
Compare
A few notes on Git commit style:
Taken together, you might consider a commit description like the following:
There's a lot of guidance on the internet for how to write a good Git commit. One guide I like is here. Note that here we are not super prescriptive on the exact line length to use. :) |
This is probably worth explicitly calling out in the PR/commit description. If a customer (for whatever reason) explicitly depended on this |
Actually, @jsturtevant or @marosset interested in your opinions on if this will be a problem for customers. This change will fix graceful termination for servercore containers, but could have negative impact if a customer is relying on the default termination timeout in servercore. |
IMO there are a lot of downsides w/ relying on the default termination timeout in servercore (especially for K8s scenarios) and users are unhappy with current solution. |
I am not sure I understand why this is a breaking change. Currently, when crictl.exe stop is specified, the target pid is directly killed (that is SIGKILL is the only signal sent). The |
Sure I'll keep this in mind, thanks Kevin :) Also, do we follow a template for the PR description section? Does that section look ok? |
In Kubernetes it is a very common p
In Kubernetes it is common for pods to be stopped with a termination grace period. |
Discussed this with Kevin today and I understand how this could be a breaking change. I will make sure to call this out specifically. |
546f0b0
to
2bc36c2
Compare
a92326d
to
5127108
Compare
5127108
to
8143ca8
Compare
…OW containers Signed-off-by: Kirtana Ashok <Kirtana.Ashok@microsoft.com>
Signed-off-by: Kirtana Ashok <Kirtana.Ashok@microsoft.com>
Signed-off-by: Kirtana Ashok <Kirtana.Ashok@microsoft.com>
Fixed lint errors caused by spelling mistakes in hcsdoc_wcow.go and stopcontainer_test.go Signed-off-by: Kirtana Ashok <Kirtana.Ashok@microsoft.com>
0634782
to
23d8608
Compare
fmt.Println("Waiting for OS signal...") | ||
|
||
signalChannel := make(chan os.Signal) | ||
wait := make(chan int, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are just using a channel to wait on once, you can do chan struct{}
, and close(wait)
instead of sending to the channel. If you want to send/receive multiple times, you can still do chan struct{}
and use wait <- struct{}{}
to send. This is probably a little more efficient than sending an int
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big deal to change it here if you don't want to, just letting you know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure will keep this in mind
23d8608
to
0e7a600
Compare
Signed-off-by: Kirtana Ashok <Kirtana.Ashok@microsoft.com> Signed-off-by: Kirtana Ashok <Kirtana.Ashok@microsoft.com>
0e7a600
to
26f72d0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I've signed off on this PR. If there's any additional testing we can do before merging that would be ideal, though. Maybe you can run the upstream containerd integration tests with an updated shim binary? |
Sure will do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A small question, otherwise LGTM
…t#1416) * This commit includes the changes to enable graceful termination of WCOW containers Signed-off-by: Kirtana Ashok <Kirtana.Ashok@microsoft.com> * Added regression tests for nanoserver and servercore base images Signed-off-by: Kirtana Ashok <Kirtana.Ashok@microsoft.com> * Worked on Kevin's review comments Signed-off-by: Kirtana Ashok <Kirtana.Ashok@microsoft.com> * Fixed lint failures Fixed lint errors caused by spelling mistakes in hcsdoc_wcow.go and stopcontainer_test.go Signed-off-by: Kirtana Ashok <Kirtana.Ashok@microsoft.com> * Addresses Kevin's review comments Signed-off-by: Kirtana Ashok <Kirtana.Ashok@microsoft.com> Signed-off-by: Kirtana Ashok <Kirtana.Ashok@microsoft.com> Signed-off-by: Kirtana Ashok <Kirtana.Ashok@microsoft.com> Co-authored-by: Kirtana Ashok <Kirtana.Ashok@microsoft.com> (cherry picked from commit 5cfbc2a) Signed-off-by: Kirtana Ashok <Kirtana.Ashok@microsoft.com>
…t#1416) * This commit includes the changes to enable graceful termination of WCOW containers Signed-off-by: Kirtana Ashok <Kirtana.Ashok@microsoft.com> Signed-off-by: Kirtana Ashok <Kirtana.Ashok@microsoft.com> Co-authored-by: Kirtana Ashok <Kirtana.Ashok@microsoft.com> (cherry picked from commit 5cfbc2a) Signed-off-by: Kirtana Ashok <Kirtana.Ashok@microsoft.com>
…t#1416) * This commit includes the changes to enable graceful termination of WCOW containers Signed-off-by: Kirtana Ashok <Kirtana.Ashok@microsoft.com> Signed-off-by: Kirtana Ashok <Kirtana.Ashok@microsoft.com> Co-authored-by: Kirtana Ashok <Kirtana.Ashok@microsoft.com> (cherry picked from commit 5cfbc2a) Signed-off-by: Kirtana Ashok <Kirtana.Ashok@microsoft.com>
…t#1416) * This commit includes the changes to enable graceful termination of WCOW containers Signed-off-by: Kirtana Ashok <Kirtana.Ashok@microsoft.com> * Added regression tests for nanoserver and servercore base images Signed-off-by: Kirtana Ashok <Kirtana.Ashok@microsoft.com> * Worked on Kevin's review comments Signed-off-by: Kirtana Ashok <Kirtana.Ashok@microsoft.com> * Fixed lint failures Fixed lint errors caused by spelling mistakes in hcsdoc_wcow.go and stopcontainer_test.go Signed-off-by: Kirtana Ashok <Kirtana.Ashok@microsoft.com> * Addresses Kevin's review comments Signed-off-by: Kirtana Ashok <Kirtana.Ashok@microsoft.com> Signed-off-by: Kirtana Ashok <Kirtana.Ashok@microsoft.com> Signed-off-by: Kirtana Ashok <Kirtana.Ashok@microsoft.com> Co-authored-by: Kirtana Ashok <Kirtana.Ashok@microsoft.com>
Graceful termination was already supported for nanoserver containers,
but those based on servercore still use a 5 second timeout which is
controlled by Windows. The reason behind this behavior is due to the
different implementation of SrvEndTask() in Servercore image which is
as follows:
(1) Deliver CTRL_SHUTDOWN_EVENT to the appropriate process
(2) Wait for 'WaitToKillServiceTimeout' amount of time (5 seconds by default) and kill the process before returning back to the caller.
This causes the containers with servercore base images to be killed in 5 seconds immaterial of the timeout specified with the container stop command.
We fix this by instead sending the termination signal for WCOW on a
background goroutine, and then returning immediately without waiting
for 'WaitToKillServiceTimeout' to expire.
In the future, if the underlying issue in Windows is fixed, this change could be reverted.