-
Notifications
You must be signed in to change notification settings - Fork 39k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Graceful termination fails on terminationGracePeriodSeconds > 2 minutes #31219
Comments
Same issue here on Kubernetes 1.3.3. We have terminationGracePeriodSeconds set to several hours in order to successfully complete data processiong. The pod's IP field gets empty value after entering Terminating state and the ifconfig command lists only the loopback interface within container. In the Kubernetes 1.2.0 the networking remains available while the pod is in terminating state. |
Yes, the request timeout should be greater than the termination grace period. We should fix this. /cc @kubernetes/sig-node |
@pracucci Thanks for reporting the issue with such detail debugging information. @Random-Liu Could you please take a look? |
@yujuhong @dchen1107 @Random-Liu may i take a stab at it? |
When terminationGracePeriodSeconds is set to > 2 minutes (which is the default request timeout), ContainerStop() times out at 2 minutes. We should check the timeout being passed in and bump up the request timeout if needed. Fixes kubernetes#31219
@dims no objection from me. |
Yeah, this is a bug. @dims Thanks a lot for fixing it! :) |
@yujuhong @dchen1107 This is introduced in v1.3, do we need a cherry-pick? |
@dims thanks for fixing the issue. I don't think we are going to cut another patch solely for this bug based on our patch policy. But @Random-Liu or @dims could one of you send a cherrypick pr against 1.3 branch. So that we could include the fix if we cut another patch release for 1.3? Also @Random-Liu can we add a node-e2e test to make sure we won't have such regression in the future? Thanks! |
When terminationGracePeriodSeconds is set to > 2 minutes (which is the default request timeout), ContainerStop() times out at 2 minutes. We should check the timeout being passed in and bump up the request timeout if needed. Fixes kubernetes#31219
@dchen1107 @Random-Liu : here's the PR for 1.3 #31279 |
Automatic merge from submit-queue Increase request timeout based on termination grace period When terminationGracePeriodSeconds is set to > 2 minutes (which is the default request timeout), ContainerStop() times out at 2 minutes. We should check the timeout being passed in and bump up the request timeout if needed. Fixes #31219
When terminationGracePeriodSeconds is set to > 2 minutes (which is the default request timeout), ContainerStop() times out at 2 minutes. We should check the timeout being passed in and bump up the request timeout if needed. Fixes kubernetes#31219
Kubernetes version: 1.3.5
I've a pod with
terminationGracePeriodSeconds
set to600
(10 minutes). When I delete the pod, if it takes more than 2 minutes to shutdown, then weird things happen (like the networking stops working after 2 minutes it's in theTerminating
state). After digging a big in the Kubernetes sources, I do believe I've individuated the root cause. Please see my report below.Extract from the node's syslog:
When a container should be killed, the
killContainer()
(manager.go
) is called. At some point it does:Looking at
StopContainer()
(kube_docker_client.go
) you can see it does:Now, the
d.client.ContainerStop()
blocks until it completes the execution or the input timeout expires (the input timeout is the grace period - set to 600 seconds in my test).However, the
d.client
instance has adefaultTimeout
of 2 minutes, thus if the grace period is > 2 minutes then theContainerStop()
request times out before the grace period. If my analysis is correct, we should set a client timeout a bit higher than the input timeout (grace period), if the latter is > 2 minutes.What's your take?
The text was updated successfully, but these errors were encountered: