Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

terminationGracePeriodSeconds greater than 10 minutes not working as expected #94435

Closed
wingman-chakra opened this issue Sep 2, 2020 · 19 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@wingman-chakra
Copy link

What happened:
Pod with termination grace period of 3 hours is getting killed 10 minutes after SIGTERM

What you expected to happen:
I was expecting the pod to get full 3 hours before SIGKILL is sent

How to reproduce it (as minimally and precisely as possible):
Long running process with termination grace period greater than 20 minutes and delete pod it will get deleted in 10 minutes

Anything else we need to know?:
From kubectl get events:
33m Normal ScaleDown pod/jarvis-6f9c9c79d6-d7vhr deleting pod for node scale down
33m Normal Killing pod/jarvis-6f9c9c79d6-d7vhr Stopping container jarvis
23m Warning FailedKillPod pod/jarvis-6f9c9c79d6-d7vhr error killing pod: failed to "KillContainer" for "jarvis" with KillContainerError: "rpc error: code = Unknown desc = operation timeout: context deadline exceeded"

Environment:

  • Kubernetes version (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.0", GitCommit:"e19964183377d0ec2052d1f1fa930c4d7575bd50", GitTreeState:"clean", BuildDate:"2020-08-26T14:30:33Z", GoVersion:"go1.15", Compiler:"gc", Platform:"linux/amd64"}
    Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.10-gke.42", GitCommit:"42bef28c2031a74fc68840fce56834ff7ea08518", GitTreeState:"clean", BuildDate:"2020-06-02T16:07:00Z", GoVersion:"go1.12.12b4", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration:
    GKE
  • OS (e.g: cat /etc/os-release):
    NAME="Ubuntu"
    VERSION="18.04.5 LTS (Bionic Beaver)"
    ID=ubuntu
    ID_LIKE=debian
    PRETTY_NAME="Ubuntu 18.04.5 LTS"
    VERSION_ID="18.04"
    HOME_URL="https://www.ubuntu.com/"
    SUPPORT_URL="https://help.ubuntu.com/"
    BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
    PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
    VERSION_CODENAME=bionic
    UBUNTU_CODENAME=bionic
  • Kernel (e.g. uname -a):
    Linux chakradarraju-lenovo 4.15.0-112-generic Literal configuration of multiple objects (was Service up/down scripts) #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools:
  • Network plugin and version (if this is a network-related bug):
  • Others:
@wingman-chakra wingman-chakra added the kind/bug Categorizes issue or PR as related to a bug. label Sep 2, 2020
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Sep 2, 2020
@wingman-chakra
Copy link
Author

/sig cluster-lifecycle

@k8s-ci-robot k8s-ci-robot added sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 2, 2020
@neolit123
Copy link
Member

/remove-sig cluster-lifecycle
/sig scheduling

@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. labels Sep 2, 2020
@ahg-g
Copy link
Member

ahg-g commented Sep 3, 2020

/remove-sig scheduling
/sig node

@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Sep 3, 2020
@k8s-ci-robot
Copy link
Contributor

@ahg-g: The label(s) sig/ cannot be applied, because the repository doesn't have them

In response to this:

/remove-sig scheduling
/sig node

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot removed the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Sep 3, 2020
@ahg-g
Copy link
Member

ahg-g commented Sep 3, 2020

/sig node

@pacoxu
Copy link
Member

pacoxu commented Sep 3, 2020

Can you provide the whole pod yaml?

My pod with terminationGracePeriodSeconds: 1800(30mins) is still terminating after 24 minutes.
image

And please check whether your pod is evicted or deleted by other process. And can you test a pod (command: sleep 3600 with terminationGracePeriodSeconds: 1800 )? Is it related with your image(command)?

@wingman-chakra
Copy link
Author

here is the yaml file for the production service that is facing this issue:
`apiVersion: v1
kind: Service
metadata:
name: jarvis
labels:
name: jarvis
spec:
type: ClusterIP
sessionAffinity: None
ports:

  • name: grpc
    protocol: TCP
    port: 50051
    targetPort: 50051
  • name: http
    port: 6653
    selector:
    app: jarvis

apiVersion: apps/v1
kind: Deployment
metadata:
name: jarvis
spec:
progressDeadlineSeconds: 10800
replicas: 3
selector:
matchLabels:
app: jarvis
template:
metadata:
name: jarvis
labels:
app: jarvis
spec:
nodeSelector:
pool: default
terminationGracePeriodSeconds: 10800
containers:
- name: jarvis
image: <insert_python_long_running_image>
imagePullPolicy: Always
env:
- name: ENV
value: k8s-prod
resources:
requests:
cpu: "900m"
memory: "4000Mi"
limits:
cpu: "1"
memory: "5000Mi"


apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: jarvis
spec:
minReplicas: 3
maxReplicas: 25
metrics:

  • external:
    metricName: kubernetes.io|node|cpu|total_cores
    metricSelector:
    matchLabels:
    metadata.user_labels.pool: large-recorder
    resource.labels.cluster_name: prod
    targetAverageValue: "12"
    type: External
    scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: jarvis
    `

I tried to recreate this with minimal config, but this seems to work fine:
apiVersion: apps/v1 kind: Deployment metadata: name: kill-test spec: replicas: 2 selector: matchLabels: app: kill-test template: metadata: name: kill-test labels: app: kill-test spec: nodeSelector: pool: default containers: - name: kill-test image: node command: - node - -e - "process.on('SIGTERM', () => { console.log('caught'); setTimeout(() => {console.log('Done'); process.exit(0);}, 11 * 60 * 1000); }); setInterval(() => console.log('beep'), 60 * 1000);" terminationGracePeriodSeconds: 1800

and I did see "Warning FailedKillPod" in my minimal setup too, so that is a red herring, sorry for that.

Let me know how I can go about debugging why is the pod getting killed 10 mins after SIGTERM, even though I've configured it to wait 3 hours.

@pacoxu
Copy link
Member

pacoxu commented Sep 3, 2020

Does the pod killed by HPA with a grace period of 10minutes ?
@wingman-chakra

I will test with hpa later.

@wingman-chakra
Copy link
Author

Yes, with HPA it gets killed in 10 minutes, I just noticed it does not happen consistently.

@srikaratstrings
Copy link

We did some digging. It looks like the nodes were getting killed by cluster autoscaler which only allows a maximum of 10 mins to gracefully shutdown. From reading through it sounds like there is no way to configure this limit.

@pacoxu
Copy link
Member

pacoxu commented Sep 4, 2020

not sure whether https://predictive-horizontal-pod-autoscaler.readthedocs.io/en/latest/user-guide/downscale-stabilization/ would help.

HPA support cool down delay
https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#support-for-cooldown-delay

Note: When tuning these parameter values, a cluster operator should be aware of the possible consequences. If the delay (cooldown) value is set too long, there could be complaints that the Horizontal Pod Autoscaler is not responsive to workload changes. However, if the delay value is set too short, the scale of the replicas set may keep thrashing as usual.

@shapeofarchitect
Copy link

shapeofarchitect commented Oct 9, 2020

@wingman-chakra @pacoxu I experienced the same exact Issue in our clusters too , We operate in Azure AKS and we run our automated builds in the pods , One of our QA long running tests were set to run for 3 Hours , I set the grace period for the pods to be 3 hours (10800) but right at 1 hour and 7 mins or 8 mins kubernetes send SIGTERM and graceful termination of the pod happens.

In my case we are using the Selenium Chrome image https://github.com/SeleniumHQ/docker-selenium , this uses supervisord process which is the main process that receives the SIGTERM issued by container runtime (Kubelet) , despite the grace period I set in , it seems the process got terminated and below is the script that runs at that time.

https://github.com/SeleniumHQ/docker-selenium/blob/1a3b0e1cd6d9eb3f2d3b91a5f26e160ab50fcd6b/Video/entry_point.sh

We could just never run this for longer than 1hour and 10 mins at the max too. We also don't use HPA in our setup so I assume even In normal case pods can't run more than 1 hour and some mins ? Is this the correct assumption for this ?

Appreciate your thoughts !

13:24:39.860 INFO [ActiveSessionFactory.lambda$apply$11] - Matched factory org.openqa.selenium.grid.session.remote.ServicedSession$Factory (provider: org.openqa.selenium.chrome.ChromeDriverService)
Starting ChromeDriver 85.0.4183.83 (94abc2237ae0c9a4cb5f035431c8adfb94324633-refs/branch-heads/4183@{#1658}) on port 12330
Only local connections are allowed.
Please see https://chromedriver.chromium.org/security-considerati[1602249879.880][SEVERE]: bind() foanisl efdo:r  Cannot assisgun requested address (99)
ggestions on keeping ChromeDriver safe.
ChromeDriver was started successfully.
13:24:40.501 INFO [ProtocolHandshake.createSession] - Detected dialect: W3C
13:24:40.503 INFO [RemoteSession$Factory.lambda$performHandshake$0] - Started new session 090f50045c92aa047ff05f3bc26bcebc (org.openqa.selenium.chrome.ChromeDriverService)
Trapped SIGTERM/SIGINT/x so shutting down supervisord...
2020-10-09 13:25:03,568 WARN received SIGTERM indicating exit request
2020-10-09 13:25:03,569 INFO waiting for xvfb, selenium-standalone to die
2020-10-09 13:25:03,570 INFO stopped: selenium-standalone (terminated by SIGTERM)
2020-10-09 13:25:03,570 INFO stopped: xvfb (terminated by SIGTERM)
Shutdown complete

@srikaratstrings
Copy link

srikaratstrings commented Oct 10, 2020 via email

@pacoxu
Copy link
Member

pacoxu commented Oct 10, 2020

@shapeofarchitect it seems that supervisord send kill signal to all its subprocesses, and all process are killed gracefully after 70 minutes.

If you want to add more time, you may try to add a preStop with 'sleep infinity', so that terminating grace period 3h will work

@shapeofarchitect
Copy link

shapeofarchitect commented Oct 10, 2020

Unfortunately in my case I can't directly use preStop hook as I have to use this in services container in our gitlab pipelines. Also it seems that I can't edit the pod yaml in the running container for lifecycle hooks.

But I am still unsure why does API server/kubelet even send SIGTERM. what if I just don't want container to die at all.

So appreciate any other suggestions to get this move forward.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 8, 2021
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 7, 2021
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests

8 participants