Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

should not override KillPodOptions.PodTerminationGracePeriodSecondsOverride by pod spec.TerminationGracePeriodSeconds #109412

Closed
wants to merge 1 commit into from

Conversation

249043822
Copy link
Member

What type of PR is this?

/kind bug

What this PR does / why we need it:

Since the gracePeriodOverride in KillContainer is only used for killing container, not contains the preStop time, so we should not always set this param in podworker, because we may also lose the probe.TerminationGracePeriodSeconds.

if gracePeriodOverride != nil {

  1. pod deletion, TerminationGracePeriodSeconds is not correct
apiVersion: v1
kind: ReplicationController
metadata:
  name: busytest
  namespace: default
  labels:
    app: rctest
spec:
  replicas: 1
  selector:
    app: busytest
  template:
    metadata:
      labels:
        app: busytest
    spec:
      containers:
      - name: busy01
        image: docker.artsz.zte.com.cn/cci/usee/test/busybox:v1.0
        command:
        - sh
        - -c
        - sleep 3600s
        lifecycle:
            preStop:
              exec:
                command: [ "/bin/sleep", "10" ]
      terminationGracePeriodSeconds: 30
I0410 23:13:46.482969   25118 kuberuntime_container.go:614] "PreStop hook completed" pod="default/busytest-b6cbn" podUID=d9f1f5e4-5b3e-4661-8702-2f2f94cd70a5 containerName="busy01" containerID="containerd://87db50c7e467afff05744465b5f7fc036d7c1d6b0861c2480160eff053be5a8a"
I0410 23:13:46.483034   25118 kuberuntime_container.go:724] "Killing container with a grace period override" pod="default/busytest-b6cbn" podUID=d9f1f5e4-5b3e-4661-8702-2f2f94cd70a5 containerName="busy01" containerID="containerd://87db50c7e467afff05744465b5f7fc036d7c1d6b0861c2480160eff053be5a8a" gracePeriod=30
I0410 23:13:46.483060   25118 kuberuntime_container.go:728] "Killing container with a grace period" pod="default/busytest-b6cbn" podUID=d9f1f5e4-5b3e-4661-8702-2f2f94cd70a5 containerName="busy01" containerID="containerd://87db50c7e467afff05744465b5f7fc036d7c1d6b0861c2480160eff053be5a8a" gracePeriod=30

terminationGracePeriodSeconds is 30, and the hook takes 10 seconds to complete, and the Container should take 20 seconds to stop, but the log shows 30

  1. pod eviction, gracePeriodOverride is not correct
kind: Pod
apiVersion: v1
metadata:
  name: evictme
spec:
  restartPolicy: Never
  terminationGracePeriodSeconds: 10
  containers:
  - name: busybox
    image: k8s.gcr.io/e2e-test-images/nginx:1.14-2
    command: ["sh", "-c", "fallocate -l 4G file; sleep 100000"]
I0411 17:47:31.238192   23726 eviction_manager.go:350] "Eviction manager: must evict pod(s) to reclaim" resourceName="ephemeral-storage"
I0411 17:47:31.238827   23726 eviction_manager.go:368] "Eviction manager: pods ranked for eviction" pods=[default/evictme default/csi-hostpathplugin-0 default/nfs-server-674bcc7996-hsq7z kube-system/kube-controller-manager-10.235.30.82 kube-system/kube-apiserver-10.235.30.82 kube-system/kube-proxy-10.235.30.82 kube-system/kube-scheduler-10.235.30.82 default/my-csi-app kube-system/calico-node-7sdl5 kube-system/calico-kube-controllers-6cdf7cf687-89q4d kube-system/coredns-86c46b7bb8-9jv7x]
I0411 17:47:31.239343   23726 kuberuntime_container.go:728] "Killing container with a grace period override" pod="default/evictme" podUID=486791a9-9d06-4c08-8930-3583b5f34192 containerName="busybox" containerID="containerd://934833bdd381ad34f6a3bf6f25360c0888649cb1abef1496dd39db40b7e7edca" gracePeriod=10
I0411 17:47:31.239359   23726 kuberuntime_container.go:728] "Killing container with a grace period" pod="default/evictme" podUID=486791a9-9d06-4c08-8930-3583b5f34192 containerName="busybox" containerID="containerd://934833bdd381ad34f6a3bf6f25360c0888649cb1abef1496dd39db40b7e7edca" gracePeriod=10

pod is evicted, it should be killed with PodTerminationGracePeriodSecondsOverride 0, but the log shows 10

Which issue(s) this PR fixes:

Fixes ##109352

Special notes for your reviewer:

/cc @smarterclayton @rphillips

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


…erride by pod spec.TerminationGracePeriodSeconds
@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 11, 2022
@249043822
Copy link
Member Author

/retest

return gracePeriod, status.gracePeriod != 0 && status.gracePeriod != gracePeriod
}
}
// this value is bedrock truth - the apiserver owns telling us this value calculated by apiserver
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the apiserver owners tell us?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Contributor

@yangjunmyfm192085 yangjunmyfm192085 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 12, 2022
@ehashman
Copy link
Member

/hold

There have been a bunch of issues in flight in this area over the past few releases... wanted to link #98507 #107893 #102025

/cc @smarterclayton

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 12, 2022
@ehashman ehashman added this to Triage in SIG Node PR Triage Apr 12, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: 249043822, yanghesong
To complete the pull request process, please assign derekwaynecarr after the PR has been reviewed.
You can assign the PR to them by writing /assign @derekwaynecarr in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@endocrimes
Copy link
Member

/cc @rphillips

@pacoxu pacoxu moved this from Triage to Waiting on Author in SIG Node PR Triage Jun 20, 2022
@249043822 249043822 moved this from Waiting on Author to Needs Reviewer in SIG Node PR Triage Jun 24, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 2, 2022
@249043822
Copy link
Member Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 3, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 1, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 1, 2022
@tgoodsell-tempus
Copy link

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Dec 15, 2022
@tgoodsell-tempus
Copy link

/cc @249043822

Do you have any insight on where this is sitting with the sig-node team?

@k8s-ci-robot
Copy link
Contributor

@tgoodsell-tempus: GitHub didn't allow me to request PR reviews from the following users: 249043822.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @249043822

Do you have any insight on where this is sitting with the sig-node team?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@bart0sh bart0sh moved this from Needs Approver to Triage in SIG Node PR Triage Dec 16, 2022
@bart0sh
Copy link
Contributor

bart0sh commented Dec 20, 2022

/priority important-longterm
/triage accepted
/assign @bobbypage

@k8s-ci-robot k8s-ci-robot added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 20, 2022
@bart0sh bart0sh moved this from Triage to Needs Reviewer in SIG Node PR Triage Dec 20, 2022
@aoxn
Copy link
Contributor

aoxn commented Mar 3, 2023

@smarterclayton GracePeriod was override by default and executePreStopHook time was not respected. Thus preStopHook execution time is not included in the gracefulTerminationSeconds. It is unexpected.
image

image

@bart0sh
Copy link
Contributor

bart0sh commented Mar 10, 2023

/assign @smarterclayton

@SergeyKanzhelev SergeyKanzhelev moved this from Needs Reviewer to Needs Approver in SIG Node PR Triage Mar 10, 2023
// we should keep options.PodTerminationGracePeriodSecondsOverride as it is
// since if options.PodTerminationGracePeriodSecondsOverride is nil, the killContainer interface
// of kuberuntime_container will still calculate a effective grace period for this container termination
return gracePeriod, status.gracePeriod != 0 && status.gracePeriod != gracePeriod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check for gracePeriod < 1 here?

@@ -681,9 +681,6 @@ func (p *podWorkers) UpdatePod(options UpdatePodOptions) {

wasGracePeriodShortened = gracePeriodShortened
status.gracePeriod = gracePeriod
// always set the grace period for syncTerminatingPod so we don't have to recalculate,
// will never be zero.
options.KillPodOptions.PodTerminationGracePeriodSecondsOverride = &gracePeriod
Copy link

@srl11 srl11 Jun 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep this ?
Correct me if I'm wrong: It seems that the gracePeriod in killContainer is valued by options.KillPodOptions.PodTerminationGracePeriodSecondsOverride.
Is that beeter to keep having options.KillPodOptions.PodTerminationGracePeriodSecondsOverride set by the effective gracePeriod here?

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 20, 2024
@249043822 249043822 closed this Feb 1, 2024
SIG Node PR Triage automation moved this from Needs Approver to Done Feb 1, 2024
@robert-heinzmann-logmein

Is this issue addressed somewhere / somehow else ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet