Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force deleting pods not working after deletionGracePeriodSeconds set to a negative value #98506

Closed
jqmichael opened this issue Jan 28, 2021 · 14 comments · Fixed by #98866
Closed
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@jqmichael
Copy link
Contributor

jqmichael commented Jan 28, 2021

What happened:
Some k8s client tried to evict a pod by setting negative value for deletionGracePeriodSeconds. But all eventual delete calls (and force-delete calls) made by kubelet, pod-garbage-collector, etc were not able to delete that pod, because apiserver doesnt allow deletionGracePeriodSeconds to be increased once it is set. So force-delete (which set it to 0) doesn't take effect when it's already negative. Not even kubectl is able to call delete with negative grace-period. But curl with an even more negative grace-period value worked.

What you expected to happen:
force-delete works.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:
I think the issue is in the code below

	if objectMeta.GetDeletionTimestamp() != nil {
		// if we are already being deleted, we may only shorten the deletion grace period
		// this means the object was gracefully deleted previously but deletionGracePeriodSeconds was not set,
		// so we force deletion immediately
		// IMPORTANT:
		// The deletion operation happens in two phases.
		// 1. Update to set DeletionGracePeriodSeconds and DeletionTimestamp
		// 2. Delete the object from storage.
		// If the update succeeds, but the delete fails (network error, internal storage error, etc.),
		// a resource was previously left in a state that was non-recoverable.  We
		// check if the existing stored resource has a grace period as 0 and if so
		// attempt to delete immediately in order to recover from this scenario.
		if objectMeta.GetDeletionGracePeriodSeconds() == nil || *objectMeta.GetDeletionGracePeriodSeconds() == 0 {
			return false, false, nil
		}
		// only a shorter grace period may be provided by a user
		if options.GracePeriodSeconds != nil {
			period := int64(*options.GracePeriodSeconds)
			if period >= *objectMeta.GetDeletionGracePeriodSeconds() {
				return false, true, nil
			}

https://github.com/kubernetes/apiserver/blame/d4c9a195921609cf81e3e950beaf246f934e0f4c/pkg/registry/rest/delete.go#L96-L110

As the comment suggest, in phase 1, the DeletionGracePeriodSeconds was set to a negative value (meaning DeletionTimestamp was in the past), and in phase 2, options.GracePeriodSeconds has to be set even lower, otherwise, it will be considered as pending graceful deletion and not performing the immediate delete.

I think there're two possible fixes.

  1. Consider not allowing setting DeletionTimestamp in the past (DeletionGracePeriodSeconds should be non-negative). In that case, we could convert options.GracePeriodSeconds to 0 if negative.
  2. Accept that DeletionTimestamp could be set in the past. In that case, in phase 2, we treat negative value of objectMeta.GetDeletionGracePeriodSeconds() the same as 0, and allow immediate delete.

Thoughts?

Environment:

  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
  • OS (e.g: cat /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Network plugin and version (if this is a network-related bug):
  • Others:
@jqmichael jqmichael added the kind/bug Categorizes issue or PR as related to a bug. label Jan 28, 2021
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 28, 2021
@jqmichael
Copy link
Contributor Author

jqmichael commented Jan 28, 2021

/sig api-machinery

@smarterclayton

@k8s-ci-robot
Copy link
Contributor

@jqmichael: The label(s) sig/ cannot be applied, because the repository doesn't have them

In response to this:

/sig api-machinery
@smarterclayton

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 28, 2021
@wzshiming
Copy link
Member

/assign

@jqmichael jqmichael changed the title Force deleting pods not working with deletionGracePeriodSeconds set to negative value previously Force deleting pods not working after deletionGracePeriodSeconds set to a negative value Jan 28, 2021
@fedebongio
Copy link
Contributor

/cc @jiahuif
/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 4, 2021
@smarterclayton
Copy link
Contributor

No component in the system should ever treat the timestamp of DeletionTimestamp as authoritative. In Kube, components are required to calculate the time period from when they observe the event in their local clock (i.e. 30s from now), not use the deletion timestamp.

A negative value of deletion grace period should be rejected by the apiserver as an invalid value (bad request / invalid field).

@liggitt
Copy link
Member

liggitt commented Feb 7, 2021

A negative value of deletion grace period should be rejected by the apiserver as an invalid value (bad request / invalid field).

attempts to take a missing or positive deletion grace period negative should be rejected, I agree.

if a negative value is already persisted, I'd be inclined to allow a negative value in an update and clip to 0

@michaelgugino
Copy link
Contributor

As I mentioned in the PR, eviction is passing negatives through to the store: https://github.com/kubernetes/kubernetes/blob/master/pkg/registry/core/pod/storage/eviction.go#L160

In kubectl, we specify the default value for GracePeriod is -1. In the case of delete, we simply don't set the value in the deleteOptions we send to the API. In the case of eviction, we send the -1 value, and that gets passed along to the store.

@liggitt
Copy link
Member

liggitt commented Feb 23, 2021

In kubectl, we specify the default value for GracePeriod is -1. In the case of delete, we simply don't set the value in the deleteOptions we send to the API. In the case of eviction, we send the -1 value, and that gets passed along to the store.

I don't see the -1 getting sent to the eviction API. It looks like values < 0 result in a nil value being sent to the API:

deleteOptions := metav1.DeleteOptions{}
if d.GracePeriodSeconds >= 0 {
gracePeriodSeconds := int64(d.GracePeriodSeconds)
deleteOptions.GracePeriodSeconds = &gracePeriodSeconds
}

I would not expect the API to accept a negative value as-is.

@michaelgugino
Copy link
Contributor

I would not expect the API to accept a negative value as-is.

Looks like you are correct. Do the deletionOptions go through their own validation or through some kind of validation for the evict API? The original bug report mentions 'evict' though it's not clear if the eviction API was used or not.

@liggitt
Copy link
Member

liggitt commented Feb 23, 2021

Do the deletionOptions go through their own validation or through some kind of validation for the evict API? The original bug report mentions 'evict' though it's not clear if the eviction API was used or not.

ValidateDeleteOptions is called inside pod deletion, but did not check that the field was non-negative. I don't see any eviction-specific validation.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 24, 2021
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 23, 2021
@wzshiming
Copy link
Member

/lifecycle rotten

@caozhuozi
Copy link

caozhuozi commented Jan 3, 2024

/lifecycle rotten

@wzshiming Im investigating the deletionGracePeriodSeconds and deletionTimestamp. Surprisingly see you here...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants