Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that DaemonSet respects termination #51279

Merged

Conversation

kow3ns
Copy link
Member

@kow3ns kow3ns commented Aug 24, 2017

What this PR does / why we need it:
#43077 correctly prevents the DaemonSet controller from adopting deleted Pods, but, as pointed out in #50477, the controller now has no sensitivity to the termination lifecycle (i.e TerminationGracePeriodSeconds) of the Pods it creates. This PR attempts to balance the two. DaemonSet controller will now consider deleted Pods owned by a DaemonSet during creation, but it will not consider deleted Pods as targets for adoption.

fixes #50477

Fix a problem of not respecting TerminationGracePeriodSeconds of the Pods created by DaemonSet controller.

@kow3ns kow3ns requested a review from janetkuo August 24, 2017 17:02
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 24, 2017
@kow3ns kow3ns requested a review from enisoc August 24, 2017 17:02
@k8s-github-robot k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Aug 24, 2017
@kow3ns kow3ns changed the title Ensure that DaemonSet respect termination Ensure that DaemonSet respects termination Aug 24, 2017
@kow3ns
Copy link
Member Author

kow3ns commented Aug 24, 2017

/retest

@kow3ns kow3ns force-pushed the daemonset-respects-termination branch from 2fc0eae to f2c3153 Compare August 24, 2017 17:51
@enisoc
Copy link
Member

enisoc commented Aug 24, 2017

The purpose of #43077 was not to prevent adoption of terminating Pods. That was accomplished independently with #44150, which means you shouldn't need to make any changes to getDaemonPods.

@kow3ns
Copy link
Member Author

kow3ns commented Aug 24, 2017

@enisoc #43077 doesn't allow any Pods with a DeletionTimestamp to be considered with by ControllerRefManager. #44150 filters Pods with a DeletionTimestamp, both are referenced in #44144, which is the actual issue. #43077 introduces the code that makes the test in this PR fail. This line makes it impossible for the DaemonSet Controller to be responsive to terminated Pods. If you're claiming this is unnecessary, I can buy that, but we most certainly need to change getDaemonPods to ensure that main loop actually sees terminated Pods.

@kow3ns
Copy link
Member Author

kow3ns commented Aug 24, 2017

/retest

@kow3ns
Copy link
Member Author

kow3ns commented Aug 24, 2017

/test pull-kubernetes-unit

@enisoc
Copy link
Member

enisoc commented Aug 24, 2017

The logic added in #44150 only applies to orphans. ClaimPods() will already include terminating Pods in its result, as long as you own them:

if controllerRef != nil {
if controllerRef.UID != m.Controller.GetUID() {
// Owned by someone else. Ignore.
return false, nil
}
if match(obj) {
// We already own it and the selector matches.
// Return true (successfully claimed) before checking deletion timestamp.
// We're still allowed to claim things we already own while being deleted
// because doing so requires taking no actions.
return true, nil
}

So your change to add back ownedAndDeleted Pods is unnecessary. They are already in claimed.

@kow3ns
Copy link
Member Author

kow3ns commented Aug 24, 2017

I see so the code introduced in #43077 is completely made useless by #44150. After this PR ControllerRefManager does the right thing. That makes this PR shorter.

@kow3ns kow3ns force-pushed the daemonset-respects-termination branch from f2c3153 to 4248f2a Compare August 24, 2017 21:09
@enisoc
Copy link
Member

enisoc commented Aug 24, 2017

Do we also need to bring back protection against double deletion?

https://github.com/kubernetes/kubernetes/pull/43077/files#diff-cb9055723bef8b239f9df77ac704ffacL556

Your PR is now basically a straight revert of #43077. I still think they intended that to have some effect unrelated to the problem of adopting terminating orphans (which they themselves fixed separately in #44150). We should make sure we understand what problem #43077 was trying to fix before we revert it.

cc @lukaszo @Kargakis

@@ -1816,12 +1831,6 @@ func TestGetNodesToDaemonPods(t *testing.T) {
newPod("non-matching-owned-0-", "node-0", simpleDaemonSetLabel2, ds),
newPod("non-matching-orphan-1-", "node-1", simpleDaemonSetLabel2, nil),
newPod("matching-owned-by-other-0-", "node-0", simpleDaemonSetLabel, ds2),
func() *v1.Pod {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move this up to wantedPods so we are clear to future readers that this behavior is required.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

@enisoc
Copy link
Member

enisoc commented Aug 25, 2017

@kow3ns It seems like the intent of #43077 was to omit terminating-but-owned Pods from the "available" count. However, for some reason no test case was added for such a Pod:

https://github.com/kubernetes/kubernetes/pull/43077/files#diff-7f51010c4c3f3050dcce5c9d916fc3eeR138

Could you add a test case for a terminating-but-owned Pod in that function? I expect it will fail now that you've reverted the change from #43077. To avoid regressing on the bug that was meant to fix, we may have to add a new, more targeted fix in getUnavailableNumbers().

Also, you mentioned offline that you brought back the protection against double deletion. Did you not push that commit yet?

@janetkuo janetkuo self-assigned this Aug 25, 2017
manager.dsStore.Add(ds)

syncAndValidateDaemonSets(t, manager, ds, podControl, 1, 0, 0)
}
}

// DaemonSet should launch a pod on a not ready node with taint notReady:NoExecute.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment needs to be updated

@janetkuo
Copy link
Member

janetkuo commented Aug 25, 2017

I suggest we revert #43077 but keep the test cases. If we want to count terminating pods as unavailable (what #43077 originally tried to fix), we should update below code to not increment numberAvailable when the pod is terminating:

if podutil.IsPodAvailable(pod, ds.Spec.MinReadySeconds, metav1.Now()) {
numberAvailable++
}

I originally suggested that IsPodAvailable should count terminating pods as unavailable, but the decision made then was to let controllers to decide whether to filter out terminating pods or not:
https://github.com/kubernetes/kubernetes/pull/32771/files#diff-473b59d03b4c3dc229240ad9fb29c1edR84

We should probably revisit IsPodAvailable if we think it's a common requirement for all controllers to determine availability of pods.

@kow3ns
Copy link
Member Author

kow3ns commented Aug 29, 2017

/retest

@lukaszo
Copy link
Contributor

lukaszo commented Aug 29, 2017

The goal of #43077 was to make daemon controller behavior similar to replicaset controller where pods marked for deletion are also ignored.
If the behavior is wrong then probably ReplicaSet controller also should be fixed.

@lukaszo
Copy link
Contributor

lukaszo commented Aug 29, 2017

However, for some reason no test case was added for such a Pod:

@enisoc it's handled here https://github.com/kubernetes/kubernetes/pull/43077/files#diff-0dc431d36cd3969089646d1bd4640d1eR1270

@janetkuo
Copy link
Member

janetkuo commented Aug 29, 2017

The goal of #43077 was to make daemon controller behavior similar to replicaset controller where pods marked for deletion are also ignored.
If the behavior is wrong then probably ReplicaSet controller also should be fixed.

For DaemonSets:

@kow3ns
Copy link
Member Author

kow3ns commented Aug 29, 2017

/retest

1 similar comment
@kow3ns
Copy link
Member Author

kow3ns commented Aug 30, 2017

/retest

@enisoc
Copy link
Member

enisoc commented Aug 30, 2017

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 30, 2017
// Skip terminating pods
if pod.DeletionTimestamp != nil {
continue
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add these back to dsc.manage() (where it's deleted)?

… while waiting for a Pod that it has previously created to terminate.
@kow3ns kow3ns force-pushed the daemonset-respects-termination branch from e892999 to 8ad18bf Compare August 31, 2017 17:29
@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 31, 2017
@kow3ns
Copy link
Member Author

kow3ns commented Aug 31, 2017

@janetkuo @enisoc @Kargakis are we good with this?

@enisoc
Copy link
Member

enisoc commented Aug 31, 2017

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 31, 2017
@0xmichalis
Copy link
Contributor

/approve

@janetkuo
Copy link
Member

/lgtm

@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: enisoc, janetkuo, kargakis, kow3ns

Associated issue: 43077

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 31, 2017
@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 51628, 51637, 51490, 51279, 51302)

@janetkuo
Copy link
Member

janetkuo commented Dec 6, 2017

Since the regression was introduced in 1.7, we should probably cherrypick this one into 1.7

k8s-github-robot pushed a commit that referenced this pull request Dec 15, 2017
…79-upstream-release-1.7

Automatic merge from submit-queue.

Automated cherry pick of #51279 upstream release 1.7

Cherrypick #51279 to 1.7 
Fixes #56653
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DaemonSet controller start new pod when the old pod is still in Terminating state
8 participants