Keep track of remaining pods when a node is deleted #93938

alculquicondor · 2020-08-12T17:30:45Z

What type of PR is this?

/kind bug

What this PR does / why we need it:

The apiserver is expected to send pod deletion events that might arrive at a different time. However, sometimes a node could be recreated without its pods being deleted.

Special notes for your reviewer:

This is a partial revert of #86964 and #89908

Since then, we have been more careful about direct usage of the map of nodes. In particular:

there were no uses left of GetNodeInfo
we skip nil node objects during snapshotting

Fixed Dump that still uses the node map (still useful to know, so we don't want to skip phantom nodes). And switched the only remaining list method that uses the map to return a count, and marked to only be used for tests.

The PR consists of 2 commits:

Removing direct usages of cache's map.
Actual fix.

Does this PR introduce a user-facing change?:

Scheduler bugfix: Scheduler doesn't lose pod information when nodes are quickly recreated. This could happen when nodes are restarted or quickly recreated reusing a nodename.

pkg/scheduler/framework/v1alpha1/types.go

pkg/scheduler/internal/cache/cache.go

alculquicondor · 2020-08-12T18:42:58Z

/retest

alculquicondor · 2020-08-12T19:50:17Z

/assign @Huang-Wei

jkaniuk · 2020-08-12T23:17:28Z

/retest

ahg-g · 2020-08-13T01:44:23Z

/milestone v1.19
/lgtm
/hold

One thing we could do to reduce the chance of accessing nodeinfo.Node() without checking for nil is to return an error when the node object is nil (n, err := nodeinfo.Node()). It is less likely we miss checking the error than missing to check for nil. Just an idea that we could do in 1.20.

Huang-Wei · 2020-08-13T15:38:15Z

pkg/scheduler/internal/cache/interface.go

@@ -57,8 +57,9 @@ import (
 // - Both "Expired" and "Deleted" are valid end states. In case of some problems, e.g. network issue,
 //   a pod might have changed its state (e.g. added and deleted) without delivering notification to the cache.
 type Cache interface {
-	// ListPods lists all pods in the cache.
-	ListPods(selector labels.Selector) ([]*v1.Pod, error)
+	// ListPods returns the number of pods in the cache (including those from deleted nodes).


Suggested change

// ListPods returns the number of pods in the cache (including those from deleted nodes).

// PodCount returns the number of pods in the cache (including those from deleted nodes).

ahg-g · 2020-08-13T15:41:20Z

it would be nice if we can keep the actual fix in a separate commit separate from all the other updates to the tests.

justaugustus · 2020-08-13T17:49:00Z

/test pull-kubernetes-verify
/test pull-kubernetes-e2e-kind

Signed-off-by: Aldo Culquicondor <acondor@google.com> Change-Id: Iebb22fc816926aaa1ddd1e4b2e52f335a275ffaa Signed-off-by: Aldo Culquicondor <acondor@google.com>

The apiserver is expected to send pod deletion events that might arrive at a different time. However, sometimes a node could be recreated without its pods being deleted. Partial revert of kubernetes#86964 Signed-off-by: Aldo Culquicondor <acondor@google.com> Change-Id: I51f683e5f05689b711c81ebff34e7118b5337571

alculquicondor · 2020-08-13T18:25:01Z

it would be nice if we can keep the actual fix in a separate commit separate from all the other updates to the tests.

Done

alculquicondor · 2020-08-13T18:37:05Z

/retest

Huang-Wei · 2020-08-13T19:29:59Z

pkg/scheduler/internal/cache/cache.go

 func (cache *schedulerCache) removePod(pod *v1.Pod) error {
 	n, ok := cache.nodes[pod.Spec.NodeName]
 	if !ok {
+		klog.Errorf("node %v not found when trying to remove pod %v", pod.Spec.NodeName, pod.Name)


I recalled the original logic returned an error?

It did. But returning nil is actually safer in the case of extraneous update events that might arrive before a node is created, and after the original node was completely removed.

+1 to returning nil and just logging an error.

I checked the usage of removePod(), there are still a number of callers rely on the returned value. So I'd suggest to revert to the original state.

Detail for each caller:

ForgetPod: We actually want to proceed and clear the assumedPods and podStates.

expirePod: Same as above.

AddPod: it just logs the error returned, so same effect.

RemovePod: We want to clear podStates.

updatePod: This is the case where we want to prevent losing information.

That said, for expirePod and ForgetPod, the node shouldn't have been removed because it still had pods assigned.

Thanks. That's fair.

Huang-Wei · 2020-08-13T19:31:10Z

Just a nit ^^. LGTM otherwise.

ahg-g · 2020-08-13T19:39:10Z

it would be nice if we can keep the actual fix in a separate commit separate from all the other updates to the tests.

Done

Thanks.

LGTM, will leave it to Wei to officially lgtm.

alculquicondor · 2020-08-13T19:53:13Z

/retest

Huang-Wei · 2020-08-13T20:23:23Z

/lgtm

Huang-Wei · 2020-08-13T20:23:42Z

/hold cancel

alculquicondor · 2020-08-13T21:18:06Z

This is actually already merged, but the github UI is outdated.

/shrug

k8s-ci-robot · 2020-08-13T21:18:13Z

@alculquicondor: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

alculquicondor · 2020-08-14T13:34:51Z

/unshrug

k8s-ci-robot · 2020-08-14T13:34:52Z

@alculquicondor: ¯\_(ツ)_/¯

In response to this:

/unshrug

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

alculquicondor force-pushed the revert-node-delete branch from 6267ad3 to da97b22 Compare August 12, 2020 17:31

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 12, 2020

k8s-ci-robot requested review from liu-cong and wgliang August 12, 2020 17:31

k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 12, 2020

alculquicondor changed the title ~~Keep track of remaining pods when a node is deleted~~ WIP: Keep track of remaining pods when a node is deleted Aug 12, 2020

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 12, 2020

ahg-g reviewed Aug 12, 2020

View reviewed changes

pkg/scheduler/framework/v1alpha1/types.go Outdated Show resolved Hide resolved

pkg/scheduler/internal/cache/cache.go Outdated Show resolved Hide resolved

pkg/scheduler/internal/cache/cache.go Outdated Show resolved Hide resolved

alculquicondor force-pushed the revert-node-delete branch from da97b22 to aeeaa70 Compare August 12, 2020 19:47

alculquicondor changed the title ~~WIP: Keep track of remaining pods when a node is deleted~~ Keep track of remaining pods when a node is deleted Aug 12, 2020

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 12, 2020

k8s-ci-robot assigned Huang-Wei Aug 12, 2020

alculquicondor force-pushed the revert-node-delete branch from aeeaa70 to aee640d Compare August 12, 2020 20:20

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 12, 2020

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 13, 2020

k8s-ci-robot added this to the v1.19 milestone Aug 13, 2020

k8s-ci-robot assigned ahg-g Aug 13, 2020

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 13, 2020

Huang-Wei reviewed Aug 13, 2020

View reviewed changes

alculquicondor force-pushed the revert-node-delete branch from 64e5cbe to 465bf95 Compare August 13, 2020 15:44

alculquicondor added 2 commits August 13, 2020 14:22

Remove direct accesses to cache's node map

16d7ecf

Signed-off-by: Aldo Culquicondor <acondor@google.com> Change-Id: Iebb22fc816926aaa1ddd1e4b2e52f335a275ffaa Signed-off-by: Aldo Culquicondor <acondor@google.com>

alculquicondor force-pushed the revert-node-delete branch from 465bf95 to dfe9e41 Compare August 13, 2020 18:24

Huang-Wei reviewed Aug 13, 2020

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 13, 2020

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 13, 2020

k8s-ci-robot added the ¯\_(ツ)_/¯ ¯\\\_(ツ)_/¯ label Aug 13, 2020

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 13, 2020

k8s-ci-robot merged commit 3647766 into kubernetes:master Aug 13, 2020

k8s-ci-robot removed the ¯\_(ツ)_/¯ ¯\\\_(ツ)_/¯ label Aug 14, 2020

github-actions bot mentioned this pull request Aug 18, 2020

Week Ending August 16, 2020 dev-obs/actus#209

Open

This was referenced Sep 28, 2020

Node Snapshot can become out-of-sync when Node is deleted before its Pods #95124

Closed

Fix UpdateSnapshot when Node is partially removed #95130

Merged

alculquicondor mentioned this pull request Jan 5, 2021

fix nil pointer dereference when NodeInfo.RemovePod #97609

Closed

liggitt added the kind/regression Categorizes issue or PR as related to a regression from a prior release. label Apr 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keep track of remaining pods when a node is deleted #93938

Keep track of remaining pods when a node is deleted #93938

alculquicondor commented Aug 12, 2020 •

edited

alculquicondor commented Aug 12, 2020

alculquicondor commented Aug 12, 2020

jkaniuk commented Aug 12, 2020

ahg-g commented Aug 13, 2020

Huang-Wei Aug 13, 2020

alculquicondor Aug 13, 2020

ahg-g commented Aug 13, 2020

justaugustus commented Aug 13, 2020

alculquicondor commented Aug 13, 2020

alculquicondor commented Aug 13, 2020

Huang-Wei Aug 13, 2020

alculquicondor Aug 13, 2020

ahg-g Aug 13, 2020

Huang-Wei Aug 13, 2020

alculquicondor Aug 13, 2020

alculquicondor Aug 13, 2020 •

edited

Huang-Wei Aug 13, 2020

Huang-Wei commented Aug 13, 2020

ahg-g commented Aug 13, 2020

alculquicondor commented Aug 13, 2020

Huang-Wei commented Aug 13, 2020

Huang-Wei commented Aug 13, 2020

alculquicondor commented Aug 13, 2020

k8s-ci-robot commented Aug 13, 2020

alculquicondor commented Aug 14, 2020

k8s-ci-robot commented Aug 14, 2020

	// ListPods returns the number of pods in the cache (including those from deleted nodes).
	// PodCount returns the number of pods in the cache (including those from deleted nodes).

Keep track of remaining pods when a node is deleted #93938

Keep track of remaining pods when a node is deleted #93938

Conversation

alculquicondor commented Aug 12, 2020 • edited

alculquicondor commented Aug 12, 2020

alculquicondor commented Aug 12, 2020

jkaniuk commented Aug 12, 2020

ahg-g commented Aug 13, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahg-g commented Aug 13, 2020

justaugustus commented Aug 13, 2020

alculquicondor commented Aug 13, 2020

alculquicondor commented Aug 13, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alculquicondor Aug 13, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Huang-Wei commented Aug 13, 2020

ahg-g commented Aug 13, 2020

alculquicondor commented Aug 13, 2020

Huang-Wei commented Aug 13, 2020

Huang-Wei commented Aug 13, 2020

alculquicondor commented Aug 13, 2020

k8s-ci-robot commented Aug 13, 2020

alculquicondor commented Aug 14, 2020

k8s-ci-robot commented Aug 14, 2020

alculquicondor commented Aug 12, 2020 •

edited

alculquicondor Aug 13, 2020 •

edited