Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep track of remaining pods when a node is deleted #93938

Merged
merged 2 commits into from
Aug 13, 2020

Conversation

alculquicondor
Copy link
Member

@alculquicondor alculquicondor commented Aug 12, 2020

What type of PR is this?

/kind bug

What this PR does / why we need it:

The apiserver is expected to send pod deletion events that might arrive at a different time. However, sometimes a node could be recreated without its pods being deleted.

Special notes for your reviewer:

This is a partial revert of #86964 and #89908

Since then, we have been more careful about direct usage of the map of nodes. In particular:

  • there were no uses left of GetNodeInfo
  • we skip nil node objects during snapshotting

Fixed Dump that still uses the node map (still useful to know, so we don't want to skip phantom nodes). And switched the only remaining list method that uses the map to return a count, and marked to only be used for tests.

The PR consists of 2 commits:

  1. Removing direct usages of cache's map.
  2. Actual fix.

Does this PR introduce a user-facing change?:

Scheduler bugfix: Scheduler doesn't lose pod information when nodes are quickly recreated. This could happen when nodes are restarted or quickly recreated reusing a nodename.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Aug 12, 2020
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 12, 2020
@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 12, 2020
@alculquicondor alculquicondor changed the title Keep track of remaining pods when a node is deleted WIP: Keep track of remaining pods when a node is deleted Aug 12, 2020
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 12, 2020
pkg/scheduler/framework/v1alpha1/types.go Outdated Show resolved Hide resolved
pkg/scheduler/internal/cache/cache.go Outdated Show resolved Hide resolved
pkg/scheduler/internal/cache/cache.go Outdated Show resolved Hide resolved
@alculquicondor
Copy link
Member Author

/retest

@alculquicondor alculquicondor changed the title WIP: Keep track of remaining pods when a node is deleted Keep track of remaining pods when a node is deleted Aug 12, 2020
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 12, 2020
@alculquicondor
Copy link
Member Author

/assign @Huang-Wei

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 12, 2020
@jkaniuk
Copy link
Contributor

jkaniuk commented Aug 12, 2020

/retest

@ahg-g
Copy link
Member

ahg-g commented Aug 13, 2020

/milestone v1.19
/lgtm
/hold

One thing we could do to reduce the chance of accessing nodeinfo.Node() without checking for nil is to return an error when the node object is nil (n, err := nodeinfo.Node()). It is less likely we miss checking the error than missing to check for nil. Just an idea that we could do in 1.20.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 13, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.19 milestone Aug 13, 2020
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 13, 2020
@@ -57,8 +57,9 @@ import (
// - Both "Expired" and "Deleted" are valid end states. In case of some problems, e.g. network issue,
// a pod might have changed its state (e.g. added and deleted) without delivering notification to the cache.
type Cache interface {
// ListPods lists all pods in the cache.
ListPods(selector labels.Selector) ([]*v1.Pod, error)
// ListPods returns the number of pods in the cache (including those from deleted nodes).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// ListPods returns the number of pods in the cache (including those from deleted nodes).
// PodCount returns the number of pods in the cache (including those from deleted nodes).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@ahg-g
Copy link
Member

ahg-g commented Aug 13, 2020

it would be nice if we can keep the actual fix in a separate commit separate from all the other updates to the tests.

@justaugustus
Copy link
Member

/test pull-kubernetes-verify
/test pull-kubernetes-e2e-kind

Signed-off-by: Aldo Culquicondor <acondor@google.com>
Change-Id: Iebb22fc816926aaa1ddd1e4b2e52f335a275ffaa
Signed-off-by: Aldo Culquicondor <acondor@google.com>
The apiserver is expected to send pod deletion events that might arrive at a different time. However, sometimes a node could be recreated without its pods being deleted.

Partial revert of kubernetes#86964

Signed-off-by: Aldo Culquicondor <acondor@google.com>
Change-Id: I51f683e5f05689b711c81ebff34e7118b5337571
@alculquicondor
Copy link
Member Author

it would be nice if we can keep the actual fix in a separate commit separate from all the other updates to the tests.

Done

@alculquicondor
Copy link
Member Author

/retest

func (cache *schedulerCache) removePod(pod *v1.Pod) error {
n, ok := cache.nodes[pod.Spec.NodeName]
if !ok {
klog.Errorf("node %v not found when trying to remove pod %v", pod.Spec.NodeName, pod.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recalled the original logic returned an error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It did. But returning nil is actually safer in the case of extraneous update events that might arrive before a node is created, and after the original node was completely removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to returning nil and just logging an error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the usage of removePod(), there are still a number of callers rely on the returned value. So I'd suggest to revert to the original state.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Detail for each caller:

  • ForgetPod: We actually want to proceed and clear the assumedPods and podStates.
  • expirePod: Same as above.
  • AddPod: it just logs the error returned, so same effect.
  • RemovePod: We want to clear podStates.
  • updatePod: This is the case where we want to prevent losing information.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, for expirePod and ForgetPod, the node shouldn't have been removed because it still had pods assigned.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. That's fair.

@Huang-Wei
Copy link
Member

Just a nit ^^. LGTM otherwise.

@ahg-g
Copy link
Member

ahg-g commented Aug 13, 2020

it would be nice if we can keep the actual fix in a separate commit separate from all the other updates to the tests.

Done

Thanks.

LGTM, will leave it to Wei to officially lgtm.

@alculquicondor
Copy link
Member Author

/retest

@Huang-Wei
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 13, 2020
@Huang-Wei
Copy link
Member

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 13, 2020
@alculquicondor
Copy link
Member Author

This is actually already merged, but the github UI is outdated.

/shrug

@k8s-ci-robot k8s-ci-robot added the ¯\_(ツ)_/¯ ¯\\\_(ツ)_/¯ label Aug 13, 2020
@k8s-ci-robot
Copy link
Contributor

@alculquicondor: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 13, 2020
@k8s-ci-robot k8s-ci-robot merged commit 3647766 into kubernetes:master Aug 13, 2020
@alculquicondor
Copy link
Member Author

/unshrug

@k8s-ci-robot
Copy link
Contributor

@alculquicondor: ¯\_(ツ)_/¯

In response to this:

/unshrug

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. kind/regression Categorizes issue or PR as related to a regression from a prior release. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants