New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use pod UID as cache key instead of namespace/name #61069

Merged
merged 1 commit into from Mar 13, 2018

Conversation

@anfernee
Member

anfernee commented Mar 12, 2018

UID uniquely identifies pods across lifecycles, while namespace/name
could be 2 different pods across lifecycles. This could result in
tricky scheduler bugs.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #60966

Special notes for your reviewer: @bsalamat

Release note:

Fix a bug in scheduler cache by using Pod UID as the cache key instead of namespace/name
@anfernee

This comment has been minimized.

Member

anfernee commented Mar 13, 2018

/retest

@@ -495,7 +494,7 @@ func (n *NodeInfo) FilterOutPods(pods []*v1.Pod) []*v1.Pod {
// getPodKey returns the string key of a pod.
func getPodKey(pod *v1.Pod) (string, error) {
return clientcache.MetaNamespaceKeyFunc(pod)
return string(pod.UID), nil

This comment has been minimized.

@bsalamat

bsalamat Mar 13, 2018

Contributor

Could you please add a check to return an error when pod.UID is empty?

This comment has been minimized.

@anfernee

anfernee Mar 13, 2018

Member

sure. I thought it is always non-empty? can I ask when it's empty?

This comment has been minimized.

@bsalamat

bsalamat Mar 13, 2018

Contributor

It shouldn't be empty, but a bug in the system could leave it empty. It could also be left empty in our tests. So, it would be good to check and return an error.

This comment has been minimized.

@anfernee

anfernee Mar 13, 2018

Member

make sense. just curious.

@bsalamat

This comment has been minimized.

Contributor

bsalamat commented Mar 13, 2018

We should include this in 1.10 as this is a bug fix.

@dims

This comment has been minimized.

Member

dims commented Mar 13, 2018

/kind bug
/priority important-soon
/sig scheduling

Throwing the labels for @bsalamat

@dims

This comment has been minimized.

Member

dims commented Mar 13, 2018

/status approved-for-milestone

@dims

This comment has been minimized.

Member

dims commented Mar 13, 2018

@bsalamat @jayunit100 All it needs now is a lgtm from one of you

@ravisantoshgudimetla

This comment has been minimized.

Contributor

ravisantoshgudimetla commented Mar 13, 2018

Shouldn't we update the filter code as well which checks if the pod exists in the pod list? https://github.com/kubernetes/kubernetes/blob/master/pkg/scheduler/schedulercache/node_info.go#L509

@jdumars

This comment has been minimized.

Member

jdumars commented Mar 13, 2018

@sig-scheduling-maintainers PTAL ASAP

@bsalamat

This comment has been minimized.

Contributor

bsalamat commented Mar 13, 2018

Shouldn't we update the filter code as well which checks if the pod exists in the pod list?

It would be better to change the filter function as well. It is not necessary for this PR to work correctly though.

Yongkun Anfernee Gui
Use pod UID as cache key instead of namespace/name
UID uniquely identifies pods across lifecycles, while namespace/name
could be 2 different pods across lifecycles. This could result in
tricky scheduler bugs.

Fixes #60966
@bsalamat

This comment has been minimized.

Contributor

bsalamat commented Mar 13, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label Mar 13, 2018

@k8s-ci-robot

This comment has been minimized.

Contributor

k8s-ci-robot commented Mar 13, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: anfernee, bsalamat

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-merge-robot

This comment has been minimized.

Contributor

k8s-merge-robot commented Mar 13, 2018

[MILESTONENOTIFIER] Milestone Pull Request: Up-to-date for process

@anfernee @bsalamat

Pull Request Labels
  • sig/scheduling: Pull Request will be escalated to these SIGs if needed.
  • priority/important-soon: Escalate to the pull request owners and SIG owner; move out of milestone after several unsuccessful escalation attempts.
  • kind/bug: Fixes a bug discovered during the current release.
Help
@anfernee

This comment has been minimized.

Member

anfernee commented Mar 13, 2018

/retest

@k8s-merge-robot

This comment has been minimized.

Contributor

k8s-merge-robot commented Mar 13, 2018

Automatic merge from submit-queue (batch tested with PRs 61111, 61069). If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-merge-robot k8s-merge-robot merged commit 34001d8 into kubernetes:master Mar 13, 2018

14 checks passed

Submit Queue Queued to run github e2e tests a second time.
Details
cla/linuxfoundation anfernee authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke Skipped
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce Job succeeded.
Details
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
@resouer

This comment has been minimized.

Member

resouer commented May 7, 2018

This bug actually exists in previous releases, we need to cherry pick it in 1.9 at least.

@bsalamat

This comment has been minimized.

Contributor

bsalamat commented May 14, 2018

@resouer I agree. Could you please create a cherry pick PR for 1.9?

@resouer

This comment has been minimized.

Member

resouer commented May 16, 2018

@bsalamat Sure, will do!

k8s-merge-robot added a commit that referenced this pull request May 19, 2018

Merge pull request #63995 from resouer/automated-cherry-pick-of-#61069-…
…upstream-release-1.9

Automatic merge from submit-queue.

Automated cherry pick of #61069: Use pod UID as cache key instead of namespace/name

Cherry pick of #61069 on release-1.9.

#61069: Use pod UID as cache key instead of namespace/name
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment