Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.27] kubelet: devices: skip allocation for running pods #118635 #119432

Conversation

ffromani
Copy link
Contributor

@ffromani ffromani commented Jul 19, 2023

What type of PR is this?

/kind bug
/kind regression

What this PR does / why we need it:

Cherry-pick of #118635 to branch release-1.27. Cherry pick per se done using hack/cherry_pick_pull.sh

Original description

When kubelet initializes, runs admission for pods and possibly allocate requested resources. We need to distinguish between node reboot (no containers running) versus kubelet restart (containers potentially running).

Running pods should always survive kubelet restart. This means that device allocation on admission should not be attempted, because if a container requires devices and is still running when kubelet is restarting, that container already has devices allocated and working.

Thus, we need to properly detect this scenario in the allocation step and handle it explicitely. We need to inform the devicemanager about which pods are already running.

Which issue(s) this PR fixes:

Fixes #118559

Special notes for your reviewer:

Implements the first approach proposed in the thread, so we make the devicemanager treat running pod differently.

This approach was chosen because it seems simpler to make self-contained and easier to backport.

The devicemanager already tracks (with the help of the checkpoint files) which containers got devices assigned to them, which by definition means these containers passed its admission. The missing bit is safely learning which container are already running when initializing, and for that we extend the existing buildContainerMapFromRuntime

Does this PR introduce a user-facing change?

Fixes regression in 1.27.2 causing running pods with devices to be terminated if kubelet is restarted

@k8s-ci-robot k8s-ci-robot added this to the v1.27 milestone Jul 19, 2023
@k8s-ci-robot k8s-ci-robot added do-not-merge/cherry-pick-not-approved Indicates that a PR is not yet approved to merge into a release branch. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. kind/regression Categorizes issue or PR as related to a regression from a prior release. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jul 19, 2023
@ffromani
Copy link
Contributor Author

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jul 19, 2023
@ffromani
Copy link
Contributor Author

/test

@k8s-ci-robot
Copy link
Contributor

@ffromani: The /test command needs one or more targets.
The following commands are available to trigger required jobs:

  • /test pull-cadvisor-e2e-kubernetes
  • /test pull-kubernetes-conformance-kind-ga-only-parallel
  • /test pull-kubernetes-coverage-unit
  • /test pull-kubernetes-dependencies
  • /test pull-kubernetes-dependencies-go-canary
  • /test pull-kubernetes-e2e-gce
  • /test pull-kubernetes-e2e-gce-100-performance
  • /test pull-kubernetes-e2e-gce-canary
  • /test pull-kubernetes-e2e-gce-cos
  • /test pull-kubernetes-e2e-gce-cos-canary
  • /test pull-kubernetes-e2e-gce-cos-no-stage
  • /test pull-kubernetes-e2e-gce-network-proxy-http-connect
  • /test pull-kubernetes-e2e-kind
  • /test pull-kubernetes-e2e-kind-ipv6
  • /test pull-kubernetes-integration
  • /test pull-kubernetes-integration-go-canary
  • /test pull-kubernetes-integration-go-compatibility
  • /test pull-kubernetes-kubemark-e2e-gce-scale
  • /test pull-kubernetes-node-e2e-containerd
  • /test pull-kubernetes-typecheck
  • /test pull-kubernetes-unit
  • /test pull-kubernetes-unit-go-canary
  • /test pull-kubernetes-unit-go-compatibility
  • /test pull-kubernetes-update
  • /test pull-kubernetes-verify
  • /test pull-kubernetes-verify-go-canary

The following commands are available to trigger optional jobs:

  • /test check-dependency-stats
  • /test pull-e2e-gce-cloud-provider-disabled
  • /test pull-kubernetes-conformance-kind-ipv6-parallel
  • /test pull-kubernetes-cross
  • /test pull-kubernetes-e2e-autoscaling-hpa-cm
  • /test pull-kubernetes-e2e-autoscaling-hpa-cpu
  • /test pull-kubernetes-e2e-capz-azure-disk
  • /test pull-kubernetes-e2e-capz-azure-disk-vmss
  • /test pull-kubernetes-e2e-capz-azure-file
  • /test pull-kubernetes-e2e-capz-azure-file-vmss
  • /test pull-kubernetes-e2e-capz-conformance
  • /test pull-kubernetes-e2e-capz-windows-1-27
  • /test pull-kubernetes-e2e-containerd-gce
  • /test pull-kubernetes-e2e-gce-correctness
  • /test pull-kubernetes-e2e-gce-device-plugin-gpu
  • /test pull-kubernetes-e2e-gce-network-proxy-grpc
  • /test pull-kubernetes-e2e-gce-serial
  • /test pull-kubernetes-e2e-gci-gce-autoscaling
  • /test pull-kubernetes-e2e-kind-dual-canary
  • /test pull-kubernetes-e2e-kind-ipvs-dual-canary
  • /test pull-kubernetes-e2e-kind-kms
  • /test pull-kubernetes-e2e-kind-multizone
  • /test pull-kubernetes-e2e-kops-aws
  • /test pull-kubernetes-kubemark-e2e-gce-big
  • /test pull-kubernetes-node-e2e-containerd-kubetest2
  • /test pull-kubernetes-node-kubelet-serial-pod-disruption-conditions
  • /test pull-kubernetes-verify-strict-lint
  • /test pull-publishing-bot-validate

Use /test all to run the following jobs that were automatically triggered:

  • pull-kubernetes-conformance-kind-ga-only-parallel
  • pull-kubernetes-conformance-kind-ipv6-parallel
  • pull-kubernetes-dependencies
  • pull-kubernetes-e2e-capz-windows-1-27
  • pull-kubernetes-e2e-gce
  • pull-kubernetes-e2e-kind
  • pull-kubernetes-e2e-kind-ipv6
  • pull-kubernetes-integration
  • pull-kubernetes-integration-go-compatibility
  • pull-kubernetes-node-e2e-containerd
  • pull-kubernetes-typecheck
  • pull-kubernetes-unit
  • pull-kubernetes-unit-go-compatibility
  • pull-kubernetes-verify

In response to this:

/test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added area/kubelet area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Jul 19, 2023
@ffromani
Copy link
Contributor Author

/test all

@ffromani
Copy link
Contributor Author

/test pull-kubernetes-e2e-gce-device-plugin-gpu
/test pull-kubernetes-e2e-gce-serial

@ffromani
Copy link
Contributor Author

/test pull-kubernetes-unit-go-compatibility

unrelated failure: k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/util/flowcontrol/fairqueuing/queueset: TestTooWide expand_less

@ffromani
Copy link
Contributor Author

/test pull-kubernetes-e2e-gce-device-plugin-gpu

setup failure?

@ffromani
Copy link
Contributor Author

/test pull-kubernetes-unit-go-compatibility

@ffromani
Copy link
Contributor Author

/test pull-kubernetes-e2e-capz-windows-1-27

unrelated failure: /home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/test/e2e/conformance_test.go

@ffromani
Copy link
Contributor Author

/retest-required

apiserver is not affected by this change

@SergeyKanzhelev SergeyKanzhelev moved this from Triage to Archive-it in SIG Node CI/Test Board Jul 19, 2023
@ffromani
Copy link
Contributor Author

@ffromani sorry for the issues with the windows job. Thanks for reporting. It is running and passing now

no problem at all! it took me a while to get a clue about what was broken, but appy it was resolved and we have better windows jobs

Copy link
Contributor

@swatisehgal swatisehgal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Re-applying as the label was removed due to rebase.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 15, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 01900654b0c6f23d2ddc1ff343f0eb6fe97a9fcd

@bobbypage
Copy link
Member

regarding #119432 (comment)

I'm inclined to do the debugging of the jobs in parallel and not block this PR because of that, please reviewers share your thoughts

+1, it looks like the test is failing on presubmits on your empty PR (#119590), so I agree we should not block this PR on that. I believe I found the issue there with the config (see kubernetes/test-infra#30450), which we can address separately.

@bobbypage
Copy link
Member

/lgtm

@bobbypage
Copy link
Member

/test pull-kubernetes-e2e-gce-device-plugin-gpu

@ffromani
Copy link
Contributor Author

regarding #119432 (comment)

I'm inclined to do the debugging of the jobs in parallel and not block this PR because of that, please reviewers share your thoughts

+1, it looks like the test is failing on presubmits on your empty PR (#119590), so I agree we should not block this PR on that. I believe I found the issue there with the config (see kubernetes/test-infra#30450), which we can address separately.

Thanks @bobbypage ! it seems your fix was merged and worked like a charm, awesome!
Now I'll review myself that change carefully because I was not aware of the interdependencies and of the settings you used. TIL!

@retr0h
Copy link

retr0h commented Aug 22, 2023

@ffromani any idea when this will make it into 1.27?

We have been hitting this for a while now, and encounter UnexpectedAdmissionError on pods which use devices made available by the generic device plugin when the kubelet is restarted.

@ffromani
Copy link
Contributor Author

@ffromani any idea when this will make it into 1.27?

We have been hitting this for a while now, and encounter UnexpectedAdmissionError on pods which use devices made available by the generic device plugin when the kubelet is restarted.

Hi, please note that we (or, at very least, I :) ) want to keep the existing behavior added in #109595 , whose partial fix lead to the bug I'm fixing here. In other words, some pods WILL still be hitting UnexpectedAdmissionError in some cases.
That said, I'm aiming for the september patch day. All blockers have been removed, but the backports have not been approved either.

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 22, 2023
@bobbypage
Copy link
Member

/cc @kubernetes/release-managers

Release managers, can you please take a look at this cherrypick? Thank you!

@k8s-ci-robot k8s-ci-robot requested a review from a team August 22, 2023 19:44
@SergeyKanzhelev SergeyKanzhelev moved this from Needs Approver to Done in SIG Node PR Triage Aug 23, 2023
Copy link
Member

@xmudrii xmudrii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For RelEng:
/lgtm
/approve

@xmudrii xmudrii added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Aug 30, 2023
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/cherry-pick-not-approved Indicates that a PR is not yet approved to merge into a release branch. label Aug 30, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ffromani, mrunalp, xmudrii

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ffromani
Copy link
Contributor Author

/test pull-kubernetes-node-e2e-containerd

failure unrelated to this PR

@ffromani
Copy link
Contributor Author

/test pull-kubernetes-unit

unrelated failures
image

@ffromani
Copy link
Contributor Author

/test pull-kubernetes-unit-go-compatibility

unrelated failure

@k8s-ci-robot k8s-ci-robot merged commit 4737b87 into kubernetes:release-1.27 Aug 30, 2023
17 checks passed
@ffromani ffromani deleted the automated-cherry-pick-of-#118635-upstream-release-1.27-1689697264 branch August 30, 2023 14:43
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Sep 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet area/test cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. kind/regression Categorizes issue or PR as related to a regression from a prior release. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Development

Successfully merging this pull request may close these issues.

None yet