Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix kubelet panic when allocate resource for pod. #119561

Conversation

payall4u
Copy link
Contributor

@payall4u payall4u commented Jul 25, 2023

What type of PR is this?

/kind bug

What this PR does / why we need it:

Fix kubelet panic when allocate resource for pod.

Which issue(s) this PR fixes:

Fixes #119560

Does this PR introduce a user-facing change?

None

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 25, 2023
@k8s-ci-robot k8s-ci-robot added area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jul 25, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @payall4u. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Jul 25, 2023
@payall4u payall4u force-pushed the fix-kubelet-panic-when-allocate-device branch from 6a11677 to 06a7c0b Compare July 25, 2023 13:24
@gjkim42
Copy link
Member

gjkim42 commented Jul 25, 2023

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 25, 2023
@gjkim42
Copy link
Member

gjkim42 commented Jul 25, 2023

Fixes #119560

I think it would be better to make a test to reproduce the bug and make sure it is fixed by this patch.

Comment on lines 635 to 636
if m.allocatedDevices[resource] == nil {
m.allocatedDevices[resource] = sets.NewString()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm missing something here. Which flows may possible set this to nil?
Once allocatedDevices[resource] is initialized, is never reset to nil explicitely and it's explicitely assigned only to the return value of podDevices.devices(), which can never be nil.

I for myself I like updating allocation once everything completed, but I'm wondering if this approach plays nice with the overall design (see comment at line 834-845). Perhaps we should review that part, but this is a much larger endeavour.

@bart0sh bart0sh added this to Triage in SIG Node PR Triage Jul 25, 2023
@bart0sh
Copy link
Contributor

bart0sh commented Jul 26, 2023

/triage accepted
/priority important-soon

@payall4u please provide release note

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jul 26, 2023
@bart0sh bart0sh moved this from Triage to Needs Reviewer in SIG Node PR Triage Jul 26, 2023
@bart0sh
Copy link
Contributor

bart0sh commented Jul 26, 2023

/assign @ffromani @klueska
/cc

@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 3b1107227de026e682257c25396fea5cf9165944

@bart0sh bart0sh moved this from Needs Reviewer to Needs Approver in SIG Node PR Triage Nov 10, 2023
@bart0sh
Copy link
Contributor

bart0sh commented Nov 10, 2023

@payall4u I don't think we still need this code on lines 629-632

	// Needs to allocate additional devices.
	if m.allocatedDevices[resource] == nil {
		m.allocatedDevices[resource] = sets.New[string]()
	}

@payall4u
Copy link
Contributor Author

@payall4u I don't think we still need this code on lines 629-632

	// Needs to allocate additional devices.
	if m.allocatedDevices[resource] == nil {
		m.allocatedDevices[resource] = sets.New[string]()
	}

Right.

Signed-off-by: payall4u <payall4u@qq.com>
@payall4u payall4u force-pushed the fix-kubelet-panic-when-allocate-device branch from 0a8bb44 to d6b8a66 Compare November 12, 2023 02:59
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 12, 2023
@payall4u
Copy link
Contributor Author

Sorry, I need you to give it another review. @bart0sh

@payall4u
Copy link
Contributor Author

ping @bart0sh

@bart0sh
Copy link
Contributor

bart0sh commented Nov 24, 2023

/lgtm
@payall4u sorry for the review delay, I was on vacation.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 24, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 76076e6e067890f1a34d9d2fc078de520007b351

@k8s-ci-robot
Copy link
Contributor

@payall4u: You must be a member of the kubernetes/milestone-maintainers GitHub team to set the milestone. If you believe you should be able to issue the /milestone command, please contact your Milestone Maintainers Team and have them propose you as an additional delegate for this responsibility.

In response to this:

/milestone v1.30

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@payall4u
Copy link
Contributor Author

Hi @bart0sh @SergeyKanzhelev

How about fix this in 1.30 and cherry-pick to 1.29 ?

@bart0sh
Copy link
Contributor

bart0sh commented Nov 29, 2023

How about fix this in 1.30 and cherry-pick to 1.29 ?

sounds good to me

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 27, 2024
@pacoxu
Copy link
Member

pacoxu commented Feb 28, 2024

/remove-lifecycle stale
/lgtm
/approve
/assign @mrunalp @SergeyKanzhelev
for approval

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 28, 2024
@payall4u
Copy link
Contributor Author

payall4u commented Feb 29, 2024

Ping @mrunalp @SergeyKanzhelev @klueska

@klueska
Copy link
Contributor

klueska commented Feb 29, 2024

/approve
/lgtm

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: klueska, pacoxu, payall4u

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 29, 2024
@k8s-ci-robot k8s-ci-robot merged commit 70383f3 into kubernetes:master Feb 29, 2024
14 checks passed
SIG Node PR Triage automation moved this from Needs Approver to Done Feb 29, 2024
@k8s-ci-robot k8s-ci-robot added this to the v1.30 milestone Feb 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

Successfully merging this pull request may close these issues.

Kubelet panic when allocates device for pods.