Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix nil pointer dereference when EventedPLEG is enabled #122475

Merged
merged 1 commit into from Jan 4, 2024

Conversation

pacoxu
Copy link
Member

@pacoxu pacoxu commented Dec 25, 2023

What type of PR is this?

/kind bug

What this PR does / why we need it:

My test ENV is v1.29.0 with AllBeta: true including evented PLEG.

featureGates:
  AllBeta: true
  EventedPLEG: true

Log panic

12月 25 16:02:37 daocloud kubelet[22154]: E1225 16:02:37.233805   22154 kuberuntime_manager.go:1456] "PodSandboxStatus of sandbox for pod" err="rpc error: code = Unknown desc = failed to get sandbox ip: check network namespace closed: remove netns: unlinkat /var/run/netns/cni-69bc2682-2d1a-694c-2a43-276d3e693ea6: device or resource busy" podSandboxID="313ce8e172c9d0de6da0c25a9a05e4b4703445ee30159237d6c07ac577cdb03e" pod="calico-system/calico-kube-controllers-57c5f9d79c-nqdxx"
12月 25 16:02:37 daocloud kubelet[22154]: E1225 16:02:37.233866   22154 generic.go:453] "PLEG: Write status" err="rpc error: code = Unknown desc = failed to get sandbox ip: check network namespace closed: remove netns: unlinkat /var/run/netns/cni-69bc2682-2d1a-694c-2a43-276d3e693ea6: device or resource busy" pod="calico-system/calico-kube-controllers-57c5f9d79c-nqdxx"
12月 25 16:02:37 daocloud kubelet[22154]: E1225 16:02:37.234031   22154 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
12月 25 16:02:37 daocloud kubelet[22154]: goroutine 419 [running]:
12月 25 16:02:37 daocloud kubelet[22154]: k8s.io/apimachinery/pkg/util/runtime.logPanic({0x3c742e0?, 0x6e6ad40})
12月 25 16:02:37 daocloud kubelet[22154]:         vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x85
12月 25 16:02:37 daocloud kubelet[22154]: k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x6eb7920?})
12月 25 16:02:37 daocloud kubelet[22154]:         vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x6b
12月 25 16:02:37 daocloud kubelet[22154]: panic({0x3c742e0?, 0x6e6ad40?})
12月 25 16:02:37 daocloud kubelet[22154]:         /usr/local/go/src/runtime/panic.go:914 +0x21f
12月 25 16:02:37 daocloud kubelet[22154]: k8s.io/kubernetes/pkg/kubelet/pleg.(*GenericPLEG).updateCache(0xc000a86750, {0x4c40048, 0x6ee85c0}, 0xc0006c6380, {0xc00141e330, 0x24})
12月 25 16:02:37 daocloud kubelet[22154]:         pkg/kubelet/pleg/generic.go:479 +0x7ac
12月 25 16:02:37 daocloud kubelet[22154]: k8s.io/kubernetes/pkg/kubelet/pleg.(*GenericPLEG).Relist(0xc000a86750)
12月 25 16:02:37 daocloud kubelet[22154]:         pkg/kubelet/pleg/generic.go:283 +0x815
12月 25 16:02:37 daocloud kubelet[22154]: k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
12月 25 16:02:37 daocloud kubelet[22154]:         vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33
12月 25 16:02:37 daocloud kubelet[22154]: k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0?, {0x4c0f620, 0xc000fd5b90}, 0x1, 0xc00020cfc0)
12月 25 16:02:37 daocloud kubelet[22154]:         vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf
12月 25 16:02:37 daocloud kubelet[22154]: k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000095fb0?, 0x45d964b800, 0x0, 0x0?, 0x4475bc?)
12月 25 16:02:37 daocloud kubelet[22154]:         vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f
12月 25 16:02:37 daocloud kubelet[22154]: k8s.io/apimachinery/pkg/util/wait.Until(0xa946c5?, 0xa8f505?, 0xc000b11390?)
12月 25 16:02:37 daocloud kubelet[22154]:         vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161 +0x1e
12月 25 16:02:37 daocloud kubelet[22154]: created by k8s.io/kubernetes/pkg/kubelet/pleg.(*GenericPLEG).Start in goroutine 234
12月 25 16:02:37 daocloud kubelet[22154]:         pkg/kubelet/pleg/generic.go:146 +0x185
12月 25 16:02:37 daocloud kubelet[22154]: panic: runtime error: invalid memory address or nil pointer dereference [recovered]
12月 25 16:02:37 daocloud kubelet[22154]:         panic: runtime error: invalid memory address or nil pointer dereference
12月 25 16:02:37 daocloud kubelet[22154]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x78 pc=0x350b6ec]
12月 25 16:02:37 daocloud kubelet[22154]: goroutine 419 [running]:
12月 25 16:02:37 daocloud kubelet[22154]: k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x6eb7920?})
12月 25 16:02:37 daocloud kubelet[22154]:         vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:56 +0xcd
12月 25 16:02:37 daocloud kubelet[22154]: panic({0x3c742e0?, 0x6e6ad40?})
12月 25 16:02:37 daocloud kubelet[22154]:         /usr/local/go/src/runtime/panic.go:914 +0x21f
12月 25 16:02:37 daocloud kubelet[22154]: k8s.io/kubernetes/pkg/kubelet/pleg.(*GenericPLEG).updateCache(0xc000a86750, {0x4c40048, 0x6ee85c0}, 0xc0006c6380, {0xc00141e330, 0x24})
12月 25 16:02:37 daocloud kubelet[22154]:         pkg/kubelet/pleg/generic.go:479 +0x7ac
12月 25 16:02:37 daocloud kubelet[22154]: k8s.io/kubernetes/pkg/kubelet/pleg.(*GenericPLEG).Relist(0xc000a86750)
12月 25 16:02:37 daocloud kubelet[22154]:         pkg/kubelet/pleg/generic.go:283 +0x815
12月 25 16:02:37 daocloud kubelet[22154]: k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
12月 25 16:02:37 daocloud kubelet[22154]:         vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33
12月 25 16:02:37 daocloud kubelet[22154]: k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0?, {0x4c0f620, 0xc000fd5b90}, 0x1, 0xc00020cfc0)
12月 25 16:02:37 daocloud kubelet[22154]:         vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf
12月 25 16:02:37 daocloud kubelet[22154]: k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000095fb0?, 0x45d964b800, 0x0, 0x0?, 0x4475bc?)
12月 25 16:02:37 daocloud kubelet[22154]:         vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f
12月 25 16:02:37 daocloud kubelet[22154]: k8s.io/apimachinery/pkg/util/wait.Until(0xa946c5?, 0xa8f505?, 0xc000b11390?)
12月 25 16:02:37 daocloud kubelet[22154]:         vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161 +0x1e
12月 25 16:02:37 daocloud kubelet[22154]: created by k8s.io/kubernetes/pkg/kubelet/pleg.(*GenericPLEG).Start in goroutine 234
12月 25 16:02:37 daocloud systemd[1]: kubelet.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
12月 25 16:02:37 daocloud kubelet[22154]:         pkg/kubelet/pleg/generic.go:146 +0x185
12月 25 16:02:37 daocloud systemd[1]: kubelet.service: Unit entered failed state.
12月 25 16:02:37 daocloud systemd[1]: kubelet.service: Failed with result 'exit-code'.

Which issue(s) this PR fixes:

Fixes None

Special notes for your reviewer:

Does this PR introduce a user-facing change?

fix panic of Evented PLEG during kubelet start-up

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Dec 25, 2023
@k8s-ci-robot k8s-ci-robot added area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Dec 25, 2023
@pacoxu
Copy link
Member Author

pacoxu commented Dec 25, 2023

I may add a e2e test later.

@pacoxu
Copy link
Member Author

pacoxu commented Dec 25, 2023

/cc @harche @swghosh

I find this issue when I run a daemonset on node and restart the kubelet which already turning on EventedPLEG with v1.29.0.

  • I need to confirm the reproduce steps.

@k8s-ci-robot
Copy link
Contributor

@pacoxu: GitHub didn't allow me to request PR reviews from the following users: swghosh.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc harche swghosh

I find this issue when I run a daemonset on node and restart the kubelet which already turning on EventedPLEG with v1.29.0.

  • I need to confirm the reproduce steps.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@bart0sh bart0sh added this to Triage in SIG Node PR Triage Dec 25, 2023
@bart0sh
Copy link
Contributor

bart0sh commented Dec 25, 2023

/triage accepted
/priority important-soon
/lgtm

/assign @mrunalp @derekwaynecarr @dchen1107
for a final approval

@k8s-ci-robot k8s-ci-robot added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Dec 25, 2023
@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Dec 25, 2023
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 25, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 5b8e436f7562bae8f554e7708788bd4821272287

@bart0sh bart0sh moved this from Triage to Needs Approver in SIG Node PR Triage Dec 25, 2023
Copy link
Contributor

@kannon92 kannon92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good. Could you add a unit test for this?

Copy link
Contributor

@harche harche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@dims
Copy link
Member

dims commented Jan 4, 2024

one liner with a defensive nil check before using a pointer

/approve
/lgtm

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dims, pacoxu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 4, 2024
@k8s-ci-robot k8s-ci-robot merged commit 0babde6 into kubernetes:master Jan 4, 2024
14 checks passed
SIG Node PR Triage automation moved this from Needs Approver to Done Jan 4, 2024
@k8s-ci-robot k8s-ci-robot added this to the v1.30 milestone Jan 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

Successfully merging this pull request may close these issues.

None yet

9 participants