Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check kubeClient against nil for reconciler #86795

Closed
wants to merge 1 commit into from

Conversation

tedyu
Copy link
Contributor

@tedyu tedyu commented Jan 3, 2020

What type of PR is this?
/kind bug

What this PR does / why we need it:
See discussion over #86722
When Kubelet is constructed, kubeDeps.KubeClient may be nil.
We should check kubeClient against nil in reconciler#updateDevicePath.

Which issue(s) this PR fixes:
Fixes #86722

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jan 3, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: tedyu
To complete the pull request process, please assign jsafrane
You can assign the PR to them by writing /assign @jsafrane in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 3, 2020
Copy link
Contributor

@mattjmcnaughton mattjmcnaughton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

This change sounds wise to me - returning an informative error message is definitely better than panicking :)

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 3, 2020
@mattjmcnaughton
Copy link
Contributor

/retest

@mattjmcnaughton
Copy link
Contributor

/assign @jsafrane

@tedyu
Copy link
Contributor Author

tedyu commented Jan 3, 2020

/test pull-kubernetes-e2e-gce-csi-serial

@jingxu97
Copy link
Contributor

jingxu97 commented Jan 3, 2020

In kubelet, the following controllers uses KubeClient

  1. desired_state_of_world_populator
  2. reconciler
  3. volumePluginMgr

I think all of three could not work correctly if KubeClient is nil. Seems like only volumePluginMgr checks whether client is nil or not https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/volumemanager/volume_manager.go#L270

Maybe we should double check the logic around it.

@tedyu
Copy link
Contributor Author

tedyu commented Jan 4, 2020

@jingxu97
Are you suggesting that reconciler#Run() check rc.kubeClient != nil and only run if the kubeClient is not nil ?

@msau42
Copy link
Member

msau42 commented Jan 6, 2020

@kubernetes/sig-storage-pr-reviews

@k8s-ci-robot k8s-ci-robot added the sig/storage Categorizes an issue or PR as relevant to SIG Storage. label Jan 6, 2020
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jan 6, 2020
@tedyu tedyu changed the title Check kubeClient against nil in updateDevicePath Do not run reconciler loop if kubeClient is nil Jan 6, 2020
@jingxu97
Copy link
Contributor

jingxu97 commented Jan 6, 2020

@tedyu, one concern about kubelet standalone mode. I am not sure how it works in standalone mode? Desired_state_pupulator could not fetch any PVC/PV through kubeClient since it is nil?

@jingxu97
Copy link
Contributor

i am ok with current pr. maybe create an issue mentioning the kubeclient nil issue with those managers so we could at least document that things that are not working under standalone mode
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 13, 2020
@tedyu
Copy link
Contributor Author

tedyu commented Jan 14, 2020

/assign @msau42

Copy link
Member

@msau42 msau42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add unit tests that test the kubeclient nil case, so we can make sure that we don't regress in this area in the future?

@tedyu
Copy link
Contributor Author

tedyu commented Jan 14, 2020

@msau42
I have added a test which would cause panic without the fix:

	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0xb0 pc=0x2160510]

goroutine 68 [running]:
testing.tRunner.func1(0xc0004ac300)
	/Users/yute/homebrew/Cellar/go/1.13.4/libexec/src/testing/testing.go:874 +0x3a3
panic(0x22fc620, 0x342cc40)
	/Users/yute/homebrew/Cellar/go/1.13.4/libexec/src/runtime/panic.go:679 +0x1b2
k8s.io/kubernetes/pkg/kubelet/volumemanager/populator.(*desiredStateOfWorldPopulator).getPVCExtractPV(0xc0002700f0, 0x251c38e, 0x9, 0x251d4b9, 0xa, 0x0, 0xc000339498, 0xc000339548)
	/Users/yute/go-workspace/src/k8s.io/kubernetes/pkg/kubelet/volumemanager/populator/desired_state_of_world_populator.go:579 +0x30
k8s.io/kubernetes/pkg/kubelet/volumemanager/populator.(*desiredStateOfWorldPopulator).createVolumeSpec(0xc0002700f0, 0x252b39a, 0x15, 0x0, 0x0, 0xc0005426f0, 0x0, 0x0, 0x0, 0x0, ...)
	/Users/yute/go-workspace/src/k8s.io/kubernetes/pkg/kubelet/volumemanager/populator/desired_state_of_world_populator.go:478 +0x142
k8s.io/kubernetes/pkg/kubelet/volumemanager/populator.(*desiredStateOfWorldPopulator).processPodVolumes(0xc0002700f0, 0xc000468c00, 0xc000339bc8, 0xc000339b98)
	/Users/yute/go-workspace/src/k8s.io/kubernetes/pkg/kubelet/volumemanager/populator/desired_state_of_world_populator.go:318 +0x5ab
k8s.io/kubernetes/pkg/kubelet/volumemanager/populator.(*desiredStateOfWorldPopulator).findAndAddNewPods(0xc0002700f0)
	/Users/yute/go-workspace/src/k8s.io/kubernetes/pkg/kubelet/volumemanager/populator/desired_state_of_world_populator.go:211 +0x1d7

@k8s-ci-robot k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jan 14, 2020
@k8s-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jan 14, 2020
@tedyu
Copy link
Contributor Author

tedyu commented Jan 14, 2020

/test pull-kubernetes-e2e-gce

@tedyu
Copy link
Contributor Author

tedyu commented Jan 15, 2020

@jingxu97 @msau42
I have updated the PR.

Please take another look.

@tedyu
Copy link
Contributor Author

tedyu commented Jan 15, 2020

/test pull-kubernetes-e2e-gce

@tedyu
Copy link
Contributor Author

tedyu commented Jan 16, 2020

@msau42
Do you have more comment ?

thanks

@tedyu
Copy link
Contributor Author

tedyu commented Jan 17, 2020

/test pull-kubernetes-kubemark-e2e-gce-big

@tedyu
Copy link
Contributor Author

tedyu commented Jan 17, 2020

@msau42
All tests passed.

@tedyu
Copy link
Contributor Author

tedyu commented Jan 21, 2020

@msau42
Please take another look.

@@ -599,6 +599,10 @@ func (rc *reconciler) reconstructVolume(volume podVolume) (*reconstructedVolume,

// updateDevicePath gets the node status to retrieve volume device path information.
func (rc *reconciler) updateDevicePath(volumesNeedUpdate map[v1.UniqueVolumeName]*reconstructedVolume) {
if rc.kubeClient == nil {
klog.Errorf("kubeClient is nil")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to add more context to this error message so we can search for the source more easily.

How common is this error in normal operation? Are we going to start flooding errors? cc @jingxu97 is this only called during reconstruction that happens once when kubelet starts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updateStates() is called by syncStates() which is called from reconciler#reconciliationLoopFunc() which is called from reconciler#Run()

This error shouldn't be common in normal operation.

kubeletPodsDir)
rec := recon.(*reconciler)
rec.kubeClient = nil
rec.updateDevicePath(nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't really testing the reconstruction path, but I guess we currently don't have any unit tests for reconstruction? @jingxu97 do you have any ideas?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - trying to find good way to test it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msau42
Can the test be added in a follow-on PR ?

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 13, 2020
@k8s-ci-robot
Copy link
Contributor

@tedyu: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tedyu
Copy link
Contributor Author

tedyu commented Feb 13, 2020

In favor of #88136

@tedyu tedyu closed this Feb 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. priority/backlog Higher priority than priority/awaiting-more-evidence. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

reconciler: updateDevicePath() panic:invalid memory address or nil pointer dereference
7 participants