Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flake] unit test TestNewManagerImplStartProbeMode #121451

Closed
aojea opened this issue Oct 23, 2023 · 3 comments · Fixed by #121494
Closed

[Flake] unit test TestNewManagerImplStartProbeMode #121451

aojea opened this issue Oct 23, 2023 · 3 comments · Fixed by #121494
Assignees
Labels
kind/flake Categorizes issue or PR as related to a flaky test. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@aojea
Copy link
Member

aojea commented Oct 23, 2023

Seen in #121445 (comment) and in #121407

No hits on https://storage.googleapis.com/k8s-triage/index.html?pr=1&test=TestNewManagerImplStartProbeMode

stack strace

E1023 13:40:22.589536   43369 goroutinemap.go:150] Operation for "/tmp/device_plugin1067413082/server.sock" failed. No retries permitted until 2023-10-23 13:40:23.0894569 +0000 UTC m=+0.753948700 (durationBeforeRetry 500ms). Error: RegisterPlugin error -- failed to get plugin info using RPC GetInfo at socket /tmp/device_plugin1067413082/server.sock, err: rpc error: code = Unimplemented desc = unknown service pluginregistration.Registration
--- FAIL: TestNewManagerImplStartProbeMode (0.01s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x28fb0ac]

goroutine 238 [running]:
testing.tRunner.func1.2({0x2b14fa0, 0x452d660})
	/usr/local/go/src/testing/testing.go:1545 +0x366
testing.tRunner.func1()
	/usr/local/go/src/testing/testing.go:1548 +0x630
panic({0x2b14fa0?, 0x452d660?})
	/usr/local/go/src/runtime/panic.go:920 +0x270
k8s.io/kubernetes/pkg/kubelet/cm/devicemanager.(*ManagerImpl).PluginDisconnected(0xc00068fd40, {0xc000157470, 0x14})
	/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/kubelet/cm/devicemanager/manager.go:244 +0x46c
k8s.io/kubernetes/pkg/kubelet/cm/devicemanager/plugin/v1beta1.(*client).Disconnect(0xc0000d7ae0)
	/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/kubelet/cm/devicemanager/plugin/v1beta1/client.go:106 +0x2d9
k8s.io/kubernetes/pkg/kubelet/cm/devicemanager/plugin/v1beta1.(*server).disconnectClient(0xc000714270?, {0xc000157470, 0x14}, {0x3232508, 0xc0000d7ae0})
	/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/kubelet/cm/devicemanager/plugin/v1beta1/handler.go:86 +0x55
k8s.io/kubernetes/pkg/kubelet/cm/devicemanager/plugin/v1beta1.(*server).Stop.func1({0xc000157470, 0x14}, {0x3232508, 0xc0000d7ae0})
	/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/kubelet/cm/devicemanager/plugin/v1beta1/server.go:120 +0x7a
k8s.io/kubernetes/pkg/kubelet/cm/devicemanager/plugin/v1beta1.(*server).visitClients(0xc000374d90, 0xc000763d38)
	/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/kubelet/cm/devicemanager/plugin/v1beta1/server.go:186 +0x11f
k8s.io/kubernetes/pkg/kubelet/cm/devicemanager/plugin/v1beta1.(*server).Stop(0xc000374d90)
	/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/kubelet/cm/devicemanager/plugin/v1beta1/server.go:119 +0x73
k8s.io/kubernetes/pkg/kubelet/cm/devicemanager.(*ManagerImpl).Stop(...)
	/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/kubelet/cm/devicemanager/manager.go:315
k8s.io/kubernetes/pkg/kubelet/cm/devicemanager.cleanup(0x322a6f0?, {0x3251ca0, 0xc0006fb480}, 0x0?)
	/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/kubelet/cm/devicemanager/manager_test.go:335 +0x42
k8s.io/kubernetes/pkg/kubelet/cm/devicemanager.TestNewManagerImplStartProbeMode(0x0?)
	/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/kubelet/cm/devicemanager/manager_test.go:119 +0x126
testing.tRunner(0xc0007081a0, 0x2ef5a08)
	/usr/local/go/src/testing/testing.go:1595 +0x239

nothing evident on the git history of the test itself, but it seems @pohly added it to detect races, so most probable something in the device manager has changed?

/sig node
/kind flake

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. kind/flake Categorizes issue or PR as related to a flaky test. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 23, 2023
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@wzshiming
Copy link
Member

/assign

@pacoxu
Copy link
Member

pacoxu commented Oct 24, 2023

/cc @kubernetes/ci-signal
as this flakes a lot in https://testgrid.k8s.io/sig-release-master-blocking#ci-kubernetes-unit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/flake Categorizes issue or PR as related to a flaky test. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
4 participants