Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic in TestCancelAndReadd #45375

Closed
enj opened this issue May 4, 2017 · 7 comments
Closed

Panic in TestCancelAndReadd #45375

enj opened this issue May 4, 2017 · 7 comments
Assignees

Comments

@enj
Copy link
Member

enj commented May 4, 2017

TestCancelAndReadd has a race condition that causes a panic due to:

  1. A very short wait-to-start-time of 100 * time.Millisecond which causes wg.Done() to be called too many times
  2. Not using a fake clock such as the code in https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/util/queue/work_queue_test.go
=== RUN   TestCancelAndReadd
panic: sync: negative WaitGroup counter

goroutine 320 [running]:
panic(0x23a2620, 0xc420240440)
	/usr/local/go/src/runtime/panic.go:500 +0x1ae
sync.(*WaitGroup).Add(0xc420218230, 0xffffffffffffffff)
	/usr/local/go/src/sync/waitgroup.go:75 +0x27c
sync.(*WaitGroup).Done(0xc420218230)
	/usr/local/go/src/sync/waitgroup.go:100 +0x42
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/node.TestCancelAndReadd.func1(0xc4202bfa40, 0x1, 0xc400000020)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/node/timed_workers_test.go:118 +0x65
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/node.(*TimedWorkerQueue).getWrappedWorkerFunc.func1(0xc4202bfa40, 0x0, 0x0)
	github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/node/_test/_obj_test/timed_workers.go:90 +0xb6
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/node.CreateWorker.func1()
	github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/node/_test/_obj_test/timed_workers.go:49 +0x72
created by time.goFunc
	/usr/local/go/src/time/sleep.go:154 +0x78

cc @gmarek @davidopp @timothysc @liggitt @stevekuznetsov

xref: #41311

@gmarek
Copy link
Contributor

gmarek commented May 4, 2017

cc @smarterclayton

@erictune
Copy link
Member

erictune commented May 4, 2017

@yujuhong you wrote this test. Can you take a look?
@kubernetes/sig-node-bugs

@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label May 4, 2017
@gmarek
Copy link
Contributor

gmarek commented May 4, 2017

No, I think it's mine. I'll take a look tomorrow. The quickest "fix" would be to increase the timeout.

@yujuhong yujuhong removed the sig/node Categorizes an issue or PR as relevant to SIG Node. label May 4, 2017
@grodrigues3
Copy link
Contributor

@yujuhong we're trying to make sure all issues have associated sigs. If you remove a sig label, can you add the correct sig/label.

@gmarek
Copy link
Contributor

gmarek commented May 4, 2017

I'll fix this tomorrow - no need to apply labels here.

@yujuhong
Copy link
Contributor

yujuhong commented May 4, 2017

@yujuhong we're trying to make sure all issues have associated sigs. If you remove a sig label, can you add the correct sig/label.

I honestly don't know what the correct sig is for controllers.

k8s-github-robot pushed a commit that referenced this issue May 5, 2017
Automatic merge from submit-queue (batch tested with PRs 43732, 45413)

Extend timeouts in timed_workers_test

Fix #45375

If it won't be enough I'll rewrite it to allow injectable timers.
@enj
Copy link
Member Author

enj commented Aug 31, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants