-
Notifications
You must be signed in to change notification settings - Fork 38.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unit flake: k8s.io/kubernetes/pkg/storage (specifically storage.TestList) #19254
Comments
Just ran this 10k times. Could not reproduce. Ran like: > godep go test -c -race
> stress -p 20 ./storage.test -test.run TestList |
It's possible that #19187 fixed this, but sadly running tests locally doesn't seem to be a good way to reproduce failures. Fact that tests were passing 100% when run locally was the sad reality with other tests as well. Sometimes adding |
This looks like golang/go#12262. Fixed in go1.6 |
@mikedanese nice sleuthing! It sounds like we should just add a temporary work-around to the test (eliminate timeouts) until we upgrade to go 1.6 (which might take quite some time). Q |
Or should we downgrade to 1.4 and then upgrade directly from 1.4 to 1.6? |
It's a test only bug. I'd still want 1.5 coverage as it exposed many real Perhaps we build our own toolchain with a patch for our Jenkins instance?
|
@ixdy would it be possible to run our jenkins jobs with a patched go 1.5 toolchain? |
@krousey I don't think that it's a good idea to build and run a custom go platform. Other developers will be using a standard go distro, and we want their tests to pass too :-) |
My apologies, I'm wrong. Go 1.6 is scheduled for release on Feb 1 2016. https://groups.google.com/forum/#!topic/golang-dev/vNboccLL95c But even so, I think that getting things working right on go 1.5 makes sense. The required test workaround seems pretty trivial. |
@quinton-hoole as it would only affect a test package, and therefore only our tests, I don't see the harm in it. We could even provide the patch if they wanted less flaky tests too. But if there's a simple workaround (I didn't see one), that would definitely be preferred. |
The workaround would be to comment out all the calls to ts.Close() and leave a note to uncomment them when we stop supporting 1.5. The patch would not require rebuilding the toolchain, it's would only require applying against $GO_ROOT. It's a decently sized patch though. https://go-review.googlesource.com/#/c/15151/15/src/net/http/httptest/server.go Workaround sounds easier |
I can do the workaround. Here are all the files where we call
|
@davidopp. I'd suggest using go oracle |
golang/go#12262 . See kubernetes#19254 for more details. This change should be reverted when we upgrade to Go 1.6.
Not sure if it's the same cause, but my PR just ran into a unit test failure where
The complete log: https://gist.github.com/yujuhong/28a12bbaf5a1e27c757f |
Most recent |
We're still being hit by this. Most recently, k8s.io/kubernetes/contrib/mesos/pkg/scheduler/integration:
|
@ixdy those don't look like the race in httptest. Those look like other issues. |
@nikhiljindal any more thoughts considering that whatever this was, #19458 didn't fix it? |
No. Maybe @wojtek-t has a guess on any recent change that could have affected this. |
@nikhiljindal it looks like it is stuck here: https://github.com/kubernetes/kubernetes/blob/master/pkg/storage/watch_cache.go#L199 Which doesn't look like the httptest.Close race at all to me. It may not be the case that anything has changed here recently. I'm worried that we might have a rare deadlock, that locking is tricky. Have you tried running with the 'stress' program @krousey told us about? Maybe the entire test suite needs to be run, maybe another test in it messes something up? It would be good to figure this out. |
I ran
and it hasnt failed in the last 1k runs. Will run it a bit longer and try running the whole suite as well (to see if some other test is affecting this). |
Thanks for trying! |
I ran it more over the weekend Just the TestList test:
The whole suite:
So the test seems to have been fixed for sure. |
OK. I think your desktop is not going to be able to reproduce the failures, but I'm not convinced another system won't... ;) Let's close this, and if we see another instance we will just have to put our detective hats on and study the log/stack trace. |
golang/go#12262 . See kubernetes#19254 for more details. This change should be reverted when we upgrade to Go 1.6.
Automatic merge from submit-queue (batch tested with PRs 56308, 54304, 56364, 56388, 55853). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. httptest server should be close since Close issue has been fixed **What this PR does / why we need it**: per #19254, the issue seem to be fix for a long time and `server.Close` is no longer a issue in current related golang version, so it's time to uncomment the server.Close(). **Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*: Fixes # None **Special notes for your reviewer**: **Release note**: ```release-note None ```
Looks like a dupe of #18928
cc @kubernetes/goog-control-plane
The text was updated successfully, but these errors were encountered: