New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"kubectl exec" sometimes incorrectly returns empty string causing tests to flake #34256
Comments
I have previously investigated and fixed multiple issues where If anyone can provide steps for how to reproduce this easily, or an actual cluster where this is flaking, I will be extremely happy and willing to debug. |
@ncdc I paste the steps that @jingxu97 told me how reproduce it. It works for me.
|
Can you reproduce without the PD? On Friday, October 7, 2016, Mengqi Yu notifications@github.com wrote:
|
I was finally able to get this to reproduce. It took close to 5000 |
Actually we haven't try that yet. Mengqi, could you also try create a pod Jing On Fri, Oct 7, 2016 at 10:16 AM, Andy Goldstein notifications@github.com
|
This flake (#27023) that checks for the contents of In this case, the test POSTs an |
I am running in an endless loop with spdystream debugging enabled, but so far it hasn't failed. I'm amazed this seems to happen as frequently as it does in Jenkins tests but I'm not able to make it fail quickly. |
Finally got it to flake with spdystream debugging on. Here's normal behavior in the kubelet:
And here's the flake:
Stream 1 is an "error" stream where the kubelet can report internal errors. Stream 3 is stdout from the container. Stream 5 is stderr from the container. In the normal log, we see Now to figure out why stdout from the container is getting lost. Could be a bug in docker. Could be a bug in how we invoke |
hmm, exec has a lot of moving parts making it hard to debug, can you avoid the flake in the test? If you're execing something, do it in a retry loop. If we need better guarantees, we should replace kubectl exec with something like ssh+nsenter (random idea, ssh will also flake but fail cleanly with an error). I don't see how we can provide stability guarantees when it's built on docker exec which the docker people recommend only using for debug. |
@bprashanth while you could do that for the PD tests, we have tests specifically for The big news which I just established is this appears to be a bug in docker in the gci image. I ran the following and it flaked on me:
|
@runcom are you aware of anything in upstream docker 1.11.2 where |
@kubernetes/sig-node for clarity on the suspect docker issue mentioend in #34256 (comment) |
@ncdc I'm not sure, I remember something but can't check right now (afk). I will later today. |
/cc @vishh @mtaufen @dchen1107 |
sweet |
@saad-ali, I commented upstream, is it correct that we'll need a backport to the 1.11.x and 1.12.x series unless we're planning on bumping to docker 1.13 which seems somewhat risky but we should probably do something about all these CI flakes that are possibly related to this issue. |
Good question. @kubernetes/sig-node @vishh @dchen1107 what's the plan for this? Is it a 1.5 blocker? |
Docker 1.13 isn't out yet. Unfortunately the best fix for this particular issue at this time is probably to ensure that all the PD tests that use |
Guess we need to punt this until docker is ready with the fixes we need |
@ncdc It seems that the upstream issue moby/moby#27289 has been fixed. Please confirm. Can we close this issue? |
@ymqytw no, we can't close this issue until everyone moves to docker 1.13+. |
I see. Then we should remove this issue from 1.6 milestone. |
Automatic merge from submit-queue Retry calls to ReadFileViaContainer in PD tests **What this PR does / why we need it**: kubectl exec occasionally fails to return a valid output string. It seems to be an issue with docker #34256. This PR retries the 'kubectl exec' call to workaround the issue. This should fix the flaky PD test issues. **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #28283 **Release note**: NONE
Issues go stale after 90d of inactivity. Prevent issues from auto-closing with an If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@jingxu97 noticed (#28081 (comment)) that a lot of the
Pod disks should...
PD E2E tests are very flaky on GCI becausekubectl exec
is incorrectly returning an empty string.Jing's report:
CC @pwittrock
Step two is a good way to repro this.
The text was updated successfully, but these errors were encountered: