Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

infrequent docker exec .... signal: broken pipe during kubeadm init | kubeadm join #949

Closed
BenTheElder opened this issue Oct 14, 2019 · 21 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Milestone

Comments

@BenTheElder
Copy link
Member

BenTheElder commented Oct 14, 2019

so far I've not once seen this locally, but we have this occasionally in CI, maybe a few times a day with many, many runs.

/assign
/lifecycle active

Examples

https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/83914/pull-kubernetes-e2e-kind/1183813161560576002

#928 (comment)

kubernetes/kubernetes#83903

@BenTheElder BenTheElder added the kind/bug Categorizes issue or PR as related to a bug. label Oct 14, 2019
@k8s-ci-robot k8s-ci-robot added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Oct 14, 2019
@BenTheElder
Copy link
Member Author

per the stack trace, this is definitely failing in LocalCmd.Run(), so we're seeing a signal: broken pipe from docker exec ....

currently I've exclusively found these with kubeadm init.

however in this example: https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/ci-kubernetes-kind-conformance-ipv6/1182717569350504448

you can also see ERROR: command "docker exec --privileged kind-control-plane tar --hard-dereference -C /var/log -chf - ." failed with error: write |1: broken pipe from kind export logs

everything else I can find involves kubeadm init

@BenTheElder
Copy link
Member Author

another example kubernetes/kubernetes#83956 (comment)

@BenTheElder
Copy link
Member Author

our docker is fairly old, kubernetes/test-infra#14784

@BenTheElder
Copy link
Member Author

Still trying to get us on newer docker. Couple of pending test-infra PRs.

@BenTheElder
Copy link
Member Author

not much progress today -- infra wg this morning and then mitigating, debugging, and dealing with kubernetes/test-infra#14812

now have kubernetes/test-infra#14820 to see about using newer docker in slightly streamlined kind CI specific image experimentally.

@BenTheElder
Copy link
Member Author

https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/ci-kubernetes-kind-conformance-parallel/1184482589986000898

here's an example with kubeadm join

W1016 15:02:01.370] ERROR: failed to create cluster: failed to join node with kubeadm: command "docker exec --privileged kind-worker kubeadm join --config /kind/kubeadm.conf --ignore-preflight-errors=all --v=6" failed with error: signal: broken pipe

@BenTheElder
Copy link
Member Author

https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/46662/pull-kubernetes-e2e-kind-canary/1184732113933438982

have kind experimentally on docker 19.03.X on debian buster. will follow up in the morning.
we can see if we continue seeing these flakes under more recent docker...

@BenTheElder BenTheElder changed the title infrequent docker exec .... signal: broken pipe during kubeadm init infrequent docker exec .... signal: broken pipe during kubeadm init | kubeadm join Oct 17, 2019
@BenTheElder
Copy link
Member Author

~everything in kubernetes CI should be on 19.03.X now, we'll have to wait to see if we continue to get these

@BenTheElder
Copy link
Member Author

@neolit123
Copy link
Member

~everything in kubernetes CI should be on 19.03.X now

did test-infra move to 19.03 already?

@BenTheElder
Copy link
Member Author

I moved the test-infra images, yes.

@neolit123
Copy link
Member

ok, thanks for letting me know.
we held on moving the kubeadm validators and k/k/build/dependencies.yaml because according to TimSC not all the popular distros have 19.x in their package mangers yet.

but i guess we should add it as "verified" soon.

@BenTheElder
Copy link
Member Author

@neolit123 we're not running kubeadm / docker against that. the dind image is on 19.03 (kubekins-e2e, and kind's KRTE), but the hosts are on whatever the hosts are on, and the kind nodes are on whatever the kind nodes run

@BenTheElder
Copy link
Member Author

too many layers :-)

@neolit123
Copy link
Member

too many layers :-)

indeed. :)

just wondering when...

kinder doesn't have kind node images with docker 19.03 yet, maybe that's the switching point for k/k/build/dependencies.yaml and updating the kubeadm validators.

@BenTheElder
Copy link
Member Author

@neolit123 I wouldn't read much into us using 19.03 to host the nodes, someday it might be podman or ignite, it won't reflect much on qualifying with kubeadm. Mostly a shot in the dark regarding the stability issues.

I think there's a small chance the root cause of #971 is related here, the go program would get a broken pipe signal if the internal pipe is closed after the internal io.Copy hit an error. So far I've not identified a path where we'd be triggering this though (versus the panic)..

@BenTheElder
Copy link
Member Author

We haven't had one since the patch for #971 went in. However I want to wait a bit longer before calling this fixed.

@BenTheElder BenTheElder added this to the v0.6.0 milestone Oct 21, 2019
@BenTheElder BenTheElder added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Oct 22, 2019
@BenTheElder
Copy link
Member Author

BenTheElder commented Oct 22, 2019

we are however seeing this with kind export logs:

W1021 15:48:19.198] ERROR: command "docker exec --privileged kind-worker sh -c 'tar --hard-dereference -C /var/log -chf - . || (r=$?; [ $r -eq 1 ] || exit $r)'" failed with error: write |1: broken pipe

https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/ci-kubernetes-kind-conformance-latest-1-12/1186286022304993281

hopefully unrelated, need to investigate.

EDIT: traced the code, we prefer returning the error from the process, versus from the reader, so it's likely we're seeing this because the reader errored, which would not surprise me for the current untar routine... filed #992 to debug

@BenTheElder
Copy link
Member Author

we've not had any of these creation failures since #971, granted it has not been an extremely large amount of time.

tentatively closing, but still monitoring.

will file a new issue for untar issues they don't appear to be related.

@BenTheElder
Copy link
Member Author

still haven't identified another one since that fix.

@BenTheElder
Copy link
Member Author

still no signs of this, I think we're in the clear on this one.
#999 with the log export is hopefully fixed now, but still tracking to ensure the fix indeed works as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests

3 participants