New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build error: dial tcp: i/o timeout #5796
Comments
Can you please provide the full build logs? Even better would be verbose build logs: If you docker run an image on your node, does it have network connectivity? Can you also provide the logs from your failed deploy? Is this a multinode or single node deployment? |
this is what I get; I use 1 master and 1 node. The registry and the router was fine for the deployement. |
So. I've tried with a minimal install with Centos 7 and Fedora 21 server images with the "advanced installation" method and both fails. This morning i vie a try to the fedora-cloud-21 images and everything is going fine and it's work. |
Thanks for the update, sounds like an issue (or config/usage issue) in the installer then. |
Also seeing this behaviour when following the Advanced Install method to spin up a dev cluster. |
@bparees Any idea of what the builder is trying to communicate with at the moment it's getting the tcp timeout error? Without knowing that, tracking down the installation/configuration issue may be difficult. |
it's generally trying to do the git clone from the repo, so probably https call to github. |
often indicative of a networking issue with the SDN which is preventing external access. |
Is there any more information about this? I'm seeing it for the first time on a vanilla installaion with RHEL7. |
Piling on here. Receiving this error when following the OpenShift training, specifically on the Sinatra lab: |
Same error using trying to create a phpcake app |
Since we migrated our testcluster (3 Nodes, 2 etcd) from 1.0.7 to 1.1.0.1 we can not build because of this issue: |
@eparis can someone from the networking team help out here? |
I have the same issue often from a fresh install and also on a running plateform hosted on AWS. systemctl stop atomic-openshift-node
rm -rf /run/openshift-sdn
systemctl stop docker
systemctl restart iptables
systemctl restart openvswitch
systemctl start atomic-openshift-node But not working. I was thinking it maybe is related to the MTU, so I tried to decrease the MTU but the issue appear again. The only way is to reboot my instances each time when it is appear. If someone have an idea please share ;) |
@danwinship Can you take a look here? |
can someone seeing this bug try running https://raw.githubusercontent.com/openshift/openshift-sdn/master/hack/debug.sh and then upload the output somewhere? You need to run it from a host that has a valid KUBECONFIG, and that can ssh as root to the master and each node. |
@danwinship seeing the same error with OSE3.1. Made sure pods have external connectivity by pinging github/such. |
@liggitt @danwinship do you think #6418 is related to this problem? |
No, #6418 was shortening an excessively long dial timeout solely for the image import controller (used to import image stream tags from a docker registry). It has no bearing on builds or any other dial timeouts. |
Same problem here, using latest build: origin deployed with ansible on CentOS 7.1. Anyway git clone command ended in a bit strange way: bash-4.2$ git clone https://github.com/openshift/training.git Also noticed that i have no skydns pod installed as default, which i was expecting to be present.. should i? So name resolution takes about 10s. Could that be a problem? Also what is strange, log says it is timeout but there are just few milliseconds from previous logs, where in case of timeout i would expect the log to appear after some pause. |
Ignore the two "removing directory" lines; they presumably happen after the timeout happens but before it gets logged. So:
So the timeout is 16 seconds... if it needed to do two DNS lookups in there, and DNS lookups are taking 10 seconds each, then that might be the problem |
Can master reach the pods by both name and IP? Can pods reach other pods by name and IP? If so, then yeah, definitely seems non-SDN-related. |
I have the same issue here and I noticed that the DNS configuration of container is weird:
If I rapidly jump into the container and try a curl, DNS resolution does not work |
So service IP addresses are failing; this is probably the same bug as openshift/openshift-sdn#231 |
@xelfe can you check your openshift-master logs:
I realised that I was facing the following error:
|
In my setup the issue was related to DNS. The kube's DNS was not starting correctly because of another dnsmasq instance on the master that was run to serve the nodes of the cluster. After moving that dns server to another (non-cluster) node, things started working well. |
Actually it looks like that is something different, and this bug was openshift/openshift-sdn#236. Which is now fixed in origin via #6532, so this can be closed. |
I got this on anybuild/deploy since my fresh install this week-end;
build;
cleanup.go:23] Removing temporary directory /tmp/s2i-build539997888
fs.go:99] Removing directory '/tmp/s2i-build539997888'
builder.go:55] Build error: dial tcp: i/o timeout
event;
Error syncing pod, skipping: failed to delete containers ([exit status 1])
I tryed with Centos 7.1 and Fedora 21 as host and I always get the same result. Ive use the ansible deployment and I follow each step of the "advanced installation" I never had this problem before last week.
The text was updated successfully, but these errors were encountered: