New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dashboard "dial tcp i/o timeout" error _possibly_ due to Weave networking #1246

Closed
itskingori opened this Issue Dec 23, 2016 · 7 comments

Comments

Projects
None yet
2 participants
@itskingori
Copy link
Member

itskingori commented Dec 23, 2016

Versions of kops:

Installed from HEAD because I want the private networking changes.

$ kops version
Version git-c15466f

How I set my cluster up:

I ran the below command and waited a while for all instances being the ELBs to be InService.

$ kops create cluster \
  --name=kubernetes.example.com
  --state=s3://example.production.kubernetes \
  --bastion=true \
  --cloud=aws \
  --dns-zone=kubernetes.example.com \
  --image=293135079892/k8s-1.4-debian-jessie-amd64-hvm-ebs-2016-11-16 \
  --kubernetes-version=v1.4.6 \
  --master-zones=us-east-1c,us-east-1d,us-east-1e \
  --master-size=m4.large \
  --networking=weave \
  --node-size=m4.large \
  --node-count=3 \
  --network-cidr=192.168.0.0/16 \
  --out=. \
  --ssh-public-key=~/.ssh/kubernetes_rsa.pub \
  --target=terraform \
  --topology=private \
  --zones=us-east-1c,us-east-1d,us-east-1e
$ terraform plan
$ terraform apply

How I set up the dashboard and the version:

Obviously I did the kubectl proxy --port=8080 thing and ran the below command (which installs v1.4.2 of the Kubernetes dashboard).

$ kubectl create -f https://raw.githubusercontent.com/kubernetes/dashboard/v1.4.2/src/deploy/kubernetes-dashboard.yaml

Problem:

Dashboard works but I intermittently get the below error:

Error: 'dial tcp 10.43.128.0:9090: i/o timeout'
Trying to reach: 'http://10.43.128.0:9090/'

Ps: kubectl never times out. Ever. Just the dashboard.

Ps 2: I intend to use this for stuff that's semi-serious, haven't gotten round to putting other services on there. Yet. I want to solve this first.

Ps 3: I didn't have this issue when using the old networking (kubenet with public subnets) on the older version of kops

What do other people say about this issue?

Nothing stands out to me except this comment and this one. The guy talks about how "Source/Dest check was enabled on my minions, even though it shouldn't have been" (which we do by the way). Some of the other results in my search don't seem worth looking at (IMHO, of course).

What do I think?

The networking that kops generates looks 👌 (because I thought it has something to do with networking and security groups) ... but what I'm not sure about is the weave part.

To add to the weave theory, the timeout is on 10.43.128.0 which I presume is an IP implemented by weave (conjecture)? Admittedly, I don't know how to debug weave. 😢

Looking forward to your opinions. Feel free to close issue if it's not kops fault.

What do the logs say?:

Every time I hit the error I get this (see below) on one of the master's /var/log/kube-apiserver.log (I tailed the sucker).

I1223 06:01:46.565035       6 handlers.go:162] GET /api/v1/proxy/namespaces/kube-system/services/kubernetes-dashboard/: (30.00136179s) 503
goroutine 408448 [running]:
k8s.io/kubernetes/pkg/httplog.(*respLogger).recordStatus(0xc82200ae00, 0x1f7)
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/httplog/log.go:219 +0xa3
k8s.io/kubernetes/pkg/httplog.(*respLogger).WriteHeader(0xc82200ae00, 0x1f7)
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/httplog/log.go:198 +0x2b
k8s.io/kubernetes/pkg/apiserver/metrics.(*responseWriterDelegator).WriteHeader(0xc823a5a8a0, 0x1f7)
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/apiserver/metrics/metrics.go:117 +0x4c
net/http/httputil.(*ReverseProxy).ServeHTTP(0xc8239db340, 0x7f3f7ce25400, 0xc821119e50, 0xc822c77b20)
	/usr/local/go/src/net/http/httputil/reverseproxy.go:233 +0xcd1
k8s.io/kubernetes/pkg/apiserver.(*ProxyHandler).ServeHTTP(0xc8204c80c0, 0x7f3f7ce25400, 0xc821119e50, 0xc822c77a40)
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/apiserver/proxy.go:205 +0x2984
k8s.io/kubernetes/pkg/apiserver.routeFunction.func1(0xc823a5a7b0, 0xc821da82a0)
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/apiserver/api_installer.go:886 +0x57
k8s.io/kubernetes/pkg/apiserver/metrics.InstrumentRouteFunc.func1(0xc823a5a7b0, 0xc821da82a0)
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/apiserver/metrics/metrics.go:101 +0x25f
k8s.io/kubernetes/vendor/github.com/emicklei/go-restful.(*Container).dispatch(0xc820269710, 0x7f3f7ce25318, 0xc82200ae00, 0xc822c77a40)
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/emicklei/go-restful/container.go:272 +0xf30
k8s.io/kubernetes/vendor/github.com/emicklei/go-restful.(*Container).(k8s.io/kubernetes/vendor/github.com/emicklei/go-restful.dispatch)-fm(0x7f3f7ce25318, 0xc82200ae00, 0xc822c77a40)
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/emicklei/go-restful/container.go:120 +0x3e
net/http.HandlerFunc.ServeHTTP(0xc820fea610, 0x7f3f7ce25318, 0xc82200ae00, 0xc822c77a40)
	/usr/local/go/src/net/http/server.go:1618 +0x3a
net/http.(*ServeMux).ServeHTTP(0xc82025b920, 0x7f3f7ce25318, 0xc82200ae00, 0xc822c77a40)
	/usr/local/go/src/net/http/server.go:1910 +0x17d
k8s.io/kubernetes/pkg/apiserver.WithAuthorizationCheck.func1(0x7f3f7ce25318, 0xc82200ae00, 0xc822c77a40)
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/apiserver/handlers.go:476 +0x36c
net/http.HandlerFunc.ServeHTTP(0xc8200c58c0, 0x7f3f7ce25318, 0xc82200ae00, 0xc822c77a40)
	/usr/local/go/src/net/http/server.go:1618 +0x3a
k8s.io/kubernetes/pkg/apiserver.WithImpersonation.func1(0x7f3f7ce25318, 0xc82200ae00, 0xc822c77a40)
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/apiserver/handler_impersonation.go:44 +0x2b4
net/http.HandlerFunc.ServeHTTP(0xc8200c5900, 0x7f3f7ce25318, 0xc82200ae00, 0xc822c77a40)
	/usr/local/go/src/net/http/server.go:1618 +0x3a
k8s.io/kubernetes/pkg/auth/handlers.NewRequestAuthenticator.func1(0x7f3f7ce25318, 0xc82200ae00, 0xc822c77a40)
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/auth/handlers/handlers.go:70 +0x2f3
net/http.HandlerFunc.ServeHTTP(0xc82016cc30, 0x7f3f7ce25318, 0xc82200ae00, 0xc822c77a40)
	/usr/local/go/src/net/http/server.go:1618 +0x3a
k8s.io/kubernetes/pkg/api.NewRequestContextFilter.func1(0x7f3f7ce25318, 0xc82200ae00, 0xc822c77a40)
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/api/requestcontext.go:101 +0x157
net/http.HandlerFunc.ServeHTTP(0xc820238fc0, 0x7f3f7ce25318, 0xc82200ae00, 0xc822c77a40)
	/usr/local/go/src/net/http/server.go:1618 +0x3a
k8s.io/kubernetes/pkg/api.NewRequestContextFilter.func1(0x7f3f7ce25318, 0xc82200ae00, 0xc822c77a40)
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/api/requestcontext.go:101 +0x157
net/http.HandlerFunc.ServeHTTP(0xc820238fe0, 0x7f3f7ce25318, 0xc82200ae00, 0xc822c77a40)
	/usr/local/go/src/net/http/server.go:1618 +0x3a
k8s.io/kubernetes/pkg/apiserver.RecoverPanics.func1(0x7f3f7ce25318, 0xc82200ae00, 0xc822c77a40)
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/apiserver/handlers.go:161 +0x1f3
net/http.HandlerFunc.ServeHTTP(0xc821fb7f20, 0x7f3f7ce1bac8, 0xc82206cf68, 0xc822c77a40)
	/usr/local/go/src/net/http/server.go:1618 +0x3a
k8s.io/kubernetes/pkg/apiserver.(*timeoutHandler).ServeHTTP(0xc821fb7f40, 0x7f3f7ce1bac8, 0xc82206cf68, 0xc822c77a40)
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/apiserver/handlers.go:187 +0xb4
k8s.io/kubernetes/pkg/apiserver.MaxInFlightLimit.func1(0x7f3f7ce1bac8, 0xc82206cf68, 0xc822c77a40)
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/apiserver/handlers.go:113 +0x86
net/http.HandlerFunc.ServeHTTP(0xc82201a2d0, 0x7f3f7ce1bac8, 0xc82206cf68, 0xc822c77a40)
	/usr/local/go/src/net/http/server.go:1618 +0x3a
net/http.serverHandler.ServeHTTP(0xc821feef80, 0x7f3f7ce1bac8, 0xc82206cf68, 0xc822c77a40)
	/usr/local/go/src/net/http/server.go:2081 +0x19e
net/http.initNPNRequest.ServeHTTP(0xc821bdd200, 0xc821feef80, 0x7f3f7ce1bac8, 0xc82206cf68, 0xc822c77a40)
	/usr/local/go/src/net/http/server.go:2489 +0x221
net/http.(*initNPNRequest).ServeHTTP(0xc822d2b160, 0x7f3f7ce1bac8, 0xc82206cf68, 0xc822c77a40)
	<autogenerated>:253 +0xb6
net/http.(Handler).ServeHTTP-fm(0x7f3f7ce1bac8, 0xc82206cf68, 0xc822c77a40)
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/master/master.go:223 +0x50
net/http.(*http2serverConn).runHandler(0xc822873900, 0xc82206cf68, 0xc822c77a40, 0xc822d1dea0)
	/usr/local/go/src/net/http/h2_bundle.go:4060 +0x9f
created by net/http.(*http2serverConn).processHeaderBlockFragment
	/usr/local/go/src/net/http/h2_bundle.go:3853 +0x55e

logging error output: "Error: 'dial tcp 10.43.128.0:9090: i/o timeout'\nTrying to reach: 'http://10.43.128.0:9090/'"
 [[Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/602.2.14 (KHTML, like Gecko) Version/10.0.1 Safari/602.2.14] 192.168.3.133:36730]
@itskingori

This comment has been minimized.

Copy link
Member

itskingori commented Dec 23, 2016

I thought these would be helpful to know the state of the cluster 👇 ... nothing looks off.

NAMESPACE     NAME                                                         READY     STATUS    RESTARTS   AGE
kube-system   po/dns-controller-687312726-673em                            1/1       Running   0          12h
kube-system   po/etcd-server-events-ip-192-168-114-157.ec2.internal        1/1       Running   0          12h
kube-system   po/etcd-server-events-ip-192-168-51-111.ec2.internal         1/1       Running   0          12h
kube-system   po/etcd-server-events-ip-192-168-87-157.ec2.internal         1/1       Running   0          12h
kube-system   po/etcd-server-ip-192-168-114-157.ec2.internal               1/1       Running   0          12h
kube-system   po/etcd-server-ip-192-168-51-111.ec2.internal                1/1       Running   0          12h
kube-system   po/etcd-server-ip-192-168-87-157.ec2.internal                1/1       Running   0          12h
kube-system   po/kube-apiserver-ip-192-168-114-157.ec2.internal            1/1       Running   4          12h
kube-system   po/kube-apiserver-ip-192-168-51-111.ec2.internal             1/1       Running   4          12h
kube-system   po/kube-apiserver-ip-192-168-87-157.ec2.internal             1/1       Running   4          12h
kube-system   po/kube-controller-manager-ip-192-168-114-157.ec2.internal   1/1       Running   0          12h
kube-system   po/kube-controller-manager-ip-192-168-51-111.ec2.internal    1/1       Running   1          12h
kube-system   po/kube-controller-manager-ip-192-168-87-157.ec2.internal    1/1       Running   0          12h
kube-system   po/kube-dns-v20-3531996453-evrld                             3/3       Running   0          12h
kube-system   po/kube-dns-v20-3531996453-xi8ic                             3/3       Running   0          12h
kube-system   po/kube-proxy-ip-192-168-111-56.ec2.internal                 1/1       Running   0          12h
kube-system   po/kube-proxy-ip-192-168-114-157.ec2.internal                1/1       Running   0          12h
kube-system   po/kube-proxy-ip-192-168-51-111.ec2.internal                 1/1       Running   0          12h
kube-system   po/kube-proxy-ip-192-168-52-212.ec2.internal                 1/1       Running   0          12h
kube-system   po/kube-proxy-ip-192-168-87-157.ec2.internal                 1/1       Running   0          12h
kube-system   po/kube-proxy-ip-192-168-90-251.ec2.internal                 1/1       Running   0          12h
kube-system   po/kube-scheduler-ip-192-168-114-157.ec2.internal            1/1       Running   1          12h
kube-system   po/kube-scheduler-ip-192-168-51-111.ec2.internal             1/1       Running   1          12h
kube-system   po/kube-scheduler-ip-192-168-87-157.ec2.internal             1/1       Running   0          12h
kube-system   po/kubernetes-dashboard-1872324879-rhnvv                     1/1       Running   0          12h
kube-system   po/weave-net-99hw4                                           2/2       Running   0          12h
kube-system   po/weave-net-9ssqy                                           2/2       Running   0          12h
kube-system   po/weave-net-ll754                                           2/2       Running   0          12h
kube-system   po/weave-net-nl20v                                           2/2       Running   0          12h
kube-system   po/weave-net-phihh                                           2/2       Running   0          12h
kube-system   po/weave-net-z7z3p                                           2/2       Running   0          12h

NAMESPACE     NAME                          DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
kube-system   deploy/dns-controller         1         1         1            1           12h
kube-system   deploy/kube-dns-v20           2         2         2            2           12h
kube-system   deploy/kubernetes-dashboard   1         1         1            1           12h

NAMESPACE     NAME                       CLUSTER-IP      EXTERNAL-IP   PORT(S)         AGE
default       svc/kubernetes             100.64.0.1      <none>        443/TCP         12h
kube-system   svc/kube-dns               100.64.0.10     <none>        53/UDP,53/TCP   12h
kube-system   svc/kubernetes-dashboard   100.65.83.208   <nodes>       80:31587/TCP    12h
@chrislovecnm

This comment has been minimized.

Copy link
Member

chrislovecnm commented Dec 23, 2016

Dude you file the BEST issues. I will ping weave folks. We are having some challenges in house as well. Can you recreate this?

@chrislovecnm

This comment has been minimized.

Copy link
Member

chrislovecnm commented Dec 23, 2016

@smwebber we needed a template on how to file an issue. This is how everyone should file issues.

@itskingori

This comment has been minimized.

Copy link
Member

itskingori commented Dec 23, 2016

@chrislovecnm Thanks! 😊

I will ping weave folks.

Awesome. The only thing I could get close to weave logs is getting onto the master nodes and using docker logs on any containers based off weaveworks/weave-npc:1.8.1 and weaveworks/weave-kube:1.8.1 when I was poking around (dunno how to use kubectl logs on pods not on the default namespace). The only output I got was ...

# For the weaveworks/weave-npc:1.8.1 based container
root@ip-192-168-114-157:/home/admin# docker logs --tail=all c899dfe5fdbd

# For weaveworks/weave-kube:1.8.1 based container
root@ip-192-168-114-157:/home/admin# docker logs --tail=all 546f9a65d9f9
INFO: 2016/12/23 02:09:01.030002 Weave version 1.8.2 is available; please update at https://github.com/weaveworks/weave/releases/download/v1.8.2/weave
INFO: 2016/12/23 06:46:44.867987 Weave version 1.8.2 is available; please update at https://github.com/weaveworks/weave/releases/download/v1.8.2/weave
INFO: 2016/12/23 12:36:56.095176 Weave version 1.8.2 is available; please update at https://github.com/weaveworks/weave/releases/download/v1.8.2/weave
INFO: 2016/12/23 17:08:51.197403 Weave version 1.8.2 is available; please update at https://github.com/weaveworks/weave/releases/download/v1.8.2/weave

Looks like we are using 1.8.1 ... I can see 1.8.2 has a bunch of bug fixes and minor improvements. It's possible that an upgrade would solve the problem we are having here ... in fact!, in the change log I just spotted this ... "Fixed a bug where Kubernetes master could not contact pods weaveworks/weave#2673, weaveworks/weave#2683" 😍

Can you recreate this?

Yes. I still have this cluster up and can consistently replicate the error by refreshing the dashboard a couple of times ... after a few turns it will eventually take a while to load, then result in the aforementioned error (with an exception in the logs of the api server on one of the masters).

I've tried tearing down the cluster and bringing it back up, in the off-chance that maybe something wasn't configured correctly the first time round ... issue still persisted.

@itskingori

This comment has been minimized.

Copy link
Member

itskingori commented Dec 24, 2016

[...] when I was poking around (dunno how to use kubectl logs on pods not on the default namespace) [...]

Figured it out. Easier way to check all 6 weave-net pods (3 masters, 3 workers). I didn't have to use the very manual docker logs ... approach 👇

Check 'weave-net' pod 'weave-npc' container logs:

$ for i in $(kubectl get pods --all-namespaces | grep weave-net | awk '{print $2}'); do \
echo "=> '$i' pod, 'weave-npc' container"; \
kubectl logs --namespace="kube-system" --container="weave-npc" $i; \
done

=> 'weave-net-99hw4' pod, 'weave-npc' container
=> 'weave-net-9ssqy' pod, 'weave-npc' container
=> 'weave-net-ll754' pod, 'weave-npc' container
=> 'weave-net-nl20v' pod, 'weave-npc' container
=> 'weave-net-phihh' pod, 'weave-npc' container
=> 'weave-net-z7z3p' pod, 'weave-npc' container

Check 'weave-net' pod 'weave' container logs:

$ for i in $(kubectl get pods --all-namespaces | grep weave-net | awk '{print $2}'); do \
echo "=> '$i' pod, 'weave' container"; \
kubectl logs --namespace="kube-system" --container="weave" $i; \
done

=> 'weave-net-99hw4' pod, 'weave' container
INFO: 2016/12/24 03:40:02.759749 Weave version 1.8.2 is available; please update at https://github.com/weaveworks/weave/releases/download/v1.8.2/weave
INFO: 2016/12/24 08:59:58.693383 Weave version 1.8.2 is available; please update at https://github.com/weaveworks/weave/releases/download/v1.8.2/weave
=> 'weave-net-9ssqy' pod, 'weave' container
INFO: 2016/12/24 06:29:46.973421 Weave version 1.8.2 is available; please update at https://github.com/weaveworks/weave/releases/download/v1.8.2/weave
=> 'weave-net-ll754' pod, 'weave' container
INFO: 2016/12/24 00:32:58.513736 Weave version 1.8.2 is available; please update at https://github.com/weaveworks/weave/releases/download/v1.8.2/weave
INFO: 2016/12/24 06:31:11.942044 Weave version 1.8.2 is available; please update at https://github.com/weaveworks/weave/releases/download/v1.8.2/weave
=> 'weave-net-nl20v' pod, 'weave' container
INFO: 2016/12/24 02:55:19.414400 Weave version 1.8.2 is available; please update at https://github.com/weaveworks/weave/releases/download/v1.8.2/weave
INFO: 2016/12/24 08:24:21.073552 Weave version 1.8.2 is available; please update at https://github.com/weaveworks/weave/releases/download/v1.8.2/weave
=> 'weave-net-phihh' pod, 'weave' container
INFO: 2016/12/24 02:31:12.315479 Weave version 1.8.2 is available; please update at https://github.com/weaveworks/weave/releases/download/v1.8.2/weave
INFO: 2016/12/24 07:07:01.693316 Weave version 1.8.2 is available; please update at https://github.com/weaveworks/weave/releases/download/v1.8.2/weave
=> 'weave-net-z7z3p' pod, 'weave' container
INFO: 2016/12/24 00:43:11.160956 Weave version 1.8.2 is available; please update at https://github.com/weaveworks/weave/releases/download/v1.8.2/weave
INFO: 2016/12/24 05:16:41.830963 Weave version 1.8.2 is available; please update at https://github.com/weaveworks/weave/releases/download/v1.8.2/weave
INFO: 2016/12/24 10:22:24.808919 Weave version 1.8.2 is available; please update at https://github.com/weaveworks/weave/releases/download/v1.8.2/weave

Above is a demonstration of their recommended way of troubleshooting connections but I'm afraid it doesn't tell us much. Upgrading to weave 1.8.2 as the logs suggest seems to be the best option at this point because of the bug fixes and minor improvements in that release.

@chrislovecnm

This comment has been minimized.

Copy link
Member

chrislovecnm commented Dec 24, 2016

1.8.2 is merged

@itskingori

This comment has been minimized.

Copy link
Member

itskingori commented Dec 24, 2016

@chrislovecnm I can confirm this issue is solved by the changes in PR #1250. My basic test is that I've refreshed the dashboard over 50 times now without a single timeout. Previously, 1 out of 5 attempts would have resulted in a timeout.

Thanks for your time! 🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment