-
Notifications
You must be signed in to change notification settings - Fork 38.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate etcd unavailability leading to 300s cluster up timeout #22819
Comments
@lavalamp could you please take a look at this or delegate? Thanks! |
@dchen1107 something bad happened to the node, maybe to docker. I see lots of
messages on the master node's kubelet.log. Looks like it wasn't able to run much. etcd log is empty. Nothing obvious in docker log. But anyway, the problem is clearly that etcd never started. Sorry, but I have to give this back to you :) |
I looked at logs of master node.
I am wondering if etcd image tar file is copied to the master node properly through pr builder. Tried to reproduce it through kube-up but failed. |
Together with @bprashanth, we double checked our master component tarball, which doesn't include etcd:2.2.1 image tar file. Looks like etcd:2.2.1 is the only image of master component which is pulled from gcr.io. I checked kubelet.log on master node again, there are only image pulling event, no corresponding pulled event for etcd:2.2.1. Based on my understanding, the reason we want to keep image side-loading for master component is that gcr.io repository is not very reliable. We used to include etcd image in master component tarball too. cc/ @roberthbailey and @zmerlynn |
I don't recall a time when we were side-loading the etcd image. We load it from gcr.io in the release-1.1 branch and the release-1.0 branch and we haven't seen it have a significant impact on cluster creation reliability. /cc @jlowdermilk |
Do we have more instances with the same failure here? The problem is for this particular docker pull etcd:2.2.1 operation takes long time. |
Since there is no instance with the same failure. I am closing this one for now. |
#20931 (comment)
Apiserver logs:
Didn't find anything immediately suspicous in master kubelet or docker logs.
The text was updated successfully, but these errors were encountered: