New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Marking the master by adding the taints 'error marking master: timed out waiting for the condition' #1227
Comments
/assign @timothysc |
i have fixed this problem by disable etcd tls.
docker 18.06.1-ce |
@joshuacox a long shot, but can you try explicitly adding the port to the end point in the ClusterConfig?
Or alternatively if you're using 1.12 try InitConfig+ClusterConfig? For example:
|
@rdodev which port? 6443? And on the InitConfig+ClusterConfig is that all in one file e.g. |
Hey @joshuacox Yes, first to triage with minimal changes, just add the 6643 to that config param and run If that still doesn't work, then yes, take that snippet, and replace with pertinent variables all in one kubeadm-config.yaml then |
while I haven't duplicated the entire run from baremetal yet, I can quickly provision a new cluster on KVM hosts.
I'll try with the init_cluster stuff next. |
same results with the init_cluster config
|
@joshuacox however the warning do reveal
Which would explain inability to bootstrap the master. |
@rdodev that is new with the init cluster config, but might explain some of the successful clusters in the past. i.e. my google fiber router learns the names of the VMs eventually and will return DNS if the VMs have been around long enough for whatever event happens that triggers the router to learn the name of that particular MAC address. Spawning a fresh cluster exposes that problem. I was under the impression that kubernetes had it's own internal DNS? |
@joshuacox to clarify: so is the master in your home network and the etcd servers elsewhere? Perhaps I misunderstood the scenario. |
@rdodev they are all VMs on my home network, and can communicate with each other just fine, still waiting on the google router to learn the hostnames. I guess I need to setup an internal DNS server, or assign them publicly available hostnames that resolve to internal addresses. But that seems excessive for just a test cluster. |
@joshuacox instead of dns for the etcd cluster, can you just use ips? |
@rdodev I'm not really certain where that is set? is it the name: line in the init cluster stuff? |
@rdodev that was a mistake, that indeed was not
and the corrected config:
|
@joshuacox isn't clear to me from your original post, did you set up the external cluster using these instructions? https://kubernetes.io/docs/setup/independent/setup-ha-etcd-with-kubeadm/ |
yes I converted them into a single script: https://gist.github.com/joshuacox/4505fbeceb2e394900a24c3cae14131c in addition to that I am integrating them into kubash of which I have a branch here Both of which allow me to repeat the entire procedure pretty quickly with something like:
or instead of the last step, using the smaller bash script:
or for even less typing:
will tear down the |
@joshuacox Thanks a bunch for all setup the info. Let me look into this/repro and will get back to you. |
@joshuacox are you in K8s Slack? Might be easier for quick comm. |
Not sure if this is helpful or related. But I ran into this same issue when using kubernetes on Clearlinux when I was using VM's created using virt-manager. The issue was that the hostname was not resolving.
Adding the hosts to /etc/hosts and ensuring nsswitch.conf uses it did not help. The dns server (dnsmasq) that handles the VMs had to provide the resolution. Once I ensured that the name resolutions worked properly by ensuring the upstream dns server resolved the hostnames things started working. |
@mcastelino not entirely unrelated especially with the discussion about hostname stuff. Of note, here in my situation I am using bridged networking and it is the router providing resolution here in my home setup and not dnsmasq from KVM/libvirt/virt-manager. |
just thought I'd make sure that all certs worked and networking was good so I ran the docker test command from the primary master that failed to mark
|
looks like maybe a permission issue? here are the logs from a scheduler container running on a master instance after it fails to mark itself master:
|
finally have a successful method here: prep the etcd nodes by running this script on the primary etcd node: https://gist.github.com/joshuacox/9df2a029b04e63443b62c2824cf5fb95
and then initialize a master, this script can be ran on any host that has been keyed for ssh access to both the master and the primary etcd node https://gist.github.com/joshuacox/f0f0b25e51df5638f3778d80d4af8c63
EDIT: leaving this open while I do some testing to ensure that this is not anomalous |
I've repeated this a few times now on bare metal and in VMs |
we have plans to improve both the way etcd is handled and a HA setup is created, removing some of the manual steps. this is in the roadmap for future releases. |
I am running into a similar issue with bootstrapping a cluster with kubeadm. Can you further elaborate on how you resolved? All other tickets related to this issue were closed and pointed to this one. With a working external etcd cluster, my kubeadm configuration is as follows:
failure
environment details
|
@blieberman |
Don't forget to test the master connection to the etcd stack:
here's the final script I was using to provision etcd, master, and node: https://gist.github.com/joshuacox/95aad9bee0c7e49e735ec3ec553b24ca or in a more robust manner, my full script: |
Hi, I'm having similar problem with k8s version 1.13.4 cluster nodes
node info
nginx lb config
kubeadm config on etcd nodes
etcd check from master
kubeadm config on master
kubeadm output
docker ps
systemctl
journalctl -xeu kubelet
Any ideas ? |
I did just release kubash 1.13.4 and I have tested both stacked and extetcd methods using 1.13.4. I'd gladly gather any other info from a running cluster if you'd like. |
Hi, I updated the Kubeadm output as I found out that my load balancer, which is running in docker, was not configured to use the host network_mode. Not sure if that mattered but better safe than sorry. @rdodev @timothysc any idea what my problem is here? Should I open up a new issue for this? |
@joshuacox can you share what exactly fixed your issue? |
@hreidar I fixed it by changing the final script to this: https://gist.github.com/joshuacox/95aad9bee0c7e49e735ec3ec553b24ca I suggest you script out everything so you can reproduce the error consistently, then if we can reproduce your error as well we are way more likely to be able to identify the issue. |
Ok, here are the steps I have written down so far... node preperation
config creation for kublet and etcd
generate and distribute certs
config and init master nodes
... and I'm stuck in the master init step :-) |
Is thiis an external etcd setup? Why don't you include that flag? https://gist.github.com/joshuacox/95aad9bee0c7e49e735ec3ec553b24ca#file-final_node-sh-L42 |
I was not aware of its existence. Is this the exact command?
It gives me an error:
|
are you certain this system is clean? those ports being used indicates you already have a (partially?) running cluster EDIT: perhaps
further EDIT: also it appears that was old code, and you are correct about the command now at least in the docs sounds like they implemented a switch on the |
You are right, I did forget to reset, but I'm using the external block in my manifest. I'm trying to follow the official documentation as close as I can but I'm stuck on initializing a master node as shown in my previous posts. |
What is this error telling me?
Is Kubeadm not able to talk to docker via /var/run/dockershim.sock ? @joshuacox how is your docker setup? Which cgroup driver are u using for docker and kubelet? |
|
Ok, it seem to be similar setup but I'm going to try your version of docker.
|
No luck. It seems that etcd is empty and the only resorce I can list from k8s api after this failed step is a ClusterIP
I think I need to open a new issue to try to get a developer to look at this. The info in the logs are not making any sens to me. |
What keywords did you search in kubeadm issues before filing this one?
error marking master: timed out waiting for the condition
#1092
#937
#1087
#715
kubernetes/kubernetes#45727
Is this a BUG REPORT or FEATURE REQUEST?
/kind bug
Versions
kubeadm version (use
kubeadm version
):Environment:
kubectl version
):ubuntu xenial on baremetal
uname -a
):What happened?
What you expected to happen?
Master to initialize without issue.
How to reproduce it (as minimally and precisely as possible)?
https://gist.github.com/joshuacox/4505fbeceb2e394900a24c3cae14131c
run the above like so:
at this point you should have a healthy etcd cluster running on three hosts
then on a separate host (10.0.0.9) run the steps detailed here:
https://kubernetes.io/docs/setup/independent/high-availability/#external-etcd
with this config:
Anything else we need to know?
journalctl -xeu kubelet
https://gist.github.com/joshuacox/3c0b4aa2b66d1172067a32e6e064f948
docker logs cbc9036b0675
the kube api container logs:https://gist.github.com/joshuacox/ab29412c1653e2b1fd2fa06cdd0ae2e2
The text was updated successfully, but these errors were encountered: