Implement kube-master HA for multiple masters #761
Conversation
@adamschaub if this is WIP, can you describe what is your plan and what is left to do to achieve K8s HA deployment? |
@@ -4,4 +4,4 @@ | |||
# default config should be adequate | |||
|
|||
# Add your own! | |||
KUBE_SCHEDULER_ARGS="--kubeconfig={{ kube_config_dir }}/scheduler.kubeconfig" | |||
KUBE_SCHEDULER_ARGS="--kubeconfig={{ kube_config_dir }}/scheduler.kubeconfig {% if groups['masters']|length > 1 %}--leader-elect=true{% endif %}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it break things to do 'leader-elect=true' even if there is only one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good point, I'll look it up and test it out. I naively assumed that it being 'off' by default had some purpose.
Related issue: kubernetes/kubernetes#18174 |
@rutsky keep in mind that @adamschaub work will include a load-balancer role within contrib/ansible, so kubernetes/kubernetes#18174 should not be an issue as kubelet/proxy will connect to a vip. Let em know if I missed anything with the related issue. |
Sorry for the messy commits, but I wanted to get my changes out there for visibility. I've separated master tasks into single-master and ha-master groups, based on the number of nodes in 'masters' group. No load balancing yet, and the HA tasks are CoreOS dependent at the moment. Each master node runs an instance of kube-apiserver (via hyperkube), and kube-scheduler/kube-controller-manager are leader elected using podmaster. I'm finishing up load-balancing configuration with haproxy+keepalived pods. |
@@ -14,12 +14,15 @@ | |||
|
|||
- name: restart apiserver | |||
service: name=kube-apiserver state=restarted | |||
when: groups['masters']|length == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer this syntax if it works b/c it's used elsewhere in the project:
groups['masters'][0]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow. groups['masters'][0] is used elsewhere to delegate a task to just one master (generating tokens and certs is one example). In this case, we only want to restart the apiserver service when there is only one master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it. thanks.
@adamschaub looks like you're making good progress. @eparis @rutsky I would appreciate you taking time to review this WIP PR to add Master HA based on the k8s ha guide: http://kubernetes.io/docs/admin/high-availability/ |
02f2e90
to
a0dab87
Compare
@danehans @eparis @rutsky I've removed the haproxy/keepalived bits. After discussion with @danehans, we've decided to add haproxy/keepalived under a separate PR. I've tested this with CentOS 7 and CoreOS (stable) on a virtualbox environment. If you add more than one master node, kubelet is installed on all masters and master components are run on pods and leader-elected as required. Each master node runs an apiserver pod. They are not currently load balanced/tied to a floating IP, so clobbering the first master in the list will cause communication issues with the other pieces. |
0248a4a
to
535ff74
Compare
Note: That the separate haproxy/keepalived will address the limitation that @adamschaub describes: Each master node runs an apiserver pod. They are not currently load balanced/tied to a floating IP, so clobbering the first master in the list will cause communication issues with the other pieces. |
Sorry guys for being my victim again :) |
@redhat-k8s-bot test this |
Can one of the admins verify this patch? |
@redhat-k8s-bot test please |
@ingvagabund all looks good with our testing of this patch. @atosatto since you're interested in this work, have you tested it? |
I'll give it a spin this evening and let you know if it works to me. Can someone share the way he is testing the PR? How many masters are you I'm asking all these questions because I want to find out which is the Andrea Tosatto
|
@atosatto I've been testing using vagrant+virtualbox provider. I either modify the Vagrantfile directly or pull in your multi-master addition to include at least 2 masters, 3 etcds, and 1 node. Once the install is complete, I hop onto a master node, verify that the master service pods are running using |
@atosatto, any updates? |
@adamschaub sorry Adam for my very late response but I had no time to check this until yesterday. I've been able to make things working with your setup but I had issues with a greater number of master nodes. I will try to dig into this. Maybe we can ping each other on Hipchat in order to let you drill down the issue. |
Hey @adamschaub, do you have any update on this PR? After pulling down this PR, and incorporating the changes from #711, I am still getting the error below. This is a CoreOS cluster in AWS with 5 hosts (2 Masters).
I'll dig into this more and submit a PR if I locate the issue. |
@smugcloud Thanks. Haven't touched this in a while... |
Labelling this PR as size/XL |
@adamschaub PR needs rebase |
recomputing cla status... |
[CLA-PING] @adamschaub Thanks for your pull request. It looks like this may be your first contribution to a CNCF open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://identity.linuxfoundation.org/projects/cncf to sign. Once you've signed, please reply here (e.g. "I signed it!") and we'll verify. Thanks.
|
[CLA-PING] @adamschaub Thanks for your pull request. It looks like this may be your first contribution to a CNCF open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://identity.linuxfoundation.org/projects/cncf to sign. Once you've signed, please reply here (e.g. "I signed it!") and we'll verify. Thanks.
|
1 similar comment
[CLA-PING] @adamschaub Thanks for your pull request. It looks like this may be your first contribution to a CNCF open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://identity.linuxfoundation.org/projects/cncf to sign. Once you've signed, please reply here (e.g. "I signed it!") and we'll verify. Thanks.
|
[APPROVALNOTIFIER] The Following OWNERS Files Need Approval:
We suggest the following people: |
Issues go stale after 30d of inactivity. Prevent issues from auto-closing with an If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
We need to cover HA for kube-master components. Kubernetes HA cluster guide [1]. #725
User specifies multiple nodes under the master role [2]. Ansible should seamlessly create a cluster with a collection of masters, with api services appropriately load balanced, and other kube-master components leader-elected.
(optional, good to have)
source_type: docker
(run Kubernetes parts in containers) #673)[1] http://kubernetes.io/docs/admin/high-availability/
[2] https://github.com/kubernetes/contrib/blob/master/ansible/inventory.example.ha
This change is