Skip to content

A set of scripts and documentation for adding redundancy (etcd cluster, multiple masters) to a cluster set up with kubeadm 1.8 and above

License

Notifications You must be signed in to change notification settings

mbert/kubeadm2ha

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

75 Commits
 
 
 
 
 
 

Repository files navigation

kubeadm2ha - Automatic setup of HA clusters using kubeadm

A set of scripts and documentation for adding redundancy (etcd cluster, multiple masters) to a cluster set up with kubeadm 1.8. This code is intended to demonstrate and simplify creation of redundant-master setups while still using kubeadm which is still lacking this functionality. See kubernetes/kubeadm/issues/546 for discussion on this.

This code largely follows the instructions published in cookeem/kubeadm-ha and added only minor contribution in changing little bits for K8s 1.8 compatibility and automating things.

Overview

This repository contains a set of ansible scripts to do this. There are three playbooks:

  1. playbook-01-cluster-setup.yaml sets up a complete cluster including the HA setup. See below for more details.
  2. playbook-51-cluster-uninstall.yaml removes data and configuration files to a point that cluster-setup.yaml can be used again.
  3. playbook-02-dashboard.yaml sets up the dashboard including influxdb/grafana.
  4. playbook-03-local-access.yaml creates a patched admin.conf file in /tmp/-admin.conf. After copying it to ~/.kube/config remote kubectl access via V-IP / load balancer can be tested.
  5. playbook-04-prometheus-operator.yaml sets up the prometheus-operator.
  6. playbook-05-efk-stack.yaml sets up an EFK stack for centralised logging.
  7. playbook-00-cluster-images.yaml prefetches all images needed for Kubernetes operations and transfers them to the target hosts.
  8. playbook-52-uninstall-dashboard.yaml removes the dashboard.
  9. playbook-53-uninstall-efk-stack.yaml removes the EFK stack including Fluentd cache and Elasticsearch data files.
  10. playbook-31-cluster-upgrade.yaml upgrades a cluster.

Due to the frequent upgrades to both Kubernetes and kubeadm, these scripts cannot support all possible versions. For both, fresh installs and upgrades, please refer to the value of KUBERNETES_VERSION in ansible/group_vars/all.yaml to find out which target version has been used for developing them. Other versions may work, too, but you may turn out to be the first to try this. Please refer to the following documents for compatibility information:

Prerequisites

Ansible version 2.4 or higher is required. Older versions will not work.

On the target environment, some settings for successful installation of Kubernetes are necessary. The "Before you begin" section in the official kubernetes documentation applies, nevertheless here is a convenience list of things to take care of:

  1. Set the value of /proc/sys/net/bridge/bridge-nf-call-iptables to 1. There may be different, distro-dependent ways to accomplish this in a persistent way, however most people will get away by editing /etc/sysctl.conf.
  2. Load the ip_vs kernel module. Most people will want to create a file in /etc/modprobe.de for this.
  3. Disable swap. Make sure to edit /etc/fstab to remove the swap mount from it.
  4. If you want to use the EFK stack, you'll also have to define the following groups: elasticsearch_hot (for hosts with fast hardware, more RAM on which the "hot" ES instances will be run, mutually exclusive with elasticsearch_warm), elasticsearch_warm (for hosts with more disk space on which the "warm" ES instances will be run, mutually exclusive with elasticsearch_hot), elasticsearch (all elasticsearch hosts). See the inventory md-kubernetes.inventory for an example. The Elasticsearch data nodes require a nofile limit not lower than 65536 which is well above of the defaults on some systems. If using the JSON-based configuration file /etc/docker/daemon.json, you may have to add this: "default-ulimits":{"nofile":{"Name":"nofile","Hard":65536,"Soft":65536}}.

Configuration

In order to use the ansible scripts, at least two files need to be configured:

  1. Either edit my-cluster.inventory or create your own. The inventory must define the following groups: primary_master (a single machine on which kubeadm will be run), secondary_masters (the other masters), masters (all masters), minions (the worker nodes), nodes (all nodes), etcd (all machines on which etcd is installed, usually the masters).
  2. Create a file named as the group defined in your inventory file in group_vars overriding the defaults from all.yaml where necessary. Note that if you set SETUP_DOCKER to yes, the device for /var/lib/docker must exist and be empty, it will be formatted and mounted automatically. You may also decide to change some of the defaults for your environment: LOAD_BALANCING (kube-vip, haproxy or nginx), NETWORK_PLUGIN (weavenet, flannel or calico) and ETCD_HOSTING (stacked if running on the masters, else external).

What the cluster setup does

  1. Set up an etcd cluster with self-signed certificates on all hosts in group etcd..
  2. Set up a virtual IP and load balancing: either using a static pod for kube-vip or a keepalived cluster with nginx on all hosts in group masters.
  3. Set up a master instance on the host in group primary_master using kubeadm.
  4. Set up master instances on all hosts in group secondary_masters by copying and patching (replace the primary master's host name and IP) the configuration created by kubeadm and have them join the cluster.
  5. Use kubeadm to join all hosts in the group minions.
  6. Sets up a service account 'admin-user' and cluster role binding for the role 'cluster-admin' for remote access (if wanted).

What the images setup does

  1. Pull all required images locally (hence you need to make sure to have docker installed on the host from which you run ansible).
  2. Export the images to tar files.
  3. Copy the tar files over to the target hosts.
  4. Import the images from the tar files on the target hosts.

What the prometheus-operator setup does

  1. Install prometheus-operator, so that applications can use it for creating their own prometheus instances, service monitors etc.

What the images setup does

  1. Pull all required images locally (hence you need to make sure to have docker installed on the host from which you run ansible).
  2. Export the images to tar files.
  3. Copy the tar files over to the target hosts.
  4. Import the images from the tar files on the target hosts.

Setting up the dashboard

The playbook-02-dashboard.yaml playbook does the following:

  1. Install the dashboard and metrics-server components.
  2. Scale the number of instances to the number of master nodes.

For accessing the dashbord run kubectl proxy on your local host (which requires to have configured kubectl for your local host, see Configuring local access below for automating this), then access via http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/#/login

The dashboard will ask you to authenticate. There are several options:

  1. Use the token of an existing service account with sufficient privileges. On many clusters this command works for root-like access:

    kubectl -n kube-system describe secrets `kubectl -n kube-system get secrets | awk '/clusterrole-aggregation-controller/ {print $1}'` | awk '/token:/ {print $2}'
    
  2. Use the token of the 'admin-user' service account (if it exists):

    kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep admin-user | awk '{print $1}')
    
  3. Use the playbook-03-local-access.yaml playbook to generate a configuration file. That file can be copied to ~/.kube/config for local kubectl access. It can also be uploaded as kubeconfig file in the dashboard's login dialogue.

Setting up the EFK stack for centralised logging

  1. Elasticsearch is installed as "hot/warm", i.e. all indices older than 3 days are moved from the "hot" to the "warm" instances automatically. It is assumed that the "hot" instances run on machines with faster hardware and probably less disk space. If kubectl proxy is running, the ES instance can be accessed through this URL: http://localhost:8001/api/v1/namespaces/kube-system/services/elasticsearch-logging-client:9200/proxy/_search?q=*

  2. Kibana can be used for accessing the logs from Elasticsearch. If kubectl proxy is running it can be accessed through this URL: http://localhost:8001/api/v1/namespaces/kube-system/services/kibana-logging:http/proxy/app/kibana#/

  3. Fluentd collects the logs from the running pods. In order to normalise log files using different formats, you will most likely want to edit the the configmap fluentd uses (see the file fluentd-es-configmap.yaml in roles/fluentd/files). A working, while not terribly efficient way to manage this is the use of the rewrite_tag_filter as a multiplexer depending on the different log formats and then the parser filter with one section per (rewritten) tag for the actual parsing and normalising (for normalising different time/date formats, the record_transformer filter can be used).

Configuring local access

Running the playbook-03-local-access.yaml playbook creates a file /tmp/-admin.conf that can be used as ~/.kube/config. If the dashboard has been installed (see above) the file will contain the 'admin-user' service account's token, so that for both kubectl and the dashboard root-like access is possible. If that service account does not exist, the client-side certificate will be used instead which is OK for testing environments but is generally considered not recommendable because the client-side certificates are not supposed to leave their master host.

Upgrading a cluster

For upgrading a cluster several steps are needed:

  1. Find out which software versions to upgrade to.
  2. Set the ansible variables to the new software versions.
  3. Run the playbook-00-cluster-images.yaml playbook if the cluster has no Internet access.
  4. Run the playbook-31-cluster-upgrade.yaml playbook.

Note: Never upgrade a productive cluster without having tried it on a reference system before.

Preparation

First thing to do is find out to which version you want to upgrade. We only support systems where the version for all Kubernetes-related components (native packages, like kubelet, kubectl, kubeadm) and whatever they will run in containers when installed (API Server, Controller Manager, Scheduler, Kube Proxy) is the same. Hence, after having determined the version to upgrade to, update the variable KUBERNETES_VERSION either in group_vars/all.yaml (global) or in group_vars/.yaml (your environment only).

Next, you need to be able to upgrade the kubelet, kubectl, kubeadm and - if upgraded, too - kubernetes-cni on your cluster's machines using their package manager (yum, apt or whatever). If you are connected to the internet, this is a no-brainer. However in an isolated environment without internet access you will need to download these packages elsewhere and then make them available for your nodes, so that they can be installed using their package managers. This will most likely lead to creating local repos on the nodes or on a server in the same network and configure the package managers to use them. Since the steps required for this will be different for all the various Linux distributions we will not cover this here. Once you can simply run something like yum install -y kubeadm-<KUBERNETES_VERSION> (for a Redhat platform) on your nodes you should be fine.

Note that upgrading etcd is only supported if it is installed on the master nodes (ETCD_HOSTING is stacked). Else you will have to upgrade etcd manually which is beyond scope here.

Having configured this you may now want to fetch and install the new images for your to-be-upgraded cluster, if your cluster has no internet access. If it has you may want to do this anyway to make the upgrade more seamless.

To do so, run the following command:

ansible-playbook -f <good-number-of-concurrent-processes> -i <your-environment>.inventory playbook-00-cluster-images.yaml

I usually set the number of concurrent processes manually because if a cluster consists of more than 5 (default) nodes picking a higher value here significantly speeds up the process.

Perform the upgrade

You may want to backup /etc/kubernetes on all your master machines. Do this before running the upgrade.

The actual upgrade is automated. Run the following command:

ansible-playbook -f <good-number-of-concurrent-processes> -i <your-environment>.inventory playbook-31-cluster-upgrade.yaml

See the comment above on setting the number of concurrent processes.

The upgrade is not fully free of disruptions:

  • while kubeadm applies the changes on a master, it restarts a number of services, hence they may be unavailable for a short time
  • if containers running on the minions keep local data they have to take care to rebuild it when relocated to different minions during the upgrade process (i.e. local data is ignored)

If any of these is unacceptable, a fully automated upgrade process does not really make any sense because deep knowledge of the application running in a respective cluster is required to work around this. Hence in that case a manual upgrade process is recommended.

If something goes wrong

If the upgrade fails the situation afterwards depends on the phase in which things went wrong.

If kubeadm failed to upgrade the cluster it will try to perform a rollback. Hence if that happened on the first master, chances are pretty good that the cluster is still intact. In that case all you need is to start docker, kubelet and keepalived on the secondary masters and then uncordon them (kubectl uncordon <secondary-master-fqdn>) to be back where you started from.

If kubeadm on one of the secondary masters failed you still have a working, upgraded cluster, but without the secondary masters which may be in a somewhat undefined condition. In some cases kubeadm fails if the cluster is still busy after having upgraded the previous master node, so that waiting a bit and running kubeadm upgrade apply v<VERSION> may even succeed. Otherwise you will have to find out what went wrong and join the secondaries manually. Once this has been done, finish the automatic upgrade process by processing the second half of the playbook only:

ansible-playbook -f <good-number-of-concurrent-processes> -i <your-environment>.inventory playbook-31-cluster-upgrade.yaml --tags nodes

If upgrading the software packages (i.e. the second half of the playbook) failed, you still have a working cluster. You may try to fix the problems and continue manually. See the .yaml files under roles/upgrade-nodes/tasks for what you need to do.

If you are trying out the upgrade on a reference system, you may have to downgrade at some point to start again. See the sequence for reinstalling a cluster below for an instruction how to do this (hint: it is important to erase the some base software packages before setting up a new cluster based on a lower Kubernetes version).

Examples

To run one of the playbooks (e.g. to set up a cluster), run ansible like this:

ansible-playbook -i <your-inventory-file>.inventory playbook-01-cluster-setup.yaml

You might want to adapt the number of parallel processes to your number of hosts using the `-f' option.

A sane sequence of playbooks for a complete setup would be:

  • playbook-00-cluster-images.yaml
  • playbook-01-cluster-setup.yaml
  • playbook-02-cluster-dashboard.yaml

The following playbooks can be used as needed:

  • playbook-51-cluster-uninstall.yaml
  • playbook-03-local-access.yaml
  • playbook-52-uninstall-dashboard.yaml

Sequence for reinstalling a cluster:

INVENTORY=<your-inventory-file> 
NODES=<number-of-nodes>
ansible-playbook -f $NODES -i $INVENTORY playbook-51-cluster-uninstall.yaml 
# if you want to downgrade your kubelet, kubectl, ... packages you need to uninstall them first
# if this is not the issue here, you can skip the following line
ansible -u root -f $NODES -i $INVENTORY nodes -m command -a "rpm -e kubelet kubectl kubeadm kubernetes-cni"
for i in playbook-01-cluster-setup.yaml playbook-02-cluster-dashboard.yaml; do 
    ansible-playbook -f $NODES -i $INVENTORY $i || break
    sleep 15s
done

Known limitations

This is a preview in order to obtain early feedback. It is not done yet. Known limitations are:

  • Not yet finished: support for EFK stack (will need to see whether this will work at all)
  • The setup with 'kube-vip' as load balancer / VIP manager does not work completely on some systems. In particular using LB ports other than 6443 often fails.
  • The code has been tested almost exclusively in a Redhat-like (RHEL) environment. More testing on other distros is needed.

About

A set of scripts and documentation for adding redundancy (etcd cluster, multiple masters) to a cluster set up with kubeadm 1.8 and above

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors 4

  •  
  •  
  •  
  •