Skip to content

Commit

Permalink
KEP: kubeadm config HA support
Browse files Browse the repository at this point in the history
  • Loading branch information
Matt Kelly committed Mar 14, 2018
1 parent f22c1d0 commit 60fe0e8
Showing 1 changed file with 148 additions and 0 deletions.
148 changes: 148 additions & 0 deletions keps/sig-cluster-lifecycle/kubeadm-config-ha-support.md
@@ -0,0 +1,148 @@
---
kep-number: TBD
title: Augment Kubeadm Config to Enable Upgrades of HA Clusters
authors:
- "@mattkelly"
owning-sig: sig-cluster-lifecycle
reviewers:
- "@timothysc"
approvers:
- TBD
editor:
- "@mattkelly"
creation-date: 2018-03-08
last-updated: 2018-03-09
status: provisional
see-also:
- [KEP kubeadm join --master workflow](https://github.com/kubernetes/community/pull/1707) (in progress)
- [Upgrading kubeadm HA clusters from 1.9.x to 1.9.y](https://kubernetes.io/docs/tasks/administer-cluster/kubeadm-upgrade-ha)
---

# Augment Kubeadm Config to Enable Upgrades of HA Clusters

## Table of Contents

* [Augment Kubeadm Config to Enable Upgrades of HA Clusters](#augment-kubeadm-config-to-enable-upgrades-of-ha-clusters)
* [Table of Contents](#table-of-contents)
* [Summary](#summary)
* [Motivation](#motivation)
* [Goals](#goals)
* [Non-Goals](#non-goals)
* [Challenges and Open Questions](#challenges-and-open-questions)
* [Proposal](#proposal)
* [Implementation Details](#implementation-details)
* [Background](#background)
* [Adding Additional Master-Specific ConfigMaps](#adding-additional-master-specific-configmaps)
* [Key Design Considerations and Benefits](#key-design-considerations-and-benefits)
* [Parallel Node Creation](#parallel-node-creation)
* [Guaranteed Consistent kubeadm-config](#guaranteed-consistent-kubeadm-config)
* [Risks and Mitigations](#risks-and-mitigations)
* [Implementation History](#implementation-history)
* [Drawbacks](#drawbacks)
* [Alternatives](#alternatives)

## Summary

Currently, `kubeadm upgrade` of a master in a multi-master cluster will fail without workarounds because of a lack of node-specific master configuration in the `kubeadm-config` ConfigMap that is created at `init` time and later referenced during an upgrade.
In particular, node-specific information is required in order for kubeadm to identify which control plane static pods belong to the current node during an upgrade.
In the non-HA case, having a single `nodeName` property in `kubeadm-config` that corresponds to the single master is sufficient because there is no ambiguity.
As we move towards supporting HA natively in kubeadm, a new approach is required to uniquely identify master nodes.

## Motivation

Kubeadm is driving towards natively supporting highly available clusters.
As part of HA support, a clean upgrade path is required.
The purpose of this KEP is simply to introduce support for multiple masters in the kubeadm configuration that is stored in-cluster in order to enable that clean upgrade path.

### Goals

Enable `kubeadm upgrade` of highly available clusters by augmenting the existing persistent kubeadm configuration.

### Non-Goals

This proposal does not aim to solve the entire problem of upgrading HA clusters.
This KEP specifically tackles the persistent configuration problem so that the information required at upgrade time is available.

### Challenges and Open Questions

The final implementation of this KEP will require deciding exactly what "master node-specific information" means.
Currently, the `nodeName` of the master is the only entry that is unarguably node-specific.
However, it may be possible that additional config entries could be split out into the node-specific area(s) of the config.
This could result in asymmetric configuration across the masters, which may or may not be something that we wish to support.

## Proposal

### Implementation Details

#### Background

Currently, the `kubeadm-config` ConfigMap in the `kube-system` namespace serves as the single source of truth for how kubeadm has been used to create and modify a cluster.
Because kubeadm is not a process that runs on the cluster (it is only run to perform operations, e.g. `init` and `upgrade`), this config is not modified during normal operation.
In the non-HA case today, it is guaranteed to be an accurate representation of the kubeadm configuration.

If kubeadm is used to create an HA cluster today, e.g. using the workarounds described in [kubeadm #546](https://github.com/kubernetes/kubeadm/issues/546) and/or @mbert's [document](https://docs.google.com/document/d/1rEMFuHo3rBJfFapKBInjCqm2d7xGkXzh0FpFO0cRuqg), then the `kubeadm-config` ConfigMap will be an accurate representation except for any master node-specific information.
As explained in [Challenges and Open Questions](#challenges-and-open-questions), such node-specific information is not yet well-defined but minimally consists of the master's `nodeName`.
The `nodeName` in `kubeadm-config` will correspond to the last master that happened to write to the ConfigMap.
In the case of parallel node creation, this may not be well-defined.
When `kubeadm upgrade` is run on a master and this `nodeName` is fetched, it may be incorrect and the upgrade process will fail.

#### Adding Additional Master-Specific ConfigMaps

The proposed solution is to add additional kubeadm ConfigMaps that are specific to each master (one ConfigMap for each master).
Each master-specific ConfigMap will be created as part of the to-be-implemented [`kubeadm join --master` process](https://github.com/kubernetes/community/pull/1707).
Any master-specific information in the main `kubeadm-config` ConfigMap will be removed.

The names of these new ConfigMaps will be `kubeadm-config-<machine_UID>` where `machine_UID` is an identifier that is guaranteed to be unique for each node in the cluster.
There is a precedent for using such a `machine_UID`, and in fact kubeadm already has a [prerequisite](https://kubernetes.io/docs/setup/independent/install-kubeadm/#verify-the-mac-address-and-product_uuid-are-unique-for-every-node) that such machine identifiers be unique for every node.
For the purpose of this KEP, let us assume that `machine_UID` is the full `product_uuid` of the machine that the master node is running on.

Kubeadm operations such as upgrade that require master-specific information should now also grab the corresponding ConfigMap for their node.

##### Key Design Considerations and Benefits

There are a few key benefits to the approach of adding additional ConfigMaps over an approach which would augment the existing `kubeadm-config` with master-specific information:

###### Parallel Node Creation

Node creation in parallel is a valid use-case that works today.
By adding additional ConfigMaps instead of requiring each master to modify the existing `kubeadm-config`, we avoid the need to lock on that ConfigMap.

###### Guaranteed Consistent kubeadm-config

This approach allows us to continue to guarantee that the main `kubeadm-config` is consistent with the actual cluster configuration.
If we put master-specific information into `kubeadm-config` itself, then we would require either a yet-to-be-defined `kubeadm leave` workflow or active reconciliation of `kubeadm-config` in order to ensure accurateness.
This may not be critical, but it is a consideration.

With this proposal, if a node unexpectedly leaves a cluster, then at worst a stale ConfigMap will be left in the cluster.
For the case where a node is explicitly deleted, we can leverage garbage collection to automatically delete the master-specific ConfigMap by listing the node as an `ownerReference` when the ConfigMap is created.

### Risks and Mitigations

There will be situations in which a kubeadm operation (e.g. upgrade) that requires the new master-specific ConfigMap is run and finds that the expected ConfigMap does not exist.
For example, this will happen for users who are upgrading HA clusters that were created using the aforementioned workarounds required before kubeadm HA support is available.
In this case, there are several options:

1. Kubeadm can fall back to looking at the main `kubeadm-config`.
The user would have to manually modify `kubeadm-config` for each master to set the `nodeName` to the current master.
This is the [recommended workaround](https://github.com/kubernetes/kubeadm/issues/546#issuecomment-365063404) today.

2. Kubeadm can provide an additional command similar to (or an extension of) the existing [`kubeadm config upload`](https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-config/) command.
This command would (minimally) create the `kubeadm-config-<product_uuid>` ConfigMap for the current master, which would allow a subsequent `kubeadm upgrade` to succeed.

It seems reasonable to assume that any user that created an HA cluster using kubeadm before the existence `kubeadm join --master` workflow should be aware that workarounds will be required for upgrading (since they presumably applied many workarounds to create the cluster in the first place).
In any case, useful documentation and error messages are critical to a good user experience.

## Implementation History

- [Issue #546: Workarounds for the time before kubeadm HA becomes available](https://github.com/kubernetes/kubeadm/issues/546)
- [Adding HA to kubeadm-deployed clusters](https://docs.google.com/document/d/1rEMFuHo3rBJfFapKBInjCqm2d7xGkXzh0FpFO0cRuqg)
- [Issue #706: Make kubeadm upgrade HA ready](https://github.com/kubernetes/kubeadm/issues/706)

## Drawbacks

This KEP introduces additional ConfigMaps for kubeadm to use (one for each master).

## Alternatives

An alternative approach would be to augment the existing `kubeadm-config` ConfigMap with master-specific information.
The advantages over this approach are detailed in the [Proposal](#proposal) section.

0 comments on commit 60fe0e8

Please sign in to comment.