kubernetes · k8s-github-robot · Aug 31, 2017 · Aug 17, 2017 · Aug 18, 2017 · Aug 18, 2017
diff --git a/contributors/design-proposals/apiserver-count-fix.md b/contributors/design-proposals/apiserver-count-fix.md
@@ -0,0 +1,86 @@
+# apiserver-count fix proposal
+
+Authors: @rphillips
+
+## Table of Contents
+
+1. [Overview](#overview)
+2. [Known Issues](#known-issues)
+3. [Proposal](#proposal)
+4. [Alternate Proposals](#alternate-proposals)
+    1. [Custom Resource Definitions](#custom-resource-definitions)
+    2. [Refactor Old Reconciler](#refactor-old-reconciler)
+
+## Overview
+
+Proposal to fix Issue [#22609](https://github.com/kubernetes/kubernetes/issues/22609)
+
+`kube-apiserver` currently has a command-line argument `--apiserver-count`
+specifying the number of api servers. This masterCount is used in the
+MasterCountEndpointReconciler on a 10 second interval to potentially cleanup
+stale API Endpoints. The issue is when the number of kube-apiserver instances
+gets below or above the masterCount. If the below case happens, the stale
+instances within the Endpoints does not get cleaned up, or in the latter case
+the endpoints start to flap.
+
+## Known Issues
+
+Each apiserver’s reconciler only cleans up for it's own IP. If a new
+server is spun up at a new IP, then the old IP in the Endpoints list is
+only reclaimed if the number of apiservers becomes greater-than or equal
+to the masterCount. For example:
+
+* If the masterCount = 3, and there are 3 API servers running (named: A, B, and C) 
+* ‘B’ API server is terminated for any reason
+* The IP for endpoint ‘B’ is not
+removed from the Endpoints list
+
+There is logic within the
+[MasterCountEndpointReconciler](https://github.com/kubernetes/kubernetes/blob/68814c0203c4b8abe59812b1093844a1f9bdac05/pkg/master/controller.go#L293)
+to attempt to make the Endpoints eventually consistent, but the code relies on
+the Endpoints count becoming equal to or greater than masterCount. When the
+apiservers become greater than the masterCount the Endpoints tend to flap.
+
+If the number endpoints were scaled down from automation, then the
+Endpoints would never become consistent.
+
+## Proposal
+
+### Create New Reconciler
+
+| Kubernetes Release  | Quality | Description |
+| ------------- | ------------- | ----------- |
+| 1.9           | alpha         | <ul><li>Add a new reconciler</li><li>Add a command-line type `--alpha-apiserver-endpoint-reconciler-type`<ul><li>storage</li><li>default</li></ul></li></ul>
+| 1.10          | beta          | <ul><li>Turn on the `storage` type by default</li></ul>
+| 1.11          | stable        | <ul><li>Remove code for old reconciler</li><li>Remove --apiserver-count</li></ul>
+
+The MasterCountEndpointReconciler does not meet the current needs for durability
+of API Endpoint creation, deletion, or failure cases.
+
+Custom Resource Definitions were proposed, but they do not have clean layering.
+Additionally, liveness and locking would be a nice to have feature for a long
+term solution.
+
+ConfigMaps were proposed, but since they are watched globally, liveliness
+updates could be overly chatty.
+
+By porting OpenShift's
+[LeaseEndpointReconciler](https://github.com/openshift/origin/blob/master/pkg/cmd/server/election/lease_endpoint_reconciler.go)
+to Kubernetes we can use use the Storage API directly to store Endpoints
+dynamically within the system.
+
+### Alternate Proposals
+
+#### Custom Resource Definitions and ConfigMaps
+
+CRD's and ConfigMaps were considered for this proposal. They were not adopted
+for this proposal by the community due to tecnical issues explained earlier.
+
+#### Refactor Old Reconciler
+
+| Release | Quality |                         Description                          |
+| ------- | ------- | ------------------------------------------------------------ |
+| 1.9     | stable  | Change the logic in the current reconciler
+
+We could potentially reuse the old reconciler by changing the reconciler to count
+the endpoints and set the `masterCount` (with a RWLock) to the count.