Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube-apiserver endpoint cleanup when --apiserver-count>1 #22609

Closed
Calpicow opened this issue Mar 6, 2016 · 40 comments

Comments

Projects
None yet
@Calpicow
Copy link
Contributor

commented Mar 6, 2016

Using v1.2.0-beta.0 and running --apiserver-count=2 in my cluster. I would expect the service endpoint to be cleaned up when one of them goes offline. This doesn't happen though, and causes ~50% of apiserver requests to fail.

@dchen1107

This comment has been minimized.

Copy link
Member

commented Mar 8, 2016

I haven't tried to reproduce the issue yet, but I suspect you might hit #22625 here. There is a pending pr #22626 to resolve #22625. Could you please try to patch your cluster to see if you still have the issue? Thanks!

@Calpicow

This comment has been minimized.

Copy link
Contributor Author

commented Mar 9, 2016

I don't believe that issue is related. When I down a master, I could see the node and pods being removed. What's not being removed is the second apiserver IP in the kubernetes endpoint.

@dchen1107

This comment has been minimized.

Copy link
Member

commented Mar 9, 2016

cc/ @mikedanese @lavalamp they might see the issue before.

@mikedanese

This comment has been minimized.

Copy link
Member

commented Mar 9, 2016

This isn't exactly a bug. This is by design but it's likely the design could be better.

@lavalamp

This comment has been minimized.

Copy link
Member

commented Mar 10, 2016

Yes, we don't expect an apiserver to go down and stay down without a replacement.

@lavalamp

This comment has been minimized.

Copy link
Member

commented Mar 11, 2016

We could make this more robust by having apiservers count themselves, e.g. by each separately making an entry in etcd somewhere.

For now, if you change the number of apiservers that are running, you must change the --apiserver-count= flag by restarting all apiservers.

@hanikesn

This comment has been minimized.

Copy link

commented Apr 15, 2016

What happens if one of them crashes/machines goes down? The kubernetes service endpoint would not be usable as 50% of the requests would fail.

@victorgp

This comment has been minimized.

Copy link
Contributor

commented Apr 21, 2016

This is actually an important issue that can break the high availability of the whole cluster.
The DNS service relies on the default kubernetes service to connect to the api. If an api server goes down, as hanikesn says, 50% of the requests will fail, hence, the DNS service will fail, therefore, the whole cluster might fail. Is there any plan to change the design of that apiserver-count?

@javefang

This comment has been minimized.

Copy link

commented Jul 7, 2016

Agreed with @victorgp , we are creating a high availability cluster, and some services like DNS and Traefik do rely on the default kubernetes endpoint. I can workaround this by forcing them to use the load balancer's URL directly (basically don't use the default endpoint). But I feel like the service endpoint for kubernetes should be consistent with the rest?

@timothysc

This comment has been minimized.

Copy link
Member

commented Jul 28, 2016

Does this still exist @ncdc , post recent mod on endpoint update?

@hanikesn

This comment has been minimized.

Copy link

commented Oct 15, 2016

This issue still exists with 1.4.1.

@ncdc

This comment has been minimized.

Copy link
Member

commented Oct 17, 2016

@timothysc yes, it still exists. We have an endpoint reconciler for the kubernetes service in OpenShift that uses etcd 2's key ttl mechanism to maintain a lease. If an apiserver goes down, one of the remaining members will remove the dead backend IP from the list of endpoints. If we can agree on a mechanism to do this in Kube (there were some concerns about etcd key ttl), I'd be happy to put together a PR.

@cristifalcas

This comment has been minimized.

Copy link

commented Oct 18, 2016

👍

@fgrzadkowski

This comment has been minimized.

Copy link
Member

commented Oct 20, 2016

This problem came up when discussing HA master design. This made me think that maybe there's a better solution. As you say we should be using a TTL for each IP. What we can do is:

  1. In the Endpoints object annotations we'd keep a TTL for each IP. Each annotation would keep a pair with an IP, that it corresponds to, and a TTL
  2. Each apiserver when updating service kubernetes will do two things:
    1. Add it's own IP if it's not there and add/update TTL for it
    2. Remove all the IPs with too old TTL

This should be a very simple change and hopefully would solve this issue. WDYT?

@ncdc

This comment has been minimized.

Copy link
Member

commented Oct 20, 2016

@fgrzadkowski that's essentially what we're doing in OpenShift here, although we use a separate path in etcd to store the data, instead of using the existing endpoints path.

@fgrzadkowski

This comment has been minimized.

Copy link
Member

commented Oct 20, 2016

Do you think that baking the logic I described above into apiserver here would make sense?

@roberthbailey @jszczepkowski @lavalamp @krousey @nikhiljindal

@fgrzadkowski

This comment has been minimized.

Copy link
Member

commented Oct 24, 2016

Slight modification after discussion with @thockin and @jszczepkowski.

We believe that a reasonable approach would be to:

  1. Add a ConfigMap that would keep the list of active apiservers, with their expiration times; those would be updated by each apiserver separately
  2. Change EndpointsReconsiler in apiserver to update Endpoints list to match active apiservers from the ConfigMap.

That way we will have a dynamic configuration and at the same time we will not be updating Endpoints too often, as expiration times will be stored in a dedicated ConfigMap.

@ncdc

This comment has been minimized.

Copy link
Member

commented Oct 25, 2016

I assume you'd add retries in the event that 2 apiservers tried to update
the ConfigMap simultaneously?

On Mon, Oct 24, 2016 at 10:13 AM, Filip Grzadkowski <
notifications@github.com> wrote:

Slight modification after discussion with @thockin
https://github.com/thockin and @jszczepkowski
https://github.com/jszczepkowski.

We believe that a reasonable approach would be to:

  1. Add a ConfigMap that would keep the list of active apiservers, with
    their expiration times; those would be updated by each apiserver separately
  2. Change EndpointsReconsiler in apiserver to update Endpoints list to
    match active apiservers from the ConfigMap.

That way we will have a dynamic configuration and at the same time we will
not be updating Endpoints too often, as expiration times will be stored in
a dedicated ConfigMap.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#22609 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAABYmG6QxOT54pQL3uKKOEQ8JhHyJg7ks5q3L0hgaJpZM4HqLYR
.

@smarterclayton

This comment has been minimized.

Copy link
Contributor

commented Jun 20, 2017

@luxas

This comment has been minimized.

Copy link
Member

commented Jun 20, 2017

I think this would be a great and necessary feature as Kube grows and starts utilizing HA for real (without an external LB like everybody does today).

@yifan-gu Is it possible that you could join the SIG Cluster Lifecycle meeting today 9am PT? I'd love to talk about this there.

cc @jbeda @timothysc ^

@aaronlevy

This comment has been minimized.

Copy link
Member

commented Jun 20, 2017

If @yifan-gu isn't able to make it - I'll be on the call and can discuss

@aaronlevy

This comment has been minimized.

Copy link
Member

commented Jun 20, 2017

Briefly discussed in the sig-cluster-lifecycle sync:

Exploring if we could come up with a proposal/implementation as part of a summer internship project. So goal would be to have something within the v1.8 release cycle.

Also @yifan-gu and @abhinavdahiya were planning to join the sig-api-machinery meeting this week to discuss possible direction / proposal. @liggitt does that seem like the right place to start this discussion?

@rphillips

This comment has been minimized.

Copy link
Member

commented Aug 3, 2017

There is a proposal from openshift [1] implementing a new registry for endpoints, but directly accesses etcd which we want to avoid. This proposal keeps the logic local to the module.

There was some concern about clock skew, but lots of things (like leader election) are impacted by large skews.

Proposal

  • Create an EndpointEvictionTimeout within the config, default: 5 minutes
  • Add an annotation for each API server IP that is a lease on the IP:
key value
endpoints.k8s.io/api-controller/ip/[IP string] Formatted Time Value
  • Each iteration of ReconcileEndpoints would loop through Annotations:
    • Compare the timestamp if it is older than the eviction delay
      • IP exists in Addresses
        • Remove the IP from Addresses
        • Remove the Annotation
      • IP does not exist in Addresses
        • Remove the Annotation
  • Update the Annotation for the current IP’s timestamp
  • If the update fails due to a consistency constraint, read the Endpoint logic again, goto step 1, and attempt the write again, fail after 5 tries

Example

API Controller IPs: 10.0.3.1, 10.0.3.2

Key Value
k8.io/api-controller-ip/10-0-3-1 2009-11-10T23:00:00Z
k8.io/api-controller-ip/10-0-3-2 2009-11-10T23:00:00Z

Questions

  • Are there technical concerns implementing a fix in this manner?
@smarterclayton

This comment has been minimized.

Copy link
Contributor

commented Aug 4, 2017

Updating endpoints is bad, in that it fans out to the entire cluster (all kube-proxy, any network plugins, all ingress that watch endpoints, and any custom firewall controllers) every write. At 5k nodes that is a lot of traffic. Updating another resource like configmap is less bad. Least bad would be updating a resource that no one watches globally that is designed for this purpose.

@liggitt

This comment has been minimized.

Copy link
Member

commented Aug 4, 2017

5 minutes seems really long... I'd expect an unresponsive master to get its IP pulled out of the kubernetes service endpoints way sooner (more like a 15-30 second response time). The masters contending on a single resource to heartbeat also seems like it could be problematic.

@szuecs

This comment has been minimized.

Copy link
Member

commented Aug 8, 2017

We run 2 masters in AWS with an ASG. If we teardown one master it will take about 5-10 minutes to get a replacement, which is fine for all loadbalanced applications.

If you want to have a HA Kubernetes you have to set --apiserver-count=N, where N>1, but this will make sure that the "kubernetes" endpoints will not clean up the teardowned master from above. This is not like normal loadbalancers work and I think it is much worse, then an unavailable control plane!

% kubectl get nodes -l master
NAME                                            STATUS                     AGE       VERSION
ip-172-31-15-39.eu-central-1.compute.internal   Ready,SchedulingDisabled   18h       v1.6.7+coreos.0

% kubectl get endpoints
NAME               ENDPOINTS                                            AGE
kubernetes         172.31.15.37:443,172.31.15.38:443,172.31.15.39:443   21h
@rphillips

This comment has been minimized.

Copy link
Member

commented Aug 16, 2017

Thank you for the feedback.

Proposal

Add an kube-apiserver-endpoints ConfigMap in the kube-system namespace.

Populate the ConfigMap with the following:

kind: ConfigMap
apiVersion: v1
metadata:
  creationTimestamp: 2016-02-18T19:14:38Z
  name: kube-apiserver-endpoints
  namespace: kube-system
data: 
  192.168.0.3:
    update.timestamp=2016-02-18T19:14:38Z
  192.168.0.4:
    update.timestamp=2016-02-18T19:14:38Z

In the reconcile endpoints loop, expire the endpoint after a configured period of time (~1 minute?)

/cc @smarterclayton @liggitt

@rphillips rphillips referenced this issue Aug 17, 2017

Merged

add apiserver-count fix proposal #939

1 of 2 tasks complete

olavmrk added a commit to Uninett/kubernetes-terraform that referenced this issue Aug 22, 2017

Remove session affinity for API server service.
This session affinity causes problems if one of the API servers is
down. If a client has a connection to the API server that fails, it
will continue to connect to that node, because the session affinity
tries to steer connections back to the failed node.

(There is a related issue that causes a failed API server to never be removed from the list of valid service endpoints. See: kubernetes/kubernetes#22609)

rphillips pushed a commit to rphillips/kubernetes that referenced this issue Aug 31, 2017

Ryan Phillips
add lease endpoint reconciler
fixes kubernetes/community#939
fixes kubernetes#22609

diff --git a/pkg/election/doc.go b/pkg/election/doc.go
new file mode 100644
index 0000000000..d61d49d7bb
--- /dev/null
+++ b/pkg/election/doc.go
@@ -0,0 +1,2 @@
+// Package election provides objects for managing the list of active masters via leases.
+package election
diff --git a/pkg/election/lease_endpoint_reconciler.go b/pkg/election/lease_endpoint_reconciler.go
new file mode 100644
index 0000000000..397a174010
--- /dev/null
+++ b/pkg/election/lease_endpoint_reconciler.go
@@ -0,0 +1,228 @@
+package election
+
+import (
+	"fmt"
+	"net"
+
+	"github.com/golang/glog"
+	"k8s.io/apimachinery/pkg/api/errors"
+	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
+	kruntime "k8s.io/apimachinery/pkg/runtime"
+	apirequest "k8s.io/apiserver/pkg/endpoints/request"
+	"k8s.io/apiserver/pkg/storage"
+	"k8s.io/kubernetes/pkg/api"
+	"k8s.io/kubernetes/pkg/api/endpoints"
+	"k8s.io/kubernetes/pkg/registry/core/endpoint"
+)
+
+// Leases is an interface which assists in managing the set of active masters
+type Leases interface {
+	// ListLeases retrieves a list of the current master IPs
+	ListLeases() ([]string, error)
+
+	// UpdateLease adds or refreshes a master's lease
+	UpdateLease(ip string) error
+}
+
+type storageLeases struct {
+	storage   storage.Interface
+	baseKey   string
+	leaseTime uint64
+}
+
+var _ Leases = &storageLeases{}
+
+// ListLeases retrieves a list of the current master IPs from storage
+func (s *storageLeases) ListLeases() ([]string, error) {
+	ipInfoList := &api.EndpointsList{}
+	if err := s.storage.List(apirequest.NewDefaultContext(), s.baseKey, "0", storage.Everything, ipInfoList); err != nil {
+		return nil, err
+	}
+
+	ipList := make([]string, len(ipInfoList.Items))
+	for i, ip := range ipInfoList.Items {
+		ipList[i] = ip.Subsets[0].Addresses[0].IP
+	}
+
+	glog.V(6).Infof("Current master IPs listed in storage are %v", ipList)
+
+	return ipList, nil
+}
+
+// UpdateLease resets the TTL on a master IP in storage
+func (s *storageLeases) UpdateLease(ip string) error {
+	return s.storage.GuaranteedUpdate(apirequest.NewDefaultContext(), s.baseKey+"/"+ip, &api.Endpoints{}, true, nil, func(input kruntime.Object, respMeta storage.ResponseMeta) (kruntime.Object, *uint64, error) {
+		// just make sure we've got the right IP set, and then refresh the TTL
+		existing := input.(*api.Endpoints)
+		existing.Subsets = []api.EndpointSubset{
+			{
+				Addresses: []api.EndpointAddress{{IP: ip}},
+			},
+		}
+
+		leaseTime := s.leaseTime
+
+		// NB: GuaranteedUpdate does not perform the store operation unless
+		// something changed between load and store (not including resource
+		// version), meaning we can't refresh the TTL without actually
+		// changing a field.
+		existing.Generation += 1
+
+		glog.V(6).Infof("Resetting TTL on master IP %q listed in storage to %v", ip, leaseTime)
+
+		return existing, &leaseTime, nil
+	})
+}
+
+// NewLeases creates a new etcd-based Leases implementation.
+func NewLeases(storage storage.Interface, baseKey string, leaseTime uint64) Leases {
+	return &storageLeases{
+		storage:   storage,
+		baseKey:   baseKey,
+		leaseTime: leaseTime,
+	}
+}
+
+type leaseEndpointReconciler struct {
+	endpointRegistry endpoint.Registry
+	masterLeases     Leases
+}
+
+func NewLeaseEndpointReconciler(endpointRegistry endpoint.Registry, masterLeases Leases) *leaseEndpointReconciler {
+	return &leaseEndpointReconciler{
+		endpointRegistry: endpointRegistry,
+		masterLeases:     masterLeases,
+	}
+}
+
+// ReconcileEndpoints lists keys in a special etcd directory.
+// Each key is expected to have a TTL of R+n, where R is the refresh interval
+// at which this function is called, and n is some small value.  If an
+// apiserver goes down, it will fail to refresh its key's TTL and the key will
+// expire. ReconcileEndpoints will notice that the endpoints object is
+// different from the directory listing, and update the endpoints object
+// accordingly.
+func (r *leaseEndpointReconciler) ReconcileEndpoints(serviceName string, ip net.IP, endpointPorts []api.EndpointPort, reconcilePorts bool) error {
+	ctx := apirequest.NewDefaultContext()
+
+	// Refresh the TTL on our key, independently of whether any error or
+	// update conflict happens below. This makes sure that at least some of
+	// the masters will add our endpoint.
+	if err := r.masterLeases.UpdateLease(ip.String()); err != nil {
+		return err
+	}
+
+	// Retrieve the current list of endpoints...
+	e, err := r.endpointRegistry.GetEndpoints(ctx, serviceName, &metav1.GetOptions{})
+	if err != nil {
+		if !errors.IsNotFound(err) {
+			return err
+		}
+
+		e = &api.Endpoints{
+			ObjectMeta: metav1.ObjectMeta{
+				Name:      serviceName,
+				Namespace: api.NamespaceDefault,
+			},
+		}
+	}
+
+	// ... and the list of master IP keys from etcd
+	masterIPs, err := r.masterLeases.ListLeases()
+	if err != nil {
+		return err
+	}
+
+	// Since we just refreshed our own key, assume that zero endpoints
+	// returned from storage indicates an issue or invalid state, and thus do
+	// not update the endpoints list based on the result.
+	if len(masterIPs) == 0 {
+		return fmt.Errorf("no master IPs were listed in storage, refusing to erase all endpoints for the kubernetes service")
+	}
+
+	// Next, we compare the current list of endpoints with the list of master IP keys
+	formatCorrect, ipCorrect, portsCorrect := checkEndpointSubsetFormatWithLease(e, masterIPs, endpointPorts, reconcilePorts)
+	if formatCorrect && ipCorrect && portsCorrect {
+		return nil
+	}
+
+	if !formatCorrect {
+		// Something is egregiously wrong, just re-make the endpoints record.
+		e.Subsets = []api.EndpointSubset{{
+			Addresses: []api.EndpointAddress{},
+			Ports:     endpointPorts,
+		}}
+	}
+
+	if !formatCorrect || !ipCorrect {
+		// repopulate the addresses according to the expected IPs from etcd
+		e.Subsets[0].Addresses = make([]api.EndpointAddress, len(masterIPs))
+		for ind, ip := range masterIPs {
+			e.Subsets[0].Addresses[ind] = api.EndpointAddress{IP: ip}
+		}
+
+		// Lexicographic order is retained by this step.
+		e.Subsets = endpoints.RepackSubsets(e.Subsets)
+	}
+
+	if !portsCorrect {
+		// Reset ports.
+		e.Subsets[0].Ports = endpointPorts
+	}
+
+	glog.Warningf("Resetting endpoints for master service %q to %v", serviceName, masterIPs)
+	return r.endpointRegistry.UpdateEndpoints(ctx, e)
+}
+
+// checkEndpointSubsetFormatWithLease determines if the endpoint is in the
+// format ReconcileEndpoints expects when the controller is using leases.
+//
+// Return values:
+// * formatCorrect is true if exactly one subset is found.
+// * ipsCorrect when the addresses in the endpoints match the expected addresses list
+// * portsCorrect is true when endpoint ports exactly match provided ports.
+//     portsCorrect is only evaluated when reconcilePorts is set to true.
+func checkEndpointSubsetFormatWithLease(e *api.Endpoints, expectedIPs []string, ports []api.EndpointPort, reconcilePorts bool) (formatCorrect bool, ipsCorrect bool, portsCorrect bool) {
+	if len(e.Subsets) != 1 {
+		return false, false, false
+	}
+	sub := &e.Subsets[0]
+	portsCorrect = true
+	if reconcilePorts {
+		if len(sub.Ports) != len(ports) {
+			portsCorrect = false
+		} else {
+			for i, port := range ports {
+				if port != sub.Ports[i] {
+					portsCorrect = false
+					break
+				}
+			}
+		}
+	}
+
+	ipsCorrect = true
+	if len(sub.Addresses) != len(expectedIPs) {
+		ipsCorrect = false
+	} else {
+		// check the actual content of the addresses
+		// present addrs is used as a set (the keys) and to indicate if a
+		// value was already found (the values)
+		presentAddrs := make(map[string]bool, len(expectedIPs))
+		for _, ip := range expectedIPs {
+			presentAddrs[ip] = false
+		}
+
+		// uniqueness is assumed amongst all Addresses.
+		for _, addr := range sub.Addresses {
+			if alreadySeen, ok := presentAddrs[addr.IP]; alreadySeen || !ok {
+				ipsCorrect = false
+				break
+			}
+
+			presentAddrs[addr.IP] = true
+		}
+	}
+
+	return true, ipsCorrect, portsCorrect
+}
diff --git a/pkg/election/lease_endpoint_reconciler_test.go b/pkg/election/lease_endpoint_reconciler_test.go
new file mode 100644
index 0000000000..f5cdb2a675
--- /dev/null
+++ b/pkg/election/lease_endpoint_reconciler_test.go
@@ -0,0 +1,510 @@
+package election
+
+import (
+	"net"
+	"reflect"
+	"testing"
+
+	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
+	"k8s.io/kubernetes/pkg/api"
+	"k8s.io/kubernetes/pkg/registry/registrytest"
+)
+
+type fakeLeases struct {
+	keys map[string]bool
+}
+
+var _ Leases = &fakeLeases{}
+
+func newFakeLeases() *fakeLeases {
+	return &fakeLeases{make(map[string]bool)}
+}
+
+func (f *fakeLeases) ListLeases() ([]string, error) {
+	res := make([]string, 0, len(f.keys))
+	for ip := range f.keys {
+		res = append(res, ip)
+	}
+	return res, nil
+}
+
+func (f *fakeLeases) UpdateLease(ip string) error {
+	f.keys[ip] = true
+	return nil
+}
+
+func (f *fakeLeases) SetKeys(keys []string) {
+	for _, ip := range keys {
+		f.keys[ip] = false
+	}
+}
+
+func (f *fakeLeases) GetUpdatedKeys() []string {
+	res := []string{}
+	for ip, updated := range f.keys {
+		if updated {
+			res = append(res, ip)
+		}
+	}
+	return res
+}
+
+func TestLeaseEndpointReconciler(t *testing.T) {
+	ns := api.NamespaceDefault
+	om := func(name string) metav1.ObjectMeta {
+		return metav1.ObjectMeta{Namespace: ns, Name: name}
+	}
+	reconcile_tests := []struct {
+		testName      string
+		serviceName   string
+		ip            string
+		endpointPorts []api.EndpointPort
+		endpointKeys  []string
+		endpoints     *api.EndpointsList
+		expectUpdate  *api.Endpoints // nil means none expected
+	}{
+		{
+			testName:      "no existing endpoints",
+			serviceName:   "foo",
+			ip:            "1.2.3.4",
+			endpointPorts: []api.EndpointPort{{Name: "foo", Port: 8080, Protocol: "TCP"}},
+			endpoints:     nil,
+			expectUpdate: &api.Endpoints{
+				ObjectMeta: om("foo"),
+				Subsets: []api.EndpointSubset{{
+					Addresses: []api.EndpointAddress{{IP: "1.2.3.4"}},
+					Ports:     []api.EndpointPort{{Name: "foo", Port: 8080, Protocol: "TCP"}},
+				}},
+			},
+		},
+		{
+			testName:      "existing endpoints satisfy",
+			serviceName:   "foo",
+			ip:            "1.2.3.4",
+			endpointPorts: []api.EndpointPort{{Name: "foo", Port: 8080, Protocol: "TCP"}},
+			endpoints: &api.EndpointsList{
+				Items: []api.Endpoints{{
+					ObjectMeta: om("foo"),
+					Subsets: []api.EndpointSubset{{
+						Addresses: []api.EndpointAddress{{IP: "1.2.3.4"}},
+						Ports:     []api.EndpointPort{{Name: "foo", Port: 8080, Protocol: "TCP"}},
+					}},
+				}},
+			},
+		},
+		{
+			testName:      "existing endpoints satisfy + refresh existing key",
+			serviceName:   "foo",
+			ip:            "1.2.3.4",
+			endpointPorts: []api.EndpointPort{{Name: "foo", Port: 8080, Protocol: "TCP"}},
+			endpointKeys:  []string{"1.2.3.4"},
+			endpoints: &api.EndpointsList{
+				Items: []api.Endpoints{{
+					ObjectMeta: om("foo"),
+					Subsets: []api.EndpointSubset{{
+						Addresses: []api.EndpointAddress{{IP: "1.2.3.4"}},
+						Ports:     []api.EndpointPort{{Name: "foo", Port: 8080, Protocol: "TCP"}},
+					}},
+				}},
+			},
+		},
+		{
+			testName:      "existing endpoints satisfy but too many",
+			serviceName:   "foo",
+			ip:            "1.2.3.4",
+			endpointPorts: []api.EndpointPort{{Name: "foo", Port: 8080, Protocol: "TCP"}},
+			endpoints: &api.EndpointsList{
+				Items: []api.Endpoints{{
+					ObjectMeta: om("foo"),
+					Subsets: []api.EndpointSubset{{
+						Addresses: []api.EndpointAddress{{IP: "1.2.3.4"}, {IP: "4.3.2.1"}},
+						Ports:     []api.EndpointPort{{Name: "foo", Port: 8080, Protocol: "TCP"}},
+					}},
+				}},
+			},
+			expectUpdate: &api.Endpoints{
+				ObjectMeta: om("foo"),
+				Subsets: []api.EndpointSubset{{
+					Addresses: []api.EndpointAddress{{IP: "1.2.3.4"}},
+					Ports:     []api.EndpointPort{{Name: "foo", Port: 8080, Protocol: "TCP"}},
+				}},
+			},
+		},
+		{
+			testName:      "existing endpoints satisfy but too many + extra masters",
+			serviceName:   "foo",
+			ip:            "1.2.3.4",
+			endpointPorts: []api.EndpointPort{{Name: "foo", Port: 8080, Protocol: "TCP"}},
+			endpointKeys:  []string{"1.2.3.4", "4.3.2.2", "4.3.2.3", "4.3.2.4"},
+			endpoints: &api.EndpointsList{
+				Items: []api.Endpoints{{
+					ObjectMeta: om("foo"),
+					Subsets: []api.EndpointSubset{{
+						Addresses: []api.EndpointAddress{
+							{IP: "1.2.3.4"},
+							{IP: "4.3.2.1"},
+							{IP: "4.3.2.2"},
+							{IP: "4.3.2.3"},
+							{IP: "4.3.2.4"},
+						},
+						Ports: []api.EndpointPort{{Name: "foo", Port: 8080, Protocol: "TCP"}},
+					}},
+				}},
+			},
+			expectUpdate: &api.Endpoints{
+				ObjectMeta: om("foo"),
+				Subsets: []api.EndpointSubset{{
+					Addresses: []api.EndpointAddress{
+						{IP: "1.2.3.4"},
+						{IP: "4.3.2.2"},
+						{IP: "4.3.2.3"},
+						{IP: "4.3.2.4"},
+					},
+					Ports: []api.EndpointPort{{Name: "foo", Port: 8080, Protocol: "TCP"}},
+				}},
+			},
+		},
+		{
+			testName:      "existing endpoints satisfy but too many + extra masters + delete first",
+			serviceName:   "foo",
+			ip:            "4.3.2.4",
+			endpointPorts: []api.EndpointPort{{Name: "foo", Port: 8080, Protocol: "TCP"}},
+			endpointKeys:  []string{"4.3.2.1", "4.3.2.2", "4.3.2.3", "4.3.2.4"},
+			endpoints: &api.EndpointsList{
+				Items: []api.Endpoints{{
+					ObjectMeta: om("foo"),
+					Subsets: []api.EndpointSubset{{
+						Addresses: []api.EndpointAddress{
+							{IP: "1.2.3.4"},
+							{IP: "4.3.2.1"},
+							{IP: "4.3.2.2"},
+							{IP: "4.3.2.3"},
+							{IP: "4.3.2.4"},
+						},
+						Ports: []api.EndpointPort{{Name: "foo", Port: 8080, Protocol: "TCP"}},
+					}},
+				}},
+			},
+			expectUpdate: &api.Endpoints{
+				ObjectMeta: om("foo"),
+				Subsets: []api.EndpointSubset{{
+					Addresses: []api.EndpointAddress{
+						{IP: "4.3.2.1"},
+						{IP: "4.3.2.2"},
+						{IP: "4.3.2.3"},
+						{IP: "4.3.2.4"},
+					},
+					Ports: []api.EndpointPort{{Name: "foo", Port: 8080, Protocol: "TCP"}},
+				}},
+			},
+		},
+		{
+			testName:      "existing endpoints current IP missing",
+			serviceName:   "foo",
+			ip:            "4.3.2.2",
+			endpointPorts: []api.EndpointPort{{Name: "foo", Port: 8080, Protocol: "TCP"}},
+			endpointKeys:  []string{"4.3.2.1"},
+			endpoints: &api.EndpointsList{
+				Items: []api.Endpoints{{
+					ObjectMeta: om("foo"),
+					Subsets: []api.EndpointSubset{{
+						Addresses: []api.EndpointAddress{
+							{IP: "4.3.2.1"},
+						},
+						Ports: []api.EndpointPort{{Name: "foo", Port: 8080, Protocol: "TCP"}},
+					}},
+				}},
+			},
+			expectUpdate: &api.Endpoints{
+				ObjectMeta: om("foo"),
+				Subsets: []api.EndpointSubset{{
+					Addresses: []api.EndpointAddress{
+						{IP: "4.3.2.1"},
+						{IP: "4.3.2.2"},
+					},
+					Ports: []api.EndpointPort{{Name: "foo", Port: 8080, Protocol: "TCP"}},
+				}},
+			},
+		},
+		{
+			testName:      "existing endpoints wrong name",
+			serviceName:   "foo",
+			ip:            "1.2.3.4",
+			endpointPorts: []api.EndpointPort{{Name: "foo", Port: 8080, Protocol: "TCP"}},
+			endpoints: &api.EndpointsList{
+				Items: []api.Endpoints{{
+					ObjectMeta: om("bar"),
+					Subsets: []api.EndpointSubset{{
+						Addresses: []api.EndpointAddress{{IP: "1.2.3.4"}},
+						Ports:     []api.EndpointPort{{Name: "foo", Port: 8080, Protocol: "TCP"}},
+					}},
+				}},
+			},
+			expectUpdate: &api.Endpoints{
+				ObjectMeta: om("foo"),
+				Subsets: []api.EndpointSubset{{
+					Addresses: []api.EndpointAddress{{IP: "1.2.3.4"}},
+					Ports:     []api.EndpointPort{{Name: "foo", Port: 8080, Protocol: "TCP"}},
+				}},
+			},
+		},
+		{
+			testName:      "existing endpoints wrong IP",
+			serviceName:   "foo",
+			ip:            "1.2.3.4",
+			endpointPorts: []api.EndpointPort{{Name: "foo", Port: 8080, Protocol: "TCP"}},
+			endpoints: &api.EndpointsList{
+				Items: []api.Endpoints{{
+					ObjectMeta: om("foo"),
+					Subsets: []api.EndpointSubset{{
+						Addresses: []api.EndpointAddress{{IP: "4.3.2.1"}},
+						Ports:     []api.EndpointPort{{Name: "foo", Port: 8080, Protocol: "TCP"}},
+					}},
+				}},
+			},
+			expectUpdate: &api.Endpoints{
+				ObjectMeta: om("foo"),
+				Subsets: []api.EndpointSubset{{
+					Addresses: []api.EndpointAddress{{IP: "1.2.3.4"}},
+					Ports:     []api.EndpointPort{{Name: "foo", Port: 8080, Protocol: "TCP"}},
+				}},
+			},
+		},
+		{
+			testName:      "existing endpoints wrong port",
+			serviceName:   "foo",
+			ip:            "1.2.3.4",
+			endpointPorts: []api.EndpointPort{{Name: "foo", Port: 8080, Protocol: "TCP"}},
+			endpoints: &api.EndpointsList{
+				Items: []api.Endpoints{{
+					ObjectMeta: om("foo"),
+					Subsets: []api.EndpointSubset{{
+						Addresses: []api.EndpointAddress{{IP: "1.2.3.4"}},
+						Ports:     []api.EndpointPort{{Name: "foo", Port: 9090, Protocol: "TCP"}},
+					}},
+				}},
+			},
+			expectUpdate: &api.Endpoints{
+				ObjectMeta: om("foo"),
+				Subsets: []api.EndpointSubset{{
+					Addresses: []api.EndpointAddress{{IP: "1.2.3.4"}},
+					Ports:     []api.EndpointPort{{Name: "foo", Port: 8080, Protocol: "TCP"}},
+				}},
+			},
+		},
+		{
+			testName:      "existing endpoints wrong protocol",
+			serviceName:   "foo",
+			ip:            "1.2.3.4",
+			endpointPorts: []api.EndpointPort{{Name: "foo", Port: 8080, Protocol: "TCP"}},
+			endpoints: &api.EndpointsList{
+				Items: []api.Endpoints{{
+					ObjectMeta: om("foo"),
+					Subsets: []api.EndpointSubset{{
+						Addresses: []api.EndpointAddress{{IP: "1.2.3.4"}},
+						Ports:     []api.EndpointPort{{Name: "foo", Port: 8080, Protocol: "UDP"}},
+					}},
+				}},
+			},
+			expectUpdate: &api.Endpoints{
+				ObjectMeta: om("foo"),
+				Subsets: []api.EndpointSubset{{
+					Addresses: []api.EndpointAddress{{IP: "1.2.3.4"}},
+					Ports:     []api.EndpointPort{{Name: "foo", Port: 8080, Protocol: "TCP"}},
+				}},
+			},
+		},
+		{
+			testName:      "existing endpoints wrong port name",
+			serviceName:   "foo",
+			ip:            "1.2.3.4",
+			endpointPorts: []api.EndpointPort{{Name: "baz", Port: 8080, Protocol: "TCP"}},
+			endpoints: &api.EndpointsList{
+				Items: []api.Endpoints{{
+					ObjectMeta: om("foo"),
+					Subsets: []api.EndpointSubset{{
+						Addresses: []api.EndpointAddress{{IP: "1.2.3.4"}},
+						Ports:     []api.EndpointPort{{Name: "foo", Port: 8080, Protocol: "TCP"}},
+					}},
+				}},
+			},
+			expectUpdate: &api.Endpoints{
+				ObjectMeta: om("foo"),
+				Subsets: []api.EndpointSubset{{
+					Addresses: []api.EndpointAddress{{IP: "1.2.3.4"}},
+					Ports:     []api.EndpointPort{{Name: "baz", Port: 8080, Protocol: "TCP"}},
+				}},
+			},
+		},
+		{
+			testName:    "existing endpoints extra service ports satisfy",
+			serviceName: "foo",
+			ip:          "1.2.3.4",
+			endpointPorts: []api.EndpointPort{
+				{Name: "foo", Port: 8080, Protocol: "TCP"},
+				{Name: "bar", Port: 1000, Protocol: "TCP"},
+				{Name: "baz", Port: 1010, Protocol: "TCP"},
+			},
+			endpoints: &api.EndpointsList{
+				Items: []api.Endpoints{{
+					ObjectMeta: om("foo"),
+					Subsets: []api.EndpointSubset{{
+						Addresses: []api.EndpointAddress{{IP: "1.2.3.4"}},
+						Ports: []api.EndpointPort{
+							{Name: "foo", Port: 8080, Protocol: "TCP"},
+							{Name: "bar", Port: 1000, Protocol: "TCP"},
+							{Name: "baz", Port: 1010, Protocol: "TCP"},
+						},
+					}},
+				}},
+			},
+		},
+		{
+			testName:    "existing endpoints extra service ports missing port",
+			serviceName: "foo",
+			ip:          "1.2.3.4",
+			endpointPorts: []api.EndpointPort{
+				{Name: "foo", Port: 8080, Protocol: "TCP"},
+				{Name: "bar", Port: 1000, Protocol: "TCP"},
+			},
+			endpoints: &api.EndpointsList{
+				Items: []api.Endpoints{{
+					ObjectMeta: om("foo"),
+					Subsets: []api.EndpointSubset{{
+						Addresses: []api.EndpointAddress{{IP: "1.2.3.4"}},
+						Ports:     []api.EndpointPort{{Name: "foo", Port: 8080, Protocol: "TCP"}},
+					}},
+				}},
+			},
+			expectUpdate: &api.Endpoints{
+				ObjectMeta: om("foo"),
+				Subsets: []api.EndpointSubset{{
+					Addresses: []api.EndpointAddress{{IP: "1.2.3.4"}},
+					Ports: []api.EndpointPort{
+						{Name: "foo", Port: 8080, Protocol: "TCP"},
+						{Name: "bar", Port: 1000, Protocol: "TCP"},
+					},
+				}},
+			},
+		},
+	}
+	for _, test := range reconcile_tests {
+		fakeLeases := newFakeLeases()
+		fakeLeases.SetKeys(test.endpointKeys)
+		registry := &registrytest.EndpointRegistry{
+			Endpoints: test.endpoints,
+		}
+		r := NewLeaseEndpointReconciler(registry, fakeLeases)
+		err := r.ReconcileEndpoints(test.serviceName, net.ParseIP(test.ip), test.endpointPorts, true)
+		if err != nil {
+			t.Errorf("case %q: unexpected error: %v", test.testName, err)
+		}
+		if test.expectUpdate != nil {
+			if len(registry.Updates) != 1 {
+				t.Errorf("case %q: unexpected updates: %v", test.testName, registry.Updates)
+			} else if e, a := test.expectUpdate, &registry.Updates[0]; !reflect.DeepEqual(e, a) {
+				t.Errorf("case %q: expected update:\n%#v\ngot:\n%#v\n", test.testName, e, a)
+			}
+		}
+		if test.expectUpdate == nil && len(registry.Updates) > 0 {
+			t.Errorf("case %q: no update expected, yet saw: %v", test.testName, registry.Updates)
+		}
+		if updatedKeys := fakeLeases.GetUpdatedKeys(); len(updatedKeys) != 1 || updatedKeys[0] != test.ip {
+			t.Errorf("case %q: expected the master's IP to be refreshed, but the following IPs were refreshed instead: %v", test.testName, updatedKeys)
+		}
+	}
+
+	non_reconcile_tests := []struct {
+		testName      string
+		serviceName   string
+		ip            string
+		endpointPorts []api.EndpointPort
+		endpointKeys  []string
+		endpoints     *api.EndpointsList
+		expectUpdate  *api.Endpoints // nil means none expected
+	}{
+		{
+			testName:    "existing endpoints extra service ports missing port no update",
+			serviceName: "foo",
+			ip:          "1.2.3.4",
+			endpointPorts: []api.EndpointPort{
+				{Name: "foo", Port: 8080, Protocol: "TCP"},
+				{Name: "bar", Port: 1000, Protocol: "TCP"},
+			},
+			endpoints: &api.EndpointsList{
+				Items: []api.Endpoints{{
+					ObjectMeta: om("foo"),
+					Subsets: []api.EndpointSubset{{
+						Addresses: []api.EndpointAddress{{IP: "1.2.3.4"}},
+						Ports:     []api.EndpointPort{{Name: "foo", Port: 8080, Protocol: "TCP"}},
+					}},
+				}},
+			},
+			expectUpdate: nil,
+		},
+		{
+			testName:    "existing endpoints extra service ports, wrong ports, wrong IP",
+			serviceName: "foo",
+			ip:          "1.2.3.4",
+			endpointPorts: []api.EndpointPort{
+				{Name: "foo", Port: 8080, Protocol: "TCP"},
+				{Name: "bar", Port: 1000, Protocol: "TCP"},
+			},
+			endpoints: &api.EndpointsList{
+				Items: []api.Endpoints{{
+					ObjectMeta: om("foo"),
+					Subsets: []api.EndpointSubset{{
+						Addresses: []api.EndpointAddress{{IP: "4.3.2.1"}},
+						Ports:     []api.EndpointPort{{Name: "foo", Port: 8080, Protocol: "TCP"}},
+					}},
+				}},
+			},
+			expectUpdate: &api.Endpoints{
+				ObjectMeta: om("foo"),
+				Subsets: []api.EndpointSubset{{
+					Addresses: []api.EndpointAddress{{IP: "1.2.3.4"}},
+					Ports:     []api.EndpointPort{{Name: "foo", Port: 8080, Protocol: "TCP"}},
+				}},
+			},
+		},
+		{
+			testName:      "no existing endpoints",
+			serviceName:   "foo",
+			ip:            "1.2.3.4",
+			endpointPorts: []api.EndpointPort{{Name: "foo", Port: 8080, Protocol: "TCP"}},
+			endpoints:     nil,
+			expectUpdate: &api.Endpoints{
+				ObjectMeta: om("foo"),
+				Subsets: []api.EndpointSubset{{
+					Addresses: []api.EndpointAddress{{IP: "1.2.3.4"}},
+					Ports:     []api.EndpointPort{{Name: "foo", Port: 8080, Protocol: "TCP"}},
+				}},
+			},
+		},
+	}
+	for _, test := range non_reconcile_tests {
+		fakeLeases := newFakeLeases()
+		fakeLeases.SetKeys(test.endpointKeys)
+		registry := &registrytest.EndpointRegistry{
+			Endpoints: test.endpoints,
+		}
+		r := NewLeaseEndpointReconciler(registry, fakeLeases)
+		err := r.ReconcileEndpoints(test.serviceName, net.ParseIP(test.ip), test.endpointPorts, false)
+		if err != nil {
+			t.Errorf("case %q: unexpected error: %v", test.testName, err)
+		}
+		if test.expectUpdate != nil {
+			if len(registry.Updates) != 1 {
+				t.Errorf("case %q: unexpected updates: %v", test.testName, registry.Updates)
+			} else if e, a := test.expectUpdate, &registry.Updates[0]; !reflect.DeepEqual(e, a) {
+				t.Errorf("case %q: expected update:\n%#v\ngot:\n%#v\n", test.testName, e, a)
+			}
+		}
+		if test.expectUpdate == nil && len(registry.Updates) > 0 {
+			t.Errorf("case %q: no update expected, yet saw: %v", test.testName, registry.Updates)
+		}
+		if updatedKeys := fakeLeases.GetUpdatedKeys(); len(updatedKeys) != 1 || updatedKeys[0] != test.ip {
+			t.Errorf("case %q: expected the master's IP to be refreshed, but the following IPs were refreshed instead: %v", test.testName, updatedKeys)
+		}
+	}
+}
diff --git a/pkg/master/master.go b/pkg/master/master.go
index 97c5c5357b..d140b179a0 100644
--- a/pkg/master/master.go
+++ b/pkg/master/master.go
@@ -51,12 +51,17 @@ import (
 	genericapiserver "k8s.io/apiserver/pkg/server"
 	"k8s.io/apiserver/pkg/server/healthz"
 	serverstorage "k8s.io/apiserver/pkg/server/storage"
+	storagefactory "k8s.io/apiserver/pkg/storage/storagebackend/factory"
 	corev1client "k8s.io/client-go/kubernetes/typed/core/v1"
 	"k8s.io/kubernetes/cmd/kube-apiserver/app/options"
 	"k8s.io/kubernetes/pkg/api"
+	kapi "k8s.io/kubernetes/pkg/api"
 	coreclient "k8s.io/kubernetes/pkg/client/clientset_generated/internalclientset/typed/core/internalversion"
+	election "k8s.io/kubernetes/pkg/election"
 	kubeletclient "k8s.io/kubernetes/pkg/kubelet/client"
 	"k8s.io/kubernetes/pkg/master/tunneler"
+	"k8s.io/kubernetes/pkg/registry/core/endpoint"
+	endpointsstorage "k8s.io/kubernetes/pkg/registry/core/endpoint/storage"
 	"k8s.io/kubernetes/pkg/routes"
 	nodeutil "k8s.io/kubernetes/pkg/util/node"

@@ -87,6 +92,16 @@ const (
 	DefaultEndpointReconcilerInterval = 10 * time.Second
 )

+// EndpointReconcilerEnum selects which reconciler to use
+type EndpointReconcilerEnum int
+
+const (
+	// DefaultMasterCountReconciler will select the original reconciler
+	DefaultMasterCountReconciler = 0
+	// LeaseEndpointReconciler will select a storage based reconciler
+	LeaseEndpointReconciler = iota
+)
+
 type Config struct {
 	GenericConfig *genericapiserver.Config

@@ -135,6 +150,12 @@ type Config struct {
 	// Number of masters running; all masters must be started with the
 	// same value for this field. (Numbers > 1 currently untested.)
 	MasterCount int
+
+	// out of the kubernetes service record. It is not recommended to set this value below 15s.
+	MasterEndpointReconcileTTL int
+
+	// Selects which reconciler to use
+	EndpointReconcilerEnum int
 }

 // EndpointReconcilerConfig holds the endpoint reconciler and endpoint reconciliation interval to be
@@ -155,6 +176,49 @@ type completedConfig struct {
 	*Config
 }

+func (c *Config) createMasterCountReconciler() EndpointReconciler {
+	// use a default endpoint reconciler if nothing is set
+	endpointClient := coreclient.NewForConfigOrDie(c.GenericConfig.LoopbackClientConfig)
+	return NewMasterCountEndpointReconciler(c.MasterCount, endpointClient)
+}
+
+func (c *Config) createLeaseReconciler() EndpointReconciler {
+	ttl := c.MasterEndpointReconcileTTL
+	config, err := c.StorageFactory.NewConfig(kapi.Resource("apiServerIPInfo"))
+	if err != nil {
+		glog.Fatalf("Error determining service IP ranges: %v", err)
+	}
+	leaseStorage, _, err := storagefactory.Create(*config)
+	if err != nil {
+		glog.Fatalf("Error creating storage factory: %v", err)
+	}
+	endpointConfig, err := c.StorageFactory.NewConfig(kapi.Resource("endpoints"))
+	if err != nil {
+		glog.Fatalf("Error getting storage config: %v", err)
+	}
+	endpointsStorage := endpointsstorage.NewREST(generic.RESTOptions{
+		StorageConfig:           endpointConfig,
+		Decorator:               generic.UndecoratedStorage,
+		DeleteCollectionWorkers: 0,
+		ResourcePrefix:          c.StorageFactory.ResourcePrefix(kapi.Resource("endpoints")),
+	})
+	endpointRegistry := endpoint.NewRegistry(endpointsStorage)
+	masterLeases := election.NewLeases(leaseStorage, "/masterleases/", uint64(ttl))
+	return election.NewLeaseEndpointReconciler(endpointRegistry, masterLeases)
+}
+
+func (c *Config) createEndpointReconciler() EndpointReconciler {
+	switch c.EndpointReconcilerEnum {
+	case DefaultMasterCountReconciler:
+		return c.createMasterCountReconciler()
+	case LeaseEndpointReconciler:
+		return c.createLeaseReconciler()
+	default:
+		glog.Fatalf("Reconciler not implemented: %v", c.EndpointReconcilerEnum)
+	}
+	return nil
+}
+
 // Complete fills in any fields not set that are required to have valid data. It's mutating the receiver.
 func (c *Config) Complete() completedConfig {
 	c.GenericConfig.Complete()
@@ -169,6 +233,9 @@ func (c *Config) Complete() completedConfig {
 	if c.APIServerServiceIP == nil {
 		c.APIServerServiceIP = apiServerServiceIP
 	}
+	if c.MasterEndpointReconcileTTL == 0 {
+		c.MasterEndpointReconcileTTL = 15
+	}

 	discoveryAddresses := discovery.DefaultAddresses{DefaultAddress: c.GenericConfig.ExternalAddress}
 	discoveryAddresses.CIDRRules = append(discoveryAddresses.CIDRRules,
@@ -192,9 +259,7 @@ func (c *Config) Complete() completedConfig {
 	}

 	if c.EndpointReconcilerConfig.Reconciler == nil {
-		// use a default endpoint reconciler if nothing is set
-		endpointClient := coreclient.NewForConfigOrDie(c.GenericConfig.LoopbackClientConfig)
-		c.EndpointReconcilerConfig.Reconciler = NewMasterCountEndpointReconciler(c.MasterCount, endpointClient)
+		c.EndpointReconcilerConfig.Reconciler = c.createEndpointReconciler()
 	}

 	// this has always been hardcoded true in the past

pass down the endpoint reconciler type

fixup bazel

lint

add headers

rphillips pushed a commit to rphillips/kubernetes that referenced this issue Aug 31, 2017

rphillips pushed a commit to rphillips/kubernetes that referenced this issue Aug 31, 2017

k8s-github-robot pushed a commit to kubernetes/community that referenced this issue Aug 31, 2017

Kubernetes Submit Queue
Merge pull request #939 from rphillips/fixes/apiserver-count-fix
Automatic merge from submit-queue

add apiserver-count fix proposal

This is a proposal to fix the apiserver-count issue at kubernetes/kubernetes#22609. I would appreciate a review on the proposal.

- [x] Add ConfigMap for configurable options
- [ ] Find out dependencies on the Endpoints API and add them to the proposal

rphillips pushed a commit to rphillips/kubernetes that referenced this issue Sep 11, 2017

hh pushed a commit to ii/kubernetes that referenced this issue Sep 23, 2017

Kubernetes Submit Queue
Merge pull request kubernetes#51698 from rphillips/feat/lease_endpoin…
…t_reconciler

Automatic merge from submit-queue (batch tested with PRs 52240, 48145, 52220, 51698, 51777). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..

add lease endpoint reconciler

**What this PR does / why we need it**: Adds OpenShift's LeaseEndpointReconciler to register kube-apiserver endpoints within the storage registry.

Adds a command-line argument `alpha-endpoint-reconciler-type` to the kube-apiserver.

Defaults to the old MasterCount reconciler.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes kubernetes/community#939 fixes kubernetes#22609

**Release note**:
```release-note
Adds a command-line argument to kube-apiserver called
--alpha-endpoint-reconciler-type=(master-count, lease, none) (default
"master-count"). The original reconciler is 'master-count'. The 'lease'
reconciler uses the storageapi and a TTL to keep alive an endpoint within the
`kube-apiserver-endpoint` storage namespace. The 'none' reconciler is a noop
reconciler that does not do anything. This is useful for self-hosted
environments.
```

/cc @lavalamp @smarterclayton @ncdc

gurvindersingh added a commit to Uninett/daas-kube that referenced this issue Jan 11, 2018

Remove session affinity for API server service.
This session affinity causes problems if one of the API servers is
down. If a client has a connection to the API server that fails, it
will continue to connect to that node, because the session affinity
tries to steer connections back to the failed node.

(There is a related issue that causes a failed API server to never be removed from the list of valid service endpoints. See: kubernetes/kubernetes#22609)

justaugustus pushed a commit to justaugustus/enhancements that referenced this issue Sep 3, 2018

Kubernetes Submit Queue
Merge pull request kubernetes#939 from rphillips/fixes/apiserver-coun…
…t-fix

Automatic merge from submit-queue

add apiserver-count fix proposal

This is a proposal to fix the apiserver-count issue at kubernetes/kubernetes#22609. I would appreciate a review on the proposal.

- [x] Add ConfigMap for configurable options
- [ ] Find out dependencies on the Endpoints API and add them to the proposal

bergmannf added a commit to bergmannf/salt that referenced this issue Feb 27, 2019

Automatically update the kubernetes-service endpoint.
When a cluster is bootstrapped with multiple kube-apiservers, the `kubernetes`
service contains a list of all of these endpoints.

By default, this list of endpoints will *not* be updated if one of the
apiservers goes down. This can lead to the api becoming unresponsive and
breaking it. To have the endpoints automatically keep track of the apiservers
that are available the `--endpoint-reconciler-type` option `lease` needs to be
added.

(The default option for 1.10 `master-count` only changes the endpoint when the
count changes: apprenda/kismatic#987)

See:

kubernetes/kubernetes#22609
kubernetes/kubernetes#56584
kubernetes/kubernetes#51698

bergmannf added a commit to bergmannf/salt that referenced this issue Feb 27, 2019

Automatically update the kubernetes-service endpoint.
When a cluster is bootstrapped with multiple kube-apiservers, the `kubernetes`
service contains a list of all of these endpoints.

By default, this list of endpoints will *not* be updated if one of the
apiservers goes down. This can lead to the api becoming unresponsive and
breaking it. To have the endpoints automatically keep track of the apiservers
that are available the `--endpoint-reconciler-type` option `lease` needs to be
added.

(The default option for 1.10 `master-count` only changes the endpoint when the
count changes: apprenda/kismatic#987)

See:

kubernetes/kubernetes#22609
kubernetes/kubernetes#56584
kubernetes/kubernetes#51698

bergmannf added a commit to bergmannf/salt that referenced this issue Feb 27, 2019

Automatically update the kubernetes-service endpoint.
When a cluster is bootstrapped with multiple kube-apiservers, the `kubernetes`
service contains a list of all of these endpoints.

By default, this list of endpoints will *not* be updated if one of the
apiservers goes down. This can lead to the api becoming unresponsive and
breaking it. To have the endpoints automatically keep track of the apiservers
that are available the `--endpoint-reconciler-type` option `lease` needs to be
added.

(The default option for 1.10 `master-count` only changes the endpoint when the
count changes: apprenda/kismatic#987)

See:

kubernetes/kubernetes#22609
kubernetes/kubernetes#56584
kubernetes/kubernetes#51698
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.