give the kubernetes service client ip session affinity #23129

mikedanese · 2016-03-17T16:34:21Z

No description provided.

k8s-github-robot · 2016-03-17T16:38:36Z

Labelling this PR as size/M

k8s-bot · 2016-03-17T17:12:16Z

GCE e2e build/test passed for commit 127eaf26fa164f264688576d3a120536ffe45248.

ncdc · 2016-03-17T17:18:59Z

FYI @kubernetes/rh-cluster-infra @kubernetes/rh-platform-management @kubernetes/rh-networking

k8s-bot · 2016-03-17T17:20:09Z

GCE e2e build/test passed for commit e21ebbc.

pweil- · 2016-03-17T17:25:33Z

@mikedanese - what's the motivation for this? I don't object to the change, just curious

mikedanese · 2016-03-17T17:34:07Z

@pweil- we talked about this in the sig HA meeting with @smarterclayton and @lavalamp. The idea is that if we start to run controllers like scheduler and controller-manager onto the cluster in HA configurations, they should talk to the same API server if possible. Because each apiserver serves from a cache (unless the linearize read flag is passed explicitly) and each apiserver cache could be a different degree behind etcd. This could cause bugs in naive clients that don't handle this. We can avoid these bugs by adding clientIP session affinity.

pweil- · 2016-03-17T17:35:09Z

👍 thanks

lavalamp · 2016-03-17T20:43:01Z

LGTM. Makes it a bit safer to run multi-apiserver without turning on consistent reads.

k8s-github-robot · 2016-03-17T22:29:07Z

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

k8s-bot · 2016-03-17T23:08:38Z

GCE e2e build/test passed for commit e21ebbc.

k8s-github-robot · 2016-03-17T23:11:53Z

Automatic merge from submit-queue

Auto commit by PR queue bot

jsravn · 2017-06-14T14:48:06Z

Is there a way to disable this? We ran into an issue where a single apiserver had an outage, in a multi master setup. Because of this issue change, some of the clients were unable to reconnect because they kept retrying the same apiserver. We enable quorum reads (because we had transient problems w/ it disabled), so don't care about isolating connections. All our client connections to the apiserver are long lived anyways.

jsravn · 2017-06-14T14:50:45Z

I suppose we'll just apply our own service to override it. But I think this change is wrong in multi master setups (sort of defeats the point of having multi master if your clients can't reconnect to the other apiservers...).

mikedanese · 2017-06-14T14:56:39Z

What we should fix is the latency of a bad endpoint being remove from the master service. This change is orthogonal. Right now they are not removed. Is having 1/3 of all requests fail for all clients worse then having all requests fail for 1/3 of clients? The controller-manager will be able to continue working in case 1 but not in case 2.

Regardless we should just fix the kubernetes service to remove down endpoints.

jsravn · 2017-06-15T15:40:17Z

Considering client-go auto retries failed connections, I think it's better to have requests fail 1/3rd of the time rather than always for 1/3rd of clients. In the latter case, there is no way to recover on the client.

tatsuhiro-t · 2017-09-26T01:10:20Z

I completely agree with @jsravn
We have been suffered from this issue, and a failure of a just single master makes some workers unusable which is really not HA. I think this fix should be reverted until we "fix the kubernetes service to remove down endpoints", or just give us the knob to disable this.

tatsuhiro-t · 2017-09-26T01:20:50Z

I just found #51698
I hope it fixes the fixing down endpoints, and is going to be available in v1.8 release.

liggitt · 2017-09-26T01:22:27Z

That is targeted for 1.9. It’s a significant change to add at the tail end of the release.

tatsuhiro-t · 2017-09-26T01:56:45Z

Thank you for reply. We'll wait for v1.9. For another 3 or 4 month, we have to workaround the endpoints issue. For some reason, we cannot use LB, so probably 1) use --api-server-count=1 and let masters fight, and some more frequent iptables updates, or 2) build our own hyperkube without this change, whichever less painful.

redbaron · 2017-11-22T19:24:34Z

Just been hit by this :( Worstest thing is that there is no way to switch back to the old way of doing things without recompiling.

Per comments above, this change intent is to help scheduler & controller manager in HA setup, but effect of this change is for every user of 'kubernetes.svc' in the cluster :( That is not to mention, that in every HA setup I've heard of, controller managers run on same nodes as apiservers, so they can use $(NODE_IP) to connect to, practically without loosing any resilience (if local apiserver is down, lock would be released and another controller manager would take over), in fact it is even more reliable as it doesn't depend on kube-proxy being healthy.

jsravn · 2017-12-01T11:42:22Z

As a workaround you can apply your own service file for the apiserver:

apiVersion: v1
kind: Service
metadata:
  labels:
    component: apiserver
    provider: kubernetes
  name: kubernetes
  namespace: default
spec:
  clusterIP: {{ apiserver_serviceip }}
  ports:
  - name: https
    port: 443
    protocol: TCP
    targetPort: 443
  sessionAffinity: None
  type: ClusterIP

redbaron · 2017-12-01T12:40:03Z

That is ugly. If service resource is managed by something, going and interfering with it even if it works today is bad. There should be no easy way to paint yourself into a corner. Only thing which saves numerous kubernetes deployment from hitting deadlock is that apiserver-count is 1 by default.

jsravn · 2017-12-01T14:13:52Z

It's ugly sure, but it works in all versions of k8s until it gets fixed. I don't agree w/ this PR (it flat out broke HA resiliency), but I'm not sure how much luck you'll have reverting it - seems like the plan is to wait on apiserver endpoint updates.

tatsuhiro-t · 2017-12-01T14:17:29Z

At the moment, our workaround is running our own reconciler controller which watches master Nodes resources, and then synchronizes default/kubernetes Endpoints with them.
Hopefully, we can get rid of it when k8s v1.9 with lease reconciler is released.

jsravn · 2017-12-01T14:22:35Z

What happens if the reconciler controller itself can't connect to apiserver due to session affinity?

tatsuhiro-t · 2017-12-01T14:35:35Z

reconciler controller runs on master node, and it can connect to apiserver on the same node without using service ip, so there is no session affinity involved.

@mikedanese

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Disable session affinity for internal kuberntes service Under following conditions session affinity leads to a deadlock: - Self hosted controller-manager, where it talks to API servers via kubernetes service ClusterIP - default master-count reconcilier is used - --apiserver-count is set to >1 according to the help message - number of responsive APIServers goes below `apiserver-count` - all controller-managers happen to be hashed to APIServers which are down. What then happens is that controller managers never be able to contact APIServer, despite correctly working APIServer available. Less serious outages also possible for other consumers of kubernetes service, such as operators, kube-dns, flannel & calico, etc. There is always non zero chance, that given consumer is hashed to an apiserver which is down. This reverts PR #23129 /sig api-machinery CCing: - author and approver of reverted PR: @mikedanese, @lavalamp - other affected users which spoke up: @jsravn, @tatsuhiro-t ```release-note NONE ```

…y_for_1_10 Automatic merge from submit-queue. [backport] Disable session affinity for internal kubernetes service Under following conditions session affinity leads to a deadlock: - Self hosted controller-manager, where it talks to API servers via kubernetes service ClusterIP - default master-count reconcilier is used - --apiserver-count is set to >1 according to the help message - number of responsive APIServers goes below `apiserver-count` - all controller-managers happen to be hashed to APIServers which are down. What then happens is that controller managers never be able to contact APIServer, despite correctly working APIServer available. Less serious outages also possible for other consumers of kubernetes service, such as operators, kube-dns, flannel & calico, etc. There is always non zero chance, that given consumer is hashed to an apiserver which is down. Revert "give the kubernetes service client ip session affinity" This reverts commit e21ebbc. **What this PR does / why we need it**: Backporting #56690 to 1.10 release branch. **Which issue(s) this PR fixes** Fixes #23129 **Release note**: ```release-note Disable session affinity for internal kubernetes service - Backport of #56690 to 1.10 release branch ```

mikedanese assigned lavalamp Mar 17, 2016

googlebot added the cla: yes label Mar 17, 2016

give the kubernetes service client ip session affinity

e21ebbc

mikedanese force-pushed the kube-service-affinity branch from 127eaf2 to e21ebbc Compare March 17, 2016 16:38

k8s-github-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Mar 17, 2016

lavalamp added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 17, 2016

k8s-github-robot pushed a commit that referenced this pull request Mar 17, 2016

Merge pull request #23129 from mikedanese/kube-service-affinity

35d7469

Auto commit by PR queue bot

k8s-github-robot merged commit 35d7469 into kubernetes:master Mar 17, 2016

mikedanese deleted the kube-service-affinity branch March 17, 2016 23:15

mikedanese mentioned this pull request Sep 1, 2016

Use multiple hosts in restclient #30588

Closed

redbaron mentioned this pull request Dec 1, 2017

Disable session affinity for internal kuberntes service #56690

Merged

This was referenced Jun 18, 2018

[backport] Disable session affinity for internal kubernetes service #65177

Merged

[backport to 1.9x] Disable session affinity for internal kubernetes service #65178

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

give the kubernetes service client ip session affinity #23129

give the kubernetes service client ip session affinity #23129

mikedanese commented Mar 17, 2016

k8s-github-robot commented Mar 17, 2016

k8s-bot commented Mar 17, 2016

ncdc commented Mar 17, 2016

k8s-bot commented Mar 17, 2016

pweil- commented Mar 17, 2016

mikedanese commented Mar 17, 2016

pweil- commented Mar 17, 2016

lavalamp commented Mar 17, 2016

k8s-github-robot commented Mar 17, 2016

k8s-bot commented Mar 17, 2016

k8s-github-robot commented Mar 17, 2016

jsravn commented Jun 14, 2017

jsravn commented Jun 14, 2017

mikedanese commented Jun 14, 2017 •

edited

Loading

jsravn commented Jun 15, 2017

tatsuhiro-t commented Sep 26, 2017

tatsuhiro-t commented Sep 26, 2017

liggitt commented Sep 26, 2017 •

edited

Loading

tatsuhiro-t commented Sep 26, 2017

redbaron commented Nov 22, 2017

jsravn commented Dec 1, 2017

redbaron commented Dec 1, 2017 •

edited

Loading

jsravn commented Dec 1, 2017

tatsuhiro-t commented Dec 1, 2017

jsravn commented Dec 1, 2017

tatsuhiro-t commented Dec 1, 2017

give the kubernetes service client ip session affinity #23129

give the kubernetes service client ip session affinity #23129

Conversation

mikedanese commented Mar 17, 2016

k8s-github-robot commented Mar 17, 2016

k8s-bot commented Mar 17, 2016

ncdc commented Mar 17, 2016

k8s-bot commented Mar 17, 2016

pweil- commented Mar 17, 2016

mikedanese commented Mar 17, 2016

pweil- commented Mar 17, 2016

lavalamp commented Mar 17, 2016

k8s-github-robot commented Mar 17, 2016

k8s-bot commented Mar 17, 2016

k8s-github-robot commented Mar 17, 2016

jsravn commented Jun 14, 2017

jsravn commented Jun 14, 2017

mikedanese commented Jun 14, 2017 • edited Loading

jsravn commented Jun 15, 2017

tatsuhiro-t commented Sep 26, 2017

tatsuhiro-t commented Sep 26, 2017

liggitt commented Sep 26, 2017 • edited Loading

tatsuhiro-t commented Sep 26, 2017

redbaron commented Nov 22, 2017

jsravn commented Dec 1, 2017

redbaron commented Dec 1, 2017 • edited Loading

jsravn commented Dec 1, 2017

tatsuhiro-t commented Dec 1, 2017

jsravn commented Dec 1, 2017

tatsuhiro-t commented Dec 1, 2017

mikedanese commented Jun 14, 2017 •

edited

Loading

liggitt commented Sep 26, 2017 •

edited

Loading

redbaron commented Dec 1, 2017 •

edited

Loading