Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

give the kubernetes service client ip session affinity #23129

Merged

Conversation

mikedanese
Copy link
Member

No description provided.

@k8s-github-robot
Copy link

Labelling this PR as size/M

@k8s-github-robot k8s-github-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Mar 17, 2016
@k8s-bot
Copy link

k8s-bot commented Mar 17, 2016

GCE e2e build/test passed for commit 127eaf26fa164f264688576d3a120536ffe45248.

@ncdc
Copy link
Member

ncdc commented Mar 17, 2016

FYI @kubernetes/rh-cluster-infra @kubernetes/rh-platform-management @kubernetes/rh-networking

@k8s-bot
Copy link

k8s-bot commented Mar 17, 2016

GCE e2e build/test passed for commit e21ebbc.

@pweil-
Copy link
Contributor

pweil- commented Mar 17, 2016

@mikedanese - what's the motivation for this? I don't object to the change, just curious

@mikedanese
Copy link
Member Author

@pweil- we talked about this in the sig HA meeting with @smarterclayton and @lavalamp. The idea is that if we start to run controllers like scheduler and controller-manager onto the cluster in HA configurations, they should talk to the same API server if possible. Because each apiserver serves from a cache (unless the linearize read flag is passed explicitly) and each apiserver cache could be a different degree behind etcd. This could cause bugs in naive clients that don't handle this. We can avoid these bugs by adding clientIP session affinity.

@pweil-
Copy link
Contributor

pweil- commented Mar 17, 2016

👍 thanks

@lavalamp
Copy link
Member

LGTM. Makes it a bit safer to run multi-apiserver without turning on consistent reads.

@lavalamp lavalamp added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 17, 2016
@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-bot
Copy link

k8s-bot commented Mar 17, 2016

GCE e2e build/test passed for commit e21ebbc.

@k8s-github-robot
Copy link

Automatic merge from submit-queue

k8s-github-robot pushed a commit that referenced this pull request Mar 17, 2016
@k8s-github-robot k8s-github-robot merged commit 35d7469 into kubernetes:master Mar 17, 2016
@mikedanese mikedanese deleted the kube-service-affinity branch March 17, 2016 23:15
@jsravn
Copy link
Contributor

jsravn commented Jun 14, 2017

Is there a way to disable this? We ran into an issue where a single apiserver had an outage, in a multi master setup. Because of this issue change, some of the clients were unable to reconnect because they kept retrying the same apiserver. We enable quorum reads (because we had transient problems w/ it disabled), so don't care about isolating connections. All our client connections to the apiserver are long lived anyways.

@jsravn
Copy link
Contributor

jsravn commented Jun 14, 2017

I suppose we'll just apply our own service to override it. But I think this change is wrong in multi master setups (sort of defeats the point of having multi master if your clients can't reconnect to the other apiservers...).

@mikedanese
Copy link
Member Author

mikedanese commented Jun 14, 2017

What we should fix is the latency of a bad endpoint being remove from the master service. This change is orthogonal. Right now they are not removed. Is having 1/3 of all requests fail for all clients worse then having all requests fail for 1/3 of clients? The controller-manager will be able to continue working in case 1 but not in case 2.

Regardless we should just fix the kubernetes service to remove down endpoints.

@jsravn
Copy link
Contributor

jsravn commented Jun 15, 2017

Considering client-go auto retries failed connections, I think it's better to have requests fail 1/3rd of the time rather than always for 1/3rd of clients. In the latter case, there is no way to recover on the client.

@tatsuhiro-t
Copy link
Contributor

I completely agree with @jsravn
We have been suffered from this issue, and a failure of a just single master makes some workers unusable which is really not HA. I think this fix should be reverted until we "fix the kubernetes service to remove down endpoints", or just give us the knob to disable this.

@tatsuhiro-t
Copy link
Contributor

I just found #51698
I hope it fixes the fixing down endpoints, and is going to be available in v1.8 release.

@liggitt
Copy link
Member

liggitt commented Sep 26, 2017

That is targeted for 1.9. It’s a significant change to add at the tail end of the release.

@tatsuhiro-t
Copy link
Contributor

Thank you for reply. We'll wait for v1.9. For another 3 or 4 month, we have to workaround the endpoints issue. For some reason, we cannot use LB, so probably 1) use --api-server-count=1 and let masters fight, and some more frequent iptables updates, or 2) build our own hyperkube without this change, whichever less painful.

@redbaron
Copy link
Contributor

Just been hit by this :( Worstest thing is that there is no way to switch back to the old way of doing things without recompiling.

Per comments above, this change intent is to help scheduler & controller manager in HA setup, but effect of this change is for every user of 'kubernetes.svc' in the cluster :( That is not to mention, that in every HA setup I've heard of, controller managers run on same nodes as apiservers, so they can use $(NODE_IP) to connect to, practically without loosing any resilience (if local apiserver is down, lock would be released and another controller manager would take over), in fact it is even more reliable as it doesn't depend on kube-proxy being healthy.

@jsravn
Copy link
Contributor

jsravn commented Dec 1, 2017

As a workaround you can apply your own service file for the apiserver:

apiVersion: v1
kind: Service
metadata:
  labels:
    component: apiserver
    provider: kubernetes
  name: kubernetes
  namespace: default
spec:
  clusterIP: {{ apiserver_serviceip }}
  ports:
  - name: https
    port: 443
    protocol: TCP
    targetPort: 443
  sessionAffinity: None
  type: ClusterIP

@redbaron
Copy link
Contributor

redbaron commented Dec 1, 2017

That is ugly. If service resource is managed by something, going and interfering with it even if it works today is bad. There should be no easy way to paint yourself into a corner. Only thing which saves numerous kubernetes deployment from hitting deadlock is that apiserver-count is 1 by default.

@jsravn
Copy link
Contributor

jsravn commented Dec 1, 2017

It's ugly sure, but it works in all versions of k8s until it gets fixed. I don't agree w/ this PR (it flat out broke HA resiliency), but I'm not sure how much luck you'll have reverting it - seems like the plan is to wait on apiserver endpoint updates.

@tatsuhiro-t
Copy link
Contributor

At the moment, our workaround is running our own reconciler controller which watches master Nodes resources, and then synchronizes default/kubernetes Endpoints with them.
Hopefully, we can get rid of it when k8s v1.9 with lease reconciler is released.

@jsravn
Copy link
Contributor

jsravn commented Dec 1, 2017

What happens if the reconciler controller itself can't connect to apiserver due to session affinity?

@tatsuhiro-t
Copy link
Contributor

reconciler controller runs on master node, and it can connect to apiserver on the same node without using service ip, so there is no session affinity involved.

k8s-github-robot pushed a commit that referenced this pull request May 10, 2018
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Disable session affinity for internal kuberntes service

Under following conditions session affinity leads to a deadlock:
  - Self hosted controller-manager, where it talks to API servers
    via kubernetes service ClusterIP
  - default master-count reconcilier is used
  - --apiserver-count is set to >1 according to the help message
  - number of responsive APIServers goes below `apiserver-count`
  - all controller-managers happen to be hashed to APIServers which
    are down.

What then happens is that controller managers never be able to
contact APIServer, despite correctly working APIServer available.

Less serious outages also possible for other consumers of kubernetes
service, such as operators, kube-dns, flannel & calico, etc.  There is
always non zero chance, that given consumer is hashed  to an apiserver
which is down.

This reverts PR #23129

/sig api-machinery
CCing:
  -  author and approver of reverted PR: @mikedanese, @lavalamp 
  - other affected users which spoke up: @jsravn, @tatsuhiro-t 


```release-note
NONE
```
k8s-github-robot pushed a commit that referenced this pull request Jun 22, 2018
…y_for_1_10

Automatic merge from submit-queue.

[backport] Disable session affinity for internal kubernetes service

Under following conditions session affinity leads to a deadlock:
  - Self hosted controller-manager, where it talks to API servers
    via kubernetes service ClusterIP
  - default master-count reconcilier is used
  - --apiserver-count is set to >1 according to the help message
  - number of responsive APIServers goes below `apiserver-count`
  - all controller-managers happen to be hashed to APIServers which
    are down.

What then happens is that controller managers never be able to
contact APIServer, despite correctly working APIServer available.

Less serious outages also possible for other consumers of kubernetes
service, such as operators, kube-dns, flannel & calico, etc.  There is
always non zero chance, that given consumer is hashed  to an apiserver
which is down.

Revert "give the kubernetes service client ip session affinity"
This reverts commit e21ebbc.

**What this PR does / why we need it**:
Backporting #56690 to 1.10 release branch.

**Which issue(s) this PR fixes** 
Fixes #23129

**Release note**:
```release-note
Disable session affinity for internal kubernetes service - Backport of #56690 to 1.10 release branch
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet