Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Service connection affinity #15675

Closed
eliaslevy opened this issue Oct 14, 2015 · 18 comments
Closed

Proposal: Service connection affinity #15675

eliaslevy opened this issue Oct 14, 2015 · 18 comments
Assignees
Labels
area/federation lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/multicluster Categorizes an issue or PR as relevant to SIG Multicluster.

Comments

@eliaslevy
Copy link

As of now connections to Services are either handled in a round robin fashion (the default) or can make use of client IP based affinity, where the endpoint for the initial connection to the service by a client is selected in a round robin fashion and subsequent connections will attempt to use the previously selected endpoint.

In some environments there is a need for a topographical or administrative prioritization of the Service endpoint selection. For instance, a cluster may be comprised of multiple zones with Service endpoints spread across zones. Latencies across zones may be low, yet still higher than intra-zone latencies. There may also be financial costs associated to inter-zone traffic. Thus, it is could be desirable for a number of reasons to prioritize endpoints for a Service within the same zone as that of Pods making connections to the Service.

Note that this proposal is distinct from that of #14484. #14484 proposes priority affinity when scheduling Pods (colocation). I am proposing priority affinity of network connections to Services.

This could be implemented allowing the admin to specify a tag selector within the session affinity spec. kube-proxy could then look up the tag on the Pod that initiated a connection and on any Pods that are endpoints for the Service. It would prioritize endpoint Pods that a tag with a value that matches the value of the tag on the client Pod. Partial matches be prioritized by the number of tags that match.

If multiple endpoints match at the same priority, then the selection of one of them can be performed in a round robin fashion, or ClientIP affinity could be applied optionally.

@mikedanese
Copy link
Member

Is there a usecase for this beyond a multi-zone cluster? Currently we recommend against multi-zone cluster for this reason (and others). See https://github.com/kubernetes/kubernetes/blob/master/docs/admin/multi-cluster.md#scope-of-a-single-cluster however it should be possible in theory. Instead we recommend a single cluster per zone (a federated model).

cc @quinton-hoole @kubernetes/goog-control-plane

@davidopp
Copy link
Member

Could you just create one service per zone, and have the pods in a zone use the service local to that zone? It seems unlikely that all the pods providing the service would go down, but the pods consuming the service would not go down, if they are all in the same zone, so I'm not sure I see a lot of value to configuring the proxy with endpoints in remote zones.

Now, if some service is only available in another zone, that's another story, though the premise of your question seemed to be about a service that was available both in the local zone and a remote zone.

@eliaslevy
Copy link
Author

@mikedanese A zone is just one particular networking boundary that one may wish to use for prioritization purposes. One may also wish to prioritize endpoints within the same node, rack, subnet, placement group, or VPC as the client.

The proposal allows for prioritized routing of connections based on administrative policy through tags. The tags need not be related to network boundaries. They could specify a version number with clients preferentially connecting to endpoints with a matching version number.

@davidopp You presume there is more than one Pod behind the service per network boundary or that all Pods within such boundary won't fail concurrently. We are paid to take into account "unlikely" events and avoid them if possible. Also note my comments above regarding boundaries other than zones. Surely you are not suggesting creating a Service per node if I wish to prioritize a node-local endpoint.

@davidopp
Copy link
Member

@kubernetes/huawei

@davidopp
Copy link
Member

@kubernetes/sig-cluster-federation

@timothysc
Copy link
Member

Doesn't this touch the longstanding topic of wanting real load-balancers with configuration and policy, as you typically adjust these types of configuration parameters?

/cc @thockin

@irfanurrehman
Copy link
Contributor

This issue was labelled only for sig/multicluster and is thus moved over to kubernetes-retired/federation#36.
If this does not seem to be right, please reopen this and notify us @kubernetes/sig-multicluster-misc.
/close

@craigbox
Copy link

craigbox commented Nov 24, 2017

I think this should be reopened here, as this is a single-cluster ("Ubernetes Lite", remember that?) issue, not a multi-cluster issue.

/cc @kubernetes/sig-multicluster-misc
/open

@davidopp davidopp reopened this Nov 24, 2017
@irfanurrehman
Copy link
Contributor

This was done by automation which looked for issues exclusively marked with label sig-multicluster. This sounds valid to mean and could not notice this earlier as was out of office. Am closing the linked issue in kubernetes/federation.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 25, 2018
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 14, 2018
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 15, 2018
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 15, 2018
@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 4, 2019
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 4, 2019
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 3, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 3, 2019
@craigbox
Copy link

craigbox commented Aug 3, 2019

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Aug 3, 2019
@thockin
Copy link
Member

thockin commented Aug 20, 2022

Closing this in favor of all the ongoing work around service topology and traffic policy.

@thockin thockin closed this as completed Aug 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/federation lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/multicluster Categorizes an issue or PR as relevant to SIG Multicluster.
Projects
None yet
Development

No branches or pull requests