Proposal: Service connection affinity #15675

eliaslevy · 2015-10-14T23:34:30Z

As of now connections to Services are either handled in a round robin fashion (the default) or can make use of client IP based affinity, where the endpoint for the initial connection to the service by a client is selected in a round robin fashion and subsequent connections will attempt to use the previously selected endpoint.

In some environments there is a need for a topographical or administrative prioritization of the Service endpoint selection. For instance, a cluster may be comprised of multiple zones with Service endpoints spread across zones. Latencies across zones may be low, yet still higher than intra-zone latencies. There may also be financial costs associated to inter-zone traffic. Thus, it is could be desirable for a number of reasons to prioritize endpoints for a Service within the same zone as that of Pods making connections to the Service.

Note that this proposal is distinct from that of #14484. #14484 proposes priority affinity when scheduling Pods (colocation). I am proposing priority affinity of network connections to Services.

This could be implemented allowing the admin to specify a tag selector within the session affinity spec. kube-proxy could then look up the tag on the Pod that initiated a connection and on any Pods that are endpoints for the Service. It would prioritize endpoint Pods that a tag with a value that matches the value of the tag on the client Pod. Partial matches be prioritized by the number of tags that match.

If multiple endpoints match at the same priority, then the selection of one of them can be performed in a round robin fashion, or ClientIP affinity could be applied optionally.

mikedanese · 2015-10-15T03:17:48Z

Is there a usecase for this beyond a multi-zone cluster? Currently we recommend against multi-zone cluster for this reason (and others). See https://github.com/kubernetes/kubernetes/blob/master/docs/admin/multi-cluster.md#scope-of-a-single-cluster however it should be possible in theory. Instead we recommend a single cluster per zone (a federated model).

cc @quinton-hoole @kubernetes/goog-control-plane

davidopp · 2015-10-15T07:27:30Z

Could you just create one service per zone, and have the pods in a zone use the service local to that zone? It seems unlikely that all the pods providing the service would go down, but the pods consuming the service would not go down, if they are all in the same zone, so I'm not sure I see a lot of value to configuring the proxy with endpoints in remote zones.

Now, if some service is only available in another zone, that's another story, though the premise of your question seemed to be about a service that was available both in the local zone and a remote zone.

eliaslevy · 2015-10-15T15:34:02Z

@mikedanese A zone is just one particular networking boundary that one may wish to use for prioritization purposes. One may also wish to prioritize endpoints within the same node, rack, subnet, placement group, or VPC as the client.

The proposal allows for prioritized routing of connections based on administrative policy through tags. The tags need not be related to network boundaries. They could specify a version number with clients preferentially connecting to endpoints with a matching version number.

@davidopp You presume there is more than one Pod behind the service per network boundary or that all Pods within such boundary won't fail concurrently. We are paid to take into account "unlikely" events and avoid them if possible. Also note my comments above regarding boundaries other than zones. Surely you are not suggesting creating a Service per node if I wish to prioritize a node-local endpoint.

davidopp · 2016-01-31T23:28:13Z

@kubernetes/huawei

davidopp · 2016-05-16T03:02:57Z

@kubernetes/sig-cluster-federation

timothysc · 2016-05-18T19:16:00Z

Doesn't this touch the longstanding topic of wanting real load-balancers with configuration and policy, as you typically adjust these types of configuration parameters?

/cc @thockin

irfanurrehman · 2017-10-30T15:43:09Z

This issue was labelled only for sig/multicluster and is thus moved over to kubernetes-retired/federation#36.
If this does not seem to be right, please reopen this and notify us @kubernetes/sig-multicluster-misc.
/close

craigbox · 2017-11-24T12:07:24Z

I think this should be reopened here, as this is a single-cluster ("Ubernetes Lite", remember that?) issue, not a multi-cluster issue.

/cc @kubernetes/sig-multicluster-misc
/open

irfanurrehman · 2017-11-27T13:06:30Z

This was done by automation which looked for issues exclusively marked with label sig-multicluster. This sounds valid to mean and could not notice this earlier as was out of office. Am closing the linked issue in kubernetes/federation.

fejta-bot · 2018-02-25T13:31:49Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.