traffic splitting with session affinity #8167

vadimeisenbergibm · 2019-09-06T08:10:52Z

Can Envoy perform traffic splitting while preserving session affinity? For example, use Ring Hash or Maglev to select the weighted cluster according to its weight? This way, when route.RouteAction.HashPolicy is specified, the same weighted cluster and the same endpoint point will be used by applying Ring Hash/Maglev twice, for selecting the weighted cluster and for selecting the endpoint.

The text was updated successfully, but these errors were encountered:

zuercher · 2019-09-06T16:52:29Z

This isn't currently possible. Weighted clusters are chosen based on a random value selected at request time. I'll go ahead and label this as an enhancement request.

vadimeisenbergibm · 2019-09-06T18:36:44Z

@zuercher Thanks!

temporafugiunt · 2019-09-12T14:17:21Z

@zuercher Is the behavior I outline below expected when using weighted traffic splitting or am I doing something wrong? Here is my real world example of what I am seeing with a kubernetes cluster using Istio / Envoy with weighted traffic splitting turned on to attempt to perform "canary testing" of a new version of a web application.

I have two different versions of my web app which "compile" random names for JavaScript files using WebPack to kill caching between versions. If my route changes mid download of a JS file (which it appears it does as this is the behavior I am seeing) the file may or may not exist in the version I have just switched to because it might have a different name so it terminates downloading with only a partial file. The call succeeds with 200 but the contents are incomplete.

Even if it didn't the contents of the file could change between versions so if it can't choose a service subset at first request and stick to that subset during subsequent calls during the same page load then I could essentially be downloading a file whose contents change mid stream maybe?

Would you expect this given the nature of what I am trying to do and given that Envoy does not support session affinity with weight based traffic splitting, or am I just neglecting something or is Istio not supporting something yet perhaps?

zuercher · 2019-09-12T16:56:40Z

The weighted cluster choices are made per-request even if the requests are sent on the same connection.

The call succeeds with 200 but the contents are incomplete.

That doesn't sound like a weighted cluster routing problem. It sounds like either the upstream server itself is truncating the data or something about how you're updating Envoy's config is causing a problem. I would look at Envoy's log to see what's going on.

Even if it didn't the contents of the file could change between versions so if it can't choose a service subset at first request and stick to that subset during subsequent calls during the same page load then I could essentially be downloading a file whose contents change mid stream maybe?

Envoy does not have support for session affinity built in. Imagine a downstream's first request retrieves v1 of an HTML page from cluster A (as a result of a weighted cluster choice). If that HTML contains a reference to app.js it is entirely possible that the request for app.js will be be sent to cluster B (based on a second weighted cluster choice) resulting in a mismatch. If the javascript's path changes from version to version you'd get a 404 (proxied from the upstream) when the mismatch occurs.

In the past, I've achieved session affinity in Envoy with 3 routes:

A route that matches on a Cookie header/value for v1 and routes to only the v1 cluster.
A route that matches on a Cookie header/value for v2 and routes to the only v2 cluster.
A route that makes a weighted cluster choice between v1 and v2 and uses response_headers_to_add to set a Set-Cookie header for the selected cluster.

This means that x% of your new sessions will see the canary version, which is a bit different that x% of requests. Also, when you're done with the canary (completed deployed or rolled back) you'll need a way to get the clients on the other version to reload (or keep it running until it happens organically).

kyessenov · 2019-09-12T17:45:25Z

I think the request is using a normalized hash function instead of the random coin. As long as the hash function output is uniformly distributed, it's the same thing. We can probably compose a cryptographic function on top of the hash policy to guarantee one-way properties.

zuercher · 2019-09-13T17:06:51Z

That seems like a reasonable feature to add.

temporafugiunt · 2019-09-13T19:20:53Z

@zuercher Thank you, that is a good idea and I will try that!

That doesn't sound like a weighted cluster routing problem. It sounds like either the upstream server itself is truncating the data or something about how you're updating Envoy's config is causing a problem. I would look at Envoy's log to see what's going on.

Sorry for the n00b question... But can you recommend any good documentation resources on how to debug envoy to determine root cause for an issue like this? If not I will have a discussion with Dr. Google for a while to see if I can find something.

zuercher · 2019-09-13T20:00:21Z

I would enable debug logging (-l debug). You should be able to see the request and response headers for both the downstream and upstream requests. Beyond that, you might have better luck asking in the Envoy slack channel (with logs & config snippets) or else opening a separate issue.

vietwow · 2019-12-15T17:34:54Z

Hi, any news for this feature ?

Jeskz0rd · 2020-01-23T01:37:34Z

Any suggestions for a workaround? 🤔

kholisrag · 2020-04-14T15:31:32Z

any update about this?
I try to use websocket with weight routing in istio, but blocked by this... to enable flagger canary on my deployment
😭

rgs1 · 2020-05-01T16:25:56Z

Can Envoy perform traffic splitting while preserving session affinity? For example, use Ring Hash or Maglev to select the weighted cluster according to its weight? This way, when route.RouteAction.HashPolicy is specified, the same weighted cluster and the same endpoint point will be used by applying Ring Hash/Maglev twice, for selecting the weighted cluster and for selecting the endpoint.

We actually have an internal filter that does something like that... More info here:

https://medium.com/pinterest-engineering/simplifying-web-deploys-19244fe13737

If there's interest, we could chat a bit more and I might be able to get some cycles to upstream our filter.

vadimeisenbergibm · 2020-05-01T17:12:08Z

@rgs1 Thanks, I switched to work on another domain, but maybe other folks who are interested in this feature, could proceed with this issue.

rafaeldasilva · 2020-08-08T06:53:23Z

By I could understand the point is apply weight rule only on first call.
Since the client connected the next calls goes through the stick rule.

Would be an “IF” before apply weight. Where if there is a sticky match the call bypass the weight rule and goes to destination.
Is it feasible?
Any one see any drawback on this logic?

kholisrag · 2020-08-14T20:14:49Z

By I could understand the point is apply weight rule only on first call.
Since the client connected the next calls goes through the stick rule.

Would be an “IF” before apply weight. Where if there is a sticky match the call bypass the weight rule and goes to destination.
Is it feasible?
Any one see any drawback on this logic?

personally not, but not sure from maintainer / other

btw, why don't we detect it through Header Upgrade and or Connection Upgrade?

benpoulson · 2020-10-01T11:26:35Z

Nginx currently splits by creating a MurmurHash2 number (0-4294967295) based on specific variables (eg remote IP)

See: http://nginx.org/en/docs/http/ngx_http_split_clients_module.html

This allows users to be consistently sent to the same upstream.

defool · 2021-02-19T03:51:57Z

Any update for this feature?

sschepens · 2021-02-19T15:14:26Z

Maybe #14875 will fix this.

arungeorge101 · 2021-04-09T18:46:04Z

Is this feature being worked on and any timeline on when will it be available?

jcrugzz · 2021-04-16T20:18:42Z

@rgs1 Would definitely be interesting in chatting sometime if you wanted.

rgs1 · 2021-04-16T20:21:04Z

@rgs1 Would definitely be interesting in chatting sometime if you wanted.

Sure ping me on Slack and we can coordinate, I actually gave some extra thought this week on what it'd take to open source what's described in the above blog post... But couldn't allocate the cycles yet.

howardjohn · 2021-06-09T18:19:11Z

@snowp does #14875 solve this case or is it not applicable?

snowp · 2021-06-09T18:26:05Z

That PR adds support for a consistent matcher to the new matching framework, but that hasn't yet been wired up to work with the router to allow its usage to construct the route table (and I don't think anyone has planned this work yet).

You could probably get something working by using a simple filter that uses the consistent matcher to inject a cluster header that can then inform the the router, or even using the new setRoute API to influence the route selection directly without having to work through the relatively large effort to integrate the new matching framework with the router.

mksha · 2021-07-14T14:28:37Z

Any updates on it ?

mksha · 2021-07-14T16:25:45Z

@vadimeisenbergibm did you find any workaround it ?

vadimeisenbergibm · 2021-07-14T16:30:40Z

@mksha No, I did not, sorry.

mksha · 2021-07-15T08:28:38Z

@zuercher are we planning to add this feature in near future?

timvan · 2021-07-28T11:47:47Z

Hi, just commenting to show interest for this feature. The use case would be with experimentation. Traffic should be split between control and variant models/apps and users must be directed to a consistent variant. Bucketing is based on a user id header.

gongzh · 2021-09-15T03:54:49Z

Hi, just commenting to show interest for this feature. The use case would be with experimentation. Traffic should be split between control and variant models/apps and users must be directed to a consistent variant. Bucketing is based on a user id header.

Hello, is there any progress or plan on this feature? thanks!

mksha · 2021-10-06T11:29:03Z

@alyssawilk anyone working on it?

alyssawilk · 2021-10-06T14:21:41Z

No one is assigned which is a pretty good sign that no one is working on this :-)

mksha · 2021-10-06T16:18:56Z

is there a plan to work on it, bec its very generic usecase for everyone.

alyssawilk · 2021-10-06T19:25:51Z

The Envoy project doesn't have paid developers, it is an open source community where anyone can contribute. So largely features don't happen until someone wants them enough to implement them or pay someone like Tetrate to do so.

mksha · 2022-02-07T05:49:40Z

@rootsongjc Do we have this feature in upstream now?

rootsongjc · 2022-02-08T02:23:15Z

@mksha I don't think so.

jcometki · 2022-02-22T02:13:57Z

I had this same issue, I solved that with the following configuration:

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: sample-virtual-service
spec:
  gateways:
    - default
  hosts:
    - sample-host
  http:
    - match:
        - headers:
            Cookie:
              regex: .*userAffinity=v1.*
      route:
        - destination:
            host: sample-host.svc.cluster.local
            port:
              number: 80
            subset: v1
    - match:
        - headers:
            Cookie:
              regex: .*userAffinity=v2.*
      route:
        - destination:
            host: sample-host.svc.cluster.local
            port:
              number: 80
            subset: v2
    - route:
        - destination:
            host: sample-host.svc.cluster.local
            port:
              number: 80
            subset: v1
          weight: 75
        - destination:
            host: sample-host.svc.cluster.local
            port:
              number: 80
            subset: v2
          headers:
            response:
              add:
                Set-Cookie: userAffinity=v2; Max-Age=3600000
          weight: 25
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: sample-destination-rule
spec:
  host: sample-host.svc.cluster.local
  subsets:
    - labels:
        app: sample-app-v1
      name: v1
      trafficPolicy:
        loadBalancer:
          consistentHash:
            httpCookie:
              name: userAffinity=v1
              ttl: 1h
    - labels:
        app: sample-app-v2
      name: v2

mksha · 2022-02-22T05:46:16Z

@jcometki We have something similar in place, but i have a question about DestinationRule that you have, May I know what is the need of having name: userAffinity=v1 rather than just name: userAffinity
Ideally name property is to represent name of the cookie but the name:value pair. May be I am missing something, but would love to understand your point as well.

mksha · 2022-02-22T05:49:57Z

https://istio.io/latest/docs/reference/config/networking/destination-rule/#LoadBalancerSettings-ConsistentHashLB-HTTPCookie

itnazeer · 2022-04-27T10:28:45Z

Did any one get this working?

4meepo · 2022-04-27T11:43:09Z

Any updates or workaround? : )

zuercher · 2022-04-28T05:20:10Z

I'm going to close this ticket. Many of the recent comments have been for projects that build on Envoy and as such aren't really appropriate here.

See https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/route/v3/route_components.proto.html#config-route-v3-routeaction-hashpolicy-cookie and https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/stateful_session_filter#config-http-filters-stateful-session for various types of session affinity that are currently supported. The former implements the feature originally requested.

rixongary · 2024-02-27T20:32:48Z

@jcometki hello. thanks for the config you posted above. I know this was 2 years ago now, but I am wondering if you could provide any more information on how this worked out for you?

I think there would be a problem where once a user hits the v2 subset and gets the cookie, all of their subsequent requests would be sent to v2, but those requests would not be taken into account for weighting calculations, therefore, you would end up with more than 5% of requests going to the v2 subset, and this problem would get worse and worse the higher the canary weight was set.

If anyone has a solution for this, I'd love to hear. Thanks!

This was referenced Sep 6, 2019

Support sticky sessions for weight based routing istio/istio#1703

Closed

Session affinity doesn't seem to apply to weighted subsets - Envoy support required istio/istio#9764

Closed

zuercher added the question Questions that are neither investigations, bugs, nor enhancements label Sep 6, 2019

zuercher added enhancement Feature requests. Not bugs or questions. help wanted Needs help! and removed question Questions that are neither investigations, bugs, nor enhancements labels Sep 6, 2019

vadimeisenbergibm changed the title ~~Question: traffic splitting with session affinity~~ traffic splitting with session affinity Sep 6, 2019

Stoakes mentioned this issue Sep 7, 2019

Ring Hash consistent traffic splitting between multiple upstreams #7276

Closed

zuercher mentioned this issue Sep 23, 2019

Routing rule with Lua in route_configuration #8316

Closed

AsimAzmi mentioned this issue May 3, 2020

explore load balancing with istio vs standard kubernetes & Load Balancing Implementation airavata-courses/devengers#89

Closed

itsmunim mentioned this issue Jun 9, 2021

Session stickiness does not work with weighted canary distribution istio/istio#33343

Closed

lahabana mentioned this issue Nov 24, 2021

Http load balancing is not working with ingress kumahq/kuma#3354

Closed

zuercher closed this as completed Apr 28, 2022

traffic splitting with session affinity #8167

traffic splitting with session affinity #8167

Comments

vadimeisenbergibm commented Sep 6, 2019

zuercher commented Sep 6, 2019

vadimeisenbergibm commented Sep 6, 2019

temporafugiunt commented Sep 12, 2019 • edited Loading

zuercher commented Sep 12, 2019

kyessenov commented Sep 12, 2019

zuercher commented Sep 13, 2019

temporafugiunt commented Sep 13, 2019

zuercher commented Sep 13, 2019

vietwow commented Dec 15, 2019

Jeskz0rd commented Jan 23, 2020

kholisrag commented Apr 14, 2020

rgs1 commented May 1, 2020

vadimeisenbergibm commented May 1, 2020

rafaeldasilva commented Aug 8, 2020 • edited Loading

kholisrag commented Aug 14, 2020

benpoulson commented Oct 1, 2020

defool commented Feb 19, 2021

sschepens commented Feb 19, 2021

arungeorge101 commented Apr 9, 2021 • edited Loading

jcrugzz commented Apr 16, 2021

rgs1 commented Apr 16, 2021

howardjohn commented Jun 9, 2021

snowp commented Jun 9, 2021

mksha commented Jul 14, 2021

mksha commented Jul 14, 2021

vadimeisenbergibm commented Jul 14, 2021

mksha commented Jul 15, 2021

timvan commented Jul 28, 2021 • edited Loading

gongzh commented Sep 15, 2021

mksha commented Oct 6, 2021

alyssawilk commented Oct 6, 2021

mksha commented Oct 6, 2021

alyssawilk commented Oct 6, 2021

mksha commented Feb 7, 2022

rootsongjc commented Feb 8, 2022

jcometki commented Feb 22, 2022

mksha commented Feb 22, 2022

mksha commented Feb 22, 2022

itnazeer commented Apr 27, 2022 • edited Loading

4meepo commented Apr 27, 2022

zuercher commented Apr 28, 2022

rixongary commented Feb 27, 2024

temporafugiunt commented Sep 12, 2019 •

edited

Loading

rafaeldasilva commented Aug 8, 2020 •

edited

Loading

arungeorge101 commented Apr 9, 2021 •

edited

Loading

timvan commented Jul 28, 2021 •

edited

Loading

itnazeer commented Apr 27, 2022 •

edited

Loading