Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

traffic splitting with session affinity #8167

Closed
vadimeisenbergibm opened this issue Sep 6, 2019 · 42 comments
Closed

traffic splitting with session affinity #8167

vadimeisenbergibm opened this issue Sep 6, 2019 · 42 comments
Labels
enhancement Feature requests. Not bugs or questions. help wanted Needs help!

Comments

@vadimeisenbergibm
Copy link
Contributor

Can Envoy perform traffic splitting while preserving session affinity? For example, use Ring Hash or Maglev to select the weighted cluster according to its weight? This way, when route.RouteAction.HashPolicy is specified, the same weighted cluster and the same endpoint point will be used by applying Ring Hash/Maglev twice, for selecting the weighted cluster and for selecting the endpoint.

@zuercher
Copy link
Member

zuercher commented Sep 6, 2019

This isn't currently possible. Weighted clusters are chosen based on a random value selected at request time. I'll go ahead and label this as an enhancement request.

@zuercher zuercher added enhancement Feature requests. Not bugs or questions. help wanted Needs help! and removed question Questions that are neither investigations, bugs, nor enhancements labels Sep 6, 2019
@vadimeisenbergibm
Copy link
Contributor Author

@zuercher Thanks!

@vadimeisenbergibm vadimeisenbergibm changed the title Question: traffic splitting with session affinity traffic splitting with session affinity Sep 6, 2019
@temporafugiunt
Copy link

temporafugiunt commented Sep 12, 2019

@zuercher Is the behavior I outline below expected when using weighted traffic splitting or am I doing something wrong? Here is my real world example of what I am seeing with a kubernetes cluster using Istio / Envoy with weighted traffic splitting turned on to attempt to perform "canary testing" of a new version of a web application.

I have two different versions of my web app which "compile" random names for JavaScript files using WebPack to kill caching between versions. If my route changes mid download of a JS file (which it appears it does as this is the behavior I am seeing) the file may or may not exist in the version I have just switched to because it might have a different name so it terminates downloading with only a partial file. The call succeeds with 200 but the contents are incomplete.

Even if it didn't the contents of the file could change between versions so if it can't choose a service subset at first request and stick to that subset during subsequent calls during the same page load then I could essentially be downloading a file whose contents change mid stream maybe?

Would you expect this given the nature of what I am trying to do and given that Envoy does not support session affinity with weight based traffic splitting, or am I just neglecting something or is Istio not supporting something yet perhaps?

@zuercher
Copy link
Member

The weighted cluster choices are made per-request even if the requests are sent on the same connection.

The call succeeds with 200 but the contents are incomplete.

That doesn't sound like a weighted cluster routing problem. It sounds like either the upstream server itself is truncating the data or something about how you're updating Envoy's config is causing a problem. I would look at Envoy's log to see what's going on.

Even if it didn't the contents of the file could change between versions so if it can't choose a service subset at first request and stick to that subset during subsequent calls during the same page load then I could essentially be downloading a file whose contents change mid stream maybe?

Envoy does not have support for session affinity built in. Imagine a downstream's first request retrieves v1 of an HTML page from cluster A (as a result of a weighted cluster choice). If that HTML contains a reference to app.js it is entirely possible that the request for app.js will be be sent to cluster B (based on a second weighted cluster choice) resulting in a mismatch. If the javascript's path changes from version to version you'd get a 404 (proxied from the upstream) when the mismatch occurs.

In the past, I've achieved session affinity in Envoy with 3 routes:

  1. A route that matches on a Cookie header/value for v1 and routes to only the v1 cluster.
  2. A route that matches on a Cookie header/value for v2 and routes to the only v2 cluster.
  3. A route that makes a weighted cluster choice between v1 and v2 and uses response_headers_to_add to set a Set-Cookie header for the selected cluster.

This means that x% of your new sessions will see the canary version, which is a bit different that x% of requests. Also, when you're done with the canary (completed deployed or rolled back) you'll need a way to get the clients on the other version to reload (or keep it running until it happens organically).

@kyessenov
Copy link
Contributor

I think the request is using a normalized hash function instead of the random coin. As long as the hash function output is uniformly distributed, it's the same thing. We can probably compose a cryptographic function on top of the hash policy to guarantee one-way properties.

@zuercher
Copy link
Member

That seems like a reasonable feature to add.

@temporafugiunt
Copy link

@zuercher Thank you, that is a good idea and I will try that!

That doesn't sound like a weighted cluster routing problem. It sounds like either the upstream server itself is truncating the data or something about how you're updating Envoy's config is causing a problem. I would look at Envoy's log to see what's going on.

Sorry for the n00b question... But can you recommend any good documentation resources on how to debug envoy to determine root cause for an issue like this? If not I will have a discussion with Dr. Google for a while to see if I can find something.

@zuercher
Copy link
Member

I would enable debug logging (-l debug). You should be able to see the request and response headers for both the downstream and upstream requests. Beyond that, you might have better luck asking in the Envoy slack channel (with logs & config snippets) or else opening a separate issue.

@vietwow
Copy link

vietwow commented Dec 15, 2019

Hi, any news for this feature ?

@Jeskz0rd
Copy link

Any suggestions for a workaround? 🤔

@kholisrag
Copy link

any update about this?
I try to use websocket with weight routing in istio, but blocked by this... to enable flagger canary on my deployment
😭

@rgs1
Copy link
Member

rgs1 commented May 1, 2020

Can Envoy perform traffic splitting while preserving session affinity? For example, use Ring Hash or Maglev to select the weighted cluster according to its weight? This way, when route.RouteAction.HashPolicy is specified, the same weighted cluster and the same endpoint point will be used by applying Ring Hash/Maglev twice, for selecting the weighted cluster and for selecting the endpoint.

We actually have an internal filter that does something like that... More info here:

https://medium.com/pinterest-engineering/simplifying-web-deploys-19244fe13737

If there's interest, we could chat a bit more and I might be able to get some cycles to upstream our filter.

@vadimeisenbergibm
Copy link
Contributor Author

@rgs1 Thanks, I switched to work on another domain, but maybe other folks who are interested in this feature, could proceed with this issue.

@rafaeldasilva
Copy link

rafaeldasilva commented Aug 8, 2020

By I could understand the point is apply weight rule only on first call.
Since the client connected the next calls goes through the stick rule.

Would be an “IF” before apply weight. Where if there is a sticky match the call bypass the weight rule and goes to destination.
Is it feasible?
Any one see any drawback on this logic?

@kholisrag
Copy link

By I could understand the point is apply weight rule only on first call.
Since the client connected the next calls goes through the stick rule.

Would be an “IF” before apply weight. Where if there is a sticky match the call bypass the weight rule and goes to destination.
Is it feasible?
Any one see any drawback on this logic?

personally not, but not sure from maintainer / other

btw, why don't we detect it through Header Upgrade and or Connection Upgrade?

@benpoulson
Copy link

Nginx currently splits by creating a MurmurHash2 number (0-4294967295) based on specific variables (eg remote IP)

See: http://nginx.org/en/docs/http/ngx_http_split_clients_module.html

This allows users to be consistently sent to the same upstream.

@defool
Copy link

defool commented Feb 19, 2021

Any update for this feature?

@sschepens
Copy link
Contributor

Maybe #14875 will fix this.

@arungeorge101
Copy link

arungeorge101 commented Apr 9, 2021

Is this feature being worked on and any timeline on when will it be available?

@jcrugzz
Copy link

jcrugzz commented Apr 16, 2021

@rgs1 Would definitely be interesting in chatting sometime if you wanted.

@rgs1
Copy link
Member

rgs1 commented Apr 16, 2021

@rgs1 Would definitely be interesting in chatting sometime if you wanted.

Sure ping me on Slack and we can coordinate, I actually gave some extra thought this week on what it'd take to open source what's described in the above blog post... But couldn't allocate the cycles yet.

@howardjohn
Copy link
Contributor

@snowp does #14875 solve this case or is it not applicable?

@snowp
Copy link
Contributor

snowp commented Jun 9, 2021

That PR adds support for a consistent matcher to the new matching framework, but that hasn't yet been wired up to work with the router to allow its usage to construct the route table (and I don't think anyone has planned this work yet).

You could probably get something working by using a simple filter that uses the consistent matcher to inject a cluster header that can then inform the the router, or even using the new setRoute API to influence the route selection directly without having to work through the relatively large effort to integrate the new matching framework with the router.

@mksha
Copy link

mksha commented Jul 14, 2021

Any updates on it ?

@mksha
Copy link

mksha commented Jul 14, 2021

@vadimeisenbergibm did you find any workaround it ?

@vadimeisenbergibm
Copy link
Contributor Author

@mksha No, I did not, sorry.

@mksha
Copy link

mksha commented Jul 15, 2021

@zuercher are we planning to add this feature in near future?

@timvan
Copy link

timvan commented Jul 28, 2021

Hi, just commenting to show interest for this feature. The use case would be with experimentation. Traffic should be split between control and variant models/apps and users must be directed to a consistent variant. Bucketing is based on a user id header.

@gongzh
Copy link

gongzh commented Sep 15, 2021

Hi, just commenting to show interest for this feature. The use case would be with experimentation. Traffic should be split between control and variant models/apps and users must be directed to a consistent variant. Bucketing is based on a user id header.

Hello, is there any progress or plan on this feature? thanks!

@mksha
Copy link

mksha commented Oct 6, 2021

@alyssawilk anyone working on it?

@alyssawilk
Copy link
Contributor

No one is assigned which is a pretty good sign that no one is working on this :-)

@mksha
Copy link

mksha commented Oct 6, 2021

is there a plan to work on it, bec its very generic usecase for everyone.

@alyssawilk
Copy link
Contributor

The Envoy project doesn't have paid developers, it is an open source community where anyone can contribute. So largely features don't happen until someone wants them enough to implement them or pay someone like Tetrate to do so.

@mksha
Copy link

mksha commented Feb 7, 2022

@rootsongjc Do we have this feature in upstream now?

@rootsongjc
Copy link
Member

@mksha I don't think so.

@jcometki
Copy link

I had this same issue, I solved that with the following configuration:

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: sample-virtual-service
spec:
  gateways:
    - default
  hosts:
    - sample-host
  http:
    - match:
        - headers:
            Cookie:
              regex: .*userAffinity=v1.*
      route:
        - destination:
            host: sample-host.svc.cluster.local
            port:
              number: 80
            subset: v1
    - match:
        - headers:
            Cookie:
              regex: .*userAffinity=v2.*
      route:
        - destination:
            host: sample-host.svc.cluster.local
            port:
              number: 80
            subset: v2
    - route:
        - destination:
            host: sample-host.svc.cluster.local
            port:
              number: 80
            subset: v1
          weight: 75
        - destination:
            host: sample-host.svc.cluster.local
            port:
              number: 80
            subset: v2
          headers:
            response:
              add:
                Set-Cookie: userAffinity=v2; Max-Age=3600000
          weight: 25
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: sample-destination-rule
spec:
  host: sample-host.svc.cluster.local
  subsets:
    - labels:
        app: sample-app-v1
      name: v1
      trafficPolicy:
        loadBalancer:
          consistentHash:
            httpCookie:
              name: userAffinity=v1
              ttl: 1h
    - labels:
        app: sample-app-v2
      name: v2

@mksha
Copy link

mksha commented Feb 22, 2022

@jcometki We have something similar in place, but i have a question about DestinationRule that you have, May I know what is the need of having name: userAffinity=v1 rather than just name: userAffinity
Ideally name property is to represent name of the cookie but the name:value pair. May be I am missing something, but would love to understand your point as well.

@itnazeer
Copy link

itnazeer commented Apr 27, 2022

Did any one get this working?

@4meepo
Copy link

4meepo commented Apr 27, 2022

Any updates or workaround? : )

@zuercher
Copy link
Member

I'm going to close this ticket. Many of the recent comments have been for projects that build on Envoy and as such aren't really appropriate here.

See https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/route/v3/route_components.proto.html#config-route-v3-routeaction-hashpolicy-cookie and https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/stateful_session_filter#config-http-filters-stateful-session for various types of session affinity that are currently supported. The former implements the feature originally requested.

@rixongary
Copy link

@jcometki hello. thanks for the config you posted above. I know this was 2 years ago now, but I am wondering if you could provide any more information on how this worked out for you?

I think there would be a problem where once a user hits the v2 subset and gets the cookie, all of their subsequent requests would be sent to v2, but those requests would not be taken into account for weighting calculations, therefore, you would end up with more than 5% of requests going to the v2 subset, and this problem would get worse and worse the higher the canary weight was set.

If anyone has a solution for this, I'd love to hear. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature requests. Not bugs or questions. help wanted Needs help!
Projects
None yet
Development

No branches or pull requests