-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
traffic splitting with session affinity #8167
Comments
This isn't currently possible. Weighted clusters are chosen based on a random value selected at request time. I'll go ahead and label this as an enhancement request. |
@zuercher Thanks! |
@zuercher Is the behavior I outline below expected when using weighted traffic splitting or am I doing something wrong? Here is my real world example of what I am seeing with a kubernetes cluster using Istio / Envoy with weighted traffic splitting turned on to attempt to perform "canary testing" of a new version of a web application. I have two different versions of my web app which "compile" random names for JavaScript files using WebPack to kill caching between versions. If my route changes mid download of a JS file (which it appears it does as this is the behavior I am seeing) the file may or may not exist in the version I have just switched to because it might have a different name so it terminates downloading with only a partial file. The call succeeds with 200 but the contents are incomplete. Even if it didn't the contents of the file could change between versions so if it can't choose a service subset at first request and stick to that subset during subsequent calls during the same page load then I could essentially be downloading a file whose contents change mid stream maybe? Would you expect this given the nature of what I am trying to do and given that Envoy does not support session affinity with weight based traffic splitting, or am I just neglecting something or is Istio not supporting something yet perhaps? |
The weighted cluster choices are made per-request even if the requests are sent on the same connection.
That doesn't sound like a weighted cluster routing problem. It sounds like either the upstream server itself is truncating the data or something about how you're updating Envoy's config is causing a problem. I would look at Envoy's log to see what's going on.
Envoy does not have support for session affinity built in. Imagine a downstream's first request retrieves v1 of an HTML page from cluster A (as a result of a weighted cluster choice). If that HTML contains a reference to app.js it is entirely possible that the request for app.js will be be sent to cluster B (based on a second weighted cluster choice) resulting in a mismatch. If the javascript's path changes from version to version you'd get a 404 (proxied from the upstream) when the mismatch occurs. In the past, I've achieved session affinity in Envoy with 3 routes:
This means that x% of your new sessions will see the canary version, which is a bit different that x% of requests. Also, when you're done with the canary (completed deployed or rolled back) you'll need a way to get the clients on the other version to reload (or keep it running until it happens organically). |
I think the request is using a normalized hash function instead of the random coin. As long as the hash function output is uniformly distributed, it's the same thing. We can probably compose a cryptographic function on top of the hash policy to guarantee one-way properties. |
That seems like a reasonable feature to add. |
@zuercher Thank you, that is a good idea and I will try that!
Sorry for the n00b question... But can you recommend any good documentation resources on how to debug envoy to determine root cause for an issue like this? If not I will have a discussion with Dr. Google for a while to see if I can find something. |
I would enable debug logging ( |
Hi, any news for this feature ? |
Any suggestions for a workaround? 🤔 |
any update about this? |
We actually have an internal filter that does something like that... More info here: https://medium.com/pinterest-engineering/simplifying-web-deploys-19244fe13737 If there's interest, we could chat a bit more and I might be able to get some cycles to upstream our filter. |
@rgs1 Thanks, I switched to work on another domain, but maybe other folks who are interested in this feature, could proceed with this issue. |
By I could understand the point is apply weight rule only on first call. Would be an “IF” before apply weight. Where if there is a sticky match the call bypass the weight rule and goes to destination. |
personally not, but not sure from maintainer / other btw, why don't we detect it through Header Upgrade and or Connection Upgrade? |
Nginx currently splits by creating a MurmurHash2 number (0-4294967295) based on specific variables (eg remote IP) See: http://nginx.org/en/docs/http/ngx_http_split_clients_module.html This allows users to be consistently sent to the same upstream. |
Any update for this feature? |
Maybe #14875 will fix this. |
Is this feature being worked on and any timeline on when will it be available? |
@rgs1 Would definitely be interesting in chatting sometime if you wanted. |
Sure ping me on Slack and we can coordinate, I actually gave some extra thought this week on what it'd take to open source what's described in the above blog post... But couldn't allocate the cycles yet. |
That PR adds support for a consistent matcher to the new matching framework, but that hasn't yet been wired up to work with the router to allow its usage to construct the route table (and I don't think anyone has planned this work yet). You could probably get something working by using a simple filter that uses the consistent matcher to inject a cluster header that can then inform the the router, or even using the new |
Any updates on it ? |
@vadimeisenbergibm did you find any workaround it ? |
@mksha No, I did not, sorry. |
@zuercher are we planning to add this feature in near future? |
Hi, just commenting to show interest for this feature. The use case would be with experimentation. Traffic should be split between control and variant models/apps and users must be directed to a consistent variant. Bucketing is based on a user id header. |
Hello, is there any progress or plan on this feature? thanks! |
@alyssawilk anyone working on it? |
No one is assigned which is a pretty good sign that no one is working on this :-) |
is there a plan to work on it, bec its very generic usecase for everyone. |
The Envoy project doesn't have paid developers, it is an open source community where anyone can contribute. So largely features don't happen until someone wants them enough to implement them or pay someone like Tetrate to do so. |
@rootsongjc Do we have this feature in upstream now? |
@mksha I don't think so. |
I had this same issue, I solved that with the following configuration: apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: sample-virtual-service
spec:
gateways:
- default
hosts:
- sample-host
http:
- match:
- headers:
Cookie:
regex: .*userAffinity=v1.*
route:
- destination:
host: sample-host.svc.cluster.local
port:
number: 80
subset: v1
- match:
- headers:
Cookie:
regex: .*userAffinity=v2.*
route:
- destination:
host: sample-host.svc.cluster.local
port:
number: 80
subset: v2
- route:
- destination:
host: sample-host.svc.cluster.local
port:
number: 80
subset: v1
weight: 75
- destination:
host: sample-host.svc.cluster.local
port:
number: 80
subset: v2
headers:
response:
add:
Set-Cookie: userAffinity=v2; Max-Age=3600000
weight: 25
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: sample-destination-rule
spec:
host: sample-host.svc.cluster.local
subsets:
- labels:
app: sample-app-v1
name: v1
trafficPolicy:
loadBalancer:
consistentHash:
httpCookie:
name: userAffinity=v1
ttl: 1h
- labels:
app: sample-app-v2
name: v2 |
@jcometki We have something similar in place, but i have a question about DestinationRule that you have, May I know what is the need of having |
Did any one get this working? |
Any updates or workaround? : ) |
I'm going to close this ticket. Many of the recent comments have been for projects that build on Envoy and as such aren't really appropriate here. See https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/route/v3/route_components.proto.html#config-route-v3-routeaction-hashpolicy-cookie and https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/stateful_session_filter#config-http-filters-stateful-session for various types of session affinity that are currently supported. The former implements the feature originally requested. |
@jcometki hello. thanks for the config you posted above. I know this was 2 years ago now, but I am wondering if you could provide any more information on how this worked out for you? I think there would be a problem where once a user hits the v2 subset and gets the cookie, all of their subsequent requests would be sent to v2, but those requests would not be taken into account for weighting calculations, therefore, you would end up with more than 5% of requests going to the v2 subset, and this problem would get worse and worse the higher the canary weight was set. If anyone has a solution for this, I'd love to hear. Thanks! |
Can Envoy perform traffic splitting while preserving session affinity? For example, use Ring Hash or Maglev to select the weighted cluster according to its weight? This way, when
route.RouteAction.HashPolicy
is specified, the same weighted cluster and the same endpoint point will be used by applying Ring Hash/Maglev twice, for selecting the weighted cluster and for selecting the endpoint.The text was updated successfully, but these errors were encountered: