New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(egress): routing using MeshHTTPRoute and VirtualOutbound #7536
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Jakub Dyszkiewicz <jakub.dyszkiewicz@gmail.com>
Signed-off-by: Jakub Dyszkiewicz <jakub.dyszkiewicz@gmail.com>
Signed-off-by: Jakub Dyszkiewicz <jakub.dyszkiewicz@gmail.com>
Signed-off-by: Jakub Dyszkiewicz <jakub.dyszkiewicz@gmail.com>
Signed-off-by: Jakub Dyszkiewicz <jakub.dyszkiewicz@gmail.com>
Signed-off-by: Jakub Dyszkiewicz <jakub.dyszkiewicz@gmail.com>
Signed-off-by: Jakub Dyszkiewicz <jakub.dyszkiewicz@gmail.com>
jakubdyszkiewicz
requested review from
lobkovilya and
lukidzi
and removed request for
a team
August 18, 2023 09:59
Signed-off-by: Jakub Dyszkiewicz <jakub.dyszkiewicz@gmail.com>
Signed-off-by: Jakub Dyszkiewicz <jakub.dyszkiewicz@gmail.com>
lahabana
reviewed
Aug 18, 2023
lahabana
reviewed
Aug 18, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to make sense to me but it's a hard code change. I want to read it again later :)
lobkovilya
reviewed
Aug 21, 2023
michaelbeaumont
approved these changes
Aug 21, 2023
lobkovilya
approved these changes
Aug 21, 2023
6 tasks
5 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fix #5871
Fix #7527
The first problem
When we do cross-zone routing using TrafficRoute, MeshHTTPRoute, VirtualOutbound etc. we configure ZoneIngress with FilterChainMatch based on SNI to route the traffic to the proper destination without TLS termination. We implemented
ingress_generator_destination.go
to generate all possible destinations from the policies.When we add ZoneEgress to this picture, ZoneEgress passes the request to ZoneIngress without TLS termination as well. It has to be configured in the same way as ZoneIngress, but with endpoints of ZoneIngresses instead of actual services.
As we were adding new ways of routing, we were adding this to ZoneIngress (in
ingress_generator_destination.go
) without adding it to ZoneEgress.The solution to the first problem
I refactored the ZoneEgress code, because I noticed that a lot of it has to be almost the same between ZoneIngress. I created
zoneproxy
package where I moved code to build destinations and to build clusters, endpoints, and filter chains. This way we remove code duplication and we avoid the problem of not configuring ZoneEgress and ZoneIngress in the same way.The second problem
Instead of creating new clusters for every split, Zone proxies are using LB subset.
For example, there is
service:a
withversion:1
andversion:2
ZoneIngress has a cluster
service:a
with endpoint and metadataversion:1
andversion:2
and splits based on this metadata. This works well for ZoneIngress, but if we do this for ZoneEgress we very easily end up in a situation where both endpoints forversion:1
andversion:2
point to the same ZoneIngress. Envoy then does the endpoint deduplication based on IP:port and we end up only with one endpoint instead of two.The solution to the second problem
When we build the LB subset, we need to check if selecting a subset of endpoints actually selects different endpoints. If not, it means that this subset is useless. For example (from Egress perspective):
Ingress1 (10.0.0.1:1000) supports
service:a,version:1
andservice:a,version:2
Ingress2 (10.0.0.2:1000) supports
service:a,version:1
andservice:a,version:2
and we have a destination of
service:a,version:1
. It means that we don't have to do subset because both ingresses supportservice:a,version:1
.However, ZoneIngress still needs to do routing to a
version:1
which has a different endpoint thanversion:2
. All we have to do is send proper SNI regardless if we place LB subset or not.The good news here is that ZoneEgress passes SNI through to ZoneIngress.
This solution is implemented in
AddFilterChains#relevantTags
Tests
The problems existed because we did not have good enough coverage of ZoneEgress.
We could add yet another entry in a test matrix to test multizone suite with and without
Mesh#zoneEgress:true
on the mesh. However, this adds CI runs which are already expensive.The alternative is to run routing-related tests with
Mesh#zoneEgress:true/false
. Those tests should work exactly the same with or without egress. I implemented it for MeshHTTPRoute and VirtualOutbound.I think we need a bit more coverage for E2E tests, but I want to add this as a follow-up to not expand already big PR.
Additionally, I ported Mike's MeshHTTPRoute unit tests to check if this solution works.
This PR is a superset of #7516
Backport
Should we backport this? Technically it's a fix, but it's pretty big change.
syscall.Mkfifo
have equivalent implementation on the other OS --UPGRADE.md
? --> Changelog:
entry here or add aci/
label to run fewer/more tests?