Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(egress): routing using MeshHTTPRoute and VirtualOutbound #7536

Merged
merged 14 commits into from Aug 21, 2023

Conversation

jakubdyszkiewicz
Copy link
Contributor

@jakubdyszkiewicz jakubdyszkiewicz commented Aug 17, 2023

Fix #5871
Fix #7527

The first problem

When we do cross-zone routing using TrafficRoute, MeshHTTPRoute, VirtualOutbound etc. we configure ZoneIngress with FilterChainMatch based on SNI to route the traffic to the proper destination without TLS termination. We implemented ingress_generator_destination.go to generate all possible destinations from the policies.

When we add ZoneEgress to this picture, ZoneEgress passes the request to ZoneIngress without TLS termination as well. It has to be configured in the same way as ZoneIngress, but with endpoints of ZoneIngresses instead of actual services.

As we were adding new ways of routing, we were adding this to ZoneIngress (in ingress_generator_destination.go) without adding it to ZoneEgress.

The solution to the first problem

I refactored the ZoneEgress code, because I noticed that a lot of it has to be almost the same between ZoneIngress. I created zoneproxy package where I moved code to build destinations and to build clusters, endpoints, and filter chains. This way we remove code duplication and we avoid the problem of not configuring ZoneEgress and ZoneIngress in the same way.

The second problem

Instead of creating new clusters for every split, Zone proxies are using LB subset.
For example, there is service:a with version:1 and version:2
ZoneIngress has a cluster service:a with endpoint and metadata version:1 and version:2 and splits based on this metadata. This works well for ZoneIngress, but if we do this for ZoneEgress we very easily end up in a situation where both endpoints for version:1 and version:2 point to the same ZoneIngress. Envoy then does the endpoint deduplication based on IP:port and we end up only with one endpoint instead of two.

The solution to the second problem

When we build the LB subset, we need to check if selecting a subset of endpoints actually selects different endpoints. If not, it means that this subset is useless. For example (from Egress perspective):
Ingress1 (10.0.0.1:1000) supports service:a,version:1 and service:a,version:2
Ingress2 (10.0.0.2:1000) supports service:a,version:1 and service:a,version:2
and we have a destination of service:a,version:1. It means that we don't have to do subset because both ingresses support service:a,version:1.
However, ZoneIngress still needs to do routing to a version:1 which has a different endpoint than version:2. All we have to do is send proper SNI regardless if we place LB subset or not.
The good news here is that ZoneEgress passes SNI through to ZoneIngress.
This solution is implemented in AddFilterChains#relevantTags

Tests

The problems existed because we did not have good enough coverage of ZoneEgress.
We could add yet another entry in a test matrix to test multizone suite with and without Mesh#zoneEgress:true on the mesh. However, this adds CI runs which are already expensive.
The alternative is to run routing-related tests with Mesh#zoneEgress:true/false. Those tests should work exactly the same with or without egress. I implemented it for MeshHTTPRoute and VirtualOutbound.
I think we need a bit more coverage for E2E tests, but I want to add this as a follow-up to not expand already big PR.

Additionally, I ported Mike's MeshHTTPRoute unit tests to check if this solution works.

This PR is a superset of #7516

Backport

Should we backport this? Technically it's a fix, but it's pretty big change.

  • Link to relevant issue as well as docs and UI issues --
  • This will not break child repos: it doesn't hardcode values (.e.g "kumahq" as a image registry) and it will work on Windows, system specific functions like syscall.Mkfifo have equivalent implementation on the other OS --
  • Tests (Unit test, E2E tests, manual test on universal and k8s) --
  • Do you need to update UPGRADE.md? --
  • Does it need to be backported according to the backporting policy? (this GH action will add "backport" label based on these file globs, if you want to prevent it from adding the "backport" label use no-backport-autolabel label) --
  • Do you need to explicitly set a > Changelog: entry here or add a ci/ label to run fewer/more tests?

Signed-off-by: Jakub Dyszkiewicz <jakub.dyszkiewicz@gmail.com>
Signed-off-by: Jakub Dyszkiewicz <jakub.dyszkiewicz@gmail.com>
Signed-off-by: Jakub Dyszkiewicz <jakub.dyszkiewicz@gmail.com>
Signed-off-by: Jakub Dyszkiewicz <jakub.dyszkiewicz@gmail.com>
Signed-off-by: Jakub Dyszkiewicz <jakub.dyszkiewicz@gmail.com>
Signed-off-by: Jakub Dyszkiewicz <jakub.dyszkiewicz@gmail.com>
Signed-off-by: Jakub Dyszkiewicz <jakub.dyszkiewicz@gmail.com>
Signed-off-by: Jakub Dyszkiewicz <jakub.dyszkiewicz@gmail.com>
Signed-off-by: Jakub Dyszkiewicz <jakub.dyszkiewicz@gmail.com>
@jakubdyszkiewicz jakubdyszkiewicz marked this pull request as ready for review August 18, 2023 09:59
@jakubdyszkiewicz jakubdyszkiewicz requested a review from a team as a code owner August 18, 2023 09:59
@jakubdyszkiewicz jakubdyszkiewicz requested review from lobkovilya and lukidzi and removed request for a team August 18, 2023 09:59
Signed-off-by: Jakub Dyszkiewicz <jakub.dyszkiewicz@gmail.com>
Signed-off-by: Jakub Dyszkiewicz <jakub.dyszkiewicz@gmail.com>
Copy link
Contributor

@lahabana lahabana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to make sense to me but it's a hard code change. I want to read it again later :)

pkg/xds/generator/zoneproxy/generator.go Outdated Show resolved Hide resolved
pkg/xds/generator/zoneproxy/generator.go Show resolved Hide resolved
test/e2e_env/multizone/meshhttproute/test.go Show resolved Hide resolved
Signed-off-by: Jakub Dyszkiewicz <jakub.dyszkiewicz@gmail.com>
Signed-off-by: Jakub Dyszkiewicz <jakub.dyszkiewicz@gmail.com>
@jakubdyszkiewicz jakubdyszkiewicz enabled auto-merge (squash) August 21, 2023 15:00
@jakubdyszkiewicz jakubdyszkiewicz merged commit 5b823b7 into kumahq:master Aug 21, 2023
5 checks passed
@jakubdyszkiewicz jakubdyszkiewicz deleted the egress-zo branch August 22, 2023 08:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cross zone virtual outbound with zone egress MeshHTTPRoute cross zone support
4 participants