New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use masquerade default gateway when no default gw is found #4122
Conversation
Changes unknown |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall lgtm just a question around the warning print
also wider question, I wasn't involved in the design of AllowNoUplink
option where we deploy LGW without interfacing with host NIC, but currently this is really used only in bridgeForInterface
anyways just checking this option is still relevant in OVNK considering we decided to add the default route irrespective of whether this option is set or not
} | ||
if len(defaultGatewayNextHops) == 0 { | ||
klog.Warning("No default route identified in the host. Egress features and pod egress traffic in " + | ||
"shared gateway mode may not function correctly!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to print this warning in local gateway mode but without the allowNoUplink set as well which is what we are doing now? in which case I'd remove the shared gateway (or routingViaHost=false) from the log..? But I think these features are expected to work on LGW mode (though maybe not EIPs..), with allowNoUplink disabled in which case we shouldn't be printing this warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree for allowNoUplink, we can probably skip printing it.
I guess I am also missing a comma. It should be:
Egress features, and pod egress traffic in shared gateway mode may not function correctly
Which I meant to say only egress pod traffic in shared gateway mode might not work correctly. Egress pod traffic in LGW will still work. Egress features may not work in both modes.
Should I reword it to clarify they are 2 separate things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes let's reword to clarify, thanks!
// For local gw, if not default gateway is available or the provide gateway interface is not the host gateway interface | ||
// use nexthop masquerade IP as GR default gw to steer traffic to the gateway bridge | ||
// use nexthop masquerade IP as GR default gw to steer traffic to the gateway bridge, and then the host for routing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self-note: so what happens is:
- host->kapi clusterIP
- sent to br-ex and then to OVN via GR using service routes on host
- GR DNATs to secondary network backend pod IPs
- no default route on GR so we drop
This change makes the masqueradeIP be the default route so - Traffic instead of dropping comes back to host
- then host routes and iptable masquerade takes this to the secondary nic
thanks @trozet for the nice PR description which made it easy to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right!
good question. So I debated removing allowNoUplink as well. I think the one key difference is this option still allows for no uplink at all to be found/attached to the bridge: So without this option, if we cannot find an interface to attach to the bridge, then we consider it as a failed state unless explicitly told with allowNoUplink. I'm not sure how useful this is or not, but I decided to leave it alone since it isnt broke :) |
@trozet : wanna repush here? |
1ccb958
to
4e3e430
Compare
addressed the comments in last push, will now push another with just rebase |
When no default gateway is found, OVNK will still come up. This behavior was intentional: 3eeb930 The goal was that with design changes, host->service traffic would work without the need for a default gateway to be detected. This works, in most cases, but there is a case we did not consider. If a user has deployed a cluster with local gateway mode, and is using a different interface/network for kapi traffic (not the gateway bridge interface), then endpoints to the kubernetes service will reside on this other network. In this case when a host tries to talk to kube API service, the traffic would go to the GR, it would be DNAT'ed to an IP address not on any known network, and be dropped by OVN routing when there is no default gateway route. When the configuration parameter AllowNoUplink is set, we will set the default route to be the masquerade IP, which solves this problem. This commit changes the behavior to not require AllowNoUplink to be set in order to achieve the same behavior, as services should work in this scenario even without a default gw. Signed-off-by: Tim Rozet <trozet@redhat.com>
4e3e430
to
7072d98
Compare
@@ -184,10 +184,19 @@ func getGatewayNextHops() ([]net.IP, string, error) { | |||
} | |||
gatewayIntf = defaultGatewayIntf | |||
} else { | |||
if gatewayIntf != defaultGatewayIntf { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
too much if's and else's for my taste :) but we can clean code later, I see value in printing different warnings, will help us debug better in the future.
@@ -1857,7 +1857,6 @@ var _ = Describe("Gateway unit tests", func() { | |||
gwIPs := []net.IP{config.Gateway.MasqueradeIPs.V4DummyNextHopMasqueradeIP} | |||
config.Gateway.Interface = dummyBridgeName | |||
config.Gateway.Mode = config.GatewayModeLocal | |||
config.Gateway.AllowNoUplink = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self-note: there is another UT testing with allowNoUplink to true so we do have coverage.
Fix included in accepted release 4.16.0-0.nightly-2024-03-21-152650 |
When no default gateway is found, OVNK will still come up. This behavior was intentional: 3eeb930
The goal was that with design changes, host->service traffic would work without the need for a default gateway to be detected. This works, in most cases, but there is a case we did not consider. If a user has deployed a cluster with local gateway mode, and is using a different interface/network for kapi traffic (not the gateway bridge interface), then endpoints to the kubernetes service will reside on this other network. In this case when a host tries to talk to kube API service, the traffic would go to the GR, it would be DNAT'ed to an IP address not on any known network, and be dropped by OVN routing when there is no default gateway route.
When the configuration parameter AllowNoUplink is set, we will set the default route to be the masquerade IP, which solves this problem. This commit changes the behavior to not require AllowNoUplink to be set in order to achieve the same behavior, as services should work in this scenario even without a default gw.