New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix inbound traffic routing with Istio for VMI with explicitly specified ports #5200
Fix inbound traffic routing with Istio for VMI with explicitly specified ports #5200
Conversation
bd1674a
to
17291c3
Compare
17291c3
to
a24ada6
Compare
/retest |
1 similar comment
/retest |
a24ada6
to
8856e81
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Radim!
I know I lgtmed already, but I thought of something. You hit this issue when there was no inbound traffic to the VM right? |
That's a good point, there is no test in place atm. |
Please add a release note. |
@@ -1122,6 +1130,15 @@ func (b *MasqueradeBindMechanism) createNatRulesUsingNftables(proto iptables.Pro | |||
if port.Protocol == "" { | |||
port.Protocol = "tcp" | |||
} | |||
err = Handler.NftablesAppendRule(proto, "nat", "KUBEVIRT_POSTINBOUND", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this rule be added outside the ports loop (only if no ports were declared on the vm)?
AFAIU the scenario. If the port is declared on the pod container (in our case the virt-launcher compute container inherits ports declared on the vm/i) then 127.0.0.1 is used. Otherwise, 127.0.0.6 is used. In our case, we want to pass to the vm only ports that are declared on the vm or any port if nothing is declared. So from my understanding we should add the 127.0.0.6 rule only in case no ports are declared on the vm and it should forward any port.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIU the scenario. If the port is declared on the pod container (in our case the virt-launcher compute container inherits ports declared on the vm/i) then 127.0.0.1 is used. Otherwise, 127.0.0.6 is used. In our case, we want to pass to the vm only ports that are declared on the vm or any port if nothing is declared.
It doesn't depend on what ports are specified in our VMI/virt-launcher pod, it is the ports that are specified in an associated k8s service resource that makes the difference. If you look at this comment in that issue istio/istio#29603 (comment)
You'll see this comment describes the following scenario:
apiVersion: v1
kind: Service
metadata:
name: vmi-cirros
spec:
selector:
app: vmi-cirros
ports:
- port: 8080
Let's assume VMI with the following interface:
interfaces:
- name: default
masquerade: {}
ports:
- port: 8080
- port: 8081
curl <virt-launcher-pod_IP>:8080 => Envoy proxies the packet from 127.0.0.1 to 127.0.0.1:8080
curl <virt-launcher-pod_IP>:8081 => Envoy proxies the packet from 127.0.0.6 to <virt-launcher-pod_IP>:8081
So from my understanding we should add the 127.0.0.6 rule only in case no ports are declared on the vm and it should forward any port.
We could add these two rules outside the ports loop as well, where we wouldn't match the destination port. It would fix this problem for a VM without explicit ports list too. I opted for adding rules that are as specific as possible with the intention to add the rules for VM without explicit ports in a followup PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood, thanks for the explanation.
Since this new rule and the next one are practically the same. I would add them in a for loop (iterating over the possible loopback addresses) instead of copy pasting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add them in a for loop
Done
I agree with Alon, an e2e test for istio is missing in general and therefore, for this specific use case as well. |
8856e81
to
83ed698
Compare
@AlonaKaplan Added e2e tests for inbound traffic scenarios. I tested them locally, but since 1.20 and higher is required for them to run, these tests are currently not being executed in CI |
83ed698
to
146aef8
Compare
@@ -1113,6 +1113,18 @@ func (b *MasqueradeBindMechanism) getVifIpByProtocol(proto iptables.Protocol) st | |||
} | |||
} | |||
|
|||
func (b *MasqueradeBindMechanism) getDstAddressesToDnat(proto iptables.Protocol) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you define this function after the caller ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved both of these, Thanks
658811e
to
af0422b
Compare
} | ||
} | ||
return nil | ||
} | ||
|
||
func getSrcAddressesToSnat(vmi *v1.VirtualMachineInstance, proto iptables.Protocol) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why this is not a method of this particular bind mech ?
For instance, getDstAddressesToDnat
(you're adding it a bit later on) is a method.
(not saying it must be.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No reason in particular, I didn't add the masqueradeBind receiver because it's not used.
Similarly, getLoopbackAddress
doesn't have it, although I see that this method is more specific for MasqueradeBind.
Do you think it should be added to express the relation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, you don't use the receiver, but you could - instead of using the vmi input parameter, you could use the member of the receiver struct.
Anyway. These are peanuts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point actually, I missed that. I like the idea of these method having the same parameters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
af0422b
to
beac357
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again.
/retest |
2 similar comments
/retest |
/retest |
This commit introduces changes to masquarade binding method that allow correct Istio proxy sidecar functionality. Only VMI with explicitly specified list of ports is considered in this commit. There are two issues preventing istio proxy from working: 1) dnat rule that forwards traffic to the VM is matched at prerouting, which makes the inbound traffic to bypass the proxy. To avoid that, the prerouting dnat rule is not appended, if Istio is being used, which is decided based on the VMI having "sidecar.istio.io/inject" annotation set to "true". 2) Second issue is related to how istio proxy treats traffic targetting port that is declared in an associated k8s service vs traffic targetting port which is not declared in associated k8s service. Traffic that targets port declared in k8s service is sent by the proxy from 127.0.0.1 to 127.0.0.1:port, which is matched by the existing rules. However, if the traffic targets port not declared in an associated k8s service, Istio proxy sends the traffic from 127.0.0.6 to podIP:port. For these situations, we don't have appropriate rules and so this traffic is not being forwarded to the VM. This commit adds additional 2 rules matching traffic that originates from 127.0.0.6 in output and KUBEVIRT_POSTINBOUND chains. As per discussion kubevirt#5200 (comment) there was an attempt to match destination address instead, to avoid potentionally matching of external traffic, but then we realized, that this comes with consequence - the KUBEVIRT_POSTINBOUND snat rule would match too, causing the source address of traffic that reaches VM from pod network to be 10.0.2.1, which is not desired. To illustrate the inbound routing process with Istio proxy involved, see the following diagram depicting proxying packet that targets port declared in k8s service for the VMI: __Inbound traffic destined to port 8080 declared in k8s svc__ ┌────────────┐ │ k8s network│ └──┬─────────┘ │ │ ▼ PREROUTING STAGE ┌────────────┐ │ prerouting │(prerouting hook) └──┬─────────┘ │ │ iifname "eth0" counter packets 991 bytes 59460 jump KUBEVIRT_PREINBOUND ▼ ┌──────────────────────┐ │ KUBEVIRT_PREINBOUND │ <empty> └──────────────────────┘ ┌───────────┐ │PREROUTING │ (prerouting hook) └─┬─────────┘ │ │meta l4proto tcp counter packets 150 bytes 9000 jump ISTIO_INBOUND ▼ ┌─────────────┐ │ISTIO_INBOUND│ └┬────────────┘ │ │meta l4proto tcp counter packets 0 bytes 0 jump ISTIO_IN_REDIRECT ▼ ┌─────────────────┐ │ISTIO_IN_REDIRECT│ tcp counter packets 0 bytes 0 redirect to :15006 └───────┬─────────┘ │ │ ▼ ┌─────────────┐ │ENVOY PROCESS│ Sends the packet from 127.0.0.1 to 127.0.0.1:8080 └───┬─────────┘ │ │ ▼ ┌─────────┐ │ output │ ip daddr 127.0.0.1 tcp dport 8080 dnat to 10.0.2.2 └───┬─────┘ │ │ ▼ ┌───────────┐ │postrouting│ └───┬───────┘ │ │ oifname "k6t-eth0" counter packets 0 bytes 0 jump KUBEVIRT_POSTINBOUND ▼ ┌──────────────────────┐ │ KUBEVIRT_POSTINBOUND │ tcp dport 8080 ip daddr 10.0.2.2 snat to 10.0.2.1 └──────────────────────┘ As you can see, in KUBEVIRT_PREINBOUND, we don't dnat, which allows us to get to ISTIO_IN_REDIRECT, which redirects the packet to port 15006, which is where the proxy listens for inbound traffic. Proxy sends the packet from localhost to localhost:8080. (Information about the original dport is retrieved from the socket option SO_ORIGINAL_DST) After that, src and dst IP addresses are changed at output and postrouting chains to forward the proxied packet to the VM. Another scenario, where the inbound traffic targets port undeclared in a k8s service (8081 for example), is quite similar. The difference is that proxy sends the packet from 127.0.0.6 to pod IP address: ┌─────────────┐ │ENVOY PROCESS│ Sends the packet from 127.0.0.6 to podIP:8081 └─────────────┘ This traffic sent from 127.0.0.6 is matched by the rules added to output and KUBEVIRT_POSTINBOUND. Signed-off-by: Radim Hrazdil <rhrazdil@redhat.com>
beac357
to
2f31699
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK; I now see that my comments about exporting the SNAT/DNAT operations into functions no longer apply, since you know handle them differently.
You can ignore those comments, and focus on the matchers (gomega) one instead.
err = b.handler.NftablesAppendRule(proto, "nat", "KUBEVIRT_PREINBOUND", | ||
strings.ToLower(port.Protocol), | ||
"dport", | ||
strconv.Itoa(int(port.Port)), | ||
"counter", "dnat", "to", b.getVifIpByProtocol(proto)) | ||
if err != nil { | ||
return err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you export this to a function ? performDNATOnPreInbound
or something like that ?
err = b.handler.NftablesAppendRule(proto, "nat", "output", | ||
b.handler.GetNFTIPString(proto), "saddr", getEnvoyLoopbackAddress(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here; would you export this to a function ? performDNATForEnvoyTrafficOnOutputChain
or something similar.
Everytime you rebase, I have to read this again, and re-learn what this does. I think encapsulating this in a function with a meaningful name would be helpful.
Also for the SNAT performed below.
tests/libnet/namespace.go
Outdated
. "github.com/onsi/gomega" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This import is fishy; helpers shouldn't use gomega / ginkgo.
Why would you need matchers on this file ? I see that you remove the import in the next commit; please squash that 'removal' in this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were forgotten Expect
calls by mistake which is why the import remained.
I have squashed the e2e test commit with the commit that refactored the namespace helpers - there's no need to have that commit separated
2f31699
to
622b1e4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to go !
This commit introduces changes to masquarade binding method that allow correct Istio proxy sidecar functionality. Only VMI with explicitly specified list of ports is considered in this commit. There are two issues preventing istio proxy from working: 1) dnat rule that forwards traffic to the VM is matched at prerouting, which makes the inbound traffic to bypass the proxy. To avoid that, the prerouting dnat rule is not appended, if Istio is being used, which is decided based on the VMI having "sidecar.istio.io/inject" annotation set to "true". 2) Second issue is related to how istio proxy treats traffic targetting port that is declared in an associated k8s service vs traffic targetting port which is not declared in associated k8s service. Traffic that targets port declared in k8s service is sent by the proxy from 127.0.0.1 to 127.0.0.1:port, which is matched by the existing rules. However, if the traffic targets port not declared in an associated k8s service, Istio proxy sends the traffic from 127.0.0.6 to podIP:port. For these situations, we don't have appropriate rules and so this traffic is not being forwarded to the VM. This commit adds additional 2 rules matching traffic that originates from 127.0.0.6 in output and KUBEVIRT_POSTINBOUND chains. As per discussion kubevirt#5200 (comment) there was an attempt to match destination address instead, to avoid potentionally matching of external traffic, but then we realized, that this comes with consequence - the KUBEVIRT_POSTINBOUND snat rule would match too, causing the source address of traffic that reaches VM from pod network to be 10.0.2.1, which is not desired. To illustrate the inbound routing process with Istio proxy involved, see the following diagram depicting proxying packet that targets port declared in k8s service for the VMI: __Inbound traffic destined to port 8080 declared in k8s svc__ ┌────────────┐ │ k8s network│ └──┬─────────┘ │ │ ▼ PREROUTING STAGE ┌────────────┐ │ prerouting │(prerouting hook) └──┬─────────┘ │ │ iifname "eth0" counter packets 991 bytes 59460 jump KUBEVIRT_PREINBOUND ▼ ┌──────────────────────┐ │ KUBEVIRT_PREINBOUND │ <empty> └──────────────────────┘ ┌───────────┐ │PREROUTING │ (prerouting hook) └─┬─────────┘ │ │meta l4proto tcp counter packets 150 bytes 9000 jump ISTIO_INBOUND ▼ ┌─────────────┐ │ISTIO_INBOUND│ └┬────────────┘ │ │meta l4proto tcp counter packets 0 bytes 0 jump ISTIO_IN_REDIRECT ▼ ┌─────────────────┐ │ISTIO_IN_REDIRECT│ tcp counter packets 0 bytes 0 redirect to :15006 └───────┬─────────┘ │ │ ▼ ┌─────────────┐ │ENVOY PROCESS│ Sends the packet from 127.0.0.1 to 127.0.0.1:8080 └───┬─────────┘ │ │ ▼ ┌─────────┐ │ output │ ip daddr 127.0.0.1 tcp dport 8080 dnat to 10.0.2.2 └───┬─────┘ │ │ ▼ ┌───────────┐ │postrouting│ └───┬───────┘ │ │ oifname "k6t-eth0" counter packets 0 bytes 0 jump KUBEVIRT_POSTINBOUND ▼ ┌──────────────────────┐ │ KUBEVIRT_POSTINBOUND │ tcp dport 8080 ip daddr 10.0.2.2 snat to 10.0.2.1 └──────────────────────┘ As you can see, in KUBEVIRT_PREINBOUND, we don't dnat, which allows us to get to ISTIO_IN_REDIRECT, which redirects the packet to port 15006, which is where the proxy listens for inbound traffic. Proxy sends the packet from localhost to localhost:8080. (Information about the original dport is retrieved from the socket option SO_ORIGINAL_DST) After that, src and dst IP addresses are changed at output and postrouting chains to forward the proxied packet to the VM. Another scenario, where the inbound traffic targets port undeclared in a k8s service (8081 for example), is quite similar. The difference is that proxy sends the packet from 127.0.0.6 to pod IP address: ┌─────────────┐ │ENVOY PROCESS│ Sends the packet from 127.0.0.6 to podIP:8081 └─────────────┘ This traffic sent from 127.0.0.6 is matched by the rules added to output and KUBEVIRT_POSTINBOUND. Signed-off-by: Radim Hrazdil <rhrazdil@redhat.com>
Added test scenarios that cover the followig cases for VMIs with explicitly specified list of ports: 1) Inbound traffic - Test traffic is correctly routed to the VM - svc running on a port declared in k8s svc is reachable - svc running on a port NOT declared in k8s svc is reachable - With PeerAuthentication (Istio resource) set to only accept mTLS traffic (that is only from clients within the mesh). These tests verify that traffic is routed through the proxy and non-TLS encrypted traffic is dropped and doesn't reach VMI - svc running on a port declared in k8s svc is not reachable - svc running on a port NOT declared in k8s svc is not reachable 2) Outbound traffic - Test that HTTP server running on VMI outside of the mesh is reachable - With OutboundTrafficPolicy set to REGISTRY_ONLY (that is, external services (anything outside mesh) must have explicit ServiceEntry allowing outbount traffic to them). This test verifies that outbound traffic is proxied and doesn't get through - Test that HTTP server running on VMI outside of the mesh is not reachable Tests require to run with 1.20 and higher provider because the option to deploy Istio was added in kubevirtci 1.20. KUBEVIRT_DEPLOY_ISTIO env var must be to true to have Istio deployed. Otherwise tests are skipped. Signed-off-by: Radim Hrazdil <rhrazdil@redhat.com> Refactor Namespace utility functions Move helper methods for handling nmstate labels to separate file in libnet. Signed-off-by: Radim Hrazdil <rhrazdil@redhat.com>
Instead of appending multiple identical rules for snatting, with only different addresses, let's add all addresses to match in a single rule. Signed-off-by: Radim Hrazdil <rhrazdil@redhat.com>
Currently the receiver struct for this method is podNIC. To have this method accessible from MasqueradeBindMechanism I'm moving it to the NetworkHandler. Signed-off-by: Radim Hrazdil <rhrazdil@redhat.com>
This commit adds destination addresses to match for dnat in a single rule. By default traffic destined to 127.0.0.1 is DNATed. With istio, we need to additionally take care of passthrough traffic (traffic destined to ports not declared in a k8s) which is sent from 127.0.0.6 to pod IP. Since we're using destination address to match traffic for dnatting, this commit reads pod IP of the virt launcher and adds it to the set of matched IP addresses, if istio is used. Signed-off-by: Radim Hrazdil <rhrazdil@redhat.com>
622b1e4
to
a6f38e0
Compare
Signed-off-by: Radim Hrazdil rhrazdil@redhat.com
What this PR does / why we need it:
Add specific nat rules for inbound envoy routing
This commit adds nftable rules that correctly route traffic when Istio sidecar is injected to VMI.
Iptable rules are not being added intentionally for testability reasons - iptables are not tested in KubeVirt CI anymore.
nftables
are required for this feature.The Envoy proxy treats inbound traffic in the following manner:
port
declared in an associated k8s service: after proxying is sent from 127.0.0.1 to 127.0.0.1:port
port
NOT declared in an associated k8s service: after proxying is sent from 127.0.0.6 topodIP:port
Changes:
e2e tests for VMI with Istio sidecar:
Added test scenarios that cover the followig cases for VMIs with
explicitly specified list of ports:
1) Inbound traffic
Test traffic is correctly routed to the VM
With PeerAuthentication (Istio resource) set to only accept mTLS
traffic (that is only from clients within the mesh).
These tests verify that traffic is routed through the proxy
and non-TLS encrypted traffic is dropped and doesn't reach VMI
2) Outbound traffic
services (anything outside mesh) must have explicit ServiceEntry
allowing outbount traffic to them).
This test verifies that outbound traffic is proxied and doesn't
get through without
Tests require to run with 1.20 and higher provider because the
option to deploy Istio was added in kubevirtci 1.20.
KUBEVIRT_DEPLOY_ISTIO env var must be to true to have Istio deployed.
Otherwise tests are skipped.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #
Special notes for your reviewer:
Design doc:
https://docs.google.com/document/d/1aBBCFBNxnOEQn3NH1ZiOcVnhHuiWmo3w451agmZxi_Q/edit
Important comments from the loong conversation
#5200 (comment)
Release note: