Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor iptables rules for NodePort and ExternalIP services #2002

Conversation

alexanderConstantinescu
Copy link
Collaborator

- What this PR does and why is it needed

This patch refactors the iptables rules setup for NodePort and ExternalIP specific services. Mainly it does the following three things:

  • It adds an additional filter on the node IP for NodePort type services, this fixes In local gw mode, creating a nodeport service inhibit north-south traffic on the same port #1981
  • Fixes small issues in the iptables setup w.r.t. dual-stack clusters
  • Removes the filter FORWARD setup for NodePort and ExternalIP services. That chain is used for routing packets which do not have the node IP as source nor destination IP, essentially just forwarding it to another node. However, for NodePort and ExternalIP services we just DNAT the packets on the gateway IP / Cluster IP (depending on the mode we're running in). The problem with adding NodePort and ExternalIP services to filter FORWARD is that for every packet that traverses a node, a lookup will be performed in the underlying iptables implementation in the kernel, which depending on the cluster: can have severe performance impacts on networking, see: https://bugzilla.redhat.com/show_bug.cgi?id=1923157 .

/assign @trozet

- Special notes for reviewers

- How to verify it

- Description for the changelog

@coveralls
Copy link

coveralls commented Feb 3, 2021

Coverage Status

Coverage decreased (-0.2%) to 55.023% when pulling 1ba1ce8 on alexanderConstantinescu:bugfix/restrictive-node-port-rules into 3d18f6c on ovn-org:master.

@danielmellado
Copy link
Collaborator

/assign danielmellado

@danielmellado danielmellado self-requested a review February 4, 2021 16:19
@alexanderConstantinescu alexanderConstantinescu force-pushed the bugfix/restrictive-node-port-rules branch 2 times, most recently from bcc5af6 to fc33441 Compare February 4, 2021 23:44
@alexanderConstantinescu
Copy link
Collaborator Author

/retest

@JacobTanenbaum
Copy link
Contributor

JacobTanenbaum commented Feb 5, 2021

not sure if this is the best place to ask this but there are a bunch of functions that return []*net.IIPNet when they really return an ipv4 and ipv6 address, are there any plans to have multiple interfaces in an ip family? should we be looking into just returning (ipv4, ipv6 *net..IPNet)? -- should an issue be opened for this?

@alexanderConstantinescu
Copy link
Collaborator Author

not sure if this is the best place to ask this but there are a bunch of functions that return []*net.IIPNet when they really return an ipv4 and ipv6 address, are there any plans to have multiple interfaces in an ip family? should we be looking into just returning (ipv4, ipv6 *net..IPNet)? -- should an issue be opened for this?

The convention in most places all throughout our code base is to return []*net.IIPNet containing an IPv4 and IPv6 version, and later use util.MatchIPFamily or util.MatchIPNetFamily to retrieve the specific IP version needed further down the execution. This is an active approach this repo has taken when it comes to handling dual-stack arguments, so it's not an issue.

go-controller/pkg/node/gateway_iptables.go Outdated Show resolved Hide resolved
go-controller/pkg/node/gateway_iptables.go Outdated Show resolved Hide resolved
go-controller/pkg/node/gateway_iptables.go Outdated Show resolved Hide resolved
go-controller/pkg/node/gateway_iptables.go Show resolved Hide resolved
go-controller/pkg/node/gateway_init.go Outdated Show resolved Hide resolved
@alexanderConstantinescu alexanderConstantinescu force-pushed the bugfix/restrictive-node-port-rules branch 3 times, most recently from 8827918 to 2cc7145 Compare February 24, 2021 18:36
Copy link
Contributor

@trozet trozet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall lgtm. Question about upgrade inline and wondering why dual stack conversion failed for shared gw mode

go-controller/pkg/node/gateway_localnet.go Show resolved Hide resolved
@alexanderConstantinescu
Copy link
Collaborator Author

overall lgtm. Question about upgrade inline and wondering why dual stack conversion failed for shared gw mode

So the dual-stack conversion passed, the dual-stack tests executed right after didn't (I didn't understand how to check the output and didn't have time to really dig in)....but I don't understand how dual-stack can even work for node port and external IP services without my second commit in this patch, i.e: how did that ever pass?

@aojea
Copy link
Contributor

aojea commented Feb 25, 2021

So the dual-stack conversion passed, the dual-stack tests executed right after didn't (I didn't understand how to check the output and didn't have time to really dig in)....but I don't understand how dual-stack can even work for node port and external IP services without my second commit in this patch, i.e: how did that ever pass?

the failing dual-stack job conversion failed because the apiserver pod didn't come up , is unrelated to the PR and to OVN ... I´ll have to see if is a real bug or just that the conversion script is too agressive

@alexanderConstantinescu alexanderConstantinescu force-pushed the bugfix/restrictive-node-port-rules branch 2 times, most recently from c22d9c9 to 9de9696 Compare February 25, 2021 17:58
@alexanderConstantinescu
Copy link
Collaborator Author

Last failed build is unrelated to this PR, I opened #2075 as a general tracker for flaking unit tests.

@alexanderConstantinescu
Copy link
Collaborator Author

/retest

1 similar comment
@alexanderConstantinescu
Copy link
Collaborator Author

/retest

go-controller/pkg/node/gateway_iptables.go Outdated Show resolved Hide resolved
go-controller/pkg/node/gateway_iptables.go Outdated Show resolved Hide resolved
Fixes issue: ovn-org#1981

This patch simplifies ovnkube-node to remove the notion of node IP.
This notion only existed for the iptables rules to be configured
correctly. By utilizing `--dst-type LOCAL` we however don't need
that anymore and can simplify our logic.

Signed-off-by: Alexander Constantinescu <aconstan@redhat.com>
While working on the previous commits in this patch, I realized
getGatewayIPTRules wasn't "dual-stack correct". This fixes that.

Signed-off-by: Alexander Constantinescu <aconstan@redhat.com>
Our iptables rules for NodePort and ExternalIP services were setting
up rules in filter FORWARD, which is a table used for routing packets
which don't have the node IP as neither source nore destination IP.

This is not needed for NodePort and ExternalIP. For NodePort services we
currently DNAT directly to the gatewayIP/clusterIP when a packet hits
a node, and this happens on every node, which thus, does not warrant
setting up filter FORWARD rules. In no case will node port packets
just be forwarded to another node.

For ExternalIP there is one case where routing external IP packets to
another node does occur, but we use routing table rules to do this. Which
means the filter FORWARD rules are not needed in that case either.

The problem with having NodePort and ExternalIP iptables rules in filter
FORWARD is that: every packet on the cluster going from one node to another
will be looked up in those chains. They will never match, but the lookup
can be exteremely costly on cluster with a lot of NodePort services defined,
see: https://bugzilla.redhat.com/show_bug.cgi?id=1923157

Also, remove NAT PREROUTING in shared gateway mode, since we configure OVS
flows on the shared gateway bridge to steer traffic into OVN thus rendering
the iptables rules superflous.

Signed-off-by: Alexander Constantinescu <aconstan@redhat.com>

Delete stale iptables jump rules for updates

On updates we need to remove the legacy jump rules for each
mode. This will handle that. Unfortunately there's no generic
way to do so depending on the services that we find, so we
still need them hard-coded. For posteriority's sake we should
really always have them, since we don't release ovn-kubernetes
and we never know when someone deploying ovn-kubernetes reaches
N+2 and can remove these.

Signed-off-by: Alexander Constantinescu <aconstan@redhat.com>
Signed-off-by: Alexander Constantinescu <aconstan@redhat.com>
@trozet trozet merged commit dad3799 into ovn-org:master Mar 3, 2021
OVN-K-Platform PR Reviews automation moved this from Tim Rozet to Merged or Closed Mar 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

In local gw mode, creating a nodeport service inhibit north-south traffic on the same port
7 participants