Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for EgressIP and ExternalTrafficPolicy=Local to coexist #4265

Merged
merged 9 commits into from
May 16, 2024

Conversation

tssurya
Copy link
Member

@tssurya tssurya commented Apr 9, 2024

- What this PR does and why is it needed

This PR fixes the broken interaction between egressIPs and ETP=local features.
Currently if an EIP served pod is on a different node than the egressNode and
it is also a backend to the ETP=local service type, then the reply packet destined
for the ingress service traffic served by the EIP pod get's re-routed to the egressNode
using the reRoute policies we add for EIPs. This breaks the service traffic.

This PR implements the solution we did via https://issues.redhat.com/browse/FDP-42
to ensure these features work together seamlessly.

Solution:

Step1: Add a qos-rule to the node switches:

from-lport 103 (ip4.src == $a4548040316634674295 && ct.trk && ct.rpl) mark=42
where a4548040316634674295 is the address set that contains all EIP pods.

Step2: Add new LRP at 102 to match on mark and simply "allow" thus skipping the re-routes
LRP:

sample:

102 pkt.mark == 42 allow

However we must do the mark changes ONLY for local zone pods not for remote zone pods, else it breaks traffic since mark doesn't persist outside the local node.

- Special notes for reviewers

  • The logic to add QoS Rules has been added to the syncNamespaces/initClusterPolicies function so that on start all nodes get this rule
  • It has also being added to the addNodeEvent handler so that when new node is added we get this rule
  • It has not been added to updateNode event since there is nothing to do during node updates
  • It has also not been added to the deleteNode event handler since switch goes away and rules will also go away unless it is still being referenced by another switch
  • EgressSVC feature has not been added to this mix because there the reply traffic should be going out with SVC VIP so unsure what the expected behaviour for that feature is; but in the future if that is something that needs to be fixed it should be easy to do that the same way
  • since its new LRP/QoS add it will take effect on upgrades

- How to verify it

  1. All changes to existing unit tests already cover the unit testing around this
  2. E2E has been added for cross functionality testing between services and egress ips.

The 2024-04-25T14:09:40.7078121Z �[0mLoad Balancer Service Tests with MetalLB �[0m�[1mShould ensure load balancer service works when ETP=local and backend pods are also egressIP served pods�[0m is running on both v4 and v6 lanes for LGW gateway mode and here the egressNode is different from the node where the pod lives

AND should work on secondary node interfaces for ETP=local and ETP=cluster when backend pods are also served by EgressIP is running on SGW mode both v4 & v6

- Description for the changelog
Support ETP=local with EIP

/label ci

Copy link

netlify bot commented Apr 9, 2024

Deploy Preview for subtle-torrone-bb0c84 ready!

Name Link
🔨 Latest commit 7301aeb
🔍 Latest deploy log https://app.netlify.com/sites/subtle-torrone-bb0c84/deploys/662a7cef61943b00087a4636
😎 Deploy Preview https://deploy-preview-4265--subtle-torrone-bb0c84.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@tssurya tssurya added feature/egress-ip Issues related to EgressIP feature kind/feature All issues/PRs that are new features feature/services&endpoints All issues related to the Servces/Endpoints API labels Apr 9, 2024
@tssurya tssurya added this to the v1.0.0 milestone Apr 9, 2024
@tssurya
Copy link
Member Author

tssurya commented Apr 9, 2024

TODO: Add tests.

@tssurya tssurya force-pushed the fix-EIP-ETP-local branch 2 times, most recently from c9f861d to c635ee0 Compare April 15, 2024 13:19
@coveralls
Copy link

coveralls commented Apr 15, 2024

Coverage Status

coverage: 52.738% (+0.07%) from 52.671%
when pulling 7527fcf on tssurya:fix-EIP-ETP-local
into f5d4dfb on ovn-org:master.

@tssurya tssurya force-pushed the fix-EIP-ETP-local branch 2 times, most recently from 271d731 to 28fb5ed Compare April 15, 2024 21:08
@tssurya
Copy link
Member Author

tssurya commented Apr 15, 2024

TODO: Add tests.

done! e2e and unit tests are done.
let's see if the CI passes, besides fixing upgrade updates we are looking good here. Since we don't have indexes for LRPs and we rely on the match, I'd need to sort of sync cleanup all stale pod routes during upgrades to update to the new ones ugh.

@tssurya
Copy link
Member Author

tssurya commented Apr 16, 2024

@trozet PTAL this one is ready for reviews!
just one more thing to fix as mentioned in the above comment which I'll come back to in second iteration as new commit.

@tssurya
Copy link
Member Author

tssurya commented Apr 22, 2024

I tested the secondary networks EIP test case on lgw and it works as expected with this PR with no changes:

service traffic via eth1:

root@ovn-worker:/# conntrack -E | grep 36363                                                                                                          
    [NEW] tcp      6 120 SYN_SENT src=172.20.0.2 dst=172.20.0.3 sport=36363 dport=30432 [UNREPLIED] src=169.254.169.3 dst=172.20.0.2 sport=30432 dport=36363
    [NEW] tcp      6 120 SYN_SENT src=172.20.0.2 dst=169.254.169.3 sport=36363 dport=30432 [UNREPLIED] src=10.244.1.3 dst=172.20.0.2 sport=80 dport=36363 mark=2 zone=20
    [NEW] tcp      6 120 SYN_SENT src=172.20.0.2 dst=10.244.1.3 sport=36363 dport=80 [UNREPLIED] src=10.244.1.3 dst=172.20.0.2 sport=80 dport=36363 zone=24
 [UPDATE] tcp      6 60 SYN_RECV src=172.20.0.2 dst=172.20.0.3 sport=36363 dport=30432 src=169.254.169.3 dst=172.20.0.2 sport=30432 dport=36363    
 [UPDATE] tcp      6 432000 ESTABLISHED src=172.20.0.2 dst=172.20.0.3 sport=36363 dport=30432 src=169.254.169.3 dst=172.20.0.2 sport=30432 dport=36363 [ASSURED]
    [NEW] tcp      6 300 ESTABLISHED src=172.20.0.3 dst=172.20.0.2 sport=30432 dport=36363 [UNREPLIED] src=172.20.0.2 dst=172.20.0.3 sport=36363 dport=30432 mark=2 zone=6400
0
 [UPDATE] tcp      6 120 FIN_WAIT src=172.20.0.2 dst=172.20.0.3 sport=36363 dport=30432 src=169.254.169.3 dst=172.20.0.2 sport=30432 dport=36363 [ASSURED]
 [UPDATE] tcp      6 30 LAST_ACK src=172.20.0.2 dst=172.20.0.3 sport=36363 dport=30432 src=169.254.169.3 dst=172.20.0.2 sport=30432 dport=36363 [ASSURED]
 [UPDATE] tcp      6 120 TIME_WAIT src=172.20.0.2 dst=172.20.0.3 sport=36363 dport=30432 src=169.254.169.3 dst=172.20.0.2 sport=30432 dport=36363 [ASSURED]
[DESTROY] tcp      6 TIME_WAIT src=172.20.0.3 dst=172.20.0.2 sport=30432 dport=36363 src=172.20.0.2 dst=172.20.0.3 sport=36363 dport=30432 mark=2 zone=64000
[DESTROY] tcp      6 TIME_WAIT src=172.20.0.2 dst=172.20.0.3 sport=36363 dport=30432 src=169.254.169.3 dst=172.20.0.2 sport=30432 dport=36363 [ASSURED]
[DESTROY] tcp      6 TIME_WAIT src=172.20.0.2 dst=169.254.169.3 sport=36363 dport=30432 src=10.244.1.3 dst=172.20.0.2 sport=80 dport=36363 [ASSURED] mark=2 zone=20
[DESTROY] tcp      6 TIME_WAIT src=172.20.0.2 dst=10.244.1.3 sport=36363 dport=80 src=10.244.1.3 dst=172.20.0.2 sport=80 dport=36363 [ASSURED] zone=24

EIP traffic from pod to npclient:

root@ovn-worker:/# conntrack -E | grep icmp 
    [NEW] icmp     1 30 src=10.244.1.3 dst=172.20.0.2 type=8 code=0 id=7424 [UNREPLIED] src=172.20.0.2 dst=10.244.1.3 type=0 code=0 id=7424 zone=24
    [NEW] icmp     1 30 src=10.244.1.3 dst=172.20.0.2 type=8 code=0 id=7424 [UNREPLIED] src=172.20.0.2 dst=10.244.1.3 type=0 code=0 id=7424 zone=20
    [NEW] icmp     1 30 src=10.244.1.3 dst=172.20.0.2 type=8 code=0 id=7424 [UNREPLIED] src=172.20.0.2 dst=172.20.0.10 type=0 code=0 id=7424
    [NEW] icmp     1 30 src=172.20.0.10 dst=172.20.0.2 type=8 code=0 id=7424 [UNREPLIED] src=172.20.0.2 dst=172.20.0.10 type=0 code=0 id=7424 mark=2 zone=64000
 [UPDATE] icmp     1 30 src=10.244.1.3 dst=172.20.0.2 type=8 code=0 id=7424 src=172.20.0.2 dst=172.20.0.10 type=0 code=0 id=7424

I need to add an e2e test coverage for this.

@tssurya tssurya force-pushed the fix-EIP-ETP-local branch 3 times, most recently from d48969d to f75c32d Compare April 23, 2024 19:11
@tssurya tssurya force-pushed the fix-EIP-ETP-local branch 4 times, most recently from d5105cb to ba0ef00 Compare April 24, 2024 16:33
@tssurya tssurya force-pushed the fix-EIP-ETP-local branch 2 times, most recently from f5533f4 to 7301aeb Compare April 25, 2024 15:55
@tssurya
Copy link
Member Author

tssurya commented Apr 25, 2024

yayyyy I finally fixed all those 72 unit tests and my e2es are looking good

go-controller/pkg/ovn/default_network_controller.go Outdated Show resolved Hide resolved
go-controller/pkg/ovn/egressip.go Outdated Show resolved Hide resolved
go-controller/pkg/types/const.go Show resolved Hide resolved
@tssurya
Copy link
Member Author

tssurya commented May 14, 2024

@trozet I had to add client side db indexes for QoSRules because when two nodes were getting added at the same time in nonIC env, we were ending up inserting duplicate QoS Rules because nodes are handled in multiple threads and so two threads were trying to insert qos rules at the same time leading to multiple predicates found libovsdb error.

2024/05/14 09:14:19 database/transaction/cache: "caller"={"file"="cache.go" "line"=1189} "level"=5 "msg"="inserting model" "table"="QoS" "uuid"="e3c08adb-fa2d-49d7-81e1-2891
5436db3e" "model"={"UUID"="e3c08adb-fa2d-49d7-81e1-28915436db3e" "Action"={"mark"=42} "Bandwidth"={} "Direction"="from-lport" "ExternalIDs"={"k8s.ovn.org/name"="EgressIP-Mar
k-Reply-Traffic" "priority"="103" "ip-family"="ip4" "k8s.ovn.org/id"="default-network-controller:EgressIP:103:EgressIP-Mark-Reply-Traffic:ip4" "k8s.ovn.org/owner-controller"
="default-network-controller" "k8s.ovn.org/owner-type"="EgressIP"} "Match"="ip4.src == $a4548040316634674295 && ct.trk && ct.rpl" "Priority"=103} 

2024/05/14 09:14:19 database/transaction/cache: "caller"={"file"="cache.go" "line"=1196} "level"=5 "msg"="updating model" "table"="Logical_Switch" "uuid"="74916f3a-7f7a-4701
-b1c5-90212493ad1b" "old"={"UUID"="74916f3a-7f7a-4701-b1c5-90212493ad1b" "ACLs"=[] "Copp"=null "DNSRecords"=[] "ExternalIDs"={} "ForwardingGroups"=[] "LoadBalancer"=[] "Load
BalancerGroup"=[] "Name"="node2" "OtherConfig"={} "Ports"=[] "QOSRules"=[]} "new"={"UUID"="74916f3a-7f7a-4701-b1c5-90212493ad1b" "ACLs"=[] "Copp"=null "DNSRecords"=[] "Exter
nalIDs"={} "ForwardingGroups"=[] "LoadBalancer"=[] "LoadBalancerGroup"=[] "Name"="node2" "OtherConfig"={} "Ports"=[] "QOSRules"=["e3c08adb-fa2d-49d7-81e1-28915436db3e"]}

2024/05/14 09:14:19 cache: "caller"={"file"="cache.go" "line"=1189} "level"=5 "msg"="inserting model" "table"="QoS" "uuid"="e3c08adb-fa2d-49d7-81e1-28915436db3e" "model"={"U
UID"="e3c08adb-fa2d-49d7-81e1-28915436db3e" "Action"={"mark"=42} "Bandwidth"={} "Direction"="from-lport" "ExternalIDs"={"ip-family"="ip4" "k8s.ovn.org/id"="default-network-c
ontroller:EgressIP:103:EgressIP-Mark-Reply-Traffic:ip4" "k8s.ovn.org/owner-controller"="default-network-controller" "k8s.ovn.org/owner-type"="EgressIP" "k8s.ovn.org/name"="E
gressIP-Mark-Reply-Traffic" "priority"="103"} "Match"="ip4.src == $a4548040316634674295 && ct.trk && ct.rpl" "Priority"=103}

2024/05/14 09:14:19 cache: "caller"={"file"="cache.go" "line"=1196} "level"=5 "msg"="updating model" "table"="Logical_Switch" "uuid"="74916f3a-7f7a-4701-b1c5-90212493ad1b" "
old"={"UUID"="74916f3a-7f7a-4701-b1c5-90212493ad1b" "ACLs"=[] "Copp"=null "DNSRecords"=[] "ExternalIDs"={} "ForwardingGroups"=[] "LoadBalancer"=[] "LoadBalancerGroup"=[] "Na
me"="node2" "OtherConfig"={} "Ports"=[] "QOSRules"=[]} "new"={"UUID"="74916f3a-7f7a-4701-b1c5-90212493ad1b" "ACLs"=[] "Copp"=null "DNSRecords"=[] "ExternalIDs"={} "Forwardin
gGroups"=[] "LoadBalancer"=[] "LoadBalancerGroup"=[] "Name"="node2" "OtherConfig"={} "Ports"=[] "QOSRules"=["e3c08adb-fa2d-49d7-81e1-28915436db3e"]}

2024/05/14 09:14:19 database/transaction/cache: "caller"={"file"="cache.go" "line"=1189} "level"=5 "msg"="inserting model" "table"="QoS" "uuid"="9d5d02e7-d269-4d4f-bfb4-6610
d668d935" "model"={"UUID"="9d5d02e7-d269-4d4f-bfb4-6610d668d935" "Action"={"mark"=42} "Bandwidth"={} "Direction"="from-lport" "ExternalIDs"={"ip-family"="ip4" "k8s.ovn.org/n
ame"="EgressIP-Mark-Reply-Traffic" "k8s.ovn.org/id"="default-network-controller:EgressIP:103:EgressIP-Mark-Reply-Traffic:ip4" "k8s.ovn.org/owner-controller"="default-network
-controller" "k8s.ovn.org/owner-type"="EgressIP" "priority"="103"} "Match"="ip4.src == $a4548040316634674295 && ct.trk && ct.rpl" "Priority"=103}

2024/05/14 09:14:19 database/transaction/cache: "caller"={"file"="cache.go" "line"=1196} "level"=5 "msg"="updating model" "table"="Logical_Switch" "uuid"="a886b025-3d2f-4b6e
-a198-5b04a1ba5301" "old"={"UUID"="a886b025-3d2f-4b6e-a198-5b04a1ba5301" "ACLs"=[] "Copp"=null "DNSRecords"=[] "ExternalIDs"={} "ForwardingGroups"=[] "LoadBalancer"=[] "Load
BalancerGroup"=[] "Name"="node1" "OtherConfig"={} "Ports"=[] "QOSRules"=[]} "new"={"UUID"="a886b025-3d2f-4b6e-a198-5b04a1ba5301" "ACLs"=[] "Copp"=null "DNSRecords"=[] "Exter
nalIDs"={} "ForwardingGroups"=[] "LoadBalancer"=[] "LoadBalancerGroup"=[] "Name"="node1" "OtherConfig"={} "Ports"=[] "QOSRules"=["9d5d02e7-d269-4d4f-bfb4-6610d668d935"]}

2024/05/14 09:14:19 cache: "caller"={"file"="cache.go" "line"=1189} "level"=5 "msg"="inserting model" "table"="QoS" "uuid"="9d5d02e7-d269-4d4f-bfb4-6610d668d935" "model"={"U
UID"="9d5d02e7-d269-4d4f-bfb4-6610d668d935" "Action"={"mark"=42} "Bandwidth"={} "Direction"="from-lport" "ExternalIDs"={"k8s.ovn.org/name"="EgressIP-Mark-Reply-Traffic" "k8s
.ovn.org/id"="default-network-controller:EgressIP:103:EgressIP-Mark-Reply-Traffic:ip4" "k8s.ovn.org/owner-controller"="default-network-controller" "k8s.ovn.org/owner-type"="
EgressIP" "priority"="103" "ip-family"="ip4"} "Match"="ip4.src == $a4548040316634674295 && ct.trk && ct.rpl" "Priority"=103}

2024/05/14 09:14:19 cache: "caller"={"file"="cache.go" "line"=1196} "level"=5 "msg"="updating model" "table"="Logical_Switch" "uuid"="a886b025-3d2f-4b6e-a198-5b04a1ba5301" "
old"={"UUID"="a886b025-3d2f-4b6e-a198-5b04a1ba5301" "ACLs"=[] "Copp"=null "DNSRecords"=[] "ExternalIDs"={} "ForwardingGroups"=[] "LoadBalancer"=[] "LoadBalancerGroup"=[] "Na
me"="node1" "OtherConfig"={} "Ports"=[] "QOSRules"=[]} "new"={"UUID"="a886b025-3d2f-4b6e-a198-5b04a1ba5301" "ACLs"=[] "Copp"=null "DNSRecords"=[] "ExternalIDs"={} "Forwardin
gGroups"=[] "LoadBalancer"=[] "LoadBalancerGroup"=[] "Name"="node1" "OtherConfig"={} "Ports"=[] "QOSRules"=["9d5d02e7-d269-4d4f-bfb4-6610d668d935"]}

The only other feature using QoS rules on switch is EgressQoS in addition to EgressIPs. So introducing client side indexes means from this version onwards eqos feature will start creating qos rules in OVN with the new externalIDs that are unique per QoS Rule. That means when users upgrade from older versions to new versions with EgressQoS, there will be stale left overs which I don't plan to clean up in this PR. Anyways since EgressQoS feature is still not GA-ed and is only Dev Preview to the point where even the API may change in a breaking way including the name, I think its ok to start using db indexes for EQoS without worrying about updating older QoSRules i.e upgrades for QoS is not handled at the moment so stale QoS Rules will be left behind without the new externalIDs but that should be fine

@tssurya
Copy link
Member Author

tssurya commented May 14, 2024

adding indexes didn't work actually. The reason here is because the race was in the cache level, so since 2 threads try to insert at same time, we hit some hiccupps.. I have instead used lock now (I added a new commit for dbindexes, but we can also move that out of this PR if that's too much)

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
This commit renames the AddressSetIPFamilyKey to
IPFamilyKey so that in the next commit we can
use it beyond just AddressSets.

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
This commit introduces a new LRP at 102 priority
for EgressIP that matches on the 42 marked reply
packets and simply "allows" that packet so that
it prevents reply traffic from egressIP pods from
ever hitting the reroute policies at priority 100

sample:
102 pkt.mark == 42  allow

libovsdb updates/creates LRPs on startup so
upgrades are handled automatically.
also creating the policy is done one-time at
startup via initClusterPolicies

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
@tssurya tssurya requested a review from a team as a code owner May 14, 2024 13:17
@tssurya tssurya requested a review from cathy-zhou May 14, 2024 13:17
This commit introduces DBIndexing for
QoSTable. Note that I have not added a
qos_syncer or anything to cleanup
older EQoS rules because this feature is
not GA yet, I don't see a need to support
upgrades and anyways we are planning to make
major changes to this feature moving forward.

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
@tssurya
Copy link
Member Author

tssurya commented May 14, 2024

@trozet : I have addressed your comments. PTAL
As I was addressing comment1: #4265 (comment) where I was creating the QoS rules globally on startup and now I am only doing it from node add events; I ended up with a race condition where for two node adds we were trying to create the qos rules at the same time from two different threads and due to libovsdb cache lag ended up with multiple objects, so I am now using the nodeUpdateMutex to lock and perform the updates. I have also added the logic for cache lookup like we discussed yesterday. Those are the new changes in the recent push into commit: a2d55e4

Unintentionally I added the dbindexes commit for eqos: 07fa4d1 which is probably not really needed ATM; if you prefer to keep that commit as separate PR I can cut it out.

This commit introduces qos rules on every node's switch
to mark packets emerging from EIP pods if they are
reply packets. These are the packets that should not be
reRouted to other node in the cluster. So basically we
mark the SYNACK packets that are matched on by the LRP we
saw in the previous commit. SYN packets are not marked
as these are egress traffic from EIP pods.

sample:
from-lport   103 (ip4.src==$a4548040316634674295 && ct.trk && ct.rpl) mark=42

This let's these two features (ETP=local + EIP)
to be used at the same time.

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
This commit adds e2e for primari EIPs and
ETP=local cross feature testing

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
This commit adds cross feature testing for
etp=local and etp=cluster with secondary NIC
EIPs.

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
@trozet trozet merged commit 13c333a into ovn-org:master May 16, 2024
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature/egress-ip Issues related to EgressIP feature feature/services&endpoints All issues related to the Servces/Endpoints API kind/feature All issues/PRs that are new features
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

5 participants