Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openvswitch: ovs-system: deferred action limit reached, drop recirc action #153

Closed
syedammad83 opened this issue Sep 21, 2022 · 11 comments
Closed

Comments

@syedammad83
Copy link

Hi,

I am using OVN 22.03 with OVS 2.17.2 on ubuntu 22.04. The OVN is integrated with OpenStack neutron and using 20.2 yoga release.

I am getting below mentioned logs in dmesg logs of gateway chassis.

[Wed Sep 21 16:54:59 2022] openvswitch: ovs-system: deferred action limit reached, drop recirc action
[Wed Sep 21 16:55:05 2022] openvswitch: ovs-system: deferred action limit reached, drop recirc action
[Wed Sep 21 16:55:09 2022] openvswitch: ovs-system: deferred action limit reached, drop recirc action
[Wed Sep 21 16:55:14 2022] openvswitch: ovs-system: deferred action limit reached, drop recirc action
[Wed Sep 21 16:55:17 2022] openvswitch: ovs-system: deferred action limit reached, drop recirc action
[Wed Sep 21 16:55:19 2022] openvswitch: ovs-system: deferred action limit reached, drop recirc action
[Wed Sep 21 16:55:22 2022] openvswitch: ovs-system: deferred action limit reached, drop recirc action

There are alot of messages I am seeing. Digging it further the issue can be reproduced by hitting any kind of traffic to logical router's external gateway port IP address (which is SNAT IP address of router). When ever I put the SNAT IP in browser and press enter, the logs starts to show up.

Is this an issue OR will it cause any trouble in traffic ?

I am happy to provide any further details needed.

Ammad

@zhanrox2
Copy link

Hi @syedammad83 , I also encountered the same problem, has your problem been solved, how to solve it.

@hzhou8
Copy link
Collaborator

hzhou8 commented Sep 23, 2022

Could you provide output of ovs-appctl dpctl/dump-flows when this is happening?

@hzhou8
Copy link
Collaborator

hzhou8 commented Sep 23, 2022

It seems this problem was discussed before and never resolved? https://mail.openvswitch.org/pipermail/ovs-discuss/2021-August/051353.html
I also recall a problem that's fixed in OVS even earlier that had the same symptom. Not sure if it is something similar to that problem: openvswitch/ovs@29b1dd9
The DP flow output should tell more information. At the same time, it would be helpful if you could provide the NB-DB dumps and SB lflow dumps (ovn-sbctl lflow-list). ovn-trace output for the problematic trigger packet would be even better.

@zhanrox2
Copy link

@hzhou8 The problem like this: #134 .
The main problem is related to TCP/UDP traffic sent to the address of an LRP port that is not part of any SNAT/DNAT conversation, it will keep recirculating in the OVS data plane until TTL is 0.
The message is shown in the kernel log due to the size of the FIFO "DEFERRED_ACTION_FIFO_SIZE", but this is a consequence of the packets not matching the flow tables of the datapath. See kernel - net/openvswitch/actions.c
Basically, it only happens when there is a SNAT rule to translate an entire network (masquerade) and the return traffic does not have an open port.

@syedammad83
Copy link
Author

flow1.txt
flow3.txt
flow2.txt

Above are the output of ovs-appctl dpctl/dump-flows | grep 175.107.193.122

I have grep my testing public IP address from dump. I have just put this IP in my browser and dumped the flows.

@syedammad83
Copy link
Author

# ovn-trace --no-friendly-names --ovs neutron-6cf49a42-ae18-4e2b-ae53-52e398025b6f 'inport == "lrp-020ebe69-fd1b-446f-8f02-29b88914184d" && ip4.dst == 175.107.193.12'
# ip,reg14=0x1,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,nw_src=0.0.0.0,nw_dst=175.107.193.12,nw_proto=0,nw_tos=0,nw_ecn=0,nw_ttl=0

ingress(dp="d02fa4ae-8845-41e8-b65f-51ddcd4f4478", inport="lrp-020ebe69-fd1b-446f-8f02-29b88914184d")
-----------------------------------------------------------------------------------------------------
 0. lr_in_admission: no match (implicit drop) " 

I am seeing nothing in ovn-trace output

@hzhou8

@srelf-eschercloud
Copy link

Seeing the same issue, but also could be affecting performance. Tagging myself. Please let me know if any debug info i can provide.

@r-acosta
Copy link
Contributor

r-acosta commented Nov 9, 2022 via email

@hzhou8
Copy link
Collaborator

hzhou8 commented Nov 9, 2022

Hi folks, thanks for your valuable information! Sorry that I didn't have time to look at this recently. I will try to reproduce it locally according to what you provided, and debug it next week or so.

ovsrobot pushed a commit to ovsrobot/ovn that referenced this issue Dec 20, 2022
When a packet enters LR pipeline from a distributed gateway port with
destination IP being a SNAT IP, it goes through the unSNAT stage and
it is possible that the unSNAT fails to convert the dst IP when no
conntrack entries are accociated with the packet. In this case, the
packet is rerouted to the same DGP, and results in recirc loop in
datapath. The packet would finally be dropped either due to ttl or
the recirc limit, but it would have created unnecessary cost.

To reproduce the problem, simply configure SNAT on a LR with the SNAT IP
being the DGP's IP, and then send a packet from external (DGP's LS) to
the SNAT IP. Kernel logs like below will be seen:

openvswitch: ovs-system: deferred action limit reached, drop recirc action

DP flow dump would also show plenty of flows related to this packet,
each with a different ttl match, indicating the packet has been looped
many times.

Commit 802f927 (ovn-northd: Drop IP packets destined to router owned
IPs (after NAT)) already added flows to drop packets failed unSNAT for
Gateway Routers. It added flows with a low priority (2) to drop the
packets that fail ARP resolve, to avoid triggering ARP request for the
SNAT IPs. However, for the DGP case, to support E/W NAT, ARP resolve
flows are added for thoses NAT IPs so that the packets can continue the
pipeline and possibly redirect to redirect chassis. So, because of these
ARP resolve flows, even the packets failed unSNAT would continue the
pipeline and won't hit the low priority (2) flows, thus not get dropped.

To fix the problem, for each of the ARP resolve flow added for the DGP
NAT IPs, a higher priority (150) flow is added to check if the packet's
inport is the DGP (same as the outport), then drop the packet directly.

Test cases are updated to cover both Gateway Router and DGP scenarios,
with packets from both directions (uplink and downlink).

Reported-by: Krzysztof Klimonda <kklimonda@syntaxhighlighted.com>
Reported-at: https://patchwork.ozlabs.org/project/ovn/patch/20210816085206.69170-1-kklimonda@syntaxhighlighted.com/
Reported-by: Frode Nordahl <frode.nordahl@canonical.com>
Reported-at: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967718
Reported-by: Roberto Bartzen Acosta <rbartzen@gmail.com>
Reported-at: ovn-org#134
Reported-by: Syed Ammad Ali <syedammad83@gmail.com>
Reported-at: ovn-org#153
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2021-August/051340.html
Signed-off-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: 0-day Robot <robot@bytheb.org>
@hzhou8
Copy link
Collaborator

hzhou8 commented Dec 20, 2022

ovsrobot pushed a commit to ovsrobot/ovn that referenced this issue Jan 11, 2023
When a packet enters LR pipeline from a distributed gateway port with
destination IP being a SNAT IP, it goes through the unSNAT stage and
it is possible that the unSNAT fails to convert the dst IP when no
conntrack entries are accociated with the packet. In this case, the
packet is rerouted to the same DGP, and results in recirc loop in
datapath. The packet would finally be dropped either due to ttl or
the recirc limit, but it would have created unnecessary cost.

To reproduce the problem, simply configure SNAT on a LR with the SNAT IP
being the DGP's IP, and then send a packet from external (DGP's LS) to
the SNAT IP. Kernel logs like below will be seen:

openvswitch: ovs-system: deferred action limit reached, drop recirc action

DP flow dump would also show plenty of flows related to this packet,
each with a different ttl match, indicating the packet has been looped
many times.

Commit 802f927 (ovn-northd: Drop IP packets destined to router owned
IPs (after NAT)) already added flows to drop packets failed unSNAT for
Gateway Routers. It added flows with a low priority (2) to drop the
packets that fail ARP resolve, to avoid triggering ARP request for the
SNAT IPs. However, for the DGP case, to support E/W NAT, ARP resolve
flows are added for thoses NAT IPs so that the packets can continue the
pipeline and possibly redirect to redirect chassis. So, because of these
ARP resolve flows, even the packets failed unSNAT would continue the
pipeline and won't hit the low priority (2) flows, thus not get dropped.

To fix the problem, for each of the ARP resolve flow added for the DGP
NAT IPs, a higher priority (150) flow is added to check if the packet's
inport is the DGP (same as the outport), then drop the packet directly.

Test cases are updated to cover both Gateway Router and DGP scenarios,
with packets from both directions (uplink and downlink).

Reported-by: Krzysztof Klimonda <kklimonda@syntaxhighlighted.com>
Reported-at: https://patchwork.ozlabs.org/project/ovn/patch/20210816085206.69170-1-kklimonda@syntaxhighlighted.com/
Reported-by: Frode Nordahl <frode.nordahl@canonical.com>
Reported-at: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967718
Reported-by: Roberto Bartzen Acosta <rbartzen@gmail.com>
Reported-at: ovn-org#134
Reported-by: Syed Ammad Ali <syedammad83@gmail.com>
Reported-at: ovn-org#153
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2021-August/051340.html
Signed-off-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: 0-day Robot <robot@bytheb.org>
dceara pushed a commit to dceara/ovn that referenced this issue Jan 19, 2023
When a packet enters LR pipeline from a distributed gateway port with
destination IP being a SNAT IP, it goes through the unSNAT stage and
it is possible that the unSNAT fails to convert the dst IP when no
conntrack entries are accociated with the packet. In this case, the
packet is rerouted to the same DGP, and results in recirc loop in
datapath. The packet would finally be dropped either due to ttl or
the recirc limit, but it would have created unnecessary cost.

To reproduce the problem, simply configure SNAT on a LR with the SNAT IP
being the DGP's IP, and then send a packet from external (DGP's LS) to
the SNAT IP. Kernel logs like below will be seen:

openvswitch: ovs-system: deferred action limit reached, drop recirc action

DP flow dump would also show plenty of flows related to this packet,
each with a different ttl match, indicating the packet has been looped
many times.

Commit 802f927 (ovn-northd: Drop IP packets destined to router owned
IPs (after NAT)) already added flows to drop packets failed unSNAT for
Gateway Routers. It added flows with a low priority (2) to drop the
packets that fail ARP resolve, to avoid triggering ARP request for the
SNAT IPs. However, for the DGP case, to support E/W NAT, ARP resolve
flows are added for thoses NAT IPs so that the packets can continue the
pipeline and possibly redirect to redirect chassis. So, because of these
ARP resolve flows, even the packets failed unSNAT would continue the
pipeline and won't hit the low priority (2) flows, thus not get dropped.

To fix the problem, for each of the ARP resolve flow added for the DGP
NAT IPs, a higher priority (150) flow is added to check if the packet's
inport is the DGP (same as the outport), then drop the packet directly.

Test cases are updated to cover both Gateway Router and DGP scenarios,
with packets from both directions (uplink and downlink).

Reported-by: Krzysztof Klimonda <kklimonda@syntaxhighlighted.com>
Reported-at: https://patchwork.ozlabs.org/project/ovn/patch/20210816085206.69170-1-kklimonda@syntaxhighlighted.com/
Reported-by: Frode Nordahl <frode.nordahl@canonical.com>
Reported-at: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967718
Reported-by: Roberto Bartzen Acosta <rbartzen@gmail.com>
Reported-at: ovn-org#134
Reported-by: Syed Ammad Ali <syedammad83@gmail.com>
Reported-at: ovn-org#153
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2021-August/051340.html
Signed-off-by: Han Zhou <hzhou@ovn.org>
hzhou8 added a commit that referenced this issue Jan 19, 2023
When a packet enters LR pipeline from a distributed gateway port with
destination IP being a SNAT IP, it goes through the unSNAT stage and
it is possible that the unSNAT fails to convert the dst IP when no
conntrack entries are accociated with the packet. In this case, the
packet is rerouted to the same DGP, and results in recirc loop in
datapath. The packet would finally be dropped either due to ttl or
the recirc limit, but it would have created unnecessary cost.

To reproduce the problem, simply configure SNAT on a LR with the SNAT IP
being the DGP's IP, and then send a packet from external (DGP's LS) to
the SNAT IP. Kernel logs like below will be seen:

openvswitch: ovs-system: deferred action limit reached, drop recirc action

DP flow dump would also show plenty of flows related to this packet,
each with a different ttl match, indicating the packet has been looped
many times.

Commit 802f927 (ovn-northd: Drop IP packets destined to router owned
IPs (after NAT)) already added flows to drop packets failed unSNAT for
Gateway Routers. It added flows with a low priority (2) to drop the
packets that fail ARP resolve, to avoid triggering ARP request for the
SNAT IPs. However, for the DGP case, to support E/W NAT, ARP resolve
flows are added for thoses NAT IPs so that the packets can continue the
pipeline and possibly redirect to redirect chassis. So, because of these
ARP resolve flows, even the packets failed unSNAT would continue the
pipeline and won't hit the low priority (2) flows, thus not get dropped.

To fix the problem, for each of the ARP resolve flow added for the DGP
NAT IPs, a higher priority (150) flow is added to check if the packet's
inport is the DGP (same as the outport), then drop the packet directly.

Test cases are updated to cover both Gateway Router and DGP scenarios,
with packets from both directions (uplink and downlink).

Reported-by: Krzysztof Klimonda <kklimonda@syntaxhighlighted.com>
Reported-at: https://patchwork.ozlabs.org/project/ovn/patch/20210816085206.69170-1-kklimonda@syntaxhighlighted.com/
Reported-by: Frode Nordahl <frode.nordahl@canonical.com>
Reported-at: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967718
Reported-by: Roberto Bartzen Acosta <rbartzen@gmail.com>
Reported-at: #134
Reported-by: Syed Ammad Ali <syedammad83@gmail.com>
Reported-at: #153
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2021-August/051340.html
Signed-off-by: Han Zhou <hzhou@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
hzhou8 added a commit that referenced this issue Jan 19, 2023
When a packet enters LR pipeline from a distributed gateway port with
destination IP being a SNAT IP, it goes through the unSNAT stage and
it is possible that the unSNAT fails to convert the dst IP when no
conntrack entries are accociated with the packet. In this case, the
packet is rerouted to the same DGP, and results in recirc loop in
datapath. The packet would finally be dropped either due to ttl or
the recirc limit, but it would have created unnecessary cost.

To reproduce the problem, simply configure SNAT on a LR with the SNAT IP
being the DGP's IP, and then send a packet from external (DGP's LS) to
the SNAT IP. Kernel logs like below will be seen:

openvswitch: ovs-system: deferred action limit reached, drop recirc action

DP flow dump would also show plenty of flows related to this packet,
each with a different ttl match, indicating the packet has been looped
many times.

Commit 802f927 (ovn-northd: Drop IP packets destined to router owned
IPs (after NAT)) already added flows to drop packets failed unSNAT for
Gateway Routers. It added flows with a low priority (2) to drop the
packets that fail ARP resolve, to avoid triggering ARP request for the
SNAT IPs. However, for the DGP case, to support E/W NAT, ARP resolve
flows are added for thoses NAT IPs so that the packets can continue the
pipeline and possibly redirect to redirect chassis. So, because of these
ARP resolve flows, even the packets failed unSNAT would continue the
pipeline and won't hit the low priority (2) flows, thus not get dropped.

To fix the problem, for each of the ARP resolve flow added for the DGP
NAT IPs, a higher priority (150) flow is added to check if the packet's
inport is the DGP (same as the outport), then drop the packet directly.

Test cases are updated to cover both Gateway Router and DGP scenarios,
with packets from both directions (uplink and downlink).

Reported-by: Krzysztof Klimonda <kklimonda@syntaxhighlighted.com>
Reported-at: https://patchwork.ozlabs.org/project/ovn/patch/20210816085206.69170-1-kklimonda@syntaxhighlighted.com/
Reported-by: Frode Nordahl <frode.nordahl@canonical.com>
Reported-at: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967718
Reported-by: Roberto Bartzen Acosta <rbartzen@gmail.com>
Reported-at: #134
Reported-by: Syed Ammad Ali <syedammad83@gmail.com>
Reported-at: #153
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2021-August/051340.html
Signed-off-by: Han Zhou <hzhou@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
(cherry picked from commit 8c341b9)
@hzhou8
Copy link
Collaborator

hzhou8 commented Jan 19, 2023

The above fix is merged to main and branch-22.12. Close the issue.

@hzhou8 hzhou8 closed this as completed Jan 19, 2023
putnopvut pushed a commit to putnopvut/ovn that referenced this issue Sep 13, 2023
When a packet enters LR pipeline from a distributed gateway port with
destination IP being a SNAT IP, it goes through the unSNAT stage and
it is possible that the unSNAT fails to convert the dst IP when no
conntrack entries are accociated with the packet. In this case, the
packet is rerouted to the same DGP, and results in recirc loop in
datapath. The packet would finally be dropped either due to ttl or
the recirc limit, but it would have created unnecessary cost.

To reproduce the problem, simply configure SNAT on a LR with the SNAT IP
being the DGP's IP, and then send a packet from external (DGP's LS) to
the SNAT IP. Kernel logs like below will be seen:

openvswitch: ovs-system: deferred action limit reached, drop recirc action

DP flow dump would also show plenty of flows related to this packet,
each with a different ttl match, indicating the packet has been looped
many times.

Commit 802f927 (ovn-northd: Drop IP packets destined to router owned
IPs (after NAT)) already added flows to drop packets failed unSNAT for
Gateway Routers. It added flows with a low priority (2) to drop the
packets that fail ARP resolve, to avoid triggering ARP request for the
SNAT IPs. However, for the DGP case, to support E/W NAT, ARP resolve
flows are added for thoses NAT IPs so that the packets can continue the
pipeline and possibly redirect to redirect chassis. So, because of these
ARP resolve flows, even the packets failed unSNAT would continue the
pipeline and won't hit the low priority (2) flows, thus not get dropped.

To fix the problem, for each of the ARP resolve flow added for the DGP
NAT IPs, a higher priority (150) flow is added to check if the packet's
inport is the DGP (same as the outport), then drop the packet directly.

Test cases are updated to cover both Gateway Router and DGP scenarios,
with packets from both directions (uplink and downlink).

Reported-by: Krzysztof Klimonda <kklimonda@syntaxhighlighted.com>
Reported-at: https://patchwork.ozlabs.org/project/ovn/patch/20210816085206.69170-1-kklimonda@syntaxhighlighted.com/
Reported-by: Frode Nordahl <frode.nordahl@canonical.com>
Reported-at: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967718
Reported-by: Roberto Bartzen Acosta <rbartzen@gmail.com>
Reported-at: ovn-org/ovn#134
Reported-by: Syed Ammad Ali <syedammad83@gmail.com>
Reported-at: ovn-org/ovn#153
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2021-August/051340.html
Signed-off-by: Han Zhou <hzhou@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
putnopvut pushed a commit that referenced this issue Sep 26, 2023
When a packet enters LR pipeline from a distributed gateway port with
destination IP being a SNAT IP, it goes through the unSNAT stage and
it is possible that the unSNAT fails to convert the dst IP when no
conntrack entries are accociated with the packet. In this case, the
packet is rerouted to the same DGP, and results in recirc loop in
datapath. The packet would finally be dropped either due to ttl or
the recirc limit, but it would have created unnecessary cost.

To reproduce the problem, simply configure SNAT on a LR with the SNAT IP
being the DGP's IP, and then send a packet from external (DGP's LS) to
the SNAT IP. Kernel logs like below will be seen:

openvswitch: ovs-system: deferred action limit reached, drop recirc action

DP flow dump would also show plenty of flows related to this packet,
each with a different ttl match, indicating the packet has been looped
many times.

Commit 802f927 (ovn-northd: Drop IP packets destined to router owned
IPs (after NAT)) already added flows to drop packets failed unSNAT for
Gateway Routers. It added flows with a low priority (2) to drop the
packets that fail ARP resolve, to avoid triggering ARP request for the
SNAT IPs. However, for the DGP case, to support E/W NAT, ARP resolve
flows are added for thoses NAT IPs so that the packets can continue the
pipeline and possibly redirect to redirect chassis. So, because of these
ARP resolve flows, even the packets failed unSNAT would continue the
pipeline and won't hit the low priority (2) flows, thus not get dropped.

To fix the problem, for each of the ARP resolve flow added for the DGP
NAT IPs, a higher priority (150) flow is added to check if the packet's
inport is the DGP (same as the outport), then drop the packet directly.

Test cases are updated to cover both Gateway Router and DGP scenarios,
with packets from both directions (uplink and downlink).

Reported-by: Krzysztof Klimonda <kklimonda@syntaxhighlighted.com>
Reported-at: https://patchwork.ozlabs.org/project/ovn/patch/20210816085206.69170-1-kklimonda@syntaxhighlighted.com/
Reported-by: Frode Nordahl <frode.nordahl@canonical.com>
Reported-at: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967718
Reported-by: Roberto Bartzen Acosta <rbartzen@gmail.com>
Reported-at: #134
Reported-by: Syed Ammad Ali <syedammad83@gmail.com>
Reported-at: #153
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2021-August/051340.html
Signed-off-by: Han Zhou <hzhou@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
putnopvut pushed a commit that referenced this issue Sep 26, 2023
When a packet enters LR pipeline from a distributed gateway port with
destination IP being a SNAT IP, it goes through the unSNAT stage and
it is possible that the unSNAT fails to convert the dst IP when no
conntrack entries are accociated with the packet. In this case, the
packet is rerouted to the same DGP, and results in recirc loop in
datapath. The packet would finally be dropped either due to ttl or
the recirc limit, but it would have created unnecessary cost.

To reproduce the problem, simply configure SNAT on a LR with the SNAT IP
being the DGP's IP, and then send a packet from external (DGP's LS) to
the SNAT IP. Kernel logs like below will be seen:

openvswitch: ovs-system: deferred action limit reached, drop recirc action

DP flow dump would also show plenty of flows related to this packet,
each with a different ttl match, indicating the packet has been looped
many times.

Commit 802f927 (ovn-northd: Drop IP packets destined to router owned
IPs (after NAT)) already added flows to drop packets failed unSNAT for
Gateway Routers. It added flows with a low priority (2) to drop the
packets that fail ARP resolve, to avoid triggering ARP request for the
SNAT IPs. However, for the DGP case, to support E/W NAT, ARP resolve
flows are added for thoses NAT IPs so that the packets can continue the
pipeline and possibly redirect to redirect chassis. So, because of these
ARP resolve flows, even the packets failed unSNAT would continue the
pipeline and won't hit the low priority (2) flows, thus not get dropped.

To fix the problem, for each of the ARP resolve flow added for the DGP
NAT IPs, a higher priority (150) flow is added to check if the packet's
inport is the DGP (same as the outport), then drop the packet directly.

Test cases are updated to cover both Gateway Router and DGP scenarios,
with packets from both directions (uplink and downlink).

Reported-by: Krzysztof Klimonda <kklimonda@syntaxhighlighted.com>
Reported-at: https://patchwork.ozlabs.org/project/ovn/patch/20210816085206.69170-1-kklimonda@syntaxhighlighted.com/
Reported-by: Frode Nordahl <frode.nordahl@canonical.com>
Reported-at: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967718
Reported-by: Roberto Bartzen Acosta <rbartzen@gmail.com>
Reported-at: #134
Reported-by: Syed Ammad Ali <syedammad83@gmail.com>
Reported-at: #153
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2021-August/051340.html
Signed-off-by: Han Zhou <hzhou@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
putnopvut pushed a commit that referenced this issue Sep 26, 2023
When a packet enters LR pipeline from a distributed gateway port with
destination IP being a SNAT IP, it goes through the unSNAT stage and
it is possible that the unSNAT fails to convert the dst IP when no
conntrack entries are accociated with the packet. In this case, the
packet is rerouted to the same DGP, and results in recirc loop in
datapath. The packet would finally be dropped either due to ttl or
the recirc limit, but it would have created unnecessary cost.

To reproduce the problem, simply configure SNAT on a LR with the SNAT IP
being the DGP's IP, and then send a packet from external (DGP's LS) to
the SNAT IP. Kernel logs like below will be seen:

openvswitch: ovs-system: deferred action limit reached, drop recirc action

DP flow dump would also show plenty of flows related to this packet,
each with a different ttl match, indicating the packet has been looped
many times.

Commit 802f927 (ovn-northd: Drop IP packets destined to router owned
IPs (after NAT)) already added flows to drop packets failed unSNAT for
Gateway Routers. It added flows with a low priority (2) to drop the
packets that fail ARP resolve, to avoid triggering ARP request for the
SNAT IPs. However, for the DGP case, to support E/W NAT, ARP resolve
flows are added for thoses NAT IPs so that the packets can continue the
pipeline and possibly redirect to redirect chassis. So, because of these
ARP resolve flows, even the packets failed unSNAT would continue the
pipeline and won't hit the low priority (2) flows, thus not get dropped.

To fix the problem, for each of the ARP resolve flow added for the DGP
NAT IPs, a higher priority (150) flow is added to check if the packet's
inport is the DGP (same as the outport), then drop the packet directly.

Test cases are updated to cover both Gateway Router and DGP scenarios,
with packets from both directions (uplink and downlink).

Reported-by: Krzysztof Klimonda <kklimonda@syntaxhighlighted.com>
Reported-at: https://patchwork.ozlabs.org/project/ovn/patch/20210816085206.69170-1-kklimonda@syntaxhighlighted.com/
Reported-by: Frode Nordahl <frode.nordahl@canonical.com>
Reported-at: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967718
Reported-by: Roberto Bartzen Acosta <rbartzen@gmail.com>
Reported-at: #134
Reported-by: Syed Ammad Ali <syedammad83@gmail.com>
Reported-at: #153
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2021-August/051340.html
Signed-off-by: Han Zhou <hzhou@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
putnopvut pushed a commit that referenced this issue Sep 26, 2023
When a packet enters LR pipeline from a distributed gateway port with
destination IP being a SNAT IP, it goes through the unSNAT stage and
it is possible that the unSNAT fails to convert the dst IP when no
conntrack entries are accociated with the packet. In this case, the
packet is rerouted to the same DGP, and results in recirc loop in
datapath. The packet would finally be dropped either due to ttl or
the recirc limit, but it would have created unnecessary cost.

To reproduce the problem, simply configure SNAT on a LR with the SNAT IP
being the DGP's IP, and then send a packet from external (DGP's LS) to
the SNAT IP. Kernel logs like below will be seen:

openvswitch: ovs-system: deferred action limit reached, drop recirc action

DP flow dump would also show plenty of flows related to this packet,
each with a different ttl match, indicating the packet has been looped
many times.

Commit 802f927 (ovn-northd: Drop IP packets destined to router owned
IPs (after NAT)) already added flows to drop packets failed unSNAT for
Gateway Routers. It added flows with a low priority (2) to drop the
packets that fail ARP resolve, to avoid triggering ARP request for the
SNAT IPs. However, for the DGP case, to support E/W NAT, ARP resolve
flows are added for thoses NAT IPs so that the packets can continue the
pipeline and possibly redirect to redirect chassis. So, because of these
ARP resolve flows, even the packets failed unSNAT would continue the
pipeline and won't hit the low priority (2) flows, thus not get dropped.

To fix the problem, for each of the ARP resolve flow added for the DGP
NAT IPs, a higher priority (150) flow is added to check if the packet's
inport is the DGP (same as the outport), then drop the packet directly.

Test cases are updated to cover both Gateway Router and DGP scenarios,
with packets from both directions (uplink and downlink).

Reported-by: Krzysztof Klimonda <kklimonda@syntaxhighlighted.com>
Reported-at: https://patchwork.ozlabs.org/project/ovn/patch/20210816085206.69170-1-kklimonda@syntaxhighlighted.com/
Reported-by: Frode Nordahl <frode.nordahl@canonical.com>
Reported-at: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967718
Reported-by: Roberto Bartzen Acosta <rbartzen@gmail.com>
Reported-at: #134
Reported-by: Syed Ammad Ali <syedammad83@gmail.com>
Reported-at: #153
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2021-August/051340.html
Signed-off-by: Han Zhou <hzhou@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants