New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ipf memleak #226
Comments
|
No - this leak looks to be a different issue - the ipf expiry timer is taking extremely long to kick in. There is a purge timer, and it should be running at 15s to purge the free list. However, look at what happens: 2021-09-29T17:41:07.491Z|00093|unixctl|DBG|replying with success, id=0: " Fragmentation Module Status Almost 1m later: Finally purged: Even worse, this timer is not user configurable - so the only way to fix is to adjust the code. I suspect that you see it look worse because now we clone a packet, so under high packet conditions, we hold even bigger buffers for frags. This isn't a leak, but it is a problem (you should see after quiescent period, the number of outstanding frags should go down). I will look into the purge timeout - it isn't configurable by the user and that would help here as well. |
|
No. I did another test which use iperf to send udp packet over 2000 bytes. My vm's MTU is 1500. After a while all frags have be completed. But the mempool did not get released. And the result is like The |
|
What is the output from ovs-appctl netdev-dpdk/get-mempool-info [netdev] ? |
|
Perhaps the issue is here: Please try with this patch and let me know |
|
Note - I applied this to a pre-liang version of the code, because it seems this leak has been present for a very long time (and is independent of the dnsteal flag - it is a completely missed branch and would happen anyway). |
|
https://patchwork.ozlabs.org/project/openvswitch/patch/20211005181844.734362-1-aconole@redhat.com/ This patch should address the outstanding buffer. Please confirm it works in your environment. |
Since 640d4db ("ipf: Fix a use-after-free error, ...") the ipf framework unconditionally allocates a new dp_packet to track individual fragments. This prevents a use-after-free. However, an additional issue was present - even when the packet buffer is cloned, if the ip fragment handling code keeps it, the original buffer is leaked during the refill loop. Even in the original processing code, the hardcoded dnsteal branches would always leak a packet buffer from the refill loop. This can be confirmed with valgrind: ==717566== 16,672 (4,480 direct, 12,192 indirect) bytes in 8 blocks are definitely lost in loss record 390 of 390 ==717566== at 0x484086F: malloc (vg_replace_malloc.c:380) ==717566== by 0x537BFD: xmalloc__ (util.c:137) ==717566== by 0x537BFD: xmalloc (util.c:172) ==717566== by 0x46DDD4: dp_packet_new (dp-packet.c:153) ==717566== by 0x46DDD4: dp_packet_new_with_headroom (dp-packet.c:163) ==717566== by 0x550AA6: netdev_linux_batch_rxq_recv_sock.constprop.0 (netdev-linux.c:1262) ==717566== by 0x5512AF: netdev_linux_rxq_recv (netdev-linux.c:1511) ==717566== by 0x4AB7E0: netdev_rxq_recv (netdev.c:727) ==717566== by 0x47F00D: dp_netdev_process_rxq_port (dpif-netdev.c:4699) ==717566== by 0x47FD13: dpif_netdev_run (dpif-netdev.c:5957) ==717566== by 0x4331D2: type_run (ofproto-dpif.c:370) ==717566== by 0x41DFD8: ofproto_type_run (ofproto.c:1768) ==717566== by 0x40A7FB: bridge_run__ (bridge.c:3245) ==717566== by 0x411269: bridge_run (bridge.c:3310) ==717566== by 0x406E6C: main (ovs-vswitchd.c:127) The fix is to delete the original packet when it isn't able to be reinserted into the packet batch. Subsequent valgrind runs show that the packets are not leaked from the batch any longer. Fixes: 640d4db ("ipf: Fix a use-after-free error, and remove the 'do_not_steal' flag.") Fixes: 4ea9669 ("Userspace datapath: Add fragmentation handling.") Reported-by: Wan Junjie <wanjunjie@bytedance.com> Reported-at: openvswitch/ovs-issues#226 Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: 0-day Robot <robot@bytheb.org>
|
Hi Aaron, |
|
Hi Aaron, Close it now. |
|
Thanks so much - would you consider replying upstream to the mailing list with a "tested-by" tag? |
Ah yes. Done. |
Since 640d4db ("ipf: Fix a use-after-free error, ...") the ipf framework unconditionally allocates a new dp_packet to track individual fragments. This prevents a use-after-free. However, an additional issue was present - even when the packet buffer is cloned, if the ip fragment handling code keeps it, the original buffer is leaked during the refill loop. Even in the original processing code, the hardcoded dnsteal branches would always leak a packet buffer from the refill loop. This can be confirmed with valgrind: ==717566== 16,672 (4,480 direct, 12,192 indirect) bytes in 8 blocks are definitely lost in loss record 390 of 390 ==717566== at 0x484086F: malloc (vg_replace_malloc.c:380) ==717566== by 0x537BFD: xmalloc__ (util.c:137) ==717566== by 0x537BFD: xmalloc (util.c:172) ==717566== by 0x46DDD4: dp_packet_new (dp-packet.c:153) ==717566== by 0x46DDD4: dp_packet_new_with_headroom (dp-packet.c:163) ==717566== by 0x550AA6: netdev_linux_batch_rxq_recv_sock.constprop.0 (netdev-linux.c:1262) ==717566== by 0x5512AF: netdev_linux_rxq_recv (netdev-linux.c:1511) ==717566== by 0x4AB7E0: netdev_rxq_recv (netdev.c:727) ==717566== by 0x47F00D: dp_netdev_process_rxq_port (dpif-netdev.c:4699) ==717566== by 0x47FD13: dpif_netdev_run (dpif-netdev.c:5957) ==717566== by 0x4331D2: type_run (ofproto-dpif.c:370) ==717566== by 0x41DFD8: ofproto_type_run (ofproto.c:1768) ==717566== by 0x40A7FB: bridge_run__ (bridge.c:3245) ==717566== by 0x411269: bridge_run (bridge.c:3310) ==717566== by 0x406E6C: main (ovs-vswitchd.c:127) The fix is to delete the original packet when it isn't able to be reinserted into the packet batch. Subsequent valgrind runs show that the packets are not leaked from the batch any longer. Fixes: 640d4db ("ipf: Fix a use-after-free error, and remove the 'do_not_steal' flag.") Fixes: 4ea9669 ("Userspace datapath: Add fragmentation handling.") Reported-by: Wan Junjie <wanjunjie@bytedance.com> Reported-at: openvswitch/ovs-issues#226 Signed-off-by: Aaron Conole <aconole@redhat.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Tested-by: Wan Junjie <wanjunjie@bytedance.com> Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>
Since 640d4db ("ipf: Fix a use-after-free error, ...") the ipf framework unconditionally allocates a new dp_packet to track individual fragments. This prevents a use-after-free. However, an additional issue was present - even when the packet buffer is cloned, if the ip fragment handling code keeps it, the original buffer is leaked during the refill loop. Even in the original processing code, the hardcoded dnsteal branches would always leak a packet buffer from the refill loop. This can be confirmed with valgrind: ==717566== 16,672 (4,480 direct, 12,192 indirect) bytes in 8 blocks are definitely lost in loss record 390 of 390 ==717566== at 0x484086F: malloc (vg_replace_malloc.c:380) ==717566== by 0x537BFD: xmalloc__ (util.c:137) ==717566== by 0x537BFD: xmalloc (util.c:172) ==717566== by 0x46DDD4: dp_packet_new (dp-packet.c:153) ==717566== by 0x46DDD4: dp_packet_new_with_headroom (dp-packet.c:163) ==717566== by 0x550AA6: netdev_linux_batch_rxq_recv_sock.constprop.0 (netdev-linux.c:1262) ==717566== by 0x5512AF: netdev_linux_rxq_recv (netdev-linux.c:1511) ==717566== by 0x4AB7E0: netdev_rxq_recv (netdev.c:727) ==717566== by 0x47F00D: dp_netdev_process_rxq_port (dpif-netdev.c:4699) ==717566== by 0x47FD13: dpif_netdev_run (dpif-netdev.c:5957) ==717566== by 0x4331D2: type_run (ofproto-dpif.c:370) ==717566== by 0x41DFD8: ofproto_type_run (ofproto.c:1768) ==717566== by 0x40A7FB: bridge_run__ (bridge.c:3245) ==717566== by 0x411269: bridge_run (bridge.c:3310) ==717566== by 0x406E6C: main (ovs-vswitchd.c:127) The fix is to delete the original packet when it isn't able to be reinserted into the packet batch. Subsequent valgrind runs show that the packets are not leaked from the batch any longer. Fixes: 640d4db ("ipf: Fix a use-after-free error, and remove the 'do_not_steal' flag.") Fixes: 4ea9669 ("Userspace datapath: Add fragmentation handling.") Reported-by: Wan Junjie <wanjunjie@bytedance.com> Reported-at: openvswitch/ovs-issues#226 Signed-off-by: Aaron Conole <aconole@redhat.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Tested-by: Wan Junjie <wanjunjie@bytedance.com> Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>
Since 640d4db ("ipf: Fix a use-after-free error, ...") the ipf framework unconditionally allocates a new dp_packet to track individual fragments. This prevents a use-after-free. However, an additional issue was present - even when the packet buffer is cloned, if the ip fragment handling code keeps it, the original buffer is leaked during the refill loop. Even in the original processing code, the hardcoded dnsteal branches would always leak a packet buffer from the refill loop. This can be confirmed with valgrind: ==717566== 16,672 (4,480 direct, 12,192 indirect) bytes in 8 blocks are definitely lost in loss record 390 of 390 ==717566== at 0x484086F: malloc (vg_replace_malloc.c:380) ==717566== by 0x537BFD: xmalloc__ (util.c:137) ==717566== by 0x537BFD: xmalloc (util.c:172) ==717566== by 0x46DDD4: dp_packet_new (dp-packet.c:153) ==717566== by 0x46DDD4: dp_packet_new_with_headroom (dp-packet.c:163) ==717566== by 0x550AA6: netdev_linux_batch_rxq_recv_sock.constprop.0 (netdev-linux.c:1262) ==717566== by 0x5512AF: netdev_linux_rxq_recv (netdev-linux.c:1511) ==717566== by 0x4AB7E0: netdev_rxq_recv (netdev.c:727) ==717566== by 0x47F00D: dp_netdev_process_rxq_port (dpif-netdev.c:4699) ==717566== by 0x47FD13: dpif_netdev_run (dpif-netdev.c:5957) ==717566== by 0x4331D2: type_run (ofproto-dpif.c:370) ==717566== by 0x41DFD8: ofproto_type_run (ofproto.c:1768) ==717566== by 0x40A7FB: bridge_run__ (bridge.c:3245) ==717566== by 0x411269: bridge_run (bridge.c:3310) ==717566== by 0x406E6C: main (ovs-vswitchd.c:127) The fix is to delete the original packet when it isn't able to be reinserted into the packet batch. Subsequent valgrind runs show that the packets are not leaked from the batch any longer. Fixes: 640d4db ("ipf: Fix a use-after-free error, and remove the 'do_not_steal' flag.") Fixes: 4ea9669 ("Userspace datapath: Add fragmentation handling.") Reported-by: Wan Junjie <wanjunjie@bytedance.com> Reported-at: openvswitch/ovs-issues#226 Signed-off-by: Aaron Conole <aconole@redhat.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Tested-by: Wan Junjie <wanjunjie@bytedance.com> Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>
Since 640d4db ("ipf: Fix a use-after-free error, ...") the ipf framework unconditionally allocates a new dp_packet to track individual fragments. This prevents a use-after-free. However, an additional issue was present - even when the packet buffer is cloned, if the ip fragment handling code keeps it, the original buffer is leaked during the refill loop. Even in the original processing code, the hardcoded dnsteal branches would always leak a packet buffer from the refill loop. This can be confirmed with valgrind: ==717566== 16,672 (4,480 direct, 12,192 indirect) bytes in 8 blocks are definitely lost in loss record 390 of 390 ==717566== at 0x484086F: malloc (vg_replace_malloc.c:380) ==717566== by 0x537BFD: xmalloc__ (util.c:137) ==717566== by 0x537BFD: xmalloc (util.c:172) ==717566== by 0x46DDD4: dp_packet_new (dp-packet.c:153) ==717566== by 0x46DDD4: dp_packet_new_with_headroom (dp-packet.c:163) ==717566== by 0x550AA6: netdev_linux_batch_rxq_recv_sock.constprop.0 (netdev-linux.c:1262) ==717566== by 0x5512AF: netdev_linux_rxq_recv (netdev-linux.c:1511) ==717566== by 0x4AB7E0: netdev_rxq_recv (netdev.c:727) ==717566== by 0x47F00D: dp_netdev_process_rxq_port (dpif-netdev.c:4699) ==717566== by 0x47FD13: dpif_netdev_run (dpif-netdev.c:5957) ==717566== by 0x4331D2: type_run (ofproto-dpif.c:370) ==717566== by 0x41DFD8: ofproto_type_run (ofproto.c:1768) ==717566== by 0x40A7FB: bridge_run__ (bridge.c:3245) ==717566== by 0x411269: bridge_run (bridge.c:3310) ==717566== by 0x406E6C: main (ovs-vswitchd.c:127) The fix is to delete the original packet when it isn't able to be reinserted into the packet batch. Subsequent valgrind runs show that the packets are not leaked from the batch any longer. Fixes: 640d4db ("ipf: Fix a use-after-free error, and remove the 'do_not_steal' flag.") Fixes: 4ea9669 ("Userspace datapath: Add fragmentation handling.") Reported-by: Wan Junjie <wanjunjie@bytedance.com> Reported-at: openvswitch/ovs-issues#226 Signed-off-by: Aaron Conole <aconole@redhat.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Tested-by: Wan Junjie <wanjunjie@bytedance.com> Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>
Since 640d4db ("ipf: Fix a use-after-free error, ...") the ipf framework unconditionally allocates a new dp_packet to track individual fragments. This prevents a use-after-free. However, an additional issue was present - even when the packet buffer is cloned, if the ip fragment handling code keeps it, the original buffer is leaked during the refill loop. Even in the original processing code, the hardcoded dnsteal branches would always leak a packet buffer from the refill loop. This can be confirmed with valgrind: ==717566== 16,672 (4,480 direct, 12,192 indirect) bytes in 8 blocks are definitely lost in loss record 390 of 390 ==717566== at 0x484086F: malloc (vg_replace_malloc.c:380) ==717566== by 0x537BFD: xmalloc__ (util.c:137) ==717566== by 0x537BFD: xmalloc (util.c:172) ==717566== by 0x46DDD4: dp_packet_new (dp-packet.c:153) ==717566== by 0x46DDD4: dp_packet_new_with_headroom (dp-packet.c:163) ==717566== by 0x550AA6: netdev_linux_batch_rxq_recv_sock.constprop.0 (netdev-linux.c:1262) ==717566== by 0x5512AF: netdev_linux_rxq_recv (netdev-linux.c:1511) ==717566== by 0x4AB7E0: netdev_rxq_recv (netdev.c:727) ==717566== by 0x47F00D: dp_netdev_process_rxq_port (dpif-netdev.c:4699) ==717566== by 0x47FD13: dpif_netdev_run (dpif-netdev.c:5957) ==717566== by 0x4331D2: type_run (ofproto-dpif.c:370) ==717566== by 0x41DFD8: ofproto_type_run (ofproto.c:1768) ==717566== by 0x40A7FB: bridge_run__ (bridge.c:3245) ==717566== by 0x411269: bridge_run (bridge.c:3310) ==717566== by 0x406E6C: main (ovs-vswitchd.c:127) The fix is to delete the original packet when it isn't able to be reinserted into the packet batch. Subsequent valgrind runs show that the packets are not leaked from the batch any longer. Fixes: 640d4db ("ipf: Fix a use-after-free error, and remove the 'do_not_steal' flag.") Fixes: 4ea9669 ("Userspace datapath: Add fragmentation handling.") Reported-by: Wan Junjie <wanjunjie@bytedance.com> Reported-at: openvswitch/ovs-issues#226 Signed-off-by: Aaron Conole <aconole@redhat.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Tested-by: Wan Junjie <wanjunjie@bytedance.com> Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>
Since 640d4db ("ipf: Fix a use-after-free error, ...") the ipf framework unconditionally allocates a new dp_packet to track individual fragments. This prevents a use-after-free. However, an additional issue was present - even when the packet buffer is cloned, if the ip fragment handling code keeps it, the original buffer is leaked during the refill loop. Even in the original processing code, the hardcoded dnsteal branches would always leak a packet buffer from the refill loop. This can be confirmed with valgrind: ==717566== 16,672 (4,480 direct, 12,192 indirect) bytes in 8 blocks are definitely lost in loss record 390 of 390 ==717566== at 0x484086F: malloc (vg_replace_malloc.c:380) ==717566== by 0x537BFD: xmalloc__ (util.c:137) ==717566== by 0x537BFD: xmalloc (util.c:172) ==717566== by 0x46DDD4: dp_packet_new (dp-packet.c:153) ==717566== by 0x46DDD4: dp_packet_new_with_headroom (dp-packet.c:163) ==717566== by 0x550AA6: netdev_linux_batch_rxq_recv_sock.constprop.0 (netdev-linux.c:1262) ==717566== by 0x5512AF: netdev_linux_rxq_recv (netdev-linux.c:1511) ==717566== by 0x4AB7E0: netdev_rxq_recv (netdev.c:727) ==717566== by 0x47F00D: dp_netdev_process_rxq_port (dpif-netdev.c:4699) ==717566== by 0x47FD13: dpif_netdev_run (dpif-netdev.c:5957) ==717566== by 0x4331D2: type_run (ofproto-dpif.c:370) ==717566== by 0x41DFD8: ofproto_type_run (ofproto.c:1768) ==717566== by 0x40A7FB: bridge_run__ (bridge.c:3245) ==717566== by 0x411269: bridge_run (bridge.c:3310) ==717566== by 0x406E6C: main (ovs-vswitchd.c:127) The fix is to delete the original packet when it isn't able to be reinserted into the packet batch. Subsequent valgrind runs show that the packets are not leaked from the batch any longer. Fixes: 640d4db ("ipf: Fix a use-after-free error, and remove the 'do_not_steal' flag.") Fixes: 4ea9669 ("Userspace datapath: Add fragmentation handling.") Reported-by: Wan Junjie <wanjunjie@bytedance.com> Reported-at: openvswitch/ovs-issues#226 Signed-off-by: Aaron Conole <aconole@redhat.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Tested-by: Wan Junjie <wanjunjie@bytedance.com> Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>
Hi,
This commit introduce a mempool memleak.
640d4db788eda96bb904abcfc7de2327107bafe1
If I keep sending first fragment as an attack, the mempool would be exhausted.
I think revert it and respect the dnsteal flag as @ Wang Liang mentioned in
https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/382098.html
would be better.
The text was updated successfully, but these errors were encountered: