Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ten64 branch] Dataflow stops with multiple port traffic #17

Closed
mcbridematt opened this issue Mar 21, 2023 · 3 comments
Closed

[ten64 branch] Dataflow stops with multiple port traffic #17

mcbridematt opened this issue Mar 21, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@mcbridematt
Copy link

Commit: 5f6b8b3

In my test suite, I run iperf3 through the FreeBSD host functioning as a router

For example:
iperf3 server <-> dpniX (FreeBSD) dpniX+1 <-> iperf3 client

The test system is another Ten64 running Linux which runs each iperf3 instance in a container with one of the ethX ports transferred into it.

So dpni0 on FreeBSD -> eth0 on test system, dpni1<->eth1, dpni2<->eth2 etc.

(I'll publish the scripts another time, they need a bit of cleanup)

cat /etc/rc.conf
hostname="freebsd-ten64"
ifconfig_dpni0="192.168.13.1 netmask 255.255.255.0"
ifconfig_dpni1="192.168.14.1 netmask 255.255.255.0"
ifconfig_dpni2="192.168.15.1 netmask 255.255.255.0"
ifconfig_dpni3="192.168.16.1 netmask 255.255.255.0"
ifconfig_dpni6="DHCP inet6 accept_rtadv"
growfs_enable="YES"
dhcpd_enable="YES"                          # dhcpd enabled?
dhcpd_flags="-q"                            # command option(s)
dhcpd_conf="/usr/local/etc/dhcpd.conf"      # configuration file
dhcpd_ifaces="dpni1 dpni3"                  # ethernet interface(s)
dhcpd_withumask="022"                       # file creation mask
gateway_enable="YES"
sshd_enable="YES"

dpni6 is the interface to my LAN for management

Server 1 is attached to dpni0 on 192.168.13.2
Client 1 is on dpni2, gets an IP via DHCP and initiates an iperf3 -R -c 192.168.13.2
Server 2 on 192.168.15.2, Client 2 on 192.168.16.X so on.

For this initial test, I will run just one flow.

On this branch, the dataflow completely stops almost immediately:

udhcpc: started, v1.34.1
udhcpc: broadcasting discover                                                   
udhcpc: broadcasting select for 192.168.14.10, server 192.168.14.1
udhcpc: lease of 192.168.14.10 obtained from 192.168.14.1, lease time 600
Connecting to host 192.168.13.2, port 5201
Reverse mode, remote host 192.168.13.2 is sending
[  5] local 192.168.14.10 port 53100 connected to 192.168.13.2 port 5201
[ ID] Interval           Transfer     Bitrate                                   
[  5]   0.00-1.00   sec  14.1 KBytes   116 Kbits/sec
[  5]   1.00-2.00   sec  0.00 Bytes  0.00 bits/sec
[  5]   2.00-3.00   sec  0.00 Bytes  0.00 bits/sec
[  5]   3.00-4.00   sec  0.00 Bytes  0.00 bits/sec
[  5]   4.00-5.00   sec  0.00 Bytes  0.00 bits/sec

In this case, dpni1 won't receive any traffic to the iperf3 server (192.168.13.2), but will receive other frames:


root@freebsd-ten64:/dev # tcpdump -i dpni1
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on dpni1, link-type EN10MB (Ethernet), capture size 262144 bytes
07:03:08.616665 ARP, Request who-has 192.168.14.1 tell 192.168.14.10, length 46
07:03:32.278905 IP 192.168.14.10 > 192.168.13.1: ICMP echo request, id 9, seq 0, length 64
# 192.168.13.2 missing
07:03:34.073692 IP 192.168.14.10 > 192.168.13.3: ICMP echo request, id 11, seq 0, length 64
07:03:34.858869 IP 192.168.14.10 > 192.168.13.4: ICMP echo request, id 12, seq 0, length 64
07:03:35.533436 IP 192.168.14.10 > 192.168.13.5: ICMP echo request, id 13, seq 0, length 64
07:03:36.293349 IP 192.168.14.10 > 192.168.13.6: ICMP echo request, id 14, seq 0, length 64
07:03:37.348211 IP 192.168.14.10 > 192.168.13.7: ICMP echo request, id 15, seq 0, length 64
07:03:38.298136 IP 192.168.14.10 > 192.168.13.8: ICMP echo request, id 16, seq 0, length 64
07:03:39.223962 IP 192.168.14.10 > 192.168.13.9: ICMP echo request, id 17, seq 0, length 64
# 192.168.13.10 MISSING
07:03:41.163779 IP 192.168.14.10 > 192.168.13.11: ICMP echo request, id 19, seq 0, length 64
07:03:42.098482 IP 192.168.14.10 > 192.168.13.12: ICMP echo request, id 20, seq 0, length 64
07:03:42.853497 IP 192.168.14.10 > 192.168.13.13: ICMP echo request, id 21, seq 0, length 64
# Other IPs not being received: 192.168.13.17, 26, 29, so looks like one of the queues is not processing

vmstat:

vmstat -i | grep dpaa2
its0,140: dpaa2_io0                                      17          0
its0,141: dpaa2_io1                                     798          2
its0,142: dpaa2_io2                                      13          0
its0,143: dpaa2_io3                                      53          0
its0,144: dpaa2_io4                                       4          0
its0,145: dpaa2_io5                                       4          0
its0,146: dpaa2_io6                                      21          0
its0,147: dpaa2_io7                                      38          0
its0,148: dpaa2_ni0                                       1          0
its0,149: dpaa2_ni1                                       1          0
its0,150: dpaa2_ni2                                       1          0
its0,151: dpaa2_ni3                                       1          0
its0,154: dpaa2_ni6                                       1          0

I do a few more vmstats:

its0,140: dpaa2_io0                                      17          0
its0,141: dpaa2_io1                                    1429          1
its0,142: dpaa2_io2                                      25          0
its0,143: dpaa2_io3                                     104          0
its0,144: dpaa2_io4                                      11          0
its0,145: dpaa2_io5                                      16          0
its0,146: dpaa2_io6                                      35          0
its0,147: dpaa2_io7                                      56          0
its0,148: dpaa2_ni0                                       1          0
its0,149: dpaa2_ni1                                       1          0
its0,150: dpaa2_ni2                                       1          0
its0,151: dpaa2_ni3                                       1          0
its0,154: dpaa2_ni6                                       1          0

its0,140: dpaa2_io0 counter has not changed, is it stuck?

dpaa2 niX counters:

sysctl dev.dpaa2_ni.0
dev.dpaa2_ni.0.stats.in_all_frames: 66
dev.dpaa2_ni.0.stats.in_all_bytes: 62368
dev.dpaa2_ni.0.stats.in_multi_frames: 0
dev.dpaa2_ni.0.stats.eg_all_frames: 36
dev.dpaa2_ni.0.stats.eg_all_bytes: 2471
dev.dpaa2_ni.0.stats.eg_multi_frames: 0
dev.dpaa2_ni.0.stats.in_filtered_frames: 0
dev.dpaa2_ni.0.stats.in_discarded_frames: 0
dev.dpaa2_ni.0.stats.in_nobuf_discards: 0
dev.dpaa2_ni.0.stats.buf_free: 0
dev.dpaa2_ni.0.stats.buf_num: 2800
dev.dpaa2_ni.0.%parent: dpaa2_rc0
dev.dpaa2_ni.0.%pnpinfo:
dev.dpaa2_ni.0.%location:
dev.dpaa2_ni.0.%driver: dpaa2_ni
dev.dpaa2_ni.0.%desc: DPAA2 Network Interface
root@freebsd-ten64:/dev # sysctl dev.dpaa2_ni.1
dev.dpaa2_ni.1.stats.in_all_frames: 584
dev.dpaa2_ni.1.stats.in_all_bytes: 56208
dev.dpaa2_ni.1.stats.in_multi_frames: 0
dev.dpaa2_ni.1.stats.eg_all_frames: 195
dev.dpaa2_ni.1.stats.eg_all_bytes: 32414
dev.dpaa2_ni.1.stats.eg_multi_frames: 0
dev.dpaa2_ni.1.stats.in_filtered_frames: 0
dev.dpaa2_ni.1.stats.in_discarded_frames: 0
dev.dpaa2_ni.1.stats.in_nobuf_discards: 0
dev.dpaa2_ni.1.stats.buf_free: 0
dev.dpaa2_ni.1.stats.buf_num: 2800
dev.dpaa2_ni.1.%parent: dpaa2_rc0
dev.dpaa2_ni.1.%pnpinfo:
dev.dpaa2_ni.1.%location:
dev.dpaa2_ni.1.%driver: dpaa2_ni
dev.dpaa2_ni.1.%desc: DPAA2 Network Interface
root@freebsd-ten64:/dev # sysctl dev.dpaa2_ni.6
dev.dpaa2_ni.6.stats.in_all_frames: 1103
dev.dpaa2_ni.6.stats.in_all_bytes: 95670
dev.dpaa2_ni.6.stats.in_multi_frames: 667
dev.dpaa2_ni.6.stats.eg_all_frames: 10
dev.dpaa2_ni.6.stats.eg_all_bytes: 978
dev.dpaa2_ni.6.stats.eg_multi_frames: 0
dev.dpaa2_ni.6.stats.in_filtered_frames: 2
dev.dpaa2_ni.6.stats.in_discarded_frames: 0
dev.dpaa2_ni.6.stats.in_nobuf_discards: 0
dev.dpaa2_ni.6.stats.buf_free: 0
dev.dpaa2_ni.6.stats.buf_num: 2800
dev.dpaa2_ni.6.%parent: dpaa2_rc0
dev.dpaa2_ni.6.%pnpinfo:
dev.dpaa2_ni.6.%location:
dev.dpaa2_ni.6.%driver: dpaa2_ni
dev.dpaa2_ni.6.%desc: DPAA2 Network Interface
@dsalychev dsalychev added the bug Something isn't working label Mar 21, 2023
@dsalychev
Copy link

@mcbridematt Same as #8 (comment)

@mcbridematt
Copy link
Author

Excellent! I've tested with four ports active (GENERIC-NODEBUG) and it has gone 24 hours without issues.

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-86400.00 sec  7.46 TBytes   760 Mbits/sec  360862             sender
[  5]   0.00-86400.00 sec  7.46 TBytes   760 Mbits/sec                  receiver
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-86400.00 sec  6.78 TBytes   690 Mbits/sec  95783             sender
[  5]   0.00-86400.00 sec  6.78 TBytes   690 Mbits/sec                  receiver

I'll move my 'production' FreeBSD machine to this version and see how it goes.

@dsalychev
Copy link

I'll move my 'production' FreeBSD machine to this version and see how it goes.

I hope I'll still have a chance to commit those changes till 14.0. Thanks for testing and help!

dsalychev pushed a commit that referenced this issue Sep 9, 2023
netlink(4) calls back into the driver during detach and it attempts to
start an internal synchronized op recursively, causing an interruptible
hang.  Fix it by failing the ioctl if the VI has been marked as DOOMED
by cxgbe_detach.

Here's the stack for the hang for reference.
 #6  begin_synchronized_op
 #7  cxgbe_media_status
 #8  ifmedia_ioctl
 #9  cxgbe_ioctl
 #10 if_ioctl
 #11 get_operstate_ether
 #12 get_operstate
 #13 dump_iface
 #14 rtnl_handle_ifevent
 #15 rtnl_handle_ifnet_event
 #16 rt_ifmsg
 #17 if_unroute
 #18 if_down
 #19 if_detach_internal
 #20 if_detach
 #21 ether_ifdetach
 #22 cxgbe_vi_detach
 #23 cxgbe_detach
 #24 DEVICE_DETACH

MFC after:	3 days
Sponsored by:	Chelsio Communications
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants