Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iperf fails (hangs) in TCP tests #1060

Open
cpaelzer opened this issue Mar 17, 2021 · 10 comments
Open

iperf fails (hangs) in TCP tests #1060

cpaelzer opened this issue Mar 17, 2021 · 10 comments

Comments

@cpaelzer
Copy link

cpaelzer commented Mar 17, 2021

Expected/Desired Behavior:

TCP based iperf tests to work e.g. net.iperf( ( h1, h4 ), l4Type='TCP' )

Actual Behavior:

UDP based tests work, but once l4Type is set to TCP (or left open as it is the default) the test hangs.

Detailed Steps to Reproduce the Behavior:

These tests were done in Ubuntu 21.04 Hirsute.
If you need the same OS to test, you can either use a cloud image or use multipass to spawn one easily

You can modify the provided example simpleperf.py to show this:

--- simpleperf.py.orig	2021-03-17 08:53:12.618482061 +0000
+++ simpleperf.py	2021-03-17 08:53:26.130543704 +0000
@@ -52,6 +52,8 @@
     info( "Testing bandwidth between h1 and h4\n" )
     h1, h4 = net.getNodeByName('h1', 'h4')
     net.iperf( ( h1, h4 ), l4Type='UDP' )
+    net.iperf( ( h1, h4 ), l4Type='TCP' )
+    net.iperf( ( h1, h4 ) )
     net.stop()

On execution that will behave like:

$ sudo python3 simpleperf.py
*** Creating network
*** Adding controller
*** Adding hosts:
h1 h2 h3 h4 
*** Adding switches:
s1 
*** Adding links:
(10.00Mbit 5ms delay 10.00000% loss) (10.00Mbit 5ms delay 10.00000% loss) (h1, s1) (10.00Mbit 5ms delay 10.00000% loss) (10.00Mbit 5ms delay 10.00000% loss) (h2, s1) (10.00Mbit 5ms delay 10.00000% loss) (10.00Mbit 5ms delay 10.00000% loss) (h3, s1) (10.00Mbit 5ms delay 10.00000% loss) (10.00Mbit 5ms delay 10.00000% loss) (h4, s1) 
*** Configuring hosts
h1 (cfs 12500/100000us) h2 (cfs 12500/100000us) h3 (cfs 12500/100000us) h4 (cfs 12500/100000us) 
*** Starting controller
c0 
*** Starting 1 switches
s1 ...(10.00Mbit 5ms delay 10.00000% loss) (10.00Mbit 5ms delay 10.00000% loss) (10.00Mbit 5ms delay 10.00000% loss) (10.00Mbit 5ms delay 10.00000% loss) 
Dumping host connections
h1 h1-eth0:s1-eth1
h2 h2-eth0:s1-eth2
h3 h3-eth0:s1-eth3
h4 h4-eth0:s1-eth4
Testing bandwidth between h1 and h4
*** Iperf: testing UDP bandwidth between h1 and h4 
*** Results: ['10M', '8.40 Mbits/sec', '10.5 Mbits/sec']
*** Iperf: testing TCP bandwidth between h1 and h4 

That is seen with iperf 2.0.14a in Debian/Ubuntu.
I thought it is due to the new version that has quite some changes 1 2 3 that eventually became v2.1.
But that wasn't true, at least in my test envronment it happened the same way when I downgraded to iperf 2.0.13.
OTOH if I do a full test reset and run with 2.0.13 it works, so I'd recommend debugging vs 2.0.14a (or later) as that still seems somehow related to this hang.

Downgrading mininet to the former (Ubuntu) version of 2.2.2-5ubuntu1 (last known working setup) is no option as
a) that then triggers a different issue xception: Error creating interface pair (h1-eth0,s1-eth1):
b) the 2.3 python3 version is what I'd eventually want to reach anyway

Additional Information:

When stuck these are the processes left running:

$ps axlf
...
4     0   16392     866  20   0  11536  4652 -      S+   pts/0      0:00  |           \_ sudo python3 simpleperf.py
4     0   16393   16392  20   0  19696 12884 -      S+   pts/0      0:00  |               \_ python3 simpleperf.py
0     0   16404   16393  20   0   8920  3520 -      Ss+  pts/1      0:00  |                   \_ bash --norc --noediting -is mininet:c0
0     0   16564   16404  20   0  12296  5240 -      S+   pts/1      0:00  |                   |   \_ /usr/bin/ovs-testcontroller ptcp:6653
4     0   16411   16393  20   0   8920  3640 -      Ss+  pts/3      0:00  |                   \_ bash --norc --noediting -is mininet:h1
4     0   16418   16393  20   0   8920  3600 -      Ss+  pts/4      0:00  |                   \_ bash --norc --noediting -is mininet:h2
4     0   16422   16393  20   0   8920  3448 -      Ss+  pts/5      0:00  |                   \_ bash --norc --noediting -is mininet:h3
4     0   16426   16393  20   0   8920  3568 -      Ss+  pts/6      0:00  |                   \_ bash --norc --noediting -is mininet:h4
0     0   16607   16426  20   0 227552  2436 -      Sl+  pts/6      0:00  |                   |   \_ iperf -p 5001 -s
0     0   16433   16393  20   0   8920  3596 -      Ss+  pts/7      0:00  |                   \_ bash --norc --noediting -is mininet:s1
@lantz
Copy link
Member

lantz commented Mar 17, 2021

Can you confirm that this issue occurs in the Ubuntu 20.04 VM for Mininet 2.3.0?

We don't currently run the CI tests on 21.04 because github actions doesn't support it as a target. However, iperf 2.0.14+ might cause issues in the future on Ubuntu and Debian so it would be nice to have a fix for it if needed.

Another test you could run is simply sudo mn --test iperf.

@cpaelzer
Copy link
Author

HI @lantz and thanks for your response.
Before I've just used the packages that are in Ubuntu 21.04, for your request I've slowly tried to circle around what triggers the hang ...

TL;DR

  • iperf >2.0.14a is the trigger
  • yes it also happens in Ubuntu 20.04
  • sudo mn --test iperf is enough to trigger it

--- details ---

My debug started with:

  • mininet 2.3 from git as of today
  • New VMs for Ubuntu 20.04 and Ubuntu 21.04
# maybe not even needed with the installer pulling in so much
$ sudo apt install openvswitch-switch openvswitch-testcontroller ubuntu-dev-tools python3-pip
$ sudo systemctl stop openvswitch-testcontroller.service
# mininet from git using the installer script
$ git clone git://github.com/mininet/mininet
$ ./mininet/util/install.sh
$ cd mininet/examples/
$ vim simpleperf.py
# add TCP arg
$ which mn
/usr/local/bin/mn

Now from here some iperf seems to work on both just fine:

  • sudo mn --test iperf
  • sudo python3 simpleperf.py

As next test step I bumped iperf up to 2.0.14a by switchign from the one in Ubuntu to a build from the upstream tarball.

$ wget https://sourceforge.net/projects/iperf2/files/iperf-2.0.14a.tar.gz/download
$ mv download iperf-2.0.14a.tar.gz
$ tar xzf iperf-2.0.14a.tar.gz
$ cd iperf-2.0.14a/
$ ./configure
$ make
$ sudo make install
$ which iperf
/usr/local/bin/iperf
$ iperf -v
iperf version 2.0.14a (3 Jan 2021) pthreads

Now it hangs, so it really is the new iperf (as assumed before)

$ sudo mn --test iperf
*** Creating network
*** Adding controller
*** Adding hosts:
h1 h2 
*** Adding switches:
s1 
*** Adding links:
(h1, s1) (h2, s1) 
*** Configuring hosts
h1 h2 
*** Starting controller
c0 
*** Starting 1 switches
s1 ...
*** Waiting for switches to connect
s1 
*** Iperf: testing TCP bandwidth between h1 and h2

<hang for ~5 minutes - then aborted>

^C

Keyboard Interrupt. Shutting down and cleaning up...

*** Removing excess controllers/ofprotocols/ofdatapaths/pings/noxes
killall controller ofprotocol ofdatapath ping nox_corelt-nox_core ovs-openflowd ovs-controllerovs-testcontroller udpbwtest mnexec ivs ryu-manager 2> /dev/null
killall -9 controller ofprotocol ofdatapath ping nox_corelt-nox_core ovs-openflowd ovs-controllerovs-testcontroller udpbwtest mnexec ivs ryu-manager 2> /dev/null
pkill -9 -f "sudo mnexec"
*** Removing junk from /tmp
rm -f /tmp/vconn* /tmp/vlogs* /tmp/*.out /tmp/*.log
*** Removing old X11 tunnels
*** Removing excess kernel datapaths
ps ax | egrep -o 'dp[0-9]+' | sed 's/dp/nl:/'
***  Removing OVS datapaths
ovs-vsctl --timeout=1 list-br
ovs-vsctl --if-exists del-br s1
ovs-vsctl --timeout=1 list-br
*** Removing all links of the pattern foo-ethX
ip link show | egrep -o '([-_.[:alnum:]]+-eth[[:digit:]]+)'
( ip link del s1-eth1;ip link del s1-eth2 ) 2> /dev/null
ip link show
*** Killing stale mininet node processes
pkill -9 -f mininet:
*** Shutting down stale tunnels
pkill -9 -f Tunnel=Ethernet
pkill -9 -f .ssh/mn
rm -f ~/.ssh/mn/*
*** Cleanup complete.

Since upstream there is a new iperf already let me try that as well

$ cd iperf-2.0.14a/
$ sudo make uninstall
$ wget https://sourceforge.net/projects/iperf2/files/iperf-2.1.1-dev.tar.gz/download
$ mv download iperf-2.1.1-dev.tar.gz
$ tar xzf iperf-2.1.1-dev.tar.gz
$ cd iperf-2.1.1-dev/
$ ./configure
$ make
$ sudo make install
$ which iperf
/usr/local/bin/iperf
$ iperf -v
iperf version 2.1.1-dev (28 Feb 2021) pthreads

And unfortunately 2.1.1-dev behaves just as hanging as 2.0.14a did.

The hanging processes on sudo mn --test iperf look very similar to what I've reported before for sudo python3 simpleperf.py

4     0   75796    1308  20   0  11572  4920 -      S+   pts/0      0:00          \_ sudo mn --test iperf
4     0   75797   75796  20   0  20888 14236 -      S+   pts/0      0:00              \_ /usr/bin/python3 /usr/local/bin/mn --test iperf
0     0   75802   75797  20   0   8916  3568 -      Ss+  pts/1      0:00                  \_ bash --norc --noediting -is mininet:c0
0     0   75841   75802  20   0   2660   984 -      S+   pts/1      0:00                  |   \_ controller ptcp:6653
4     0   75809   75797  20   0   8916  3612 -      Ss+  pts/2      0:00                  \_ bash --norc --noediting -is mininet:h1
4     0   75811   75797  20   0   8916  3548 -      Ss+  pts/3      0:00                  \_ bash --norc --noediting -is mininet:h2
0     0   75883   75811  20   0 227632  2040 -      Sl+  pts/3      0:02                  |   \_ iperf -p 5001 -s
0     0   75816   75797  20   0   8916  3580 -      Ss+  pts/4      0:00                  \_ bash --norc --noediting -is mininet:s1

@lantz
Copy link
Member

lantz commented Mar 28, 2021

Have you tried mn -v debug --test iperf? I think it reveals the issue.

Old iperf:

...
Client output: ------------------------------------------------------------
Client connecting to 10.0.0.2, TCP port 5001
TCP window size: 1.54 MByte (default)
------------------------------------------------------------
[  3] local 10.0.0.1 port 50094 connected with 10.0.0.2 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 5.0 sec  23.5 GBytes  40.4 Gbits/sec

sendInt: writing chr(3)
Server output: recvn abort failed: Success
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[  4] local 10.0.0.2 port 5001 connected with 10.0.0.1 port 50092
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0- 0.0 sec  0.00 Bytes  0.00 bits/sec
[  5] local 10.0.0.2 port 5001 connected with 10.0.0.1 port 50094
[  5]  0.0- 5.0 sec  23.5 GBytes  40.3 Gbits/sec

*** Results: ['40.3 Gbits/sec', '40.4 Gbits/sec']

new iperf:

...
Client output: ------------------------------------------------------------
Client connecting to 10.0.0.2, TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[  1] local 10.0.0.1 port 50130 connected with 10.0.0.2 port 5001
[ ID] Interval       Transfer     Bandwidth
[  1] 0.00-5.00 sec  22.6 GBytes  38.9 Gbits/sec

It appears that the new iperf server may not be printing out results from the initial connection from waitListening().

I'm not sure of the best way to fix this. waitListening() makes a real connection because that's the only way to determine that you have a working route from the client to the server.

As an alternative, we could use fuser on the server and then add a timeout to the iperf connection itself (which we probably should do anyway...)

Or perhaps we could throw away intervals less than a second.

Perhaps you have some thoughts or good ideas.

@cpaelzer
Copy link
Author

Hi @lantz
I wonder if that is indeed it, I've just ignored all of mininet and issued iperf runs against three versions.
The old 2.0.13 (working) the new 2.0.14a (failing) and 2.1.1-dev (unknown in regard to mininet)
On the other side I just run the default client of "./src/iperf -c 127.0.0.1".

Server output:

2.0.13 (working)

root@h:~# iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size:  128 KByte (default)
------------------------------------------------------------
[  4] local 127.0.0.1 port 5001 connected with 127.0.0.1 port 48872
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec  53.2 GBytes  45.7 Gbits/sec

2.0.14a (bad)

root@h:~/iperf-2.0.14a# ./src/iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size:  128 KByte (default)
------------------------------------------------------------
[  1] local 127.0.0.1 port 5001 connected with 127.0.0.1 port 48938
[ ID] Interval       Transfer     Bandwidth
[  1] 0.00-10.00 sec  49.5 GBytes  42.5 Gbits/sec

2.1.1.-dev

root@h:~/iperf-2.1.1-dev# ./src/iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size:  128 KByte (default)
------------------------------------------------------------
[  1] local 127.0.0.1 port 5001 connected with 127.0.0.1 port 48894
[ ID] Interval       Transfer     Bandwidth
[  1] 0.00-10.00 sec  54.7 GBytes  47.0 Gbits/sec

Maybe the output changed not for the defaults, but for the set of arguments mininet is calling it with?

@cpaelzer
Copy link
Author

Since you hinted at mn -v debug --test iperf I was having a look at that.

I see those two calls:

 iperf -p 5001 -t 5 -c 10.0.0.2
 iperf -p 5001 -s

Trying those in the three versions (but via 127.0.0.1) ...
It is interesting, that initial line of [ 4] 0.0- 0.0 sec 0.00 Bytes 0.00 bits/sec does not show up in ANY of the iperf versions when calling them directly.

But OTOH I can confirm running mn -v debug --test iperf against the system installed 2.0.13 shows this line.

2.0.13

*** h1 : ('iperf -p 5001 -t 5 -c 10.0.0.2 ',)
------------------------------------------------------------
Client connecting to 10.0.0.2, TCP port 5001
TCP window size: 1.20 MByte (default)
------------------------------------------------------------
[  3] local 10.0.0.1 port 51840 connected with 10.0.0.2 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 5.0 sec  23.4 GBytes  40.2 Gbits/sec
Client output: ------------------------------------------------------------
Client connecting to 10.0.0.2, TCP port 5001
TCP window size: 1.20 MByte (default)
------------------------------------------------------------
[  3] local 10.0.0.1 port 51840 connected with 10.0.0.2 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 5.0 sec  23.4 GBytes  40.2 Gbits/sec

sendInt: writing chr(3)
Server output: recvn abort failed: Success
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size:  128 KByte (default)
------------------------------------------------------------
[  4] local 10.0.0.2 port 5001 connected with 10.0.0.1 port 51838
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0- 0.0 sec  0.00 Bytes  0.00 bits/sec
[  5] local 10.0.0.2 port 5001 connected with 10.0.0.1 port 51840
[  5]  0.0- 5.0 sec  23.4 GBytes  40.2 Gbits/sec

*** Results: ['40.2 Gbits/sec', '40.2 Gbits/sec']

2.0.14a

*** h1 : ('iperf -p 5001 -t 5 -c 10.0.0.2 ',)
------------------------------------------------------------
Client connecting to 10.0.0.2, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  1] local 10.0.0.1 port 51854 connected with 10.0.0.2 port 5001
[ ID] Interval       Transfer     Bandwidth
[  1] 0.00-5.00 sec  23.2 GBytes  39.9 Gbits/sec
Client output: ------------------------------------------------------------
Client connecting to 10.0.0.2, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  1] local 10.0.0.1 port 51854 connected with 10.0.0.2 port 5001
[ ID] Interval       Transfer     Bandwidth
[  1] 0.00-5.00 sec  23.2 GBytes  39.9 Gbits/sec
*hung*

2.1.1-dev

*** h1 : ('iperf -p 5001 -t 5 -c 10.0.0.2 ',)
------------------------------------------------------------
Client connecting to 10.0.0.2, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  1] local 10.0.0.1 port 51866 connected with 10.0.0.2 port 5001
[ ID] Interval       Transfer     Bandwidth
[  1] 0.00-5.00 sec  23.3 GBytes  40.0 Gbits/sec
Client output: ------------------------------------------------------------
Client connecting to 10.0.0.2, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  1] local 10.0.0.1 port 51866 connected with 10.0.0.2 port 5001
[ ID] Interval       Transfer     Bandwidth
[  1] 0.00-5.00 sec  23.3 GBytes  40.0 Gbits/sec
*hung*

Thereby I can confirm your finding of different output and it isn't any different/better in the newest iperf version :-/
But then OTOH this output difference seems to depend on something else as well - maybe if there is a terminal attached or something like ii? because as you've seen above in "manual" runs this line was never there.

@cpaelzer
Copy link
Author

The "extra line" comes from waitListening as you said, so that my manual runs all are the same is ok.
Let us go down this initial connection a bit ...

That initial connection is made by mininet via echo A | telnet -e A 127.0.0.1 5001
Trying that I get the discussed behavior.
The old iperf versions have reported this

recvn abort failed: Resource temporarily unavailable
[  4] local 127.0.0.1 port 5001 connected with 127.0.0.1 port 49152
[  4]  0.0- 0.0 sec  0.00 Bytes  0.00 bits/sec

While the new versions better recognize it isn't a real test and now print a line like this

(null)local 127.0.0.1 port 5001 connected with 127.0.0.1 port 49162

Ok, I'm now with you and can follow what you said on your last post and confirm that behavior change :-)

@cpaelzer
Copy link
Author

The pre-check for the routing via telnet still works fine as-is.

A alternative could be to parse only client and mostly ignore the server output?
But ok, so far you used both results - the one of client and server and reported that back up.
So let us stick to that for now ...

How about changing net.py:iperf to just pick "the latest" whatever it is.
As far as I read your code you had to "wait for the result to show up" and usually such code is added once it had issues without such waits in the past :-)
Thereby just parsing "the last entry" might return the wrong (all zeros) line with the old iperf versions.

We only want results from real measurements and 0.0 - 0.0 sec never is the real thing.
So my modifying the find just a bit to ignore - 0.0 sec we could always get the right result and still wait for it.

The following works well for me with all three iperf versions that I tested

--- net.py.orig	2021-03-29 07:28:33.912880284 +0000
+++ net.py	2021-03-29 07:43:39.402620616 +0000
@@ -836,8 +836,7 @@
         servout = ''
         # We want the last *b/sec from the iperf server output
         # for TCP, there are two of them because of waitListening
-        count = 2 if l4Type == 'TCP' else 1
-        while len( re.findall( '/sec', servout ) ) < count:
+        while len( re.findall( '[1-9]*[0-9]+.0 sec.*/sec', servout ) ):
             servout += server.monitor( timeoutms=5000 )
         server.sendInt()
         servout += server.waitOutput()

But then OTOH with the new versions I no more see any difference between the client and the server result anyway.
So the benefit or the original intention to return both might be lost.
But dropping that entirely has more implications and I'd leave that to you ...

@cpaelzer
Copy link
Author

FYI this issue still remains in upcoming Ubuntu 21.10 (test result) where it tries to run with the newer iperf 2.1.2.

I'll try to apply the suggestion I have made in the last update above to mininet and check if that would then work with old&new iperf in all our cases. But I'd appreciate if we could rekindle this discussion.

@cpaelzer
Copy link
Author

I was testing the fix that I suggested two comments above and for the Ubuntu tests based on mininet that works fine, creates valid results and removed the behavior to hang.

@timhealltheway
Copy link

hi, i followed the comment above and changed the code in net.py, but still fails in tcp test(sudo mn --test iperf). any suggestion?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants