slow udp stream generation with message size 1400B #899

olichtne · 2019-07-16T14:45:44Z

Context

Version of iperf3: master
Hardware: 10G ixgbe nic, other parts are probably not relevant
Operating system (and distribution, if any): RHEL8.0 but should be reproducible on other Linux distros
Other relevant information (for example, non-default compilers,
libraries, cross-compiling, etc.):

Bug Report

Expected Behavior

when running iperf3 -c <dst_ip> -B <src_ip> -u -b 0 -t 10 -l 1400 -A <cpuid>

I'd expect the iperf generator to either reach line rate throughput (close to 10Gbps) or to reach maximum utilization of a single CPU (processor isn't fast enough to generate more packets per second)

Actual Behavior

Generator only generates 4.3gbps while the utilization of the single cpu core is only ~40%.

Running netperf in a similar configuration - basic ipv4 udp stream with 1400 messages, does reach line rate without issues.

Steps to Reproduce

configure a simple network of 2 hosts with a connection capable of 10Gbps
start an iperf3 server
run the client with the following command iperf3 -c <dst_ip> -B <src_ip> -u -b 0 -t 10 -l 1400 -A <cpuid> replacing the <> values with whatever you configured

Possible Solution

I investigated this issue with a colleague who has experience with udp kernel development and we found out that this seems to happen because of an unfortunate combination of UDP stream generation burstiness and the fact that every 10 test stream write() calls a select() gets called to check on the control connection to the server.

What seems to happen is that some amount of writes to the test socket fills out it's buffer for a short time at which point the select call suspends the iperf client process and re-waking the process takes a bit of time, considering that this happens every 10 writes we think this can affect the generator performance.

We made this conclusion after:

comparing iperf and netperf strace
- both create and configure the socket in comparable ways
- one difference is that iperf uses write and netperf uses sendto but this shouldn't have the measured impact
- netperf doesn't interlace sendto with select calls, which means that when a socket buffer fills up for a moment the sendto call blocks and we suspect that re-waking from this kind of blocking might be faster than from the select call in iperf
we found that the 10 value comes from https://github.com/esnet/iperf/blob/master/src/iperf_api.c#L2331 and when changed to a larger value (e.g. 1000) the issue dissappears

When looking for possible solutions I found that using a larger send buffer size (using the -w argument) can also work around the issue, however considering that netperf doesn't configure this and uses the default send buffer size I don't consider this a valid solution.

Configuring a burst packet value using -b 0/1000 can override the multisend variable, however this currently doesn't work and I submitted a pull request #898 to fix this. However, I'm not sure if this is a good and intended way to configure iperf to be faster for UDP streams, maybe the "multisend" variable should also be configurable via a separate CLI argument?

I'm willing to look into implementing a solution and sending a pull request after discussing it here.

The text was updated successfully, but these errors were encountered:

bmah888 · 2019-08-23T22:25:09Z

So with respect to the burst mode, I'm wondering if it would make sense to have something in the argument processing to force the burst size to zero if we're doing an "unlimited" bitrate UDP test. That would keep the special-casing out of the time-critical code. I still need to respond to your other points, but I was actually staring at, and thinking about, your pull request several times this week.

olichtne · 2019-08-26T11:28:11Z

I think that makes sense as long as the argument processing logic for these values can be overridden somehow.

Currently the logic of how big of a burst to send is part of the iperf_send function. Moving it outside and expanding it to recognize "unlimited" UDP test, is a good idea, but IMO it's still just implementing a "default/recommended value" that the tester should have the ability to directly control - which would mean either introducing a new argument or splitting the -b argument into two.

bmah888 · 2020-04-28T14:56:59Z

Having lost track of this dialog (apologies), are we done here and can I close this issue? It looks like the pull request that I merged mostly fixes the problem you were seeing. Thanks.

olichtne · 2020-04-29T13:24:24Z

I think that depends on how if you view "burst mode" and the "multisend" as the same thing...

To simplify, the original problem is that a single iteration of the generator loop (implemented in the iperf_send function) calls the snd function (iow a write to a socket) ten times, then calls a select call on the controller socket. This 10:1 ratio results means that the flow generation is suboptimal when packet sizes are smaller. Suboptimal here means that the generated flow slower than the line rate of the hardware NIC card and at the same time the CPU core is not fully utilized - if iperf could run faster it could generate more packets.

The 10:1 ratio can be manipulated by specifying the burst value however it is originally defined by the multisend attribute - defined in iperf_defaults as:

 testp->multisend = 10;	/* arbitrary */

The pull request that was merged a year ago was about a bug related to manipulating the multisend value via the burst parameter. I didn't bother creating an issue for that as I was pretty sure that it was an actual bug and I already had a pretty easy fix.

On the other hand I opened this issue because:

the burst mode resolves our one use case, but it's limited to a maximum value of 1000 (defined as MAX_BURST in iperf.h file
the multisend value is initialized as arbitrary which made me question if the burst mode is the correct way to resolve the full problem of the 10:1 ratio or if there should be a way to actually configure the multisend value as well.

In summary, if you think that burst mode and multisend are the same thing and that using burst mode is the correct way to change the 10:1 ratio then this issue can be closed.

Hopefully I explained what the issue is, if it's still somehow confusing feel free to ask more questions :).

olichtne mentioned this issue Aug 9, 2019

fix burst mode throttle checking #898

Merged

davidBar-On mentioned this issue Dec 2, 2020

iperf3 single-stream low bandwidth with small message sizes (1KB, 1500B, 2000B, etc.) #1078

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

slow udp stream generation with message size 1400B #899

slow udp stream generation with message size 1400B #899

olichtne commented Jul 16, 2019 •

edited

bmah888 commented Aug 23, 2019

olichtne commented Aug 26, 2019

bmah888 commented Apr 28, 2020

olichtne commented Apr 29, 2020

slow udp stream generation with message size 1400B #899

slow udp stream generation with message size 1400B #899

Comments

olichtne commented Jul 16, 2019 • edited

Context

Bug Report

bmah888 commented Aug 23, 2019

olichtne commented Aug 26, 2019

bmah888 commented Apr 28, 2020

olichtne commented Apr 29, 2020

olichtne commented Jul 16, 2019 •

edited