Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad VXLAN Performance #1315

Closed
lephisto opened this issue Jan 21, 2018 · 5 comments
Closed

Bad VXLAN Performance #1315

lephisto opened this issue Jan 21, 2018 · 5 comments
Assignees
Milestone

Comments

@lephisto
Copy link

While testing the current master I noticed a significant performance drop when using VXLAN for wiremesh instead of the old fashioned way that binds the Wiremesh Interface directly to bat0:

Testsetting: TP-Link TL-WDR4300 v1 vs Siemens Futro S550, both connected to the same GBE capable Switch, VPN turned of at the WDR4300:

  1. Test with legacy Wiremesh:

Initiated from WDR4300:

root@ffnoc12:~# batctl tp -t 10000 a6:07:f9:5a:71:db
Test duration 10010ms.
Sent 382825692 Bytes.
Throughput: 36.47 MB/s (305.95 Mbps)

Initiated from Offloader

root@ffnocoffloader:~# batctl tp -t 10000 46:3d:e1:c3:7f:63
Test duration 10020ms.
Sent 457679556 Bytes.
Throughput: 43.56 MB/s (365.41 Mbps)
  1. Same test with VXLAN:

Initiated on WDR:

root@ffnoc12:~# uci set network.mesh_wan.legacy='0'
root@ffnoc12:~# uci commit network
root@ffnoc12:~# /etc/init.d/network restart
root@ffnoc12:~# batctl tp -t 10000 a6:07:f9:5a:71:db
Test duration 10020ms.
Sent 30967956 Bytes.
Throughput: 2.95 MB/s (24.72 Mbps)

Initiated on Offloader:

root@ffnocoffloader:~# batctl tp -t 10000 46:3d:e1:c3:7f:63
Test duration 10110ms.
Sent 47189196 Bytes.
Throughput: 4.45 MB/s (37.34 Mbps)

For domainshortcut prevention VXLAN (or any Option to authenticate the Wiremesh) is a good idea, but for RF Backbone purposes this is too slow.

@rotanid rotanid added the 0. type: bug This is a bug label Jan 21, 2018
@neocturne
Copy link
Member

neocturne commented Jan 25, 2018

I have tested this with both iperf and batctl tp; I could reproduce the extreme performance drop with batctl, but not with iperf (or rather, with iperf the performance was rather bad even in legacy mode). The reason is fragmentation: the packet size used by batctl's throughput meter is chosen so that it goes through a 1500 byte link without fragmentation, but it needs to be fragmented over a 1430 byte link.

Therefore, the numbers given by iperf are more accurate for real-life scenarios, as fragmentation is usually necessary both in VXLAN and legacy mode (or in neither, with proper MSS clamping). I have pushed a few optimizations for wired meshing (2950cc3 affects both legacy and VXLAN mode, a9edd43 and e54b37d slightly improve VXLAN performance). There will also be a follow-up to 7ae8a51 in a few days.

With all these patches applied (including the follow-up), I have measured the following numbers with iperf on a WDR3600 (using my notebook on the other side) without MSS clamping:

  • legacy RX: 96.9 Mbits/sec
  • legacy TX: 107 Mbits/sec
  • VXLAN RX: 56.5 Mbits/sec
  • VXLAN TX: 46.2 Mbits/sec

Reducing the MSS to avoid fragmentation, I get the following numbers:

  • legacy RX: 164 Mbits/sec
  • legacy TX: 149 Mbits/sec
  • VXLAN RX: 88.5 Mbits/sec
  • VXLAN TX: 74.3 Mbits/sec

So VXLAN does cost performance without doubt, but it is by no means as bad as batctl tp might suggest. I will look into further optimiazation options (e.g. ip6tables, which is responsible for a considerable part of the performance drop).

@rotanid
Copy link
Member

rotanid commented Jan 26, 2018

did you run iperf ON the wdr3600? if so, wasn't this test limited by cpu performance used by iperf itself?
we always did iperf with real x86 machines on both ends, only.
if i'm on the wrong track, please ignore :-D

@neocturne
Copy link
Member

This check was with iperf on the WDR3600. This obviously harmed test performance, but I only wanted to test relative performance with and without VXLAN, and not the maximum achievable throughput.

@neocturne
Copy link
Member

With d87a798, all throughput optimizations that are easily possible have been made. Firewall performance will be revisited after the next release.

@rotanid
Copy link
Member

rotanid commented Mar 18, 2018

maybe we could document how much the performance was improved compared to the measurements @lephisto and @NeoRaider did in january?
or the other way around, how much the performance still suffers compared to legacy meshing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants