Replies: 15 comments
-
Do you have a nic card that supports nic offload?
Sent from mobile device
… On Jul 1, 2019, at 23:43, jeffguorg ***@***.***> wrote:
System
OS: CentOS 7
Kernel: 3.10(stock) / 5.1.15(elrepo)
Libreswan Version: 3.10(stock)/3.25(built with spec)
Target & Current status
make full use of 10Gbps nic.
But currently the bandwidth via ipsec is limited to 1.8Gbps when the app and ksoftirq is running 100% of CPU.
Attempted methods
Use simple config of one SA. (bandwidth is limited to 1.8 Gbps. )
Start multiple SA and load balance on it. (works without problem). but we want a more simple solution)
(current): trying to make nic-offload work.
I see redhat has already back-ported the nic-offload code to RHEL7.5 and CentOS should also be patched. But the stock kernel(/boot/config-xxxxx) has no ESP_OFFLOAD options and ip xfrm state showed no offload dev xxxoption.
So i installed the latest kernel from elrepo, which compiles offload part into kmod. and upgrade libreswan to 3.25. But still no luck, whether i modprobe esp4_offload or not.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Beta Was this translation helpful? Give feedback.
-
Sure. We checked the product brief and datasheets. Intel X520 10Gbps NICs should support nic offload and esp encapsulation. and we have a X520(82599ES). ref on Intel X520:
|
Beta Was this translation helpful? Give feedback.
-
On Mon, 1 Jul 2019, jeffguorg wrote:
Sure. We checked the product brief and datasheets. Intel X520 10Gbps NICs should support nic offload and esp encapsulation. and we have a X520(82599ES).
is there anyway i can check whether it support that in the system? i scammed across dmesg but didn't recognize something useful.
ref on Intel X520:
https://www.intel.com/content/www/us/en/embedded/products/networking/82599-10-gbe-controller-datasheet.html?asset=2377
https://www.kernel.org/doc/html/latest/networking/device_drivers/intel/ixgbe.html
Please check with libreswan 3.28 or 3.29, as those versions better print
the support detected from either nic or kernel.
You should see a message:
Kernel supports NIC esp-hw-offload
And for the nic you should see a message about esp-hw-offload as well.
It should also be visible in "ipsec status".
with plutodebug=all you should also see a message:
dbg("NIC esp-hw-offload offload for connection '%s' enabled on interface %s",
|
Beta Was this translation helpful? Give feedback.
-
weird... i pulled the source code from master branch and did make install. and i checked ip xfrm state. after i set nic-offload to auto, there is two line says and i did some research and found the NIC support only aes_gcm128-null for esp encapsulation and that is exactly what we use in the config. |
Beta Was this translation helpful? Give feedback.
-
Try setting replay-window=0
Paul
Sent from mobile device
… On Jul 2, 2019, at 22:50, jeffguorg ***@***.***> wrote:
weird...
i pulled the source code from master branch and did make install.
nic-offload is set to no and the bandwidth get to 4Gbps. nearer to our goal
but the bandwidth drop to 27M when i set nic-offload to yes or auto.
and i did some research and found the NIC support only aes_gcm128-null for esp encapsulation and that is exactly what we use in the config.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Beta Was this translation helpful? Give feedback.
-
just minor improvement: 27.1M -> 28.4M /proc/net/xfrm_state is all zero. |
Beta Was this translation helpful? Give feedback.
-
On Tue, 2 Jul 2019, jeffguorg wrote:
just minor improvement: 27.1M -> 28.4M
Odd, there must be other issues. do you have proper max_cpu_qlen set?
See dmesg. It should be at least 1000.
does "ip xfrm state" show the offload is part of the IPsec SA ?
|
Beta Was this translation helpful? Give feedback.
-
hi, we tested some configuration last day but did not make any progress. just confirmed that the issue occured when nic-offload is enabled on the sending side and collected some logs and output.
sure.
yes. it looks like this:
another weird thing i found is that in pluto.log, it says:
but actually ethtool recognizes esp-hw-offload:
one of my colleagues guessed it's related to igb driver. is that possible? and here is pluto debugging log and other system info you may need for diagnosis, tell me if you need other information: |
Beta Was this translation helpful? Give feedback.
-
On Thu, 4 Jul 2019, jeffguorg wrote:
does "ip xfrm state" show the offload is part of the IPsec SA ?
yes. it looks like this:
src 10.3.0.2 dst 10.3.0.1
proto esp spi 0x262e4754 reqid 16389 mode tunnel
replay-window 32 flag af-unspec
aead rfc4106(gcm(aes)) 0xed2897c7401393ea4fb79d72b838729ddbc22bd4 128
anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
crypto offload parameters: dev ens1f0 dir in
src 10.3.0.1 dst 10.3.0.2
proto esp spi 0x777af742 reqid 16389 mode tunnel
replay-window 32 flag af-unspec
aead rfc4106(gcm(aes)) 0xe84b019facd525d9cf0b91ae7da8f3968db877f3 128
anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
crypto offload parameters: dev ens1f0 dir out
Okay, so that looks to be working. So it seems this is either an OS
tuning matter, or a driver issue.
another weird thing i found is that in pluto.log, it says:
...
9:48:26.832458: adding interface ens1f0/ens1f0 (esp-hw-offload not supported by kernel) 10.3.0.1:500
That was a bug in our printing, you can ignore that.
Paul
|
Beta Was this translation helpful? Give feedback.
-
I still see the replay window being 32. I'm not sure if that is a display error of the kernel or not. If you set it to 0, it should be disabled. Alternatively, perhaps set it to 1024 ? If you get out of order packets and only have a window of 32, things could go very slow indeed. Although then if you test with iperf or something, it should show you all the packet drop. |
Beta Was this translation helpful? Give feedback.
-
Hi The two machine(actually it's supermicro twins server, two server in a single chassis) is now directly connected by optical fibers, without any intermediate devices like switches or routers. it's a testing environment in our office. so there should not be out of order packets, at least not much. still i set the replay-window to 1024, but it did not change anything |
Beta Was this translation helpful? Give feedback.
-
you are using "mode tunnel"? I noticed in your outputs. I recollect the Shannon mentioning these cards only support transport mode. Note transport in the blog you mentioned, also "ref on Intel X520:" mentions only transport mode. He mentioned the card may support tunnel mode receiving, never mentioned sending. Here is Netdev talk 2017. He also gave an update at linux IPsec workshop the following year. No tunnel mode there also. https://youtu.be/AOEpCRsBTIo?t=1507 (~25:00) I guess better error reporting would be nice. As of 2017/2018 tunnel mode was a "Todo" iteam, with low probability of finishing. https://netdevconf.org/2.2/slides/klassert_ipsec_workshop02.pdf #17 PS: No operational experience here, just slideware and listing to talks. |
Beta Was this translation helpful? Give feedback.
-
hey man, THANKS A LOT, you save my ass. this problem looks like voodoo to me. when i set type=transport, and checked ip xfrm state has transport mode, the bandwidth directly runs like 9.1Gbps or a little higher. i'd better figure out what's the difference between them. @antonyantony @letoams edit: and yes.... it's weird that both of libreswan and kernel didn't report any errors. i will read the kernel docs and see if i can find anything |
Beta Was this translation helpful? Give feedback.
-
the weird issue just came back after i setup a GRE tunnel. it runs 5.x Gbps without nic-offload and 9.x Gbps with nic-offload. this part of setup is perfect. but transport mode doesn't support routing. right? i need to route packets from two subnets to each other, so i setup a GRE tunnel over ipsec, with NetworkManager. then the bandwidth on the tunnel dropped to 20~30 Mbps. again. 😭 config looks like this:
LOCAL and REMOTE is IPSec gateways. mtu on GRE is automatically set to 1280, when it's around 1480 or 1500 on IPSec. just looks normal to me. top doesn't show any busy process or thread. |
Beta Was this translation helpful? Give feedback.
-
On Tue, 16 Jul 2019, jeffguorg wrote:
it runs 5.x Gbps without nic-offload and 9.x Gbps with nic-offload. this part of setup is perfect.
but transport mode doesn't support routing. right? i need to route packets from two subnets to each other, so i setup a GRE tunnel over ipsec, with
NetworkManager. then the bandwidth on the tunnel dropped to 20~30 Mbps. again. ?
Thats right, and people tend to use GRE on top to handle it. Did you
slightly lower your MTU to take account for the smaller packet size
due to the GRE layer?
Paul
|
Beta Was this translation helpful? Give feedback.
-
System
OS: CentOS 7
Kernel: 3.10(stock) / 5.1.15(elrepo)
Libreswan Version: 3.10(stock)/3.25(built with spec)
Hardware: Xeon 2620v4 + X520-SR2
Target & Current status
make full use of 10Gbps nic.
But currently the bandwidth via ipsec is limited to 1.8Gbps when the app and ksoftirq is running 100% of CPU.
And i just can't figure out how to reach
5.25 Gbits/sec IPsec AES_GCM128 (esp=aes_gcm128-null)
in the wikiAttempted methods
I see redhat has already back-ported the nic-offload code to RHEL7.5 and CentOS should also be patched. But the stock kernel(/boot/config-xxxxx) has no ESP_OFFLOAD options and
ip xfrm state
showed nooffload dev xxx
option.So i installed the latest kernel from elrepo, which compiles offload part into kmod. and upgrade libreswan to 3.25. But still no luck, whether i modprobe esp4_offload or not. config used here is
Beta Was this translation helpful? Give feedback.
All reactions