Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Windows] Incorrect checksum after DNAT + SNAT #231

Closed
lzhecheng opened this issue Oct 22, 2021 · 1 comment
Closed

[Windows] Incorrect checksum after DNAT + SNAT #231

lzhecheng opened this issue Oct 22, 2021 · 1 comment

Comments

@lzhecheng
Copy link

The OVS pipeline: When project Antrea implements the K8s ClusterIP Service (hostNetwork mode) on Windows, it adds flow entries for DNAT and SNAT in OVS pipeline.

The bug can be reproduced when a client on Windows Node curl a Service with endpoint on Linux Node. The client sends packet into OVS pipeline with a port and get the reply from the port. The packet entering OVS pipeline will go through DNAT and SNAT and the reply will go through d-SNAT and d-DNAT. We found that the reply packet on Windows host has incorrect checksum (pseudo checksum) while the checksum is correct on Linux Node.

I attached the wireshark result of the packet captured on the OVS port on Windows host.

Request:
192.168.251.1 -> 10.110.225.146
(DNAT + SNAT)
169.254.169.253 -> 10.176.26.107

Reply:
10.176.26.107 -> 169.254.169.253
(d-SNAT + d-DNAT)
10.110.225.146 -> 192.168.251.1 <==== checksum is incorrect
clusterip2.pcapng.zip

ovsrobot pushed a commit to ovsrobot/ovs that referenced this issue Oct 22, 2021
…load case

While testing OVS-windows flows for the DNAT/SNAT action, the checksum in
TCP header is set incorrectly when TCP offload is enabled by default. As a
result,the packet will be dropped on the Windows VM when processing the packet
 from Linux VM which has included correct checksum at first.  On the Windows VM,
 it has gone through two NAT actions and OVS Windows kernel will reset the
checksum to PseudoChecksum and then it will lose the original correct checksum
value which is set outside.

Back to the Nat TCP/UDP checksum value reset logic,it should reset it TCP checksum
To be PseudoChecksum value only on Tx direction for TCP Offload case.  For the packet
From the outside,  OVS Windows Kernel does not need reset the TCP/UDP checksum as
It should be the job of the received network driver to get out a correct checksum
Value.

>>>sample flow on default configuration on both Windows VM and Linux VM
(src=192.168.252.1,dst=10.110.225.146)-->dnat/snat-> (src=169.254.169.253,
Dst=10.176.26.107) Without the fix the return back packet(src=10.176.26.107,
Dst=169.254.169.253) will have the correct TCP checksum. After the reverse NAT
Actions, it will be changed to be packet (src=10.110.225.146, Dst=192.168.252.1)
But with incorrect TCP checksum 0xa97a which is
The PseudoChecksum. Related packet is put on the reported issue below.

Reported-at:openvswitch/ovs-issues#231
Signed-off-by: Wilson Peng <pweisong@vmware.com>
Signed-off-by: 0-day Robot <robot@bytheb.org>
ovsrobot pushed a commit to ovsrobot/ovs that referenced this issue Oct 22, 2021
…load case

While testing OVS-windows flows for the DNAT/SNAT action, the checksum in
TCP header is set incorrectly when TCP offload is enabled by default. As a
result,the packet will be dropped on the Windows VM when processing the packet
 from Linux VM which has included correct checksum at first.  On the Windows VM,
 it has gone through two NAT actions and OVS Windows kernel will reset the
checksum to PseudoChecksum and then it will lose the original correct checksum
value which is set outside.

Back to the Nat TCP/UDP checksum value reset logic,it should reset it TCP checksum
To be PseudoChecksum value only on Tx direction for TCP Offload case.  For the packet
From the outside,  OVS Windows Kernel does not need reset the TCP/UDP checksum as
It should be the job of the received network driver to get out a correct checksum
Value.

>>>sample flow on default configuration on both Windows VM and Linux VM
(src=192.168.252.1,dst=10.110.225.146)-->dnat/snat-> (src=169.254.169.253,
Dst=10.176.26.107) Without the fix the return back packet(src=10.176.26.107,
Dst=169.254.169.253) will have the correct TCP checksum. After the reverse NAT
Actions, it will be changed to be packet (src=10.110.225.146, Dst=192.168.252.1)
But with incorrect TCP checksum 0xa97a which is
The PseudoChecksum. Related packet is put on the reported issue below.

Reported-at:openvswitch/ovs-issues#231
Signed-off-by: Wilson Peng <pweisong@vmware.com>
Signed-off-by: 0-day Robot <robot@bytheb.org>
aserdean pushed a commit to openvswitch/ovs that referenced this issue Oct 28, 2021
…load case

While testing OVS-windows flows for the DNAT/SNAT action, the checksum in
TCP header is set incorrectly when TCP offload is enabled by default. As a
result,the packet will be dropped on the Windows VM when processing the packet
 from Linux VM which has included correct checksum at first.  On the Windows VM,
 it has gone through two NAT actions and OVS Windows kernel will reset the
checksum to PseudoChecksum and then it will lose the original correct checksum
value which is set outside.

Back to the Nat TCP/UDP checksum value reset logic,it should reset it TCP checksum
To be PseudoChecksum value only on Tx direction for TCP Offload case.  For the packet
From the outside,  OVS Windows Kernel does not need reset the TCP/UDP checksum as
It should be the job of the received network driver to get out a correct checksum
Value.

>>>sample flow on default configuration on both Windows VM and Linux VM
(src=192.168.252.1,dst=10.110.225.146)-->dnat/snat-> (src=169.254.169.253,
Dst=10.176.26.107) Without the fix the return back packet(src=10.176.26.107,
Dst=169.254.169.253) will have the correct TCP checksum. After the reverse NAT
Actions, it will be changed to be packet (src=10.110.225.146, Dst=192.168.252.1)
But with incorrect TCP checksum 0xa97a which is
The PseudoChecksum. Related packet is put on the reported issue below.

Reported-at:openvswitch/ovs-issues#231
Signed-off-by: Wilson Peng <pweisong@vmware.com>
Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>
aserdean pushed a commit to openvswitch/ovs that referenced this issue Oct 28, 2021
…load case

While testing OVS-windows flows for the DNAT/SNAT action, the checksum in
TCP header is set incorrectly when TCP offload is enabled by default. As a
result,the packet will be dropped on the Windows VM when processing the packet
 from Linux VM which has included correct checksum at first.  On the Windows VM,
 it has gone through two NAT actions and OVS Windows kernel will reset the
checksum to PseudoChecksum and then it will lose the original correct checksum
value which is set outside.

Back to the Nat TCP/UDP checksum value reset logic,it should reset it TCP checksum
To be PseudoChecksum value only on Tx direction for TCP Offload case.  For the packet
From the outside,  OVS Windows Kernel does not need reset the TCP/UDP checksum as
It should be the job of the received network driver to get out a correct checksum
Value.

>>>sample flow on default configuration on both Windows VM and Linux VM
(src=192.168.252.1,dst=10.110.225.146)-->dnat/snat-> (src=169.254.169.253,
Dst=10.176.26.107) Without the fix the return back packet(src=10.176.26.107,
Dst=169.254.169.253) will have the correct TCP checksum. After the reverse NAT
Actions, it will be changed to be packet (src=10.110.225.146, Dst=192.168.252.1)
But with incorrect TCP checksum 0xa97a which is
The PseudoChecksum. Related packet is put on the reported issue below.

Reported-at:openvswitch/ovs-issues#231
Signed-off-by: Wilson Peng <pweisong@vmware.com>
Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>
aserdean pushed a commit to openvswitch/ovs that referenced this issue Oct 28, 2021
…load case

While testing OVS-windows flows for the DNAT/SNAT action, the checksum in
TCP header is set incorrectly when TCP offload is enabled by default. As a
result,the packet will be dropped on the Windows VM when processing the packet
 from Linux VM which has included correct checksum at first.  On the Windows VM,
 it has gone through two NAT actions and OVS Windows kernel will reset the
checksum to PseudoChecksum and then it will lose the original correct checksum
value which is set outside.

Back to the Nat TCP/UDP checksum value reset logic,it should reset it TCP checksum
To be PseudoChecksum value only on Tx direction for TCP Offload case.  For the packet
From the outside,  OVS Windows Kernel does not need reset the TCP/UDP checksum as
It should be the job of the received network driver to get out a correct checksum
Value.

>>>sample flow on default configuration on both Windows VM and Linux VM
(src=192.168.252.1,dst=10.110.225.146)-->dnat/snat-> (src=169.254.169.253,
Dst=10.176.26.107) Without the fix the return back packet(src=10.176.26.107,
Dst=169.254.169.253) will have the correct TCP checksum. After the reverse NAT
Actions, it will be changed to be packet (src=10.110.225.146, Dst=192.168.252.1)
But with incorrect TCP checksum 0xa97a which is
The PseudoChecksum. Related packet is put on the reported issue below.

Reported-at:openvswitch/ovs-issues#231
Signed-off-by: Wilson Peng <pweisong@vmware.com>
Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>
aserdean pushed a commit to openvswitch/ovs that referenced this issue Oct 28, 2021
…load case

While testing OVS-windows flows for the DNAT/SNAT action, the checksum in
TCP header is set incorrectly when TCP offload is enabled by default. As a
result,the packet will be dropped on the Windows VM when processing the packet
 from Linux VM which has included correct checksum at first.  On the Windows VM,
 it has gone through two NAT actions and OVS Windows kernel will reset the
checksum to PseudoChecksum and then it will lose the original correct checksum
value which is set outside.

Back to the Nat TCP/UDP checksum value reset logic,it should reset it TCP checksum
To be PseudoChecksum value only on Tx direction for TCP Offload case.  For the packet
From the outside,  OVS Windows Kernel does not need reset the TCP/UDP checksum as
It should be the job of the received network driver to get out a correct checksum
Value.

>>>sample flow on default configuration on both Windows VM and Linux VM
(src=192.168.252.1,dst=10.110.225.146)-->dnat/snat-> (src=169.254.169.253,
Dst=10.176.26.107) Without the fix the return back packet(src=10.176.26.107,
Dst=169.254.169.253) will have the correct TCP checksum. After the reverse NAT
Actions, it will be changed to be packet (src=10.110.225.146, Dst=192.168.252.1)
But with incorrect TCP checksum 0xa97a which is
The PseudoChecksum. Related packet is put on the reported issue below.

Reported-at:openvswitch/ovs-issues#231
Signed-off-by: Wilson Peng <pweisong@vmware.com>
Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>
aserdean pushed a commit to openvswitch/ovs that referenced this issue Oct 28, 2021
…load case

While testing OVS-windows flows for the DNAT/SNAT action, the checksum in
TCP header is set incorrectly when TCP offload is enabled by default. As a
result,the packet will be dropped on the Windows VM when processing the packet
 from Linux VM which has included correct checksum at first.  On the Windows VM,
 it has gone through two NAT actions and OVS Windows kernel will reset the
checksum to PseudoChecksum and then it will lose the original correct checksum
value which is set outside.

Back to the Nat TCP/UDP checksum value reset logic,it should reset it TCP checksum
To be PseudoChecksum value only on Tx direction for TCP Offload case.  For the packet
From the outside,  OVS Windows Kernel does not need reset the TCP/UDP checksum as
It should be the job of the received network driver to get out a correct checksum
Value.

>>>sample flow on default configuration on both Windows VM and Linux VM
(src=192.168.252.1,dst=10.110.225.146)-->dnat/snat-> (src=169.254.169.253,
Dst=10.176.26.107) Without the fix the return back packet(src=10.176.26.107,
Dst=169.254.169.253) will have the correct TCP checksum. After the reverse NAT
Actions, it will be changed to be packet (src=10.110.225.146, Dst=192.168.252.1)
But with incorrect TCP checksum 0xa97a which is
The PseudoChecksum. Related packet is put on the reported issue below.

Reported-at:openvswitch/ovs-issues#231
Signed-off-by: Wilson Peng <pweisong@vmware.com>
Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>
@aserdean
Copy link
Member

Fixed with: openvswitch/ovs@56c3de3

lzhecheng added a commit to lzhecheng/antrea that referenced this issue Nov 8, 2021
It includes AntreaProxy related fixes.
openvswitch/ovs-issues#229
openvswitch/ovs-issues#231

Signed-off-by: Zhecheng Li <lzhecheng@vmware.com>
tnqn pushed a commit to antrea-io/antrea that referenced this issue Nov 9, 2021
It includes AntreaProxy related fixes.
openvswitch/ovs-issues#229
openvswitch/ovs-issues#231

Signed-off-by: Zhecheng Li <lzhecheng@vmware.com>
n-sandeep pushed a commit to ipdk-io/ovs that referenced this issue Jul 12, 2022
…load case

While testing OVS-windows flows for the DNAT/SNAT action, the checksum in
TCP header is set incorrectly when TCP offload is enabled by default. As a
result,the packet will be dropped on the Windows VM when processing the packet
 from Linux VM which has included correct checksum at first.  On the Windows VM,
 it has gone through two NAT actions and OVS Windows kernel will reset the
checksum to PseudoChecksum and then it will lose the original correct checksum
value which is set outside.

Back to the Nat TCP/UDP checksum value reset logic,it should reset it TCP checksum
To be PseudoChecksum value only on Tx direction for TCP Offload case.  For the packet
From the outside,  OVS Windows Kernel does not need reset the TCP/UDP checksum as
It should be the job of the received network driver to get out a correct checksum
Value.

>>>sample flow on default configuration on both Windows VM and Linux VM
(src=192.168.252.1,dst=10.110.225.146)-->dnat/snat-> (src=169.254.169.253,
Dst=10.176.26.107) Without the fix the return back packet(src=10.176.26.107,
Dst=169.254.169.253) will have the correct TCP checksum. After the reverse NAT
Actions, it will be changed to be packet (src=10.110.225.146, Dst=192.168.252.1)
But with incorrect TCP checksum 0xa97a which is
The PseudoChecksum. Related packet is put on the reported issue below.

Reported-at:openvswitch/ovs-issues#231
Signed-off-by: Wilson Peng <pweisong@vmware.com>
Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants