Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UDP6 checksum problem #1832

Closed
sconemad opened this issue Feb 7, 2017 · 32 comments
Closed

UDP6 checksum problem #1832

sconemad opened this issue Feb 7, 2017 · 32 comments

Comments

@sconemad
Copy link

sconemad commented Feb 7, 2017

I've found a problem in the way that the UDP checksum is calculated for outgoing UDP packets over IPv6 (I'm using the latest up-to-date Raspbian on a Pi 3 if that makes any difference)

I'm testing an application on the Pi which uses UDP transport over IPv6, sending datagrams to another machine. I've noticed that occasionally a datagram is sent from the Pi which is never received by the application on the other machine, even when repeatedly re-transmitted. Using wireshark, I was able to see the UDP packets arriving on the other machine, and noticed that the "missing" packets had a UDP checksum of 0.

According to RFC 2460, checksums are not optional for UDP packets over IPv6 like they are for IPv4, so the value 0 is invalid (in the case where the checksum is calculated to be 0, it's supposed to be replaced with 0xFFFF). I believe this is why the other machine is discarding these packets. Since the checksum is a 16 bit number, this would happen on average for 1 in every 65536 packets.

I've written a small test program in c to demonstrate this issue (attached as udpcs.c.txt).
You'll need a Pi with Raspbian, and another machine (e.g. Linux x86), connected over Ethernet with IPv6 supported.
Copy the test program onto both machines and build with:
gcc -o udpcs udpcs.c

On the other (non-pi) machine, run ./udpcs. This will start a simple UDP6 echo server.
On the Pi, run ./udpcs SOURCE DEST

Where SOURCE is the IPv6 address of the Pi, and DEST is the IPv6 address of the other machine. This will send test UDP6 datagrams to the server, consisting of a sequence number and random data, and wait for the datagram to be echoed back, retrying if there is no reply within 1 second. It also calculates and reports the UDP checksum for each packet.

When I run this, I get a steady stream of datagrams being sent and received, but at some point it will stop and keep retrying, where the packet checksum is 0xFFFF. Looking at the output on the other machine, you can see the packet in question is never received.

Here's an example of what I get on the client:

...
SEND(seq=18103 checksum=0d62 size=128) ... RECV(seq=18103 checksum=0d62 size=128)
SEND(seq=18104 checksum=6c9c size=128) ... RECV(seq=18104 checksum=6c9c size=128)
SEND(seq=18105 checksum=f37e size=128) ... RECV(seq=18105 checksum=f37e size=128)
SEND(seq=18106 checksum=bee2 size=128) ... RECV(seq=18106 checksum=bee2 size=128)
SEND(seq=18107 checksum=ffff size=128) ... retrying due to timeout
SEND(seq=18107 checksum=ffff size=128) ... retrying due to timeout
SEND(seq=18107 checksum=ffff size=128) ... retrying due to timeout
...repeats...

And on the server:

...
RECV(seq=18103 size=128) ... SEND(size=128)
RECV(seq=18104 size=128) ... SEND(size=128)
RECV(seq=18105 size=128) ... SEND(size=128)
RECV(seq=18106 size=128) ... SEND(size=128)
...stops here...

(obviously, the exact point where it stops will vary, since the UDP checksum depends on the source and destination IPv6 addresses and the random data).

By default on the Pi, UDP checksums are calculated on the NIC before sending (using checksum offloading), which can be disabled by installing "ethtool" and running sudo ethtool --offload eth0 rx off tx off. Doing this makes the problem go away, which would indicate that the NIC's checksum calculation algorithm is at fault. Does anyone know where this lives, and how to go about getting it fixed?

udpcs.c.txt

@P33M
Copy link
Contributor

P33M commented Feb 7, 2017

If it's a function of a setting of the onboard NIC, then have you tried using a different USB-ethernet adapter?

There are multiple chipsets available on the open market from different manufacturers. This would tell you for definite if it's a function of hardware.

@popcornmix
Copy link
Collaborator

Do you believe checksums are correct except the case where the checksum is 0x0000 which should be replaced by 0xffff? Or are you getting many spurious 0x0000 checksums?

@Ferroin
Copy link
Contributor

Ferroin commented Feb 7, 2017

@P33M if he fixed it using ethtool, it's either the hardware, the firmware for the NIC (I don't recall whether the chip the Pi's use has loadable firmware), or the driver. In any of the three cases, if he gets an adapter with a different chipset, I can almost guarantee it will work (the almost referring to the possibility that he happens to find a different chipset that has exactly the same behavior).

@popcornmix
Copy link
Collaborator

I suspect the issues lies in here:
https://github.com/raspberrypi/linux/blob/rpi-4.4.y/drivers/net/usb/smsc95xx.c#L1894

I'm not sure if it needs a check for 0x0000 and replacement with 0xffff.

@Ferroin
Copy link
Contributor

Ferroin commented Feb 7, 2017

Possibly. I hadn't noticed the ethtool command used before, but that implies heavily that it's hardware or firmware, because that particular ethtool command disables checksum offloading.

@sconemad Do things work when you only disable offloading for transmit, or do you have to disable both to get it to work?

@JamesH65
Copy link
Contributor

JamesH65 commented Feb 7, 2017

Changing the smsc driver there will only fix for packets < length 45 as they use a SW checksum, otherwise I presume the HW inserts the checksum on transmit, after this function. IIR the code correctly!

Might be worth hacking it to always use SW checksum to see if it fixes the problem.

@popcornmix
Copy link
Collaborator

I changed the if (skb->len <= 45) to if (1) and it still failed.

@Ferroin
Copy link
Contributor

Ferroin commented Feb 7, 2017

The driver seems to default to having TX checksum offloading enabled, which would suggest that the if (skb->len <= 45) may be pointless to begin with.

@popcornmix any chance you could try that modified kernel and then run sudo ethtool --offload eth0 rx off tx off before doing the actual test? If that works, then that points at a possibly separate issue in the driver where it's not preventing packets it's checksummed in SW from being re checksummed in HW.

@JamesH65
Copy link
Contributor

JamesH65 commented Feb 7, 2017

@dom presumably you also put in a if (sum == 0) sum = 0xffff thingy in the SW calc section as well?

@Ferroin If checksums are enabled, won't it always get to that if < 45 section?

@popcornmix
Copy link
Collaborator

I'm just testing the replacement of 0 with 0xffff and it is running longer.
Just testing the SW section currently - need to check if HW block needs the same.

@Ferroin
Copy link
Contributor

Ferroin commented Feb 7, 2017

@JamesH65 If however the hardware is still checksumming the packets that would match that section for some reason, then there's no point in having the section as-is, since the hardware would be stomping on the software checksum anyway. Looking at the code again, if that is the case, then it's another (possibly related, possibly unrelated) hardware bug.

I could of course be completely off-base on this, most of my expertise regarding kernel code is very high-level stuff, not drivers...

@sconemad
Copy link
Author

sconemad commented Feb 7, 2017

@popcornmix Yes, I think the problem is only with packets where the checksum is calculated to be 0, which should be replaced with 0xFFFF.

@popcornmix
Copy link
Collaborator

@Ferroin This patch fixes the problem with hardware checksums enabled (the default set up), but obviously wastes cpu always calculating crcs.
But it suggests this code is having an effect (i.e. presumably is called after HW checksum has occurred), or at least the HW is not overwriting this value.

Unfortunately I can't find the checksum in the not skb->len <= 45 code path. The same offset doesn't contain a value matching the software computed value, and trying to change 0->0xffff doesn't fix the bug.
Dumping out 64 bytes from skb->data doesn't allow me to spot the checksum, so I'm a bit stumped about exactly what is going on in this code.

With the patch present, running
sudo ethtool --offload eth0 rx off tx off
doesn't change behaviour - it is still happy.

@popcornmix
Copy link
Collaborator

BTW: I have emailed the author of this code and he has pinged microchip so perhaps we will get some advice from them.

@sconemad
Copy link
Author

sconemad commented Feb 7, 2017

@Ferroin Yes, I only need to disable transmit offloading to make it work. I guess this means that the receive offloading works, since the returned datagram is going to have the same checksum in my test code, since only the source and destination addresses are flipped.

Thanks for looking into this everyone. Please let me know if you'd like me to test anything.

@Electron752
Copy link
Contributor

@sconemad: Is this for personal use? I went to Microchip's website and they have a newer driver but you have to agree to some legal terms to download which I think includes non-disclosure. I did download it and briefly looked at it.

If you aren't planning to distribute the source code or this is for a closed source device, you can probably just use Microchip's driver. Or at least test it and see if it fixes the issue.

Otherwise, you will probably just have to wait for the original author to fix it. The product appears to be now called LAN9514.

http://www.microchip.com/wwwproducts/en/LAN9514
Just look at the software library tab.

@JamesH65
Copy link
Contributor

JamesH65 commented Feb 8, 2017

My understanding of the code is that the line 1934

if (csum)
tx_cmd_b |= TX_CMD_B_CSUM_ENABLE;

Adds a flag to the packet to enable the HW checksum to be added later. Which is why you cannot see it in this function, and cannot correct it in this function. I have no idea how to get round that - it would appear that this is a firmware issue and will need to be fixed there.

@JamesH65
Copy link
Contributor

JamesH65 commented Feb 8, 2017

I should clarify; if we force SW checksums all the time we can fix it here (as per Dom's change), but if we want to use HW checksums, I suspect that needs a firmware change.

If no firmware change, it's a trade off between added CPU burden, vs losing a packet and requiring a retry on average every 65k. Might be a sweet spot dependent on packet size. ie over a particular size, use HW anyway and take the chance of a bad checksum, because the CPU burden is too high the rest of the time.

@NWilson
Copy link

NWilson commented Feb 8, 2017 via email

@sconemad sconemad closed this as completed Feb 8, 2017
@sconemad sconemad reopened this Feb 8, 2017
@sconemad
Copy link
Author

sconemad commented Feb 8, 2017

(sorry, pressed the wrong button!)

I was basically going to say what @NWilson said above (he is the one who originally discovered this issue BTW).

This was found in testing RealVNC Server (included in Raspbian), so we're quite keen to see a general solution for this, although it's nice to know that there are ways to workaround it.

@Ferroin
Copy link
Contributor

Ferroin commented Feb 8, 2017

If someone could do some testing to get numbers both on how much forcing SW checksumming would impact network throughput and how much CPU load it would impose, that would be good.

@Electron752
Copy link
Contributor

@Ferroin - That kind of test may actually be a bunch of work to write. Lots of different configurations to test, etc. udp performance is hard to test anyway.

At this point, maybe it is best to wait and see what the driver author and Microchip say about this. I know Microchip is generally very open about hardware and firmware issues. They are also constantly tweaking the devices so this issue may in fact only happen on certain revisions or it may happen in only very specific cases which are easy to detect and correct in the driver. If it is indeed a firmware issue instead of a silicon issue, it may even be possible to update the firmware.

@sconemad
Copy link
Author

sconemad commented Feb 9, 2017

I'm not sure how helpful this is, but I was able to do some fairly simple tests by modifying my test program to send and receive 100,000 x 1000 byte datagrams, and measuring the run time - which came out to be around 1 minute. (Luckily I didn't get any packets where the 0 checksum)

Running it quite a few times with checksum offloading on and off, I wasn't able to detect any significant difference in any of the timings.

@JamesH65
Copy link
Contributor

JamesH65 commented Feb 9, 2017 via email

@popcornmix
Copy link
Collaborator

This is on a Pi1 at 700MHz

pi@domnfs:~ $ sudo ethtool --offload eth0 rx on tx on
pi@domnfs:~ $ dd if=/dev/zero bs=100M count=1 | nc -v -v -n 192.168.4.43 2000 & top -d30 -b -n2 | grep Cpu | tail -1
%Cpu(s):  0.9 us, 42.2 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi, 56.9 si,  0.0 st
104857600 bytes (105 MB) copied, 33.3049 s, 3.1 MB/s
pi@domnfs:~ $ 
pi@domnfs:~ $ dd if=/dev/zero bs=100M count=1 | nc -v -v -n 192.168.4.43 2000 & top -d30 -b -n2 | grep Cpu | tail -1
%Cpu(s):  1.4 us, 41.9 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi, 56.7 si,  0.0 st
104857600 bytes (105 MB) copied, 34.8077 s, 3.0 MB/s
pi@domnfs:~ $ sudo ethtool --offload eth0 rx off tx off
pi@domnfs:~ $ dd if=/dev/zero bs=100M count=1 | nc -v -v -n 192.168.4.43 2000 & top -d30 -b -n2 | grep Cpu | tail -1
%Cpu(s):  1.2 us, 41.2 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi, 57.7 si,  0.0 st
104857600 bytes (105 MB) copied, 34.7566 s, 3.0 MB/s
pi@domnfs:~ $ dd if=/dev/zero bs=100M count=1 | nc -v -v -n 192.168.4.43 2000 & top -d30 -b -n2 | %Cpu(s):  1.4 us, 41.3 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi, 57.3 si,  0.0 st
104857600 bytes (105 MB) copied, 34.9811 s, 3.0 MB/s

So with HW checksums on we get:
3.1 MB/s with CPU 42.2% (system) and 0.9% (user)
3.0 MB/s with CPU 41.9% (system) and 1.4% (user)

So with SW checksums on we get:
3.0 MB/s with CPU 41.2% (system) and 1.2% (user)
3.0 MB/s with CPU 41.3% (system) and 1.4% (user)

So I'm not measuring a difference with this test.

@njh
Copy link

njh commented Apr 2, 2017

I have just discovered a related issue that IPv6 UDP checksum calculation is failing for very small packets (see #1944). Switching off hardware offloading using ethtool fixes the problem.

Looks like the LAN9514 is a bit dodgy for IPv6...

@popcornmix
Copy link
Collaborator

From other issue:
Microchip has said they will be submitting a patch for #1832 (and a couple of other issues) upstream within about a month.

@njh
Copy link

njh commented May 3, 2017

Any news on a fix from Microchip?

@popcornmix
Copy link
Collaborator

No news. I've pinged them - I'll report here when there is any news.

@popcornmix
Copy link
Collaborator

Update: "The work is ongoing. Should see the patch next week."

popcornmix added a commit to raspberrypi/firmware that referenced this issue May 9, 2017
kernel: irq_bcm2836: Send event when onlining sleeping cores
kernel: ARM: dts: bcm283x: Reserve first page for firmware
See: raspberrypi/linux#1989

kernel: smsc95xx: Avoid HW TX CSUM for IPV6
See: raspberrypi/linux#1832
popcornmix added a commit to Hexxeh/rpi-firmware that referenced this issue May 9, 2017
kernel: irq_bcm2836: Send event when onlining sleeping cores
kernel: ARM: dts: bcm283x: Reserve first page for firmware
See: raspberrypi/linux#1989

kernel: smsc95xx: Avoid HW TX CSUM for IPV6
See: raspberrypi/linux#1832
@popcornmix
Copy link
Collaborator

Patch from microchip has been submitted to netdev mailing list.
I have cherry-picked it here: a9ad627
and latest rpi-update kernel includes this patch.
Please test this and let me know if you are happy. It appeared to fix sconemad's udpcs test for me.
My understanding is that is disables HW TX checksums for IPV6.

@sconemad
Copy link
Author

Tested with kernel 4.9.27-v7+ #997 SMP Tue May 9 19:58:37 BST 2017 obtained using rpi-update.

Both my udpcs test app and RealVNC server no longer encounter this problem, so this appears to have fixed it.

Thanks!

@P33M P33M closed this as completed May 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants