Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raspberry Pi 4: bcmgenet ethernet does not send packets with Linux 5.6.2 #47

Closed
stapelberg opened this issue Apr 8, 2020 · 32 comments
Closed

Comments

@stapelberg
Copy link

[Filing this as a separate issue to avoid derailing issue #43. Please forgive me if this is not the right place, but I figured y’all would know the most about ethernet on the Raspberry Pi 4 with the upstream linux kernel.]

I’m running upstream Linux 5.6.2 on a Raspberry Pi 4 Model B (in 64 bit mode) and I’m having some trouble getting ethernet to work.

The link comes up, and packets are received: I can see the packets in tcpdump, and the link gets an IPv6 address based on router advertisements.

However, any packets that are sent (I can see them in tcpdump) are not seen on the network by other devices.

Here’s what I have tried/checked so far:

  1. I’m using the official Raspberry Pi USB-C power supply.
  2. I tried swapping ethernet cables.
  3. I tried connecting the Raspberry Pi to my laptop directly. Running tcpdump on my laptop, I cannot see any packets from the Raspberry Pi (the other direction works).
  4. I tried applying kernel patch raspberrypi/linux@049c983, but that only resulted in “hw csum failure” kernel error messages: https://gist.github.com/stapelberg/4fb444d0b8460179ea99b336560afd44
  5. I tried applying the patch in point 4 and disabling tx/rx checksum offloading in the kernel. That eliminated the “hw csum failure” error messages, but the symptom of not being able to send packets remains the same.
  6. I have verified that 2020-02-13-raspbian-buster-lite.img works as expected with my setup: it can receive and send ethernet packets. I’m concluding it’s most likely a kernel driver issue, not a hardware issue.
  7. I tried just copying the *.[ch] files from raspberrypi/linux into linux 5.6.2, but the files have changed too much for me to get them to compile.
  8. I’m using the most recent firmware (raspberrypi/firmware@c2c6ce8)
  9. I’m using the most recent EEPROM (raspberrypi/rpi-eeprom@a5be2ff)

Any ideas for what might be wrong here, or what I could try to further diagnose this issue?

Thank you very much in advance!

cc @lategoodbye @pelwell

@pelwell
Copy link

pelwell commented Apr 8, 2020

If you still have that Buster Lite image around, can you run sudo rpi-update on it and retest? If that works, try BRANCH=next sudo rpi-update to pull in the latest downstream 5.4 kernel.

Since you're up to building your own kernel, you can try reverting to the most recent upstream 5.5 kernel (tag v5.5.15) in an attempt to narrow down when the problem started.

I'll be looking at a checksum offload issue later, so I should be able to confirm that the GENET is vaguely functional on 5.6, but raspberrypi/linux#3523 suggests that it is.

@stapelberg
Copy link
Author

Thanks for the tip, I’ll try that later today!

@nullr0ute
Copy link

For reference I'm not seeing issues with 5.6.2 on Fedora (aarch64 only, ARMv7 has other issues I need to investigate) on the upstream genet driver.

@lategoodbye
Copy link
Owner

lategoodbye commented Apr 8, 2020

@stapelberg Just guessing: could you please try the other RGMII PHY modes by changing your devicetree? I assume you are currently running "rgmii-rxid" with the upstream DTS.

@stapelberg
Copy link
Author

@stapelberg Just guessing: could you please try the other RGMII PHY modes by changing your devicetree? I assume you are currently running "rgmii-rxid" with the upstream DTS.

Yeah, you’re right:

% grep phy-mode /tmp/*.dts              
/tmp/gokrazy.dts:			phy-mode = "rgmii-rxid";
/tmp/raspbian.dts:			phy-mode = "rgmii";

After changing it to rgmii, the network seems to work!

Thank you so much!

stapelberg added a commit to gokrazy/kernel that referenced this issue Apr 8, 2020
@lategoodbye
Copy link
Owner

@stapelberg I prefer to fix it the mainline kernel. Could you please confirm that you were using a mainline kernel + DTS?

@nullr0ute Does phy-mode = "rgmii" also works for you?

@stapelberg
Copy link
Author

Could you please confirm that you were using a mainline kernel + DTS?

Hereby confirmed, yes.

@nullr0ute
Copy link

@stapelberg I prefer to fix it the mainline kernel. Could you please confirm that you were using a mainline kernel + DTS?

@nullr0ute Does phy-mode = "rgmii" also works for you?

We're currently using what ever the upstream default is, looking upstream that's rgmii-rxid, I can test changing it to rgmii in the DT when I get a moment.

@lategoodbye
Copy link
Owner

@stapelberg Could you please doublecheck that Linux 5.5 has the same behavior?

@stapelberg
Copy link
Author

Just checked with Linux 5.5.13. The issue is the same when not overriding phy-mode, and is fixed the same way when setting phy-mode=rgmii.

@lategoodbye
Copy link
Owner

Thanks. I will take care of the upstream patch.

Is it okay to add you as a bug reporter to the patch?

@stapelberg
Copy link
Author

Yes. Thanks for taking care of the upstream fix!

@lategoodbye
Copy link
Owner

Looks like the same issue here:
raspberrypi/linux#3417

@lategoodbye
Copy link
Owner

lategoodbye commented Apr 11, 2020

So i tested the change on 3 RPi 4 B against next-20200411 (multi_v7_defconfig) and it fails in most of the cases.

MAC Address PHY mode Result
DC:A6:32:23:54:85 RGMII FAIL
DC:A6:32:23:54:85 RGMII-RXID OKAY
B8:27:EB:FB:D8:28 RGMII FAIL
B8:27:EB:FB:D8:28 RGMII-RXID OKAY
DC:A6:32:3E:F2:35 RGMII OKAY
DC:A6:32:3E:F2:35 RGMII-RXID OKAY

Based on this result i cannot send the suggested change as a patch.

@stapelberg Could you please try current linux-next?
Was RGMII the only PHY mode (there are 4) which worked for you?

@lategoodbye lategoodbye reopened this Apr 11, 2020
@stapelberg
Copy link
Author

Was RGMII the only PHY mode (there are 4) which worked for you?

Can you clarify which 4 values are interesting here? Are the values rgmii, rgmii-rxid, rgmii-txid, rgmii-id, or did I read this wrong?

@stapelberg
Copy link
Author

Okay, here are my test results:

Linux 5.6.3:

MAC PHY mode dmesg result
dc:a6:32:02:xx:yy rgmii external RGMII (no delay) OKAY
dc:a6:32:02:xx:yy rgmii-rxid external RGMII (RX delay) FAIL
dc:a6:32:02:xx:yy rgmii-txid external RGMII (TX delay) FAIL
dc:a6:32:03:yy:zz rgmii external RGMII (no delay) OKAY
dc:a6:32:03:yy:zz rgmii-rxid external RGMII (RX delay) FAIL
dc:a6:32:03:yy:zz rgmii-txid external RGMII (TX delay) FAIL
dc:a6:32:02:zz:aa rgmii external RGMII (no delay) OKAY
dc:a6:32:02:zz:aa rgmii-rxid external RGMII (RX delay) FAIL
dc:a6:32:02:zz:aa rgmii-txid external RGMII (TX delay) FAIL

linux-next-20200413:

MAC PHY mode dmesg result
dc:a6:32:02:xx:yy rgmii external RGMII (no delay) OKAY
dc:a6:32:02:xx:yy rgmii-rxid external RGMII (RX delay) FAIL
dc:a6:32:02:xx:yy rgmii-txid external RGMII (TX delay) FAIL
dc:a6:32:03:yy:zz rgmii external RGMII (no delay) OKAY
dc:a6:32:03:yy:zz rgmii-rxid external RGMII (RX delay) FAIL
dc:a6:32:03:yy:zz rgmii-txid external RGMII (TX delay) FAIL
dc:a6:32:02:zz:aa rgmii external RGMII (no delay) OKAY
dc:a6:32:02:zz:aa rgmii-rxid external RGMII (RX delay) FAIL
dc:a6:32:02:zz:aa rgmii-txid external RGMII (TX delay) FAIL

In summary: on my three different Raspberry Pi 4 devices (one with 4G, the others with 2G of memory), only phy-mode rgmii works, both with Linux 5.6.3 and with today’s next-20200413.

@lategoodbye
Copy link
Owner

Was RGMII the only PHY mode (there are 4) which worked for you?

Can you clarify which 4 values are interesting here? Are the values rgmii, rgmii-rxid, rgmii-txid, rgmii-id, or did I read this wrong?

Correct

@lategoodbye
Copy link
Owner

lategoodbye commented Apr 13, 2020

@stapelberg Which kernel configuration did you use for your tests?

@stapelberg
Copy link
Author

Correct

Okay. Do you want me to test rgmii-id as well, or are the results above regarding rgmii{,-rxid,-txid} enough to work with?

@stapelberg Which kernel configuration did you use for your tests?

make defconfig + https://github.com/gokrazy/kernel/blob/c3e1e48e481e208f95a9304166e9e75956552587/cmd/gokr-build-kernel/build.go#L17

I also attached the resulting /proc/config.gz for your convenience: config.gz

@lategoodbye
Copy link
Owner

Could you please retest Linux 5.6 but 32 bit and only multi_v7_defconfig (without any modifications)? Sorry, currently i don't have the time to setup a working 64 bit environment.

@stapelberg
Copy link
Author

Sorry, testing 32-bit is too much effort for me. gokrazy was only ever targeting 64-bit.

I can test multi_v7_defconfig, but since I don’t use loadable modules, I’d need to do some modifications.

@lategoodbye
Copy link
Owner

Okay, i will try to test with builtin on 32 bit.

Is gokrazy ready for RPi 4 yet?

@stapelberg
Copy link
Author

Is gokrazy ready for RPi 4 yet?

It works as far as I can tell, but I haven’t installed a Raspberry Pi 4 into the continuous integration setup yet. gokrazy/gokrazy#48 tracks these 2 remaining issues.

@lategoodbye
Copy link
Owner

Okay, i will try to test with builtin on 32 bit.

I tested it, but didn't make any difference.

@lategoodbye
Copy link
Owner

@stapelberg What is the minimum version of Go i need to install for gokrazy?

@stapelberg
Copy link
Author

Not entirely sure. The current stable version (Go 1.14) definitely works and I’d recommend using it. We don’t usually test with older versions. The most likely failure scenario is that our code uses methods not yet available in your version of Go, which would result in a compile-time error. In other words: try it and see, if you’re adventurous :)

It’s quick & easy to install into your home dir (see https://golang.org/doc/install), in case your OS doesn’t provide Go 1.14.

@lategoodbye
Copy link
Owner

lategoodbye commented May 3, 2020

Okay, i managed to get it working on my RPi 4. At least i can confirm that one of the Pis which required rgmii-rxid with multi_v7_defconfig / Raspbian works fine with rgmii under gokrazy:

[ 3.289489] bcmgenet fd580000.ethernet: configuring instance for external RGMII (no delay)

So we can definitely exclude a hardware issue.

@stapelberg I would be really fine to have access via debug UART / busybox.

@stapelberg
Copy link
Author

@stapelberg I would be really fine to have access via debug UART / busybox.

I filed gokrazy/gokrazy#54 just recently. For now, you can place https://t.zekjur.net/sh (statically compiled busybox) onto the permanent partition (4th partition), either from your computer with an SD card reader, or interactively via breakglass: https://github.com/gokrazy/breakglass

@lategoodbye
Copy link
Owner

@stapelberg Sorry, i don't have the time for testing. But i think i've found the real issue. The MII PHY is not enabled in your config.

Please try to enable CONFIG_BROADCOM_PHY. Big thanks to Marek Szyprowski for finding this issue.

@stapelberg
Copy link
Author

stapelberg commented May 8, 2020

Aha, thank you! Let me verify this real quick.

Yep:

breakglass # gunzip -c /proc/config.gz | grep BROADCOM_PHY
# CONFIG_BROADCOM_PHY is not set

stapelberg added a commit to gokrazy/kernel that referenced this issue May 8, 2020
@stapelberg
Copy link
Author

You’re right! Thanks very much. I pushed gokrazy/kernel@82e30a7 and verified it fixes it on my devices. Note that once I enabled CONFIG_BROADCOM_PHY, I had to also drop the phy-mode patch and go back to the default rgmii-rxid, otherwise the network would not be stable.

Should there be a dependency in the kernel build system which enforces this setup, if this is the desired state?

@lategoodbye
Copy link
Owner

lategoodbye commented May 8, 2020

The problem is that the Ethernet PHY is board specific, so we cannot really enforce a dependency. But there is a ongoing discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants