-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update TI bits in linux-iot2050 #411
Conversation
Dear All! We tried the Siemens reference image (Chao's branch). The eno1 interface was requested to use DHCP. After several power cycles the eno1 couldn't get IP address. Based on the statistics it seems this the well known issue, that the RX packets got stuck at driver level and couldn't reach the higher levels. Long story short: the DHCP responses (RX packets) are lost in the driver and the Linux cannot configure the IP. The statistics in the ethtool dump shows non-zero RX counters. However ifconfig / netstat shows zero RX bytes: Will try Jan's branch variant |
I tried a clean build from jan/kernel-update branch The network interfaces don't work at all. This version is even worse, please double check at your side logs attached: |
Thanks for reporting. I haven't seen this on any boards here so far. Same issue when going two commits backward (70ebcfa)? |
@attila-hannibal please also check if a specific element of your network infrastructure contributes to this (eg. as specific switch, compared to cross-links). |
Hello @jan-kiszka I tried the mentioned commit 70ebcfa but the behaviour is the same, eno* interfaces don't work, kernel dump appears after 1-2 minutes |
And if you leave out the firmware update completely? In my tests, our current firmware still worked. BTW, please also explore my other question if your network infrastructure influences this. I have no luck reproducing it, boxes run for hours with all versions. |
By using the binary artefact from: https://github.com/siemens/meta-iot2050/suites/10438058123/artifacts/516591580 The only change besides the root password config at first startup I modified the /etc/network/interfaces file: # interfaces(5) file used by ifup(8) and ifdown(8) auto eno1 auto eno2 interfaces does not work, kernel dumps after while |
Ok, we can rule out build issues on your side. Some more things to rule out still: please have a look at my other suggestions. |
Using the Chao's branch as source
The ifconfig shows no RX data The dmesg has the "icssg-prueth icssg0-eth eno2: timeout waiting for command done" line So the error is present with simple point-to-point connection |
Please don't change two variables a the same time (here: sources AND network setup). Would still like to see
In addition, please confirm
|
After the many clean recompilation I recognized that we may use a different SW than the "Chao's branch", it should be an older variant of your master branch. I guess in the past the image to be written was iot2050-image-example-iot2050-debian-iot2050.wic.img (with .img suffix) and the current build creates an other filename without .img suffix. I have the same Ethernet ports non-working issue with the Chao branch, too. Let me example our test setups, we have two iot2050 instances
will do the logs you requested later. |
we may start an other approach. So based the symptom it seems the TI firmware running on the PRU got stock somehow and cannot handle the commands. Do you know if we can have some debug possibility to verify if the firmware running on the PRU is "healthy" or have some issues? |
This a command register dump when the problem occurred: the first value should be 0xffff0000 (EMAC_NONE), too, but it stays 0xffbb0000
|
I'm trying my contacts to TI. Maybe we will get some further hints how to analyze this best. |
Some further testing result: test scenarios:
I think we can narrow down the issue: during the boot when the eno1 is connected already and configured as dhcp, there should be some kind of deadlock that causes the issue. It seems only eno1 has such a behaviour (see scenario 5) ), also when the interface configured "late" (see scenario 6) ) it works. That may answers why our Linux image has the non-persistent behaviour, because something depends on the timing and/or boot step sequence. the 4 command registers for scenario 1) seems all fine
|
@attila-hannibal Do you have tested with scenario that both eno1 and eno2 are static? I met issue with this scenario before. |
@jan-kiszka I found the commit 4a80b17 really helps a lot from the rebase suffering, however, irrelevant to this issue. So I've just extracted it to a separate PR #414. |
Hi @attila-hannibal, We've tried these scenarios with the action build https://github.com/siemens/meta-iot2050/actions/runs/3948486240, however none is reproduced. So there might be something nuanced different between our setup and yours which lead to the non-producible. So it would be helpful if we have below information (some of them just a double confirmation to make sure we are on the same page):
|
Hello @BaochengSu I used the iot2050-example-image.zip as wic image from the build you linked in. I copied it to a 64GB large USB3 stick. I also replaced the SPI flash content with the iot2050-pg2-image-boot.bin from the build to have the same FW/SW as you have. auto eno1 auto eno2 (no other change has been made to you reference image)
Best regards |
Wait, I missed that so far:
We are using Network Manger in the default image. I'm sure if /etc/network/interface.d is properly evaluated at all. And even if: there is also |
Hello @jan-kiszka Ok, I confirm the "persistent network issue" is solved by using the correct network-manager. Now we have to go back to our original problem when the network loss happened randomly, and when this problem happens we have the "icssg-prueth icssg0-eth eno2: timeout waiting for command done" in the dmesg log |
Great to hear. Hope the image can now help validating if that issue is gone as well. Please let us know when there are news or further findings/questions. |
@attila-hannibal |
b603787
to
dc1cfec
Compare
These patches have been selected from ti-linux tag 08.02.00.006 which corresponds to that SDK release. They roughly bring the icssg_prueth driver on that level but also catches up in other not-yet--upstreamed areas. This is the full diffstat of the kernel changes: arch/arm64/boot/dts/ti/k3-am65-main.dtsi | 36 + drivers/dma/of-dma.c | 10 + drivers/dma/ti/dma-crossbar.c | 6 +- drivers/dma/ti/k3-udma-glue.c | 370 ++- drivers/dma/ti/k3-udma-private.c | 39 + drivers/dma/ti/k3-udma.c | 2538 +++++++++++++++++---- drivers/dma/ti/k3-udma.h | 27 +- drivers/firmware/ti_sci.c | 1 + drivers/firmware/ti_sci.h | 2 + drivers/irqchip/irq-pruss-intc.c | 47 +- drivers/net/ethernet/ti/Kconfig | 2 + drivers/net/ethernet/ti/Makefile | 2 +- drivers/net/ethernet/ti/icss_iep.c | 4 +- drivers/net/ethernet/ti/icss_iep.h | 2 +- drivers/net/ethernet/ti/icss_mii_rt.h | 93 +- drivers/net/ethernet/ti/icssg_classifier.c | 8 + drivers/net/ethernet/ti/icssg_config.c | 485 +++- drivers/net/ethernet/ti/icssg_config.h | 80 +- drivers/net/ethernet/ti/icssg_ethtool.c | 84 + drivers/net/ethernet/ti/icssg_mii_cfg.c | 105 + drivers/net/ethernet/ti/icssg_prueth.c | 623 ++++- drivers/net/ethernet/ti/icssg_prueth.h | 78 +- drivers/net/ethernet/ti/icssg_qos.c | 476 ++++ drivers/net/ethernet/ti/icssg_qos.h | 136 ++ drivers/net/ethernet/ti/icssg_switch_map.h | 11 + drivers/net/ethernet/ti/icssg_switchdev.c | 494 ++++ drivers/net/ethernet/ti/icssg_switchdev.h | 13 + drivers/net/phy/dp83867.c | 15 +- drivers/pci/controller/dwc/pci-keystone.c | 8 +- drivers/soc/ti/k3-ringacc.c | 325 ++- include/linux/dma/k3-event-router.h | 16 + include/linux/dma/k3-psil.h | 16 + include/linux/dma/k3-udma-glue.h | 8 + include/linux/dmaengine.h | 16 + include/linux/soc/ti/k3-ringacc.h | 17 + 35 files changed, 5494 insertions(+), 699 deletions(-) The SDK came with 2 issues, one missing config SELECT and a regression of half-duplex support for PG1. Related fixes are at the end of the series. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
This brings us to the level of SDK 08.02.00.02, aligned with the current kernel queue. Updating the firmware binaries separately from the kernel is fine as the old firmware still worked with the newer driver. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Now usable with the latest icssg_prueth driver. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
…0.001 This comes again primarily with fixed for the icssg-prueth and DMA bits but also has two SDHCI fixes and a feature enhancement on the timer side. Diffstat: drivers/clocksource/timer-ti-dm.c | 8 ++ drivers/dma/ti/k3-udma-private.c | 6 +- drivers/dma/ti/k3-udma.c | 100 +++++++++++++++-- drivers/mmc/host/sdhci-cqhci.h | 24 +++++ drivers/mmc/host/sdhci_am654.c | 183 ++++++++++++++++++++++++++++---- drivers/net/ethernet/ti/icss_mii_rt.h | 1 + drivers/net/ethernet/ti/icssg_ethtool.c | 46 +++++++- drivers/net/ethernet/ti/icssg_mii_cfg.c | 16 +++ drivers/net/ethernet/ti/icssg_prueth.c | 132 +++++++++++++++-------- drivers/net/ethernet/ti/icssg_prueth.h | 5 + drivers/soc/ti/k3-ringacc.c | 2 +- 11 files changed, 438 insertions(+), 85 deletions(-) Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
This aligns the firmware again to the ti-linux kernel. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Allows to drop 3 patches that were merged into stable meanwhile. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
…nux version 08.06.00.004 Brings AF_XDP support for icssg-prueth, thought without zero-copy so far. Rename the ti-pruss-firmware recipe along this. Nothing changed there, just reflect that firmware is still in sync with the kernel driver. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
dc1cfec
to
e1de8e8
Compare
@BaochengSu, @AsuraZeng, can we proceed with the MR? It addresses what #374 was fixing, avoids related regressions, and aligns with latest ti-linux (now 08.06.004, as recommended by T). It may just not resolve all issues of the prueth, but that should be shared with ti-linux at this point. |
Given that TI is about to release the 8.6 SDK soon, I am ok to proceed with 8.6 catchup. |
Align the BSP kernel regarding its TI downstream or backport bits with latest ti-linux. Also update the prueth firmware along that. Should resolve #368 and supersedes #374.
This does not including a stable update of the underlying CIP kernel yet as we are waiting for a recent cip-rt release.