New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ramips: fix "no traffic" problem and DMA kernel panic at startup on rt305x and mt7628 devices #14194
Conversation
IIRC there was work to migrate mt7628 to the mainline ethernet driver. I wonder what happened. |
mark to follow |
The whole problem is in the mt7628 switch. If we still use swconfig, then there are no problems. But there are difficulties with DSA, since hw is architecturally not compatible. DSA implementation is only possible via vlans. Correct me if this is not true. |
target/linux/ramips/files/drivers/net/ethernet/ralink/esw_rt3050.c
Outdated
Show resolved
Hide resolved
target/linux/ramips/files/drivers/net/ethernet/ralink/mtk_eth_soc.c
Outdated
Show resolved
Hide resolved
Fixes half of #14138 |
@DragonBluep I'm still puzzled about those kernel changes - not the device tree. I just noticed that mt7620 has had same reset handles as rt3352 and mt7628 are changed in this PR. It makes full sense to move rt5350 and rt3050 too, but this also means that the other targets were already working with reset handles registered just like that. My PR doesn't change that logic, really, so I'd like to keep it - I'm still digging into the inner workings of reset framework. |
3102ba5
to
55f874a
Compare
Changes for RT3050 and RT5350 added as well - tested again with my MPR-A2 - unit boots successfully and connects to the network, albeit very slowly. |
55f874a
to
91b4e41
Compare
should we maybe consider the function Perhaps resets can be added to the FE block without being removed from the ESW block? or at least, it looks like the kernel supports that now... |
I think it should work, according to kernel docs, even for 5.10 in 22.03. |
91b4e41
to
297a319
Compare
@mpratt14 @DragonBluep I applied your suggestions, please review. I'm not entirely sure if the point at which |
see if it works without the "optional" version of the function apparently the "optional" series of functions simply allows the reset to be missing from devicetree without causing a big error, but we have it and there seems to be a strong functional dependency on having it. |
some good reading material if you haven't found this already |
Around the kernel i see several uses of |
if you do try |
...it may need to be the atomic version of |
297a319
to
d7d7a11
Compare
Makes perfect sense. I returned to previous version, and then dropped the rst_esw related code in esw_3050.c. In this case it is not worth to play with shared resets.
It makes no sense to shave off microseconds of 1ms resets either, this is done once per boot.
It did work previously, so only change needed here is adding proper messages, which I did now. Needless to say, tested again on two of my devices. |
it was already working as shared though, right? |
No, it was working as exclusive, just not as optional, which only causes the framework to return NULL directly instead of respective error pointer. |
The way I see this is: EPHY block is within the ESW block, which is within the FE block. So whenever the FE block needs a reset, everything downstream of it needs a reset too (cleaning leftover data that would go into DMA too early?). Then the ESW and EPHY are able to reset together without having to reset FE because the dependency is the other way. |
have you tried and maybe this requires adjusting which function the other driver uses to get the reset... I don't mean to push this too much, it can always be done later. I'm mostly concerned that someone else will come along and do something similar to 60fadae again because the dts "doesn't look right" |
I tried devm_reset_control_array_get_sharedl (see the tree at 297a319) but I wasn't convinced that it's entirely correct. |
I'm trying to talk about testing the difference between But yeah, it can wait until another time. |
It really only differs on the returned value in case of failure, see: |
Right, and if I understand correctly... a null pointer for a reset_control causes the resetting to not actually happen, which could also be a possible case where everything is functioning ok, having almost the same effect of the case where both blocks are reset together. In other words, reset only FE -> problems, reset both FE and ESW -> good, reset nothing if null pointer -> maybe good? but we can't know whether it's actually being reset... If you try the But for now this PR is fine as it is, save it for later... |
If I just tried "get_shared" variant without the added "deassert" calls right after getting the reset controls, I got the expected warning about calling "assert" without prior "deassert", as documented. Then I added "deasert" calls immediately after getting the controls. But calling "deassert" twice during probe of each module might have indeed caused nothing to actually get reset, because use count would go in such sequence like this: 0, 1, 2, 1, 2, 1, 2. |
I just remembered, that I have TP-link WR802N v4 available, which is MT7628N. So tested that as well - success. And pro forma on Nexx WT3020 (MT7620), because |
Use devm_reset_control_array_get_exclusive to register multiple reset lines in FE driver. This is required to reattach ESW reset to FE driver again, based on device tree bindings. While at that, remove unused fe_priv.rst_ppe field, and add error message if getting the reset fails. Fixes: 60fadae ("ramips: ethernet: ralink: move reset of the esw into the esw instead of fe") Co-developed-by: Maxim Anisimov <maxim.anisimov.ua@gmail.com> Signed-off-by: Maxim Anisimov <maxim.anisimov.ua@gmail.com> [Split out of the bigger commit, provide commit mesage, refactor error handling] Signed-off-by: Lech Perczak <lech.perczak@gmail.com>
Enabling the FE core too early causes the system to hang during boot uncondtionally, after the reset is released. Increate it to 1-1.2ms range. Fixes: 60fadae ("ramips: ethernet: ralink: move reset of the esw into the esw instead of fe") Signed-off-by: Maxim Anisimov <maxim.anisimov.ua@gmail.com> [Split previous commit, provide rationale] Signed-off-by: Lech Perczak <lech.perczak@gmail.com>
Failing to do so will cause the DMA engine to not initialize properly and fail to forward packets between them, and in some cases will cause spurious transmission with size exceeding allowed packet size, causing a kernel panic. Fixes: 60fadae ("ramips: ethernet: ralink: move reset of the esw into the esw instead of fe") Signed-off-by: Maxim Anisimov <maxim.anisimov.ua@gmail.com> [Provide commit description, split into logical changes] Signed-off-by: Lech Perczak <lech.perczak@gmail.com>
Failing to do so will cause the DMA engine to not initialize properly and fail to forward packets between them, and in some cases will cause spurious transmission with size exceeding allowed packet size, causing a kernel panic. This is behaviour of downstream driver as well, however I haven't observed bug reports about this SoC in the wild, so this commit's purpose is to align this chip with all other SoC's - MT7620 were already using this arrangement. Fixes: 60fadae ("ramips: ethernet: ralink: move reset of the esw into the esw instead of fe") Signed-off-by: Lech Perczak <lech.perczak@gmail.com>
Failing to do so will cause the DMA engine to not initialize properly and fail to forward packets between them, and in some cases will cause spurious transmission with size exceeding allowed packet size, causing a kernel panic. This is behaviour of downstream driver as well, however I haven't observed bug reports about this SoC in the wild, so this commit's purpose is to align this chip with all other SoC's - MT7620 were already using this arrangement. Fixes: openwrt#9284 Fixes: 60fadae ("ramips: ethernet: ralink: move reset of the esw into the esw instead of fe") Signed-off-by: Lech Perczak <lech.perczak@gmail.com>
Failing to do so will cause the DMA engine to not initialize properly and fail to forward packets between them, and in some cases will cause spurious transmission with size exceeding allowed packet size, causing a kernel panic. Fixes: 60fadae ("ramips: ethernet: ralink: move reset of the esw into the esw instead of fe") Signed-off-by: Maxim Anisimov <maxim.anisimov.ua@gmail.com> [Provide commit description, split into logical changes] Signed-off-by: Lech Perczak <lech.perczak@gmail.com>
The ESW core needs to be reset together with FE core, so after the relevant reset controller lines are moved under FE, drop rst_esw and all related code, which would not execute anyway, because rst_esw would be NULL. While at that, ensure that if reset line for EPHY cannot be claimed, a proper error message is reported. Fixes: 60fadae ("ramips: ethernet: ralink: move reset of the esw into the esw instead of fe") Co-developed-by: Maxim Anisimov <maxim.anisimov.ua@gmail.com> Signed-off-by: Maxim Anisimov <maxim.anisimov.ua@gmail.com> [Split out of the bigger commit, provide commit mesage, refactor error handling] Signed-off-by: Lech Perczak <lech.perczak@gmail.com>
d7d7a11
to
f393ffc
Compare
Could you please create a pull request for 23.05 and 22.03 with these changes. |
On my way. |
This series fixes bootup and Ethernet support on rt305x subtarget. It combines the MAC core and switch core reset again, but using device tree properties, rather than reverting the changes introduced in 60fadae. However, reverting those changes or moving resets around in device tree wasn't enough - the wait duration after the reset needs increasing, to allow both cores to sync up after reset.
The final commit cleans up the way reset controller lines are accessed from the code, but is mostly aesthetic in nature.
As the issue was introduced before v22.03 release, this series needs backporting to 22.03 and 23.05 stable branches, possibly together with #14151, which also needs backporting to 23.05.
Build tested on: ramips/rt305x, ramips/mt76x8, ramips/mt7620
Run tested on: ZTE MF283+ (RT3352), Hame MPR-A2 (RT5350), TP-Link TL-WR802N v4 (MT7628N), Nexx WT3020 8M (MT7620)
Closes: #9284