Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ramips: mt7530 fix #2847

Closed
wants to merge 3 commits into from
Closed

ramips: mt7530 fix #2847

wants to merge 3 commits into from

Conversation

ptpt52
Copy link
Contributor

@ptpt52 ptpt52 commented Mar 20, 2020

  1. Fix the mt7530 tx timeout issue
    I think the tx timeout is caused by the wrong implement of fe_empty_txd() function
    for hours test no tx timeout

  2. Disable eee to fix the link down/up problem
    https://bugs.openwrt.org/index.php?do=details&task_id=1449
    tested ok and sure fixed

@981213
Copy link
Member

981213 commented Mar 20, 2020

For the first commit: You should make your commit message more specific. You should at least briefly describe what that error is.
As for the other two: This mt7530/mt7621 driver will be dropped after linux 5.4 support in ramips target is merged. There's no need to patch them anymore.

@adschm
Copy link
Member

adschm commented Mar 20, 2020

@981213 Maybe fixing the driver would still be relevant for backporting to 19.07, which is also on kernel 4.14?

@LearZhou
Copy link

It's a good thing if the issue is really fixed.

I'd suggest keep 4.14 kernel an option for ramips, since someone may want to enjoy the hw offloading that's only on 4.14 kernel so far.

@neheb
Copy link
Contributor

neheb commented Mar 21, 2020

The last commit is no good. The comment is incorrect (when flow control is disabled).

People on forums are reporting good results with https://patchwork.ozlabs.org/patch/1186211/ . Interrupt errors (as reported by /proc/interrupts) go down to 0.

@981213 981213 added needs changes target/ramips pull request/issue for ramips target labels Mar 21, 2020
@adschm
Copy link
Member

adschm commented Mar 21, 2020

I'd suggest keep 4.14 kernel an option for ramips, since someone may want to enjoy the hw offloading that's only on 4.14 kernel so far.

Since switching to 5.4 will introduce a lot of changes (switch driver, DTS, ...), maintaining 4.14 in parallel does not seem very desirable to me. It might create a lot of waste/duplicate work. Despite, any new release will be single-kernel anyway. If you want hw offloading, just rely on the 19.07 stable release with kernel 4.14 until it's EOL.

@981213
Copy link
Member

981213 commented Mar 21, 2020

@981213 Maybe fixing the driver would still be relevant for backporting to 19.07, which is also on kernel 4.14?

Good point.

@ptpt52
It's mainly commit message issues here then. I'll merge the first 2 commits after commit messages are fixed.
1st commit needs a description of the issue.
2nd commit needs a "Fixes: FS#1449" tag and I think we could remove the long reference URL in the commit message.

I'd need to check SDK driver behavior for the 3rd commit.

@ptpt52
Copy link
Contributor Author

ptpt52 commented Mar 21, 2020

@981213
hi, it seems something wrong in the fe_cal_txd_req() function
why it return DIV_ROUND_UP(nfrags, 2)
it should return nfrags

syb999 added a commit to syb999/openwrt-15.05 that referenced this pull request Mar 21, 2020
This disable eee for mt7530 ports, it causes the link down/up
issue, which happens when connecting to 100Mbit switch

Fixes: FS#1449

from openwrt/openwrt#2847
@981213
Copy link
Member

981213 commented Mar 22, 2020

@981213
hi, it seems something wrong in the fe_cal_txd_req() function
why it return DIV_ROUND_UP(nfrags, 2)
it should return nfrags

It should not. One Tx DMA descriptor can take up to two fragments.

@ptpt52
Copy link
Contributor Author

ptpt52 commented Mar 22, 2020

@981213
hi, it seems something wrong in the fe_cal_txd_req() function
why it return DIV_ROUND_UP(nfrags, 2)
it should return nfrags

It should not. One Tx DMA descriptor can take up to two fragments.

you're right..

…default

This revert c8f8e59
The TX/RX flow control is not the cause of the TX timeouts issue

Signed-off-by: Chen Minqiang <ptpt52@gmail.com>
fe_empty_txd() should return `tx_ring_size - 1` on ring empty, and
return 0 on ring full.

Signed-off-by: Chen Minqiang <ptpt52@gmail.com>
This disable eee for mt7530 ports, it causes the link down/up
issue, which happens when connecting to 100Mbit switch

Fixes: FS#1449

Signed-off-by: Chen Minqiang <ptpt52@gmail.com>
@mrkiko
Copy link
Contributor

mrkiko commented Mar 23, 2020

but out of curiosity - what's the direction we are going to take regarding hw offload?
I.e.: are we planning to implement it some day or are we letting go of it ? thanks!!

@LearZhou
Copy link

No practically possible for now, since the new DSA driver introduced non-compatible markers which made it impossible for hardware to handle it, as one of the developers said in other pull request comment.

So I wouldn't hold my breath just yet.

@resadent
Copy link

resadent commented Mar 27, 2020

Hi. I compiled latest master with your patches applied, for me the problem still persists. Wasn't happening with my other router I was using as a backup.

[ 4739.463088] mtk_soc_eth 1e100000.ethernet eth0: port 1 link down
[ 4742.352628] mtk_soc_eth 1e100000.ethernet eth0: port 1 link up
[ 4746.087402] mtk_soc_eth 1e100000.ethernet eth0: port 1 link down
[ 4748.759444] mtk_soc_eth 1e100000.ethernet eth0: port 1 link up

@LearZhou
Copy link

You might want to also apply https://github.com/openwrt/openwrt/pull/2843 and see what happened.

BTW, you didn't mention what router are you using for tests.

@resadent
Copy link

resadent commented Mar 27, 2020

@LearZhou I thought about doing that, but I thought the patches would conflict with each other since they modify the same files. Also thought about doing that by hand but didnt feel like it. Maybe Ill have a try.

I used a xiaomi r3g.

@yoonsikp
Copy link

@reasdent you should probably try forcing the disabled flow control by keeping the reverted commit 04622a3

@resadent
Copy link

@yoonsikp I could do that, but the idea of this commit is to fix the issue without that commit, or isnt it?

@ptpt52
Copy link
Contributor Author

ptpt52 commented Mar 28, 2020

Hi. I compiled latest master with your patches applied, for me the problem still persists. Wasn't happening with my other router I was using as a backup.

[ 4739.463088] mtk_soc_eth 1e100000.ethernet eth0: port 1 link down
[ 4742.352628] mtk_soc_eth 1e100000.ethernet eth0: port 1 link up
[ 4746.087402] mtk_soc_eth 1e100000.ethernet eth0: port 1 link down
[ 4748.759444] mtk_soc_eth 1e100000.ethernet eth0: port 1 link up

In my case, the link down up occurred frequently when connected to a 100 Mbps device, and then after eee was disabled, it no longer happened. I actually tested it.

@resadent
Copy link

resadent commented Mar 28, 2020

In my case, the link down up occurred frequently when connected to a 100 Mbps device, and then after eee was disabled, it no longer happened. I actually tested it.

I'll recompile then just in case. I compiled the build with -O2 instead of -Os, but I don't think that is causing the patch not to work.

Edit: fw in place, hopefully it will be fine.

@resadent
Copy link

I'm afraid to say that for me the problem still persists. Just had a random disconnect after 25h uptime :(.

@ptpt52
Copy link
Contributor Author

ptpt52 commented Mar 31, 2020

According to various feedback, the random down up issue still exists when connecting to 1000Mbps devices, but the thing that can be confirmed is that the down up issue that occurs when connecting to 100Mbps devices is fixed

@resadent
Copy link

resadent commented Mar 31, 2020

I am sorry to disagree once again, but my testing was performed with a 100mbit link.

Edit: actually I am not that sure. it may have been my (other) wan 1gbit connection to the modem that got disconnected. I guess I have no way to find out. And iirc this also happened when the connection to my switch was 1gbit so yeah.

@LearZhou
Copy link

Haven’t encountered one so far since applied your patch without revert part.

But the last session only run more than 5 days, before reboot due to network reconfiguration.

@ptpt52
Copy link
Contributor Author

ptpt52 commented Jul 10, 2020

outdated
close and I may post another PR if..

@ptpt52 ptpt52 closed this Jul 10, 2020
@ptpt52 ptpt52 deleted the mt7530-fix branch July 10, 2020 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs changes target/ramips pull request/issue for ramips target
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants