New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SSH stops working over Wi-Fi on Belkin RT3200/Linksys E8450 with -rc6 #10405
Comments
@dangowrt ideas? |
Oh dear. I forgot I made a report here, too. I have lots more info at the OpenWrt forum posting at: https://forum.openwrt.org/t/ssh-over-wifi-stops-working-on-rt3200-e8450-with-22-03-0-rc6/133911/18 There also may be a similar report at: https://forum.openwrt.org/t/dir-2660-admin-ssh-unstable-through-wifi-22-03-rc6/134243 |
You might edit your report (and title) here to clearly specify that the bug seems to be wifi-only, and wired is ok for you. That suggests that wifi connections occasionally break in a way that breaks the TCP stream session for SSH, or something similar. |
Good point. I fixed both the title and added the Wi-Fi vs Ethernet note to the original report. Thanks. |
More data collected on the forum (see the OP): included here for completeness... Tons of new evidence (no
My summary of the evidence:
What's the next experiment? |
Sounds like a problem with connection tracking or more likely flow offloading which results in a stale DROP entry in either hardware or software tables. Starting with WED I would try to one by one disable offloading features and see if the problem persists. |
Hi Daniel, Thanks for your thoughts. I am using a plain-vanilla config - just install the RC, and opkg install htop/nano - no other configs save the LAN subnet and the Wi-Fi credentials. I am not sure how to disable offloading. Could you give me a quick rundown? Thanks again. |
Is this related? #10422 |
Not merged yet. @hauke do you think it could be relevant? |
1 similar comment
Not merged yet. @hauke do you think it could be relevant? |
Confirm this. Various mt7621 devices running 22.03-rc4, rc5. |
Good news over on the Forum. The problem went away when tweaking My next step is to re-flash RC6, then try jow's nftables fix (https://forum.openwrt.org/t/ssh-over-wifi-stops-working-on-rt3200-e8450-with-22-03-0-rc6/133911/38) |
I got the issue even the firewall was disabled (dumb AP). |
Update: The |
I do not see the problem on my Linksys E8450 running almost OpenWrt v22.03.0-rc6 (own build with additional packages from commit before tag) without using any mentioned workarounds. I am not using special offloading functionality. I have multiple SSIDs for multiple networks. I have some questions:
|
Hi Hauke,
I made the initial report. I summarized this report on the Forum: https://forum.openwrt.org/t/ssh-over-wifi-stops-working-on-rt3200-e8450-with-22-03-0-rc6/133911/63
**Update:** (I realize this text was hidden in the Github page) Other observations:
- I could reproduce the bug with 22.03.0-rc6. After installation, I used LuCI to install only nano and htop. I also reproduced on RC1, and RC4, I think. I might have tried a snapshot somewhere along the line - same failure. I could not try 21.02 - it does not support this router.
- My test procedure was;
- Using Ethernet cable, install a fresh copy of OpenWrt, don't keep settings.
- Use LuCI to install nano and htop.
- Use LuCI to enable Wi-Fi (I was using a 2.4GHz channel; 5GHz was disabled).
- Disconnect Ethernet, connect to the Wi-Fi
- ssh into the router, start htop and note the uptime
- Sometime later, I would come back to that terminal window and see that uptime was no longer counting up. The difference showed how long it ran.
- The freeze would occur between 5 and 90 minutes
- Even when the SSH session (over Wi-Fi) was frozen, Wi-Fi still worked: I could use the LuCI web GUI, and the router would pass traffic at full speed.
- And a SSH connection over Ethernet worked fine if I turned off the laptop's Wi-Fi and connect the Ethernet
- This seemed to be related to the duration of my laptop connection (not uptime). I could connect, disconnect after a few minutes, then re-connect hours later without problem. But if I stayed connected (running htop) on Wi-Fi the uptime shown on the screen would indicate how long the connection lasted before freezing.
- When I first noticed the problem, the connection would fail in 5-15 minutes. Around the time I posted the summary, it had run for over an hour before the SSH session froze. I have no explanation for this.
- I only saw it on my MacBook Pro (10.15.7). I didn't really try to make it happen with a Win10 or Linux laptop.
- Workarounds:
#1: https://forum.openwrt.org/t/ssh-over-wifi-stops-working-on-rt3200-e8450-with-22-03-0-rc6/133911/38
#2: https://forum.openwrt.org/t/ssh-over-wifi-stops-working-on-rt3200-e8450-with-22-03-0-rc6/133911/49
Thanks for thinking about this. Let me know if you have other questions.
Rich
---
Summary of problem and workarounds
When connected to my 22.03.0-rc6 router via Wi-Fi, I would see an SSH connection freeze after a while (minutes to perhaps an hour). I would run htop, and the output from the router would freeze. Furthermore, I could no longer log into the router, even from another SSH session over Wi-FI.
I was surprised to see that the router would continue to pass traffic from my computer to the internet, and I could continue to use the LuCI web interface even when the SSH sessions were hung up. And SSH connections via Ethernet worked just fine(!)
The symptoms seem to be caused by a combination of a bug in the MT76 driver and DSCP markings ("af21") recently introduced by dropbear. This group has proposed workarounds (#1 and #2 below), both of which seem to address the problem, but the real fix seems to require the MT76 driver update.
Another workaround seems to be simply turning Wi-Fi off and back on on my MacBook Pro - that immediately allowed me to SSH back in.
---
… On Aug 24, 2022, at 5:39 PM, Hauke Mehrtens ***@***.***> wrote:
I do not see the problem on my Linksys E8450 running almost OpenWrt v22.03.0-rc6 (own build with additional packages from commit before tag) without using any mentioned workarounds.
This is my production device so I prefer not to do debugging there.
The device has a uptime of 24 days and I can connect to it over ssh using wifi without problems.
The SSH connection stays up for at least 10 minutes of doing nothing.
I can also access OpenWrt devices behind this E8450.
I am not using special offloading functionality. I have multiple SSIDs for multiple networks.
I have some questions:
Could someone please provide a minimal configuration of a device where he sees this problem and describe how to run into it with OpenWrt 22.03.0-rc6 or later. Preferable without the need of a WAN connection.
Did someone run into this problem on devices other combinations than MT7622 + MT7915?
Did someone see this problem in OpenWrt master?
Maybe the problem is related Mediatek WED offloading.
Could someone please try this patch on top of openwrt 22.03: ***@***.***
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.
|
@csharper2005 can you try following patch?
|
the nft commands work but are not a long term solution. This issue happens with mt7622 and mt7915. It does not happen with mt7621 and mt7915 DBDC. The only difference I can see between both setups is Wireless Ethernet Dispatch, which is exclusive to mt7622. Maybe @nbd168 knows more. edit: just flashed 22.03-rc6 on my RT3200. I don't get this issue now. More investigation is needed. |
TODO openwrt#10405 Signed-off-by: Petr Štetiar <ynezz@true.cz>
@ynezz @hauke patch hauke@e480bf2 doesn't help. MTS WG420223 (mt7621, mt7615 dbdc). 22.03-HEAD. |
Was this issue reproduced on the 5 GHz wifi as well? |
I have not tried the 5GHz channel. (I did not enable it at all...) I have enabled a 5GHz SSID and will report back in an hour or two |
@nbd168 yeah, 2g and 5g both. |
@csharper2005 but only on a device with mt7615 DBDC, right? |
I managed to reproduce the issue on MT7615 by forcibly restarting aggregation on TID3 and quickly found the bug afterwards. Commit ec7d32f should fix it, please test. |
For someone who has not yet learned to build the OpenWrt software, how can I get an image (a snapshot?) that contains this commit? And is my procedure (install the firmware, install LuCI if necessary, install htop & nano, configure wireless, and then run |
@nbd168, It seems to me that the problem also was reproduced on the device with mt7613, but I'm not sure. I can't test it right now. WiFire S1500.NBN with mt7602, mt7612 is not affected. |
mt7613 is handled by the same driver as well. mt7602/mt7612 is a different one |
@richb-hanover when you're not building images for yourself, just wait for the next snapshot build and flash it. I think your test should work fine if it was able to reproduce the issue before. |
great, thanks for testing! |
Add new -z commandline option which when set, disables new IP TOS feature. References: openwrt/openwrt#10405 Signed-off-by: Petr Štetiar <ynezz@true.cz>
As reported in: https://forum.openwrt.org/t/ssh-stops-working-on-belkin-rt3200-linksys-e8450-with-rc6/133911
I installed OpenWrt 22.03.0-rc6 on a Belkin RT3200 router. The Wi-Fi and traffic all seem fine, and I can SSH in. But...
Shortly after a reboot, the SSH sessions freeze. At that point, I cannot establish any new SSH logins. This occurs within a few minutes of a reboot, or perhaps as long as 20 minutes. This is repeatable. (This also happened with -rc4 - I updated to -rc6 before reporting the problem.)
A reboot clears up the SSH problem - I can log in as expected (for a while). In the meantime, even when the SSH process is frozen, the LuCI GUI works as expected, and the router passes traffic normally.
Update: My current test is to ssh into the router and run htop. The Uptime: value shows how long the SSH session runs before freezing which tends to be about 3-10 minutes.
Update 2: I have some evidence that the problem is being caused by being SSH'd in. The router runs fine without any SSH connections, but if I wait a while (hours) then SSH in, the SSH session will hang shortly thereafter.
Update 3: When SSH sessions over Wi-Fi lock up/fail/cannot be reestablished, I can ssh in via Ethernet.
What other troubleshooting information could I provide? Thanks.
The text was updated successfully, but these errors were encountered: