New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Router randomly halts after connect/disconnect to wireless AP #21
Comments
Thanks. We will check it. |
I reported this two months ago in the openwrt forum: https://forum.openwrt.org/viewtopic.php?pid=268305#p268305 This seemed to occur periodically while on kernel 3.8.x. Ever since moving to 4.0 I haven't experienced this.
|
Other people confirm that kernel version changes nothing.
Syslog and backtrace: https://forum.openwrt.org/viewtopic.php?pid=276340#p276340
Source: https://forum.openwrt.org/viewtopic.php?pid=276521#p276521
Syslog: https://forum.openwrt.org/viewtopic.php?pid=277217#p277217
Backtrace: https://forum.openwrt.org/viewtopic.php?pid=277240#p277240
Log: https://forum.openwrt.org/viewtopic.php?pid=282759#p282759
Backtrace: https://forum.openwrt.org/viewtopic.php?pid=283041#p283041 |
Thanks for your information. |
BTY, I've seen that error mss before (it's back again} https://forum.openwrt.org/viewtopic.php?pid=261726#p261726' 👎 : |
Experienced a freeze here as well:
|
It looks like this issue affects latest stock firmware too. Reported multiple random halts for version 1.1.9 [1] and multiple random reboots [2]. [1] https://forum.openwrt.org/viewtopic.php?pid=279651#p279651 |
@yuhhaurlin not sure if this helps or not, but one of the DD-WRT devs said this is related to nvram corruption. See link: http://www.dd-wrt.com/phpBB2/viewtopic.php?t=256298&start=585 (3rd post from bottom) |
Any update on this? I just updated to trunk today and within 5 hours I received a stall message: | |.-----.-----.-----.| | | |.----.| |_ CHAOS CALMER (Bleeding Edge, r46364)
Tue Jul 14 16:11:10 2015 kern.err kernel: [ 7791.964027] INFO: rcu_sched self-detected stall on CPU { 0} (t=6000 jiffies g=82522 c=82521 q=18045) |
Hi, FYI Running Openwrt RC2 on my wrt1900ac, I intermittently experience this issue, it actually just happened. sometimes it's once a day sometimes it happens a couple of times a day. I have a cronjob which reboots the router everyday at 05:00 and even with the reboot the router still hangs intermittently. Is there anything I can do to help? Thank you, |
Running r46314 from trunk I've run into this issue as well. Hardware version 1. It's apparently triggered by calling "iwinfo" and "iw dev wlan0 station dump" also "wifi" , "wifi down", "wifi up" seem to be problematic. Here are my logs:
|
Now that issue #20 is resolved, any plans to work on this one? I just got another stalled message on the ieee80211_iface_work process. This is on a recent trunk with the new wifi firmware 7.2.8.6.
|
Thanks for your information. |
I have been getting these since I bought the WRT1900ACv1. I actually got the same behavior with the stock firmware where the WiFi stops working and only a hard reset fixes the issue. I installed OpenWRT in hopes that this would go away (and for some of my PhD research related to WiFi) but the problem persisted. I have just recently installed the latest version of OpenWRT CC RC3 and still observe this behavior. Though I haven't taken the time to log everything and trace the root cause, it seems to me that this occurs when I use the 802.11ac radio and transfer a decent amount of data at ac's high-throughputs. As a result, I have increased the fans script interval to a shorter period to see if this was a heat related issue. This hasn't seem to help the issue. I have also written a cron job that reboots the server once every other day. This may be masking the issue, also I think that there are times when the system completely stalls and so the script can't run. I am going to setup persistent logging to a syslog server so I can maybe better understand this issue and help resolve it. If anyone has any suggestions for additional logging to enable I would be interested to hear them. |
@RedShamilton it looks like the latest driver is not in RC3 yet if I read this correctly. |
@kevle yes, according to this post it does seem that way. Do you have a link to an erratic of fixes this latest driver addresses? Also, I have begun collecting logs now on my syslog server so maybe I can see what is happening as well. However, if this latest driver does indeed fix this issue, I guess there is no point in my putting in effort to discover what is happening on the old driver version. |
I have tried the latest driver, basically works fine. Better than the version before.
I have tried to use different Channel and Transmit Power, same problems. |
@RedShamilton, I posted this question on the wrt1900ac as I am in the same boat as you, always experiencing a lookup regardless of the build: https://forum.openwrt.org/viewtopic.php?pid=287177#p287177 |
I was finally able to view my version of this occurrence (netconsole would have been very helpful) using dmesg -c and a loop via ssh... Is this the same issue or a different one? CHAOS CALMER (Bleeding Edge, r46584)
|
A better crash happened last night.
|
Has anybody tried building the kernel with lockdep, spinlock debugging etc.. enabled ? it looks like it might be deadlocking on a driver internal spinlock in the tx path (mwl_tx_xmit), possibly as a result of interrupting an existing lock holder (ie, driver locking bugs ?) though it's not clear as the driver doesn't seem to show up in the call chain. Maybe an A->B/B->A deadlock with the other processor. |
@ozbenh They redid a bunch of that code in 10.3.0.8. Basically, waiting for the next version for more testing :( |
Instead of an RC a full release OpenWrt Chaos Calmer 15.05 / LuCI Master (git-15.233.47308-791ca8b) Let's see if this resolves the WiFi crash issues... |
don't hold your breath. Nothing has changed with the wifi driver. The final release of Chaos Calmer has no bearing on the state of the wifi driver. |
Router locked up twice in a span of 12 hours... I've had 5 lock ups since going to CC. Thinking about reverting back to RC3. Another complete lockup --- just happened. Fri Sep 18 16:10:23 2015 kern.err kernel: [208175.526942] INFO: rcu_sched self-detected stall on CPU { 0} (t=6000 jiffies g=1751521 c=1751520 q=355) |
I disabled WMM Mode on al the SSIDs and had 22 days uptime then I switched on WMM mode again and within a day the router locked up I have many different clients connected: Macbooks, Imacs, Linux laptops, android phones, iphones etc I usually have around 40 clients connected at any one time during the day. |
@kylejvrsa Stock or Openwrt/DD-WRT? |
@kylejvrsa How about version 10.3.0.10? |
I am running CC final and I can definitely say that it has something to do with WMM mode. As I am running CC Final it is 10.3.0.3 driver that I am using. Unfortunately the router is somewhat in a production environment so I am unable to test the latest driver as I have read that other seem to still have issues with it and I only have the one router. However disabling WMM mode is a huge speed drop in wifi, I have traded of speed for stability and the people are happy. When WMM mode is enabled it doesn't take long to crash and what I have noted is that it's the people that let their devices go into sleep/hibernate that are having the most issues with WMM (when coming back from sleep/hibernate mode they seem to be connected for a little while and lose connection but device still says connected...a wifi on and off on end device gets them connected again) and might also be the ones causing the router to lock up...but I am speculating here now... |
Does anyone still encounter this problem with 10.3.0.14? |
@yuhhaurlin I can't test this as I am running CC Final on wrt1900ac v1 need to get a driver compiled for wrt1900ac with cc final. Will see if I can get hold of one and revert back on my findings :) |
@kylejvrsa |
Using username "root". BusyBox v1.24.1 (2015-11-10 00:30:38 CET) built-in shell (ash) root@AC-DD-WRT: using 10.3.0.14 Firmware Version No problems here 👍 |
I think I have been the most vocal with his bug as I have posted several kernel logs on this issue. I can say with confidence this issue appears to be behind us (knocks on wood). I've been running for over 6 days without an issue since using the new mwlwifi driver. Previously I would get stalls every day. |
@tusc |
It was on 10.3.0.13 with Openwrt trunk on 4.1.12. I have since upgraded to 10.3.0.14. |
thanks - def. useful to know. Would be good to also compare people's experiences with a 3.18.2x kernel for if there's a 15.05.1 release of CC. I also note that a recent driver was submitted upstream to the kernel devs and comments were made about some of the locks and locks types used and nearness of TX and RX variables, although of which might be contributing to various issues (and maybe the irq one too), so progress at least is being made - credit to Marvell. |
This was just closed, was the issue resolved? If so, in what version of the driver/kernel? |
It looks like no one reported this problem on 10.3.0.14, I close it. If anyone still finds this problem, he can reopen it. |
I'm still running 15.05 final and experiencing this issue. Is it possible to upgrade only the kmod-mwlwifi to v10.3.0.14 without doing any other updates? |
@NemoAlex My opkg repos are still pointing at the 15.05 release URLs which still contains v10.3.0.3 and if I update my repo URL I'll get more than just this package (which is undesirable).
EDIT: Ignore this, I misread Nemo's reply and got it working by opkg installing the provided file. |
@PHLAK You can just extract the contents of the IPK file and copy the contents manually. That's what I did, b/c I'm still on RC3. |
@PHLAK - any luck with the new driver? I'd had pretty good luck so far, but had a couple of hard crashes in the last few days which make me think I might be suffering from this as well; wondering if I should give that a try. It's worth noting that non-committers can't reopen issues, so it's stuck closed. |
@glyph With the new driver I don't get any more hard lock ups that require restarting the router. However, it seems like I get "soft" crashes occasionally where my devices stop having access to the network/internet even though they never drop their WiFi connection. This usually lasts anywhere from 10 seconds to a minute or two then things go back to normal without any action on my part. It's not perfect, but it's better than where I was before. |
@PHLAK This, exactly! I'm on a WRT1200AC, and I never experienced hard crashes, but I have always received these "soft crashes" and still do on v10.3.0.14. I always reboot the router when we catch it, because for us, it seems to last longer than a minute or so, but connected devices stay connected to Wifi with no internet connectivity. Disconnected devices are at this point unable to connect. We catch it once every two days or so, and reboot the router to resolve it. I'm curious what logs would be liable to catch anything useful, because I feel that this should be reported. I've been lazy and just kept hoping someone else would bring it up, so thank you. If anybody can identify helpful logs, I'd be happy to post them and a new issue. |
I'll grab some logs as well if someone knows what I should grab. |
I'm not sure if this is the same problem or not, but at least one client on my network seems to "hang" and will only transfer at ridiculously slow speeds, effectively making it impossible to use without rebooting. I can't seem to reproduce the issue by doing anything in particular. I just have to wait for it to happen.
As of today, I've set vm.min_size_kbytes=16384, vm.swappiness=100 and vm.drop_caches=3, which seem to have limited the problems to just 2.4GHz whereas before I was experiencing the problem on both bands. NB: I have a USB drive with a 1GB swap partition that I've never seen the router use, despite the "no memory" errors. |
@jbeagley52 Please test with latest driver. If you have memory problem, please check issue #52. |
Halt means router doesn't respond to ping wired or wireless. The only way to recover is power reset. Most often triggered after wireless clients connect or disconnect to router. I think this may be related to #20.
Environment: WRT1900AC v1, trunk default build r45601, kernel 3.18.11.
Syslog backtrace:
This trace repeated again at
Mon May 18 23:22:42 2015
with some differences in first lines:Cross-post: https://forum.openwrt.org/viewtopic.php?pid=276687#p276687
The text was updated successfully, but these errors were encountered: