New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workqueue: events_freezable mmc_rescan crash with Raspbian kernel 4.14.79-v7+ #2810
Comments
|
Same problem on a 3B after upgrading to 4.14.79-v7+ |
|
Hmm, what are the Pi's doing during that time? We haven't seen anything like this in the office, so would be interested to know what they are doing that might cause this. |
|
Actually, this is a duplicate of an issue on our Linux tracker which is the correct place for it to be. Please continue conversation there...#2810 |
|
Bum, closed wrong one. Reopened. Will now attempt to close the one in firmware... |
|
So the machines in question are running LXDE, a Java application, and the onboard keyboard. |
|
That doesn't tell me anything about their level of activity - CPU, file system, network etc. |
|
Same issue here. Attached is my kernel log. |
|
Happened again mid movie. Attached is another kernel log |
|
I've so far found some advice from https://www.blackmoreops.com/2014/09/22/linux-kernel-panic-issue-fix-hung_task_timeout_secs-blocked-120-seconds-problem/ |
So far so good. |
|
Just going say this thread saved my pi 3b+. Was hanging terrible on any intense IO. The solution by Alexurb fixed the problem for me. |
|
Just ran into this with kernel 5.4.51-v8 aarch64 on Pi 4. Ill try the fix recommended above in this thread: Here's the dmesg errors: |
May fix a lockup observed against Pi 4 64 bit: INFO: task kworker/1:0:1663 blocked for more than 120 seconds. Workqueue: events_freezable mmc_rescan Call trace: __switch_to+0x110/0x180 __schedule+0x2f4/0x750 schedule+0x44/0xe0 __mmc_claim_host+0xb8/0x210 mmc_get_card+0x38/0x50 mmc_sd_detect+0x24/0x90 mmc_rescan+0xc8/0x390 process_one_work+0x1c0/0x470 worker_thread+0x50/0x430 kthread+0x100/0x130 sets syctl values for pi boards: +vm.dirty_background_ratio = 5 +vm.dirty_ratio = 10 Reference: raspberrypi/linux#2810 Signed-off-by: Christian Stewart <christian@paral.in>
May fix a lockup observed against Pi 4 64 bit: INFO: task kworker/1:0:1663 blocked for more than 120 seconds. Workqueue: events_freezable mmc_rescan Call trace: __switch_to+0x110/0x180 __schedule+0x2f4/0x750 schedule+0x44/0xe0 __mmc_claim_host+0xb8/0x210 mmc_get_card+0x38/0x50 mmc_sd_detect+0x24/0x90 mmc_rescan+0xc8/0x390 process_one_work+0x1c0/0x470 worker_thread+0x50/0x430 kthread+0x100/0x130 sets syctl values for pi boards: +vm.dirty_background_ratio = 5 +vm.dirty_ratio = 10 Reference: raspberrypi/linux#2810 Signed-off-by: Christian Stewart <christian@paral.in>
May fix a lockup observed against Pi 4 64 bit: INFO: task kworker/1:0:1663 blocked for more than 120 seconds. Workqueue: events_freezable mmc_rescan Call trace: __switch_to+0x110/0x180 __schedule+0x2f4/0x750 schedule+0x44/0xe0 __mmc_claim_host+0xb8/0x210 mmc_get_card+0x38/0x50 mmc_sd_detect+0x24/0x90 mmc_rescan+0xc8/0x390 process_one_work+0x1c0/0x470 worker_thread+0x50/0x430 kthread+0x100/0x130 sets syctl values for pi boards: +vm.dirty_background_ratio = 5 +vm.dirty_ratio = 10 Reference: raspberrypi/linux#2810 Signed-off-by: Christian Stewart <christian@paral.in>
My card hasn't failed since I made the change on April 4 of this year. I've been using the same SanDisk microSD card I bought new since mid-2018. I think you only need to worry about microSD card life if you're using super cheap or no name cards. If you use a high tier SanDisk you should be just fine. |
|
Seeing the same, seems to be a memory leak in lxpanel causing it to run out of memory and start swapping until it dies. Straight after reboot: See also https://www.raspberrypi.org/forums/viewtopic.php?t=267015 |
|
I haven't noticed this problem on my 3B+ since July 21 of last year, which is just before The Foundation engineer announced a fix had been committed. Are you fully patched and updated on Buster and still seeing the issue? One thing I'm thinking based on both my repo issue thread and the Raspberry Pi Forum one you linked to (thanks!) is this could be triggered by having the CPU % lxpanel plugin AND a process that causes CPU % to spike for a significant length of time, e.g. |
|
This is on a 1GB 3B+ with a freshly installed and updated raspbian image, so should it is buster: It was crashing every ~6 hours from high lxpanel memory usage until I set up a script to restart lxpanel once an hour. I also added: Besides lxde there's influxdb and a autostart chrome with grafana running, plus I'm using the official 7" DSI touch display. |
|
Actually it's still crashing from lxpanel it seems, so the hourly restart isn't good enough to prevent whatever lxpanel is doing. htop excerpt: Kernel messages excerpt: |
|
Hmmm ... I would suggest disabling |
|
So I've enabled memory cgroups + swap on zram and put lxpanel into a limited cgroup. Time for some investigation: Let's try looking at that core file So I think it's likely that it's the bluetooth lxpanel plugin that is leaking memory in my case, I'll try disabling the bluetooth plugin and see how it behaves after that. |
Technically it looks more like a lockup from out-of-memory, not a crash. There's actually two separate issues here:
|
|
@kmark in earlier testing (vmstat / iostat), I never saw memory run out. I can crash it by simply exhausting memory but I don't think that's what's happening, at least in my case. Another interesting thing to note is that I can sometimes tell if it's going to happen as load starts to increase higher than usual. The interesting part is I can start killing off processes and that will often resolve it, load will go back down. This does make me think it could relate to your thought of hitting it with a fan. I may try getting a thermal camera on it and seeing if there's some external temp at which it seems to crash (though I'll likely need a blackbody / thermal reference). I'm also running Regardless I now have a test running spitting out |
|
|
Here is the last 500 samples. I was sampling at 2 times per second, but the last 10 samples started slowing down and there was a 10 minute gap between the last 2 samples (simply due to load). Interestingly temperature was pretty steady at 80C the entire time but went down during the issue (where load spikes), leading me to believe it's more related to a race condition or SD/IO issue. Here were some initial temp, load, and meminfo valuesAnd here were the last values |
Hmmmm .... is anyone with this problem not running the OS on an SD card? Maybe that's the common denominator here. |
|
@jdrch the issue is when the SD card stops responding or otherwise has some kind of error, and mmc_rescan gets stuck. |
|
@ProactiveServices to confirm, you have NOT experienced this using a USB SSD? You are the only user who has stated using a USB SSD but I see you were discussing a different issue. Curious if you have had this issue using an SSD? Looks like this user resolved the issue by switching to USB SSD, but what is strange is that the issue seems kernel version dependent. raspberrypi/firmware#1522 |
I have not seen this issue since moving to SSDs. I mistook this issue for another. I moved away from SD cards because they were too unreliable! |
|
@jdrch Yup, running HASS with Unifi addon here on SD card, hitting it on 5.15.5-1-ARCH |
|
I have switched the Pi I've been hitting this with over to a USB SSD boot drive (no microSD present). I've also reinstalled Ubuntu 20.04.3 since I had low confidence the filesystem was not corrupt after all those deadlocks. I switched the rootfs to btrfs too but I don't expect that to impact anything as the issues we were seeing seemed below the filesystem level. I am planning on running a similar workload on it. If anything, it'll be used more. Will report back. Hopefully this is a usable workaround. |
|
I'm running homeassistant on Pi4, after 2 SD cards destroyed in relatively short time I changed to USB SSD, and zero issues after months. If the distro used does a lot of writing (log files, db updates, etc) expect the sd cards not to last (that's been my experience). |
|
A few observations from my side to the mmc_rescan event: Here is my setup:I started with a traditional sd card, but recognized the speed would be to slow to run iobroker. So I decided to run the entire OS and everything else from an external USB-SSD. I created a 1:1 copy of the SD card on the USB. consequently, /boot was mounted via fstab to the boot sector on the SD-Card Given this setup, there should be no or very low file activity on the SD-Card. When I initially created the USB-SDD disk setup of my iobroker Instance, I used the remaining space on the SD-Card as swap space. I assumed swap to become slow, but keeping an eye on it should be sufficient. I just started to use watchdog in a test mode and as you can see, watchdog did not run the test script after 17:59. The entry at 20:27 was created after the reboot of the Pi. But that might (or might not) be another issue. |
|
FWIW I haven't seen this problem on my 3 B+ since I updated to Raspberry Pi OS 11.x. Currently: I would encourage everyone to update and see what happens. And yes, I updated in place with no issues at all using this method. |
|
I still have this bug with 5.15.30-v8+ 64 bit Raspberry Pi 3B+.... the system freezed and resumed after 3 hours of freeze. |
…mpound Huge vmalloc higher-order backing pages were allocated with __GFP_COMP in order to allow the sub-pages to be refcounted by callers such as "remap_vmalloc_page [sic]" (remap_vmalloc_range). However a similar problem exists for other struct page fields callers use, for example fb_deferred_io_fault() takes a vmalloc'ed page and not only refcounts it but uses ->lru, ->mapping, ->index. This is not compatible with compound sub-pages, and can cause bad page state issues like BUG: Bad page state in process swapper/0 pfn:00743 page:(____ptrval____) refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x743 flags: 0x7ffff000000000(node=0|zone=0|lastcpupid=0x7ffff) raw: 007ffff000000000 c00c00000001d0c8 c00c00000001d0c8 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: corrupted mapping in tail page Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.18.0-rc3-00082-gfc6fff4a7ce1-dirty #2810 Call Trace: dump_stack_lvl+0x74/0xa8 (unreliable) bad_page+0x12c/0x170 free_tail_pages_check+0xe8/0x190 free_pcp_prepare+0x31c/0x4e0 free_unref_page+0x40/0x1b0 __vunmap+0x1d8/0x420 ... The correct approach is to use split high-order pages for the huge vmalloc backing. These allow callers to treat them in exactly the same way as individually-allocated order-0 pages. Link: https://lore.kernel.org/all/14444103-d51b-0fb3-ee63-c3f182f0b546@molgen.mpg.de/ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Cc: Paul Menzel <pmenzel@molgen.mpg.de> Cc: Song Liu <songliubraving@fb.com> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
... hardware issue, removed comment since it's not relevant. |
Welp, within the past 2 days I had my 1st recurrence of this issue. |
|
If this happens your SD card is probably bad. Best replace it. Even relatively new SD cards can go bad, because they are not designed for repeated continuous writes to the same location over & over, nor continuous 24/7 operation. I have had the best luck with The kernel should be updated nevertheless with better error handling, in my opinion. |
Seems you are right. Can't even clear the partition table on the darn thing now. Will order some new cards. Thanks for the advice on what works well for you. Will also update my comment. |
|
I did change the sysctl setting as described here: https://www.blackmoreops.com/2014/09/22/linux-kernel-panic-issue-fix-hung_task_timeout_secs-blocked-120-seconds-problem/ I noticed this when I used VS Code's SSH Remote Extension which install a bunch of stuff in the remote server. This cause a HUGE use of CPU by "node" program, and then everything freezes, I couldn't even connect via SSH because the server is freezed. My remote server is a simple Raspberry Pi 3 B+, so not very powerful. Something curious: when all this happens, I noticed in both my host machine and remote server a process "kswapd0" that consumes much CPU too, can this be related? Also, if the scheduler sends interrupts constantly to context switch and to pass to another process, so why a certain process that consumes too much CPU can freeze the pi? Shouldn't scheduler go on with other processes equally? Why can it monopolize the CPU and freeze the pi? |
|
@All3xJ I'm regularly running entire Gentoo builds on the pi itself that completely saturate all cpu cores, and yet never run into this error after replacing the faulty microsd cards. Try a brand new card. |
I believe you are running out of memory, which is why kswapd is going mad, rather than running out of CPU power. |
|
@JamesH65 is probably right, you're out of memory. My bad, I assumed that the comment was related to this issue and not some other unrelated thing :) |
All storage devices fail eventually ;) But yes, I have had the best luck with SanDisk, and their warranty (both length and replacement policy) is the best in the business. Last Friday I retired the 32 GB card I'd been using since I got the Pi in 2018 and replaced it with a 200 GB card. We'll see what happens from there. For those wondering about the process for doing so:
|
|
Use less memory? Use Pi4 with more memory? Use a USB SSD and put a swap file on it, that might be faster than swap on the SD card? What is the Pi doing that is using all the RAM? |
I noticed this when I used VS Code's SSH Remote Extension which install and runs a bunch of stuff in the remote server. This cause a HUGE use of memory, and then everything freezes, I couldn't even connect via SSH because the server is freezed. EDIT: I fixed by increasing the swap partition size! |
|
I would like to thanks everyone for all the comments i've found on this page. My system is a plain debian 11, with /boot/firmware and / mounted on sdcard; /home, /var and /tmp are mounted on ssd. |
FWIW replacing the microSD card has fixed the problem. I think I've experienced only 1 crash since. |

I have a few Raspberry Pi 3 B+ exhibiting the same problem. They crash after 2-3 days of uptime with the following error:
[169451.220021] INFO: task kworker/0:3:10949 blocked for more than 120 seconds.
[169451.220036] Tainted: G C 4.14.79-v7+ #1159
[169451.220041] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[169451.220048] kworker/0:3 D 0 10949 2 0x00000000
[169451.220077] Workqueue: events_freezable mmc_rescan
[169451.220110] [<8079ef70>] (__schedule) from [<8079f5d8>] (schedule+0x50/0xa8)
[169451.220130] [<8079f5d8>] (schedule) from [<8061a2d0>] (__mmc_claim_host+0xb8/0x1cc)
[169451.220147] [<8061a2d0>] (__mmc_claim_host) from [<8061a414>] (mmc_get_card+0x30/0x34)
[169451.220163] [<8061a414>] (mmc_get_card) from [<80623010>] (mmc_sd_detect+0x20/0x74)
[169451.220179] [<80623010>] (mmc_sd_detect) from [<8061ccdc>] (mmc_rescan+0x1c8/0x394)
[169451.220197] [<8061ccdc>] (mmc_rescan) from [<801379b4>] (process_one_work+0x158/0x454)
[169451.220212] [<801379b4>] (process_one_work) from [<80137d14>] (worker_thread+0x64/0x5b8)
[169451.220227] [<80137d14>] (worker_thread) from [<8013dd98>] (kthread+0x13c/0x16c)
[169451.220246] [<8013dd98>] (kthread) from [<801080ac>] (ret_from_fork+0x14/0x28)
The machines are running Rasbian Stretch
The text was updated successfully, but these errors were encountered: