kswapd 100% CPU-usage #219

Closed
WereCatf opened this Issue Mar 9, 2016 · 36 comments

Projects

None yet

9 participants

@WereCatf
Contributor
WereCatf commented Mar 9, 2016

I don't know what's wrong with this, but e.g. when testing a PSOne-emulator on the OPi PC I noticed one CPU-core was pegged 100% by kswapd. I also noticed it happening during some compiles and stuff. I hear jernej knows something about this, someone who can contact him should possibly ask what's going on.

@igorpecovnik
Owner

Yes, it happend to me once while just playing around on desktop, video ... so far I was not able to recreate ... Clearly something is wrong.

@WereCatf
Contributor
WereCatf commented Mar 9, 2016

It doesn't actually seem to be related to swap, despite it being kswapd that goes nuts -- every time I've seen it happen so far there has been literally 0 bytes in swap.

@kubajar
kubajar commented Mar 9, 2016

Maybe it's related to https://bugzilla.kernel.org/show_bug.cgi?id=65201

I can see 100% kswapd cpu usage if I upload about 100 MB of data to orange pi 2 over samba.

Temporary solution:
root@orangepi2mini:~# cat /bin/cpuload.sh
#!/bin/sh
CPU2=top -b -n 1 | grep kswapd | awk '{print $9}'
CPU=${CPU2%.*}
if [ $CPU -gt 90 ]; then
echo 3 > /proc/sys/vm/drop_caches
echo $CPU
fi

crontab -l
*/5 * * * * /bin/cpuload.sh

I will test some suggestions in mentioned thread and post results.

Have a nice day.

JK

@WereCatf
Contributor
WereCatf commented Mar 9, 2016

@kubajar I already tested that, but I am not getting any results. Are you?

@kubajar
kubajar commented Mar 10, 2016

What I did:
sysctl.conf: vm.swappiness=0
fstab: #/var/swap none swap sw 0 0 (note #)
What is interesting: If I add vm.min_free_kbytes=67584 to sysctl.conf, then "echo 3 > /proc/sys/vm/drop_caches" doesn't work.

I analyzed some patches for kernel 3.7 (kswapd 100% CPU), it seems, that some of them are partially applied to 3.4.110 vmscan.c, maybe there is some piece of code in patches, that will help.

I also tried to update boot.scr using mkimage and add mem=somenumber, but after adding mem=968M Orange pi 2 is unable to boot.

Maybe some setenv bootargs tuning will help, do You have any ideas?

Have a nice day.

JK

@WereCatf
Contributor

@kubajar Not at the moment, no, and I'm busy with improving desktop-extras and stuff at the moment -- really need to get that stuff in proper shape and get some repos up. If you can't figure out what to do with kswapd then we'll just have to let this issue linger for a bit longer, I suppose. It's not like we don't already have a bunch of issues, what's one more in the pile, eh? ;)

@kubajar
kubajar commented Mar 11, 2016

vm.swappiness=0 and vm.min_free_kbytes=0 eliminates this problem completely for me. The higher vm.min_free_kbytes value is, the higher is occurence of kswapd problem, so I tried to disable it completely and it works.

I also have "tmpfs /tmp tmpfs defaults,noatime,nosuid,size=100m 0 0" in fstab, but I don't think it affects this problem, but not tested without it. /var/swap is removed from fstab.

I know, that disabling swap is a bit dangerous, but swapping to sd card is bad idea too, what a pity, that zram isn't present in 3.4.x...

Have a nice day.

JK

@WereCatf
Contributor

What are the downsides of setting it to 0, though?

Also, VanirAOSP/kernel_sony_msm8x27@a72a945 seems to have zswap backported, could possibly add it Armbian - kernel. There are a few more commits there related to zswap, but if you're feeling adventurous you could always try those.

@ThomasKaiser
Collaborator

Any updates on this?

@jernejsk

Maybe it is the same (not very well understood) issue that I have on OpenELEC. You can try to compile kernel with CONFIG_CMA disabled and see if that helps. It always works for me. Downside of this workaround is that 256MB (or whatever value is set in CONFIG_ION_SUNXI_RESERVE_LIST) will not be visible for system.

@avinashga23

Very closely following this issue, as me too affected by this running 5.5.

@jernejsk

There is a chance that newer version of Allwinner's kernel, provided by FriendlyARM, doesn't have this issue...

@jernejsk

Yes
12. apr. 2016 7:59 AM je oseba Igor Pečovnik notifications@github.com napisala:This one?
https://github.com/friendlyarm/h3_lichee/tree/master/linux-3.4

—You are receiving this because you commented.Reply to this email directly or view it on GitHub

@avinashga23

I build Armbian 5.7 headless by changing CONFIG_ION_SUNXI_RESERVE_LIST to 64MB, now this issue is not observed even under heavy load (Cassandra and Zookeeper together 👍 also total memory available for system is 937 MB for Orange PI PC. I have built Orange PI Plus image with 32 MB. I will receive plus 2 board today and will be testing soon.

@igorpecovnik
Owner

@jernejsk
Working on it. I hope we will get alternative H3 kernel by the end of the day.

@ThomasKaiser
Collaborator

@igorpecovnik If you really try to rebase all the stuff on FriendlyARM's kernel please drop the gc2035 patch entirely since a new version has to be built anyway.

@igorpecovnik
Owner

OK, will try. Here is a kernel if you wanna join testing:
http://mirror.igorpecovnik.com/test/CMA-linux-image-sun8i_5.07_armhf.deb

@avinashga23

Hi @igorpecovnik is the new kernel based on friendly arm's kernel?

@igorpecovnik
Owner

@avinashga23
yes

BTW: i am running deluge download now for almost two hours. Half of this time I added extra stress - in 2 minutes cron: stress --cpu 8 --io 4 --vm 2 --vm-bytes 128M --timeout 20s

Everything is normal.

@avinashga23

Great, I will try this on my Orange Pi Plus 2 and orange pi PC now

@avinashga23

@igorpecovnik I installed the package by dpkg -i method. I have run stressful applications like cassandra and zookeeper and monitoring htop and temperature, So far working OK without kswap problem (I am sure if it is there it would have shown the signs by now). The good part is entire 1000 MB is available for the system. I would like to test this package one Orange PI Plus 2 (2GB) Board.

Shall i follow the same installation process for plus 2 too?

@igorpecovnik
Owner

My Opi+ is also working fine after 4 hours of hard work ... I guess we can close this issue and fingers crossed that we won't need to reopen it.

Opi+2 should work fine out of the box with this kernel.

@deltasigh

I run the latest 5.07 to compile a package running make -j3 and kswapd0 appears using 100% of a cpu!
I disabled the normal 128M swap with swapoff and inserted an usb flash with 1.5 GB swap in the partition table and activated with swapon.
Any information you'd like me to post? I

@igorpecovnik
Owner

5.07 is not fixed yet. You need to either build it yourself or use a kernel upgrade from this post.

@deltasigh

Igor, I downloaded 5.10, where it is mentioned that the kswad0 is "fixed", unfortunately, it still comes up with the 100% on one of the CPUs

@avinashga23

I did upgraded to 5.10 yesterday and my PI's are running from past 12 hours, i have not observed this issue till now. whats the output of your uname -a?

@igorpecovnik
Owner

Can you provide me an example how to reproduce this error? I was not able to catch it since.

@deltasigh

For the last 5 days it became my obsession to recreate the problem but I did not have much luck. It appears that if the system starts with a set of programs that do not wake up ksawpd0, it will run OK for as long as I care to run it with no problem. If I start with another set of programs, kswapd0 will appear to run at 100% but not for long! the most time it gathered after 12 hrs operation and trying different actions the total time of kswapd0 was under 00:02:00. Bottom line: I think you took care of this annoying problem! Thank you.

@emullins
emullins commented Jun 3, 2016

I get this behavior, but don't know how to reproduce. It tends to revolve around FF on my OPi+. Usually the system comes to a grinding halt, mouse unresponsive, but the CPU monitor on the taskbar, while noticably erratic, does still continue. CTRL-ALT-F1 to try login with root doesn't work. can enter 'root', but not prompted for password until the issue resolves itself. The issue does resolve itself if patient. Yesterday, I turned off swap via an open terminal as the symptoms began. This worked, crashing FF in the process. dmesg provides no useful information other than FF crashed due to being out of memory.

This box is usually left on 24/7 connected to my main TV via HDMI. I installed armbian 5.10 and then ran nand-sata-install. Upgrading to 5.11 made this problem noticeably worse. I'm not sure I even had the issue before that, but after 5.11, it happened daily before killing swap.

What can I do after the system "comes back" to acquire useful information?

@emullins
emullins commented Jun 3, 2016

Oh, I should mention that there's never any serious use of swap space. Mabye 10 meg. But it's not like it's running out of space. I use a 128MB swap. When the system recovers, the load averages go back to normal with the 5 and 15 min numbers over 5.

@avinashga23

@emullins the kswapd issue will only effect one CPU core. Even with this issue present you still have 3 CPU cores free for other tasks. I belive issue you are facing is not kswapd related. Try what is the observation using htop

@avinashga23

This issue was fixed

@emullins
emullins commented Jun 7, 2016

I'm not so sure it's fixed. Yes, it is limited to one core, but it still brings the desktop to its knees and mostly inhibiting the mouse. I have seen it since upgrading to 5.13, but not in the last couple days. My use of the device has changed due to other issues, and I don't spend as much time on it. It does seem related to FF, possibly just because that's a memory hog compared to everything else I run.

Browsing on the device is excruciating anyway, even running from EMMC. So I limit my FF use on the OPi+ now.

@jernejsk
jernejsk commented Jun 9, 2016 edited

Igor, while I was preparing H3 linux repo with all the known fixes, I found out that while your were fixing update patches (3.4.39 -> 3.4.112) you removed a bit too much in mm/vmscan.c. Check this two commits:
jernejsk/linux@dc93269
jernejsk/linux@f82b435

Please be aware that there is additional change in balance_pgdat() function which must be included to make this patch useful (around line 2927):

if (!zone_balanced(zone, testorder, 0, end_zone)) {
                all_zones_ok = 0;
                /*
                 * We are still under min water mark.  This
                 * means that we have a GFP_ATOMIC allocation
                 * failure risk. Hurry up!
                 */
                if (!zone_watermark_ok_safe(zone, order,
                        min_wmark_pages(zone), end_zone, 0))
                    has_under_min_watermark_zone = 1;
            } else {
...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment