-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kernel 5.0.0-rc8: high cpu load #2881
Comments
|
Do you see any suspicious? Does this load depend on the Ethernet link? |
No, everything else seems to be normal. Have not tested without Ethernet because its needed due to root fs is on iSCSI target Will test it on a Pi3B today |
|
Another test, Pi3B, no network, no sd-card (dtoverlay=sdtweak,poll_once in /boot/config.txt), root fs on usb disk: Average load is exactly 4.00. Hmmm, 4 cores == 4.00 ??? |
|
FYI I'm seeing the same effect, also with a load of 4 - no theory yet as to what is going on. |
|
Just to clarify you take the average load as the first number (left to right) from |
As already written in my first post: CPU is > 98% idle |
|
The idle load average drops to 3 if three VCHIQ commits are reverted, but sadly this isn't the magic bullet I thought it might be. |
|
The idea was correct, it just didn't go far enough. The problem is caused by a switch from I'm in the middle of a messy revert, but I should have something to test later today or tomorrow. |
|
Here are my current test results for the following scenario boot Rasperry Pi 3B, Ethernet connected, Wifi enabled but not connected and wait until 2 min uptime: This would exclude a configuration change in 5.0.0 (no relevant changes in bcm2835_defconfig to 5.0.0 AFAIK). Looks like this regression has been introduced in mainline. Maybe i'm in the mood for bisecting ... |
|
It's the three patches from Nicholas and the one from Arnd (the messy reversion, due to the removal of the typedefs). |
|
@vianpl Could you please take a look at this? |
|
I've pushed the four reversions to rpi-5.0.y, and it has had the desired effect: |
|
@lategoodbye Thanks for pointing it out I'll look into it. |
|
Thank you very much @pelwell It seems to be resolve (I also had the problem with rc8): $ uptime
22:21:29 up 2 min, 1 user, load average: 1.20, 0.74, 0.29 |
I agree, the issue is not so much a performance regression as how Linux accounts it's usage statistics. One could argue it's a cosmetic thing. We actually went for that family of wait functions as we where trying to mimic the original In the case of vchiq's implementation the issue didn't exist as the task was marked as Also, I had a look at what really wakes up a linux killable task and it's just SIGKILL (see __fatal_signal_pending() in signal.h). So we where not mimicking the exact same behavior. I think the next step would be to revamp these patches using the interruptible family of functions. I already tested it in the past and there was no obvious issues. If there were any in the long run, we could go back to killable, or something in the middle, but with a proper justification for it. Which would be nice to have if the driver is ever to come out of staging with strange concurrency primitives. That said, I'd like to hear your opinion first 😉 . |
|
I'm fairly pragmatic - I'd be happy with anything that satisfied the kernel maintainers' quest for cleaner code without compromising user experience. |
|
@vianpl Any progress on this? |
|
@lategoodbye I'll try to move things forward tomorrow. |
|
Currently running 5.0.0-1006-raspi2 on my Ubuntu 19.04 Beta system and seeing this - I apologize for the newbie question but when is this fix expected to be available? Or if available how can I get it? |
|
It is just a cosmetic issue. The high load reported doesn't actually consume any CPU. |
|
The splash screen is minor affected but understood it is not performance impacting... Welcome to Ubuntu 19.04 (GNU/Linux 5.0.0-1006-raspi2 armv7l) |
|
The commits responsible are already reverted in the downstream rpi-5.0.y tree, but I think Ubuntu builds from upstream. |
|
Arch Linux ARM builds from upstream, and Debian does as well. This isn't "cosmetic" as others have noted, in my opinion. Monitoring software, k8s orchestrators, etc. are coded such that this would be tripping alarms/messing with scaling 24/7. It just goes against general Linux administration. E.g. A quad core system at a load of >4.0 indicates a fully loaded system (Which simply isn't the case here). So I've had to compile and patch this reversion in for my boxes. Anyways, to help some of the non-contributors out here; below is a link to the discussion that has occurred in upstream showing their thought processes. (Use "Next message (by thread):" to progress through.) |
|
I hope nobody gets offend by my reply, but such messages makes me grumpy. So here are my wishes to the upstream users for the future:
Thanks for using the upstream kernel |
|
Have you gotten any traction on moving this forward into upstream kernel? |
|
Appears we did get a kernel update in the repo but it doesn't have this fix integrated yet. $ uname -a |
|
The fixes are now in staging-next and will be scheduled for Linux 5.3. |
|
Is there any safe way to roll back to the 4.x kernel? |
|
should get you back to stable firmware/kernel (currently 4.19). |
|
Unfortunately, I use arch os... I have not found a way to safely roll back to the old version of the kernel (I just installed the latest version of arch os, there is no cache of the previous version of the kernel installation package) 😂 |
|
I think you'll need to ask the arch maintainers for the recommended way. You can use |
|
Again, your issue is distribution specific. That said, I'm a fellow Arch user so: You're wanting the functionality of the Arch Linux Archive I would highly suggest treading carefully when dealing with package version missmatch on Arch though. It's very easy to end up in a non-booting state. So, you may want to focus on the "How to restore all packages to a specific date" part of that wiki page to get a scope of what all may be different involving that kernel/libs at X time. Any more than this and you really need to be seeking out the Arch community at large as you wanting to move to a different kernel version isn't this repo's forte/what-have-you. |
|
@westonmyers Thanks for your reply, I think I should wait for the 5.3 release of the kernel; or use the dd command to back up my arch system (I feel that there is a high probability of rollback failure😂)... |
|
@westonmyers you are correct, it's not purely a cosmetic issue; the device is running hot, implicating that perhaps the GPU, or something else computational, is active. I haven't found any way to get the GPU load on the rpi; got
|
|
After a downgrade to a 4.19.* kernel, the issues with load and bad IO performances resolves, but the perceived heat abnormality remains; suggesting that either old consensus on idle temperature of <50C does not apply in this case, or that there are invisible/unintentional operations active on the device at idle - even on 4.19.* kernels. Experiments suggests that the VCHIQ issue is interfering with kernel internals or IO operations. Experimented with setting up a raid5 LVM on USB 2.0 attached pen-drives and exporting it via NFS over LAN, reaching write speeds of mere 0.03 MB/s and problems with slow synchronization on the 5.* kernel. On the 4.19.* kernel, the same setup has write speeds exceeding 1.0 MB/s and the LVM raid5 synchronization is seamless, suggesting the VCHIQ issues are hampering either kernel internal operations or USB IO. But even with the better performance, and 0 loads upon idle, the idle temperature is >60C which is far above the historical (2016-2017) idle temperatures (<50) reported elsewhere for Raspberry Pi 3. Now, given that my device is a Raspberry Pi 3B+, my idle temperatures might be higher due to differences in hardware. Contrary to this, my private logs state temperatures at <50C in january/february 2019 - although I have yet to reproduce these idle temperatures. Only upon reproducing lower idle temperatures on the Raspberry Pi 3B+, one can be sure whether there are invisible/unintentional operations on the board.
Operative system used is the ArchLinuxARM distribution, two versions, provided from |
|
After testing a 5.2 Kernel from the latest Manjaro I can see with all 4 cores at 100% that my PI3B+ will not draw more than 2.77 watts (it idles at 2.46 watts) and not go over 55 deg C. CPU Benchmarks are much slower than a 4.19 kernel in Raspbian. In Raspbian with 4 cores at 100% it will pull 5.3 watts and reach temps over 65 deg C. I am wondering if the false CPU load average is affecting the kernel scheduler allowing it to ramp up. OpenSSL 1.1.1c ~2x fold reduction in OpenSSL speed |
@sedlund what wattage does your board draw when idle on Raspbian with 4.19 kernel? If not possible in Raspbian, if you could try the ArchlinuxARM distro linked above, for rpi-2; that one is at least idling at 0 loads according to |
About 2.77 watts - but it in that configuration it has an active usb WiFi device and the onboard WiFi is running hostapd. |
|
The initial regression should be fixed now upstream in 5.1 and 5.2. Everything else should be discussed in a separate issue. |
|
I've tested 5.2.1 it resolves the abnormal high load average and the vchiq proceses in D state. It did not resolve the issue of the CPU frequency not ramping up to 1.4GHz though. It seems like it is stuck at 700MHz only drawing 2.8 watts under full 4 core 100% load. |
|
@sedlund CPU frequency scaling support will only be available in kernel version 5.3. So that was to be expected. |
|
@westonmyers the fix resolves the initial regression on ArchlinuxARM as well. @lategoodbye if you know the place, can you direct me to the separate issue discussing high idle temperatures? $ uname -a;echo;uptime;echo;sensors
Linux omega 5.2.1-1-ARCH #1 SMP Sun Jul 14 19:29:00 UTC 2019 aarch64 GNU/Linux
10:37:15 up 21 min, 3 users, load average: 0.00, 0.00, 0.03
cpu_thermal-virtual-0
Adapter: Virtual device
temp1: +60.1°C (crit = +80.0°C)
rpi_volt-isa-0000
Adapter: ISA adapter
in0: N/AEDIT: the temperature is power consumption dependent - removing USB-devices lowered temperature 2-4 C. I'll read up on the matter, post my results in appropriate channels, and link it here for reference to my own comments - for the sake of clarity. |
|
@ropil I'm not aware of a discussion about high idle temperatures. This repository is dedicated to the Raspberry (downstream) kernel. Arch is using the mainline kernel for Aarch64. So there are two options:
It doesn't make sense to compare the downstream with the mainline kernel. |
|
Thanks for this info. I am troubleshooting this issue too, thanks for being on it: ~$ uname -a;echo;uptime;echo;sensors 15:47:45 up 7 days, 22:01, 1 user, load average: 4.19, 4.05, 4.02 cpu_thermal-virtual-0 |
|
Please close this issue, because the initial issue has been fixed up- and downstream. Btw a temperature with a kernel version is pretty pointless. I need to at least for good AND bad case the following information (assuming both mainline kernel tree): |
|
Please update to the latest kernel which may contain a fix for this issue. |
Describe the bug
Average load is always higher than 6 on idle. Running kernel 4.20 and below load average is less than 0.2 (CPU is > 98% idle)
To reproduce
Boot kernel 5.0.0-rc8
Which model of Raspberry Pi? e.g. Pi3B+, PiZeroW
Pi3B+
Which OS and version (
cat /etc/rpi-issue)?XBian GNU/Linux 9
Which firmware version (
vcgencmd version)?Jan 22 2019 16:54:23
Copyright (c) 2012 Broadcom
version 7bfabcecab2918f85a2a217b389e256eac696962 (clean) (release) (start_x)
I know kernel 5.0 is still RC, but It can not be long before the final version is released
The text was updated successfully, but these errors were encountered: