[RfC] Switch to conservative cpufreq governor on meson kernels #499

Closed
ThomasKaiser opened this Issue Oct 13, 2016 · 10 comments

Projects

None yet

2 participants

@ThomasKaiser
Collaborator
ThomasKaiser commented Oct 13, 2016 edited

Some background information available here.

Package initscripts contains a startscript /etc/init.d/ondemand which is IMO broken by design and in conflict with cpufrequtils package. This ondemand service will send itself to background and activates interactive or ondemand or powersave governor if available in the kernel of question 60 seconds after the ondemand service has been called the first time (so cpufrequtils settings are active approx. 58 seconds).

BTW: this affects all our platforms but it's not an issue everywhere else since with all the other kernels interactive is what we want to use -- in fact our cpufrequtils settings are useless to set the governor since a long time or maybe even since ever.

Edit: Not entirely true: on sunxi-next where we're already at 4.7 or above we want to use schedutil which should also not work due to /etc/init.d/ondemand switching back to ondemand governor 60 seconds after boot.

The code is:

AVAILABLE="/sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors"
    [ -f $AVAILABLE ] || exit 0
    read governors < $AVAILABLE
    case $governors in
            *interactive*)
                    GOVERNOR="interactive"
                    break
                    ;;
            *ondemand*)
                    GOVERNOR="ondemand"
                    case $(uname -m) in
                            ppc64*)
                                    SAMPLING=100
                            ;;
                    esac
                    break
                    ;;
            *powersave*)
                    GOVERNOR="powersave"
                    break
                    ;;
            *)
                    exit 0
                    ;;
    esac

Unless this is fixed upstream we should check whether we simply remove all the broken governors from kernel config. Then powersave is also gone but to be honest: Buying an ODROID and voting for lowest consumption is weird anyway.

With that change applied our users can choose between conservative and performance (the latter behaving like all the broken other governors always keeping the CPU cores at the upper clockspeed and VDD_CPUX limit).

@ThomasKaiser
Collaborator
ThomasKaiser commented Oct 13, 2016 edited

Also we need to switch to CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y here to not affect boot behaviour (since interactive is broken and acts like performance now ODROID-C2 boots already with constant maximum CPU clockspeed).

We could also choose CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE=y instead since on meson this behaves more or less like interactive everywhere else but then between starting the kernel and cpufrequtils S905 might sometimes clock with just 500 MHz which might negatively impact boot performance (based on my tests I really doubt that but at least switching to CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y in kernel config ensures that we get identical behaviour as with broken interactive governor now)

@ThomasKaiser
Collaborator

Wow, after reading through https://bugs.launchpad.net/ubuntu/+source/sysvinit/+bug/1480320 and links from there I got the impression this will never be fixed upstream :\

What about an systemctl disable ondemand already during image creation?

@zador-blood-stained
Collaborator
zador-blood-stained commented Oct 13, 2016 edited

That's a very strange bug, now that I see that there is a separate script not related to cpufrequtils and it is enabled by default...

What about an systemctl disable ondemand already during image creation?

systemctl mask ondemand would be more reliable

@ThomasKaiser
Collaborator

I had a look at a board where installation hasn't been touched since months and there I added

(sleep 120 && echo performance >/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor)

to /etc/rc.local (performance also set in /etc/defaults/cpufrequtils). This is on sunxi-next and obviously I already ran into the issue, also find a work-around but forgot more or less about again.

IMO it's important that we get systemctl mask ondemand also executed as part of the update process since what we have now is absolutely non-deterministic behaviour on all platforms (where it currently doesn't matter due to interactive being the governor of choice in most situations).

On ODROID-C2 I disabled the ondemand service months ago and all the testing done happened with conservative already. So IMO it's a good choice but I have to admit that I never used a desktop image and don't know how conservative behaves here. At least interactive shows identical behaviour to performance and if we wanted somewhat similar behaviour to interactive we would've to choose conservative on meson anyway (unless someone provides kernel patches to fix behaviour).

BTW: Differences in consumption and temperature between 500 MHz and 1536 MHz on an otherwise idle system aren't that great (~2.5°C difference, can't measure consumption currently) but I hadn't a deeper look already how to tweak dvfs settings on this platform.

@ThomasKaiser
Collaborator

Played a bit around with various workloads and checked /sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state constantly. With light loads the intermediate cpufreqs 1000000 and 1296000 are used (60 ms on 1.0 GHz, then remaining 40 ms on 1.3 GHz to reach then 1.5 GHz), so switching from lowest to highest cpufreq happens within 100 ms while 1.0GHz is reached almost instantly while very light tasks (eg. a new user logging in) lead to max cpufreq remaining at 1.0 GHz.

Seems ok to me. But there's more. I used iozone -e -I -a -s 100M -r 4k -r 16k -r 512k -r 1024k -r 16384k -i 0 -i 1 -i 2 to check eMMC performance and USB (see below).

Testing eMMC (8GB) with conservative cpufreq remains at 500MHz:

      102400       4    21093    18990     9951     9961     9877    18569    
      102400      16    42788    52732    27654    27706    27500    50446    
      102400     512    42336    43374   107856   107912   107529    42571    
      102400    1024    43466    43260   104513   104452   104289    42558    
      102400   16384    43217    43272   107727   107656   107769    43359    

With ondemand it remains most of the times at 500 MHz while sometimes hitting 1 GHz and partially also exceeding this:

      102400       4    18459    19656     9962     9952     9838    19173    
      102400      16    47203    55768    27805    27844    27632    50394    
      102400     512    43013    43421   107367   107470   107223    42579    
      102400    1024    44051    44132   104459   104592   104613    43191    
      102400   16384    43955    43970   107427   107167   107449    43628    

After switching from ondemand to interactive cpufreq remained at 500 MHz until I started the test:

      102400       4    21189    21221     9974     9979     9863    20889    
      102400      16    66218    64361    28237    28138    27873    60690    
      102400     512    44137    44145   110271   111108   111165    43121    
      102400    1024    44019    44112   109181   109320   109387    43498    
      102400   16384    43924    43992   108987   108733   108828    43760    

Afterwards cpufreq remained at 1.5GHz regardless of activity (so interactive almost always behaves like performance). And this is performance now:

      102400       4    21245    21241     9966     9982     9878    20953    
      102400      16    62060    61701    27933    27869    27648    60719    
      102400     512    44208    44269   109932   111023   110404    43383    
      102400    1024    44129    44076   108886   109896   109202    43355    
      102400   16384    44075    44078   108888   108690   108815    43783    

Now test with Samsung PM851 SSD (SAMSUNG MZ7TE128HMGR-00004 according to smartctl -a):

powersave (all the time at 500 MHz):

      102400       4     5517     5494     7531     7531     7150     5583    
      102400      16    13183    13898    15981    15991    15376    13328    
      102400     512    28509    28585    36423    36512    36449    28583    
      102400    1024    29329    29235    36995    37051    37027    29158    
      102400   16384    36628    36317    40529    40610    40639    36355    

performance (all the time at 1.5GHz):

      102400       4     8977     9983    10657     9735    10312     9669    
      102400      16    17672    17373    21300    21307    21310    17361    
      102400     512    33785    33678    39277    39339    39300    33877    
      102400    1024    34565    34430    39490    39548    39548    34457    
      102400   16384    37733    36311    41193    41287    41279    37515    

And conservative again (mostly remaining at 500 MHz but 11% on 1.0 GHz, 3% on 1.3 GHz and 0.8% on 1.5GHz):

      102400       4     5670     5606     7677     8430     7240     8224    
      102400      16    16256    14968    16612    16240    15630    14936    
      102400     512    31683    28698    36423    36751    36711    28780    
      102400    1024    29332    29302    36996    37053    37024    29365    
      102400   16384    36528    36499    40627    40558    40275    36199    

Too bad since conservative sucks too and leads to decreased IO performance :(

@zador-blood-stained
Collaborator

Correction: This applies only to Ubuntu, as neither Debian Jessie nor Stretch have "ondemand" service. And since Armbian was focused on Jessie images mostly, we didn't encounter this issue.

@ThomasKaiser
Collaborator

This applies only to Ubuntu

Yeah, at least there it should be fixed to allow users using their governor of choice. But this won't affect idle values that much. In the meantime I measured consumption too and 500 MHz vs. 1536 MHz result in a 100mW difference and 2°C with default heatsink. The 100 mW are 2320 mW vs. 2420 mW so staying with broken interactive or even switching to performance should not matter that much (switching from GbE to Fast Ethernet is more efficient but still close to negligible to reduce consumption: ~230mW less)

If I read this correctly then Hardkernel might provide new dvfs settings that are slightly overvolted but on the other hand maybe we can try to adjust some dvfs operating points to lower VDD_CPUX values. But since the average C2 user is interested in max performance and not low consumption we could already stop after implementing systemctl mask ondemand on Ubuntu variants?

@zador-blood-stained
Collaborator

Then this should resolve this issue: d5b2ec0

@ThomasKaiser
Collaborator
ThomasKaiser commented Oct 13, 2016 edited

Thanks, that will fix it for new images. What about already existing installations?

ATM the only platform really affected might be sunxi-next with mainline kernel running Xenial since they end up with ondemand governor instead of schedutil. Hmm... after checking http://image.armbian.com/stats.html and http://apt.armbian.com/stats.html not that many people seem to be affected. But do we have a way to fix such stuff later as part of an upgrade (executing systemctl as part of package post-install?)? And part of which package could this be?

@ThomasKaiser
Collaborator

Addendum: C2 idle consumption/temperatures with 1536 MHz and 1656 MHz using our most recent settings are absolutely identical so both consumption/temperature might be only related to voltage settings of the various dvfs operating points. So while there might be a chance to limit idle consumption at 500 MHz by tweaking the relevant 500 MHz dvfs entry I still doubt any C2 user is interested in this.

(both cpufreqs checked with sysbench --test=cpu --cpu-max-prime=100000 run --num-threads=4 and performance differs with new Hardkernel settings: 48.1 vs 51.9 seconds)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment