Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

treewide: Don't diverge from upstream default HZ settings on 4.19 #1831

Closed
wants to merge 1 commit into from

Conversation

diizzyy
Copy link
Contributor

@diizzyy diizzyy commented Feb 18, 2019

Most targets upstream use 250Hz or even 1000Hz by default while
100Hz is hardcoded in OpenWrt's default config. Use upstream default
except for apm821xx which hardsets 1000Hz instead of platform default of
250Hz.

Signed-off-by: Daniel Engberg daniel.engberg.lists@pyret.net

@diizzyy
Copy link
Contributor Author

diizzyy commented Feb 18, 2019

Compile/Runtime tested on: EdgeRouter Lite with #1799

@chunkeey chunkeey added the kernel pull request/issue with Linux kernel related changes label Feb 18, 2019
@neheb
Copy link
Contributor

neheb commented Feb 19, 2019

Compile/Runtime tested on: TP-Link Archer C7v2 with #1635

@hnyman
Copy link
Contributor

hnyman commented Feb 19, 2019

@diizzyy

Is that PR title (and commit title) wrong?

Title says "Don't derive HZ from upstream", while the explanation and the code changes seem to delete OpenWrt fixed values and use the upstream Linux values...

Most targets upstream use 250Hz or even 1000Hz by default while
100Hz is hardcoded in OpenWrt's default config. Use upstream default
except for apm821xx which hardsets 1000Hz instead of platform default of
250Hz.

Signed-off-by: Daniel Engberg <daniel.engberg.lists@pyret.net>
@diizzyy diizzyy changed the title treewide: Don't derive HZ from upstream on 4.19 treewide: Don't diverge from upstream default HZ settings on 4.19 Feb 19, 2019
@diizzyy
Copy link
Contributor Author

diizzyy commented Feb 19, 2019

@hnyman
Fixed, thank you :-)

jow- pushed a commit that referenced this pull request Feb 28, 2019
This patch adds the disabled DRM_RADEON and DRM_AMDGPU
config symbols from the x86' config to the generic target
configs. The existing symbols in the x86' configs are kept
for now, until we know whenever we want to remove such
symbols or not (see Github PR #1831, #1825, #1828).

Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
jow- pushed a commit that referenced this pull request Mar 2, 2019
This patch adds more disabled DRM config symbols from the
x86' config to the generic target configs. The existing
symbols in the x86' configs are kept for now, until we
know whenever we want to remove such symbols or not
(see Github PR #1831, #1825, #1828).

THis patch also contains a squashed patch from
Daniel Engberg <daniel.engberg.lists@pyret.net> titled
"kernel: Fix config for 4.14" which fixes a duplicated line
added by: commit 8bdc241 ("x86: fix geode image builds")

Fixes: 8bdc241 ("x86: fix geode image builds")
Signed-off-by: Daniel Engberg <daniel.engberg.lists@pyret.net>
Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
@diizzyy
Copy link
Contributor Author

diizzyy commented Mar 23, 2019

@chunkeey
Any reason why apm821xx is 1000Hz instead of upstream platform default?

@chunkeey
Copy link
Member

chunkeey commented Apr 3, 2019

Any reason why apm821xx is 1000Hz instead of upstream platform default?

Where are you getting that "upstream platform default" from? If it's just from the Kconfig then this is probably not enough, since there are also defconfigs for the architectures.

As for the current HZ_1000. It comes from Chris Blake's initial apm821xx tree he did for the MR24. It was the configuration that booted.

@diizzyy
Copy link
Contributor Author

diizzyy commented Apr 3, 2019

"upstream platform default" refers to what the kernel (Kconfig) sets by default if you select platform X when you start from scratch without a config.

@chunkeey
Copy link
Member

chunkeey commented Apr 3, 2019

"upstream platform default" refers to what the kernel (Kconfig) sets by default if you select platform X when you start from scratch without a config.

sounds like you hit reply too soon.

@diizzyy
Copy link
Contributor Author

diizzyy commented Apr 3, 2019

I'll admit I'm not very much into Linux kernel development so what I'm trying to accomplish is the default value if you do (MIPS in this case)

export ARCH=mips
make defconfig
make menuconfig

with upstream kernel sources and a "clean" config.

@chunkeey
Copy link
Member

chunkeey commented Apr 3, 2019

Hm, do you know if the "upstream defaults" are any better for OpenWrt? I mean at it's core (i.e. in the generic target) OpenWrt still likes it's HZ_PERIODIC whereas upstream switched over to NO_HZ(_X)
for many targets long ago. If HZ Setting makes a difference, it should be measurable. Like, do you remember the ath9x caldata extraction thing? I only measured a order-of-magnitude improvement on the ipq40xx, whereas is was much much more on ath79 (and according to wndr3700 owners it fixed the discovery bug). So, if you can measure how much the HZ setting would affect stuff like popular application, LAN, WAN and WLAN throughput and latency in a positive way. (You could also look at idle/load power consumption, or whatever tickles your fancy) then: Sure! If there's something to be gained then I would definitely take the risk of potentially breaking some eggs.

@diizzyy
Copy link
Contributor Author

diizzyy commented Apr 4, 2019

If that's the case we need to agree on how and what to test and stick to it
Perferably it should be something fairly easy to do and doable on most devices

Suggestions:

OpenSSL (default benchmark tests)
time creation of tar.gz file using busybox
iperf3 tests over eth and wifi if possible (tx rx)
time transfer using ncat-ssl (nmap-suite)
timed transcoding of audio (multithreaded) using ffmpeg4 (package available)

kernel latency (idlejitter) and softirqs (might be of interest?) can be monitored and mesured using netdata
Implement a simple ping (network latency) test using fping and netdata that runs in the background

Not sure if we have more user-friendly tools available for performance monitoring or how deep we need to go.

@diizzyy
Copy link
Contributor Author

diizzyy commented Apr 13, 2019

Ping

@diizzyy
Copy link
Contributor Author

diizzyy commented Apr 16, 2019

FWIW, tested with and without V=s

@diizzyy
Copy link
Contributor Author

diizzyy commented Jul 8, 2019

Compile tested: ipq40xx, Linksys EA6350v3, OpenWrt master
Run tested: ipq40xx, Linksys EA6350v3, OpenWrt master

@diizzyy
Copy link
Contributor Author

diizzyy commented Aug 5, 2019

@hauke
Any thoughts on this?

@adschm
Copy link
Member

adschm commented Oct 31, 2019

@diizzyy Based on what chunkeey's comment says, the next step would be to provide performance measurements. I don't think there has to be a greater plan on what to test, just do some that are appropriate in your opinion.

@diizzyy
Copy link
Contributor Author

diizzyy commented Oct 31, 2019

@adrianschmutzler
I could ask the exact same question the other way around, there's no information of why this value was chosen from what I could find.

Looking at the commit history I could only find one commit that isn't very informative about the subject.
cbbfc5f

As for the value itself it was hardset for kernel 2.6.39 (didn't bother to look further back) which is about 7 years ago. I think it's likely that it has been kept as kernel configs seems to have been reused for major versions "bumps" and not evaluated.

@adschm
Copy link
Member

adschm commented Oct 31, 2019

@diizzyy I won't be able to review and merge this, I'm just commenting as I've seen this is stuck. The last comment from a core-developer was the clear direction to do performance measurements. From my side, I cannot judge what's better or worse in this case. But if you would say that option A leads to improvement by X over B, I would get the feeling that B should be changed into A.
So, why not just try that?

@diizzyy
Copy link
Contributor Author

diizzyy commented Oct 31, 2019

I didn't even get any feedback and given earlier attempts I doubt it's worth wasting time as it'll most likely end up with "you didn't benchmark thing X". Perhaps if someone else posts a patch of the same thing it'll get merged who knows...

@CodeFetch
Copy link
Contributor

CodeFetch commented Feb 5, 2020

As OpenWrt devices are mostly used as routers, the HZ-value is important for the NAPI polling.

The NAPI polling interval is hard-coded to 2 jiffies (correct me if I'm wrong).

NAPI switches from interrupt-driven mode to polling mode when there is a high traffic demand (defined by the NAPI weight).

The time of one jiffie is defined by the HZ-value. 100 Hz means one jiffie is 10 ms long. So the NAPI polling occurs every 20 ms. Increasing the HZ-value to 1000 Hz would lower this to 2 ms.

The polling mode is only being used when there are more packets per 2 jiffies than the NAPI weight (correct me if I'm wrong) which is 64 in ag71xx.

I remember I've read a commit message changing the NAPI weight, because Qualcomm said it was better, but actually someone should sit down and do the maths instead of relying on Qualcomm.
So we need someone to calculate which values make sense.

Edit: Unfortunately I don't have the nerves to do this at the moment. Please have a look at https://lwn.net/Articles/687617/

@ldir-EDB0
Copy link
Contributor

Thanks! Pulled into my staging tree at https://git.openwrt.org/openwrt/staging/ldir.git

@CodeFetch
Copy link
Contributor

@ldir-EDB0 Didn't you read my comment about this?

@neheb
Copy link
Contributor

neheb commented Mar 30, 2020

That's an old mirror. I changed ag71xx to use 32 for NAPI weight.

edit: in any case, changing upstream defaults should be justified.

@CodeFetch
Copy link
Contributor

@neheb It is justified, I guess. Possibly it diverged from upstream in QSDK.

Increasing the HZ-value to 1000 Hz would lower this to 2 ms.

2 ms is far to low for polling mode, because it could create higher load on devices with low computation power (which most of the routers are). You need to imagine it like this: polling mode should be used on high traffic demand - latency is secondary then, because the router does not have enough computation power to guarantee both latency and throughput.

I don't want to say these values make sense, but only that 2 ms might be too short rendering the polling mode kind of useless.

@ldir-EDB0
Copy link
Contributor

@CodeFetch Yes I did read the comment and decided that committing and waiting for the screams in order to revert if required was better than having a PR hang around for another year. If it turns out we DO need to override upstream defaults then the requirement can be documented in a new commit.

@ldir-EDB0 ldir-EDB0 closed this Mar 30, 2020
@CodeFetch
Copy link
Contributor

@ldir-EDB0 We have embedded routers here. Upstream defaults are for ordinary computers running with a more efficient CPU architecture and at a higher clock. If people notice their devices are unresponsive when there is a high traffic demand, I doubt that they'll know why... When you set the HZ value too high, this means that you might starve user-space's computation time. 2 ms means that there are only 2 ms for the user-space... I expect this to cripple everything but networking.

@diizzyy
Copy link
Contributor Author

diizzyy commented Mar 30, 2020

Do note that not all sets 1000Hz by default and it doesn't seem to have much impact
https://passthroughpo.st/config_hz-how-does-it-affect-kvm/ looking at userspace.

@f00b4r0
Copy link
Contributor

f00b4r0 commented Apr 10, 2020

So, I just stumbled upon this PR. TL;DR: I believe this proposal was wrong on all accounts.

Some datapoints:

NAPI and HZ are related only in that NAPI, being a polling system, needs to track elapsed time and does this by measuring elapsed timer ticks. Timer ticks are controlled by HZ.

Increasing HZ to "improve NAPI" is a huge misunderstanding of NAPI's goals and implementation. In fact, it nearly entirely defeats its purpose. NAPI aims at reducing the CPU load servicing IRQs during packet storms. NAPI only engages on high traffic, precisely when the CPU is under load. Increasing HZ increases the IRQ "noise floor", since higher HZ directly translates in higher timer IRQ rate, which "distracts" the CPU from its main tasks.

On CPU starved devices like low end routers, a high HZ will result in a higher CPU load:

  • further reducing CPU cycles available to process network packets and perform other tasks
  • increasing power usage for no reason

The first LWN article posted above also clearly explain why the typical load encountered on a router would warrant a low HZ setting. If NAPI results in perceived latency, I think it would first need to be measured (I'd be more inclined to believe that the latency isn't due to NAPI but rather to CPU hogging anyway) and there are tunables better suited to adjust than fiddling with the timer tick.

This change was proposed without any material evidence supporting a benefit to typical OpenWRT workload (and since I assume my opinion won't matter much, I will apply the same effort in providing evidence the change is a bad choice, although I suspect that evidence will materialize soon on the bugtracker and forums 😜).

Assuming that upstream defaults are "better" or even "good for everyone" is absurd. These values need to be set, upstream offers some defaults that may be geared toward a specific workload (and most often for big metal). I certainly hope that the next proposal won't be to switch CONFIG_DEFAULT_IOSCHED back to upstream's CFQ, for instance 😛

HTH

@CodeFetch
Copy link
Contributor

@f00b4r0 I totally agree with you.

@diizzyy
Copy link
Contributor Author

diizzyy commented Apr 11, 2020

A lot of kernel code has changed in 10+ years including compilers so performance characteristics are most likely very different and worth noting is that all arguments (so far) are theorical without any actual performance benchmarks. Also worth keeping in mind is that depending on platform you may see very different performance characteristics so what applies to MIPS may not apply to ARM or x86 and vice versa.

While you certainly can tune against specific workloads I think you can safely assume that mainline is tuned against "general" performance which however may or may not apply to your specific workload.

I've used this on MIPS, PPC and ARM without any obvious performance regressions. I've actually seen a few improvements but your milage may vary however these haven't been proper isolated tests.

diizzyy referenced this pull request Apr 11, 2020
With the "real HZ" debate out of the way, let's actually
use the apm821xx's default upstream config file at
arch/powerpc/configs/44x/bluestone_defconfig. From what
I can tell, it sets NO_HZ (well, this platform was for
NAS, so this is a bit unexpected) and remove any specific
HZ_$VALUE symbol.

Sadly, Daniel Engberg didn't run any before/after netperf
tests, because it would have been such a slam dunk across
the boards. In case of the apm821xx the tcp tx/rx
performance improved ~14% (from 600Mbps to 700Mps).
This now causes the emac to drop frames too, so let's see
if this is causes more problems or not.

This patch includes a refresh of the configuration too.

Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
@f00b4r0
Copy link
Contributor

f00b4r0 commented Apr 11, 2020

Thank you for making these completely unsubstantiated (and plain wrong) claims. It shows clearly that you had no idea what you were talking about when you made this PR. I'll rest my case.

@diizzyy
Copy link
Contributor Author

diizzyy commented Apr 11, 2020

Again, no one (either way if that wasn't obvious) have done any benchmarks of actual performance which I honestly don't see why you need to get upset about it especially since you "believe" which would imply that you aren't sure about your claims. Looking at recent commit history apm821xx
did recieve a performance boost from this change (it was hardset at 1000Hz by default).

@f00b4r0
Copy link
Contributor

f00b4r0 commented Apr 11, 2020

For the record, I don't "believe" the cpu impact of increasing HZ, I know it. I don't "believe" how NAPI works either. And I don't understand your last comment about the recent commit. Your change in this PR is a NO-OP for apm821xx so if you claim your PR had a performance impact there, I'll assume it's thanks to a nice fairy.

The commit you reference apparently switched apm821xx to full tickless kernel, which a completely different thing (and might actually be a good thing on some hardware for some workloads: this has performance implications as well, I won't detail it there it's not the place). Incidentally, full tickless operation means the timer tick can go as low as 1Hz. Are you arguing now that a lower timer frequency can be good for performance, like I did above?

Anyway, as I said, I'm now convinced you don't know what you're talking about, and I'm done feeding this troll. This is a moot case. I suggest next you offer to move the generic configs to the default upstream IOSCHED as well (CFQ), without providing any numbers like you did here, and see what kind of reaction you get. Might be tuned for "general performance" too, and as we all know, a router definitely falls into the "general" category 😛

Cheers

@diizzyy
Copy link
Contributor Author

diizzyy commented Apr 11, 2020

As far as I can tell no one as ever claimed that cpu load alone isn't impacted by changing HZ? I guess it could've been clearer that it's the end result that's interesting not whether top shows a few percents less or more. Most of the "trolling" goes back to the fact that there is no hard evidence (ie benchmarks) supporting either claim rather than falling back to very dated by now documentation. What you also seem to completely ignore is that I did offer to provide data however no one ever bothered to reply about that but then again you didn't even offer that so I don't see how that's going to provide any useful information making the discussion proceed regarding this matter. Either way it's committed and you're free to do a pull request if you think that it's inappropriate for this kind of application.

@f00b4r0
Copy link
Contributor

f00b4r0 commented Apr 11, 2020

#1831 (comment)
You were asked to provide measurements. And then instead of complying, you reverted the burden of proof - as you're doing again - by claiming there was no evidence the other setting was better. If that's not trolling, I don't know what is.

And if you don't understand that if "top shows a few percent more" [wasted servicing timer interrupts] it means that e.g. wifi throughput will suffer on hardware where the CPU is the bottleneck, then I guess I made my point. I'm done here.

@diizzyy
Copy link
Contributor Author

diizzyy commented Apr 11, 2020

@f00b4r0
Perhaps you want to browse a bit further back.
#1831 (comment)
...to which I responded as I didn't want to spend time doing benchmarks to later find out that whatever application used weren't suitable for the task without any response.
#1831 (comment)

What I'm trying to say is that you're doing assumptions which may not behave exactly the same as 10+ years ago.

Not that it matters much by now (since it's already committed) but I guess I could provide some data using the mvebu platform using iperf3 and netdata as that's what I have at hand.

@adschm
Copy link
Member

adschm commented Apr 12, 2020

Since this is merged and the discussion might be educating, but pointless, I suggest we make the best out it: What this commit effectively did was removed the global HZ setting for all targets. However, maybe it's better to deviate from that target-wise, like it's already done for several targets (and recently added by for apm821xx by @chunkeey , who followed the same strategy, if I understood him correctly)

This will enable to aim for the specific architecture/performance of the target and also might make performance measurements more meaningful after all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kernel pull request/issue with Linux kernel related changes needs changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants