Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IODeviceWeight= configures bfq.io.weight too #21795

Merged
merged 4 commits into from Apr 7, 2022

Conversation

Werkov
Copy link
Contributor

@Werkov Werkov commented Dec 16, 2021

This is a possible extension of the PR #21728. Opening separate PR since the per-device weights can't be configured when the particular device doesn't support (it's not configured) proportional control method.

Copy link
Member

@keszybz keszybz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable.

src/core/cgroup.c Outdated Show resolved Hide resolved
@Werkov
Copy link
Contributor Author

Werkov commented Jan 10, 2022

Force pushed on top of changes.

I'm emphasizing a comment from the commit message:

Since per-device weight of a particular device cannot use both io.cost
policy and BFQ scheduler, at least one warning is always expected with
this approach (the method that is unconfigured).

Copy link
Member

@yuwata yuwata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. One minor suggestion.

src/core/cgroup.c Outdated Show resolved Hide resolved
@yuwata
Copy link
Member

yuwata commented Jan 12, 2022

#21728 is merged. Please rebase.

src/test/test-cgroup-util.c Outdated Show resolved Hide resolved
@Werkov
Copy link
Contributor Author

Werkov commented Jan 12, 2022

Thanks for the review. Rebased and condition changed.

@bluca
Copy link
Member

bluca commented Jan 13, 2022

This patch extends to devices what is already done for default cgroup IO
weight in commit 2dbc45a ("cgroup: Also set io.bfq.weight"). This
is possible with kernels >= v5.4 with commit 795fe54c2a82 ("bfq: Add
per-device weight"). A warning is logged if the write fails on older kernels.

Is there any sensible and usable way to detect this? People tend to complain about warnings, especially if they are repeated. If there's an error, having something like static bool supported or such would make it already a bit nicer

@yuwata
Copy link
Member

yuwata commented Jan 14, 2022

This patch extends to devices what is already done for default cgroup IO
weight in commit 2dbc45a ("cgroup: Also set io.bfq.weight"). This
is possible with kernels >= v5.4 with commit 795fe54c2a82 ("bfq: Add
per-device weight"). A warning is logged if the write fails on older kernels.

Is there any sensible and usable way to detect this? People tend to complain about warnings, especially if they are repeated. If there's an error, having something like static bool supported or such would make it already a bit nicer

What errno we get from old kernels?

@Werkov
Copy link
Contributor Author

Werkov commented Jan 14, 2022

That should be good old -EINVAL. The warnings could be switched to debug level after a single warning with some limited reliability (similar to what is done with cgroup-compat logging).
(OTOH, this also means systems mixing kernel from 2019- with systemd from 2022+. So I don't know how much it's worth adding the compat code.)

@bluca
Copy link
Member

bluca commented Jan 14, 2022

if it's not too complicated I'd say it's worth it, 2019 is not that long ago

@Werkov
Copy link
Contributor Author

Werkov commented Jan 14, 2022

I've added two more commits to the branch with a draft of possible kernel version aware logging. I admit I don't like it, so I'm ready drop the two commits again :-)
WDYT?

@Werkov
Copy link
Contributor Author

Werkov commented Jan 14, 2022

And one more commit stacked to address @keszybz's testing concern.

#define LOG_LEVEL_CGROUP_WRITE(r, ...) (IN_SET(abs(r), ENOENT, EROFS, EACCES, EPERM, ##__VA_ARGS__) ? LOG_DEBUG : LOG_WARNING)

static const char *cgroup_jump_versions[_CGROUP_JUMP_MAX] = {
"5.4", /* 795fe54c2a82 ("bfq: Add per-device weight"). */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't usually do version comparison as enterprise kernels backport all the stuff - I'd say a static bool supported = true/false would be enough? log once, then skip it forever

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FTR, there's a precedent. I added the version check (not much excited about that neither) to properly handle EINVALs on the newer kernels.

OTOH (when checking the current kernel now), if we assume only the systemd well-formatted input, there should be possible only ENODEV, EOPNOTSUPP, ERANGE (avoided by scaling), rarely EBUSY or ENOMEM. I.e. EINVAL could be demoted to debug in any kernel.

I'll simplify the check then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, thanks

@poettering
Copy link
Member

hmm, I really dislike the idea of that kernel version table. Please let's not do that. Why?

Quite frankly, we should have never merged that BFQ supports. It's such a mess, requiring myriads of work-arounds everywhere, because it always behaves differently than the regular scheduler. Please let's not add major infra for dealing with apparently incomplete stuff like this, in particular not kernel version checks stuff which so far we always fended off.

@Werkov
Copy link
Contributor Author

Werkov commented Mar 10, 2022

Note that I share the sentiment towards kernel version checking and simpler solution should exist (not to pollute issue tracker, this should have likely been tagged as reviewed, needs rework before I get down to it).

There are two possible IO control mechanisms (io.cost and BFQ) that
expose separate weights APIs. In systemd units we configure only a
single device weight value so we need to write it to both attribute
files.
This patch extends to devices what is already done for default cgroup IO
weight in commit 2dbc45a ("cgroup: Also set io.bfq.weight"). This
is possible with kernels >= v5.4 with commit 795fe54c2a82 ("bfq: Add
per-device weight"). A warning is logged if the write fails on older kernels.

Since per-device weight of a particular device cannot use both io.cost
policy and BFQ scheduler, at least one warning is always expected with
this approach (the method that is unconfigured).
@poettering
Copy link
Member

Anyway, the kernel version infra this adds is way over the top. find a different way, let's not add something complex like that.

I don't really grok the issue here though, i.e. there are two distinct attributes? why not write both, and only complain if both fail?

Also, why doesn't the kernel report the feature set in /sys/kernel/cgroup/features? That's what that file is for...

(As before: bfq appears a totally incomplete mess, I am tmpted to just remove all support for it. We shouldn't support something that pretty obviously is chaos and incomplete).

@Werkov
Copy link
Contributor Author

Werkov commented Mar 24, 2022

Force pushed another iteration (it's also rebased):

  • kernel support is simplified to mere -EINVAL check,
  • error evaluation of two-path write to avoid unnecessary warnings,
  • (restored runtime tests of non-macro BFQ_WEIGHT by dropping a revert patch).

PTAL.

(I'm also not delighted by the state of BFQ APIs but that's what the world evolved into and doesn't seem to change. And since there are BFQ users, at some stage the mess must be harnessed. (I see no reason for not having the io.bfq.weight change captured in /sys/kernel/cgroup/features besides it was forgotten, which makes it less useful nowadays.))

src/core/cgroup.c Outdated Show resolved Hide resolved
src/core/cgroup.c Outdated Show resolved Hide resolved
src/core/cgroup.c Outdated Show resolved Hide resolved
src/core/cgroup.c Outdated Show resolved Hide resolved
src/core/cgroup.c Outdated Show resolved Hide resolved
@poettering poettering added reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks and removed please-review labels Mar 24, 2022
Use arithmetic over explicit list of existing property names.
Old kernels (795fe54c2a82 ("bfq: Add per-device weight") v5.4) won't
allow io.bfq.weight per-device configuration, don't get too excited
about it but give the user one warning message to be aware this settings
don't apply.
A single device cannot have more than a single proportional control
policy at the same time. So either io.bfq.weight or io.weight will
always fail. Process the resulting values and emit a warning message
only when none was really configured.

This patch also drops warnings when setting [blk]io.bfq.weight fails,
the code would be too convoluted otherwise.
@Werkov
Copy link
Contributor Author

Werkov commented Mar 29, 2022

Force pushed changes:

  • property name (squashable),
  • kernel version comment,
  • branch conditions simplified,
  • aggregation function simplified.

@bluca bluca added please-review and removed reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks labels Mar 29, 2022
@systemd systemd deleted a comment from lgtm-com bot Mar 29, 2022
@systemd systemd deleted a comment from lgtm-com bot Mar 29, 2022
@lgtm-com
Copy link

lgtm-com bot commented Mar 29, 2022

This pull request introduces 1 alert when merging 3e6eafd into b64f6d8 - view on LGTM.com

new alerts:

  • 1 for FIXME comment

@keszybz
Copy link
Member

keszybz commented Apr 7, 2022

Let's do this. The interface is so bad that no matter what we try to do, the code will need to be ugly.

@keszybz keszybz merged commit cbb6068 into systemd:main Apr 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

5 participants