Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cgroup-util: allow cg_read_pid() to return unmappable PIDs #32722

Closed
wants to merge 1 commit into from

Conversation

keszybz
Copy link
Member

@keszybz keszybz commented May 8, 2024

In some environments, namely WSL2, the cgroup.procs PID list for some reason contain a ton of zeros everywhere, most likely those are from other instances under the same WSL Kernel, which at least always hosts the system instance with the X/Wayland/PA/Pipe server.

Without this patch, whenever cg_read_pid encounters such a zero, it throws an error. This makes systemd near unusable inside of WSL. Change cg_read_pid() to return 0, and adjust all callers to handle that appropriately. In general, we cannot do anything with such processes, so most operations have to be refused. The only thing we can do with them is count them, in particular, we can answer the question whether the cgroup is empty in the negative.

On normal systems, where the list does not contain any zeros to begin with, this has no averse effects.

This replaces #32534. See also: microsoft/WSL#8879.

@github-actions github-actions bot added util-lib cgtop please-review PR is ready for (re-)review by a maintainer labels May 8, 2024
Copy link

github-actions bot commented May 8, 2024

Important

An -rc1 tag has been created and a release is being prepared, so please note that PRs introducing new features and APIs will be held back until the new version has been released.

src/basic/cgroup-util.c Outdated Show resolved Hide resolved
/* Un unmappable PID. We certainly cannot create a pidref for it,
* so return an error. (Also 0 would be interpreted as us by
* pidref_set_pid(), which we cannot allow to happen.) */
return -ESRCH;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning an error here would not solve the underlying issue the original patch attempted to solve.
It would now just fail with -ESRCH instead of -EIO, since the paths that fail most loudly on WSL all call cg_read_pidref.

Hence the change to skip these pids by default, and only return them when explicitly requested.

The failures there mostly happen in append_cgroup() in dbus-unit.c, but there is multiple functions that are candidates for such failures.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the original issue was about GetUnitProcesses returning error. So append_cgroup must ignore -ESRCH.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I changed it to ignore those PIDs. I think this is the right thing to do for all callers, who already sometimes ignore pids for various reasons.

src/shared/cgroup-show.c Outdated Show resolved Hide resolved
/* Un unmappable PID. We certainly cannot create a pidref for it,
* so return an error. (Also 0 would be interpreted as us by
* pidref_set_pid(), which we cannot allow to happen.) */
return -ESRCH;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the original issue was about GetUnitProcesses returning error. So append_cgroup must ignore -ESRCH.

@YHNdnzj
Copy link
Member

YHNdnzj commented May 8, 2024

The new approach does seem simpler, but still requires fixup.

@YHNdnzj YHNdnzj added cgroups reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks and removed please-review PR is ready for (re-)review by a maintainer labels May 8, 2024
@github-actions github-actions bot added please-review PR is ready for (re-)review by a maintainer and removed reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks labels May 9, 2024
src/basic/cgroup-util.c Outdated Show resolved Hide resolved
In some environments, namely WSL2, the cgroup.procs PID list for some reason
contain a ton of zeros everywhere, most likely those are from other instances
under the same WSL Kernel, which at least always hosts the system instance with
the X/Wayland/PA/Pipe server.

Without this patch, whenever cg_read_pid encounters such a zero, it throws an
error. This makes systemd near unusable inside of WSL. Change cg_read_pid()
to return 0, and adjust all callers to handle that appropriately. In general,
we cannot do anything with such processes, so most operations have to be refused.
The only thing we can do with them is count them, in particular, we can answer
the question whether the cgroup is empty in the negative.

On normal systems, where the list does not contain any zeros to begin with,
this has no averse effects.

This replaces systemd#32534.
See also: microsoft/WSL#8879.

Co-authored-by: Timo Rothenpieler <timo@rothenpieler.org>
@BtbN
Copy link
Contributor

BtbN commented May 9, 2024

The main potential future issue I see with this vs. the flags approach is that with cg_read_pid now just always returning zero-pids, it might cause a similar issue again in the future, because someone developed a feature that does not handle the zeros properly, but never encountered them in a normal environment.
Having cg_read_pid itself skip them by default avoids any such future issues.

Would a second cg_read_pid function be an option? i.e. cg_read_pid_ext or cg_read_pid_raw, which doesn't skip them and can be used in the few cases where the zeros actually need to be evaluated?

/* An unmappable PID. We certainly cannot create a pidref for it, so ignore
* it like other PIDs that we cannot find. (Also 0 would be interpreted as
* us by pidref_set_pid(), which we cannot allow to happen.) */
continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, now cg_read_pidref skips over 0 pids. This implication seems somehow weirder compared to the flags approach.

E.g. I think unit_search_main_pid should refuse 0 pids, since for services the cgroup should really be fully controlled by us, and unmappable pids are inherently an error.

So ideally, when the flag is set for cg_read_pid, it should return 0 pid. But when set for cg_read_pidref, it should return an error. To make this controllable, I think flags are needed.

@bluca bluca added this to the v256 milestone May 10, 2024
@YHNdnzj YHNdnzj removed the please-review PR is ready for (re-)review by a maintainer label May 10, 2024
@YHNdnzj
Copy link
Member

YHNdnzj commented May 10, 2024

I think #32534 is good to go, and that approach is more correct.

@poettering
Copy link
Member

So I agree that an approach where the iteration API by default hides the zero pids, but there's a way to make them appear would be best. either via a flags param or via two functions with slightly different names i.e. (cg_read_pid() and cg_read_pid_raw() sounds fine to me) is OK to me.

I guess the zero pids only need to be considered for counting, and for empty/non-empty checks, hence I think returning them should be the exception, not the norm

@keszybz
Copy link
Member Author

keszybz commented May 14, 2024

OK, let's close this then.

@keszybz keszybz closed this May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

5 participants