Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(sd-pam) process is a CoW trap. Garbage data puts unecessary pressure on virtual memory. #8081

Open
sourcejedi opened this issue Feb 3, 2018 · 3 comments

Comments

@sourcejedi
Copy link
Contributor

commented Feb 3, 2018

Submission type

  • Request for enhancement (RFE)

systemd version the issue has been seen with

code seen in master

Used distribution

Fedora

RFE

I've been overloading my RAM a bit lately.

StackOverflow has a one-liner to print per-process swap usage. I notice that (sd-pam) can end up with more swap accounted than the parent systemd currently has. Like, almost 2MB!

This is basically all going to be dead memory as far as (sd-pam) is concerned. I think it's got to be copy-on-write pages from PID 1.

If the parent is also modifying memory while the child runs, the child will be hosting copies of all the modified memory (copy-on-write). Often the child is performing a sub-task and only uses a subset of that data.

It's possible for it to be significant with a larger parent process and longer-running child process.

$ (echo "COMM PID SWAP"; for file in /proc/*/status ; do awk '/^Pid|VmSwap|Name/{printf $2 " " $3}END{ print ""}' $file; done | grep kB | grep -wv "0 kB" | sort -k 3 -n -r) | column -t | grep -E 'sys|[(]'
systemd-udevd    586    4436   kB
(sd-pam)         2215   3900   kB
(sd-pam)         1023   3896   kB
(sd-pam)         17506  2488   kB
systemd          1      2260   kB
systemd-logind   840    960    kB
systemd          1007   748    kB
systemd-journal  560    656    kB
systemd          2210   652    kB
@poettering

This comment has been minimized.

Copy link
Member

commented Feb 5, 2018

Yes, this is a known problem, and there has been a TODO list item about this for a while, but it's really hard to fix this, and PAM isn't really making things easy here...

To fix this we'd have to split exec_child() in two, and the part from the setup_pam() invocation on would need to be compiled into a separate binary that we can execve() first here, in order to release the cow copies of PID1's memory... But to make that work we'd have to pass all that context info we need over the execve(), and we'd ideally do that in a fashion that for the non-PAM case we'd avoid this extra execve() thing... And that's frickin' awful...

@sourcejedi

This comment has been minimized.

Copy link
Contributor Author

commented Feb 5, 2018

As you say pam doesn't fit in to exec_child() very nicely. But PAMName seems fairly obscure and IIRC there's at least one other caveat to it. I think we could stop using it if we wanted. E.g.

ExecStart=/usr/lib/systemd --user --pam-user=%i

and suggest anyone else consider runuser instead.

It's great to have service files declaratively dropping privileges. But maybe we can give up on fitting arbitrary plugins in to the sequence.

@poettering

This comment has been minimized.

Copy link
Member

commented Feb 6, 2018

well, PAM hooks need to be called with privs still, which means we'd either have to give up on all privs dropping in that case on the systemd side or do PAM stuff inside of PID1's pre-exec child...

It's a fucked situation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
2 participants
You can’t perform that action at this time.