-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
manager: always reap first the process for which we already got SIGCHLD #12919
Conversation
Do note that I was not able to reproduce the race on my test system so I couldn't verify that patch actually fixes the issue. I think it might, but YMMV. Also whoever is able to reproduce #10872, it would be cool if you could give it a try. |
If we get a SIGCHLD we enable and eventually dispatch sigchld_event_source where we actually reap the process. We received SIGCHLD for the specific PID so wait for that process first. Motivation to do this is to prevent problem due to our state machine for mount units relying on the fact that we always dispatch mountinfo notifications before dispatching sigchld handler for the mount. Previously, this was racy because we might have called manager_dispatch_sigchld() for completely unrelated process but we would actually reap the mount process which completed in the meantime. sigchld handler for the mount unit would then fail the mount unit because we haven't dispatched mountinfo notification yet. event| mount kernel PID 1 ------------------------------------------------------------------------ 1 | forks off mount as PID x ------------------------------------------------------------------------ 2 | receives SIGCHLD for PID y ------------------------------------------------------------------------ 3 | enables sigchld_event_source ------------------------------------------------------------------------ 4 | dispatches sigchld_event_source ------------------------------------------------------------------------ 5 | mount() mountinfo_notif ------------------------------------------------------------------------ 6 | exit() ------------------------------------------------------------------------ 7 | calls waitid() with P_ALL ------------------------------------------------------------------------ 8 | calls sigchld_handler for mount ------------------------------------------------------------------------ 9 | fails the mount unit since | mountinfo_notif wasn't | processed yet ------------------------------------------------------------------------ Fixes systemd#10872
0c13bfb
to
eb653ed
Compare
this doesn't work... SIGCHLD is not a queue. i.e. if fifteen child processes die all in a very short time window then in theory we should get fifteen seperate SIGCHLD delivered you'd say. But that's not how this works in the kernel, unfortunately: for each unix signal (excluding realtime signals, which are different) only a single field exists per process: if the field is empty, the SIGCHLD metadata is stored there. But if it is already set, then every new SIGCHLD just overrides the earlier data. This means if fifteen children die at once, then PID 1 might only process the SIGCHLD at a time where only the last process is actually still stored in the field, and all earlier ones have been overwritten. Yes, UNIX is stupid. But this means you patch doesn't fix the bug unfortunately: it might very well happen that the SIGCHLD we care for is actually on eof the overwritten ones... |
As discussed elsewhere, I figure this should work, as long as we get a guarantee that the metadata we get on the SIGCHLD is indeed the oldest metadata around, i.e. the kernel drops any new SIGCHLD, and never the already pending one if multiple are seen without them being handled. @msekletar volunteered to prep a man page patch to document this kernel behaviour ;-) |
Hmm, so I wonder, does the ordering thing really make this work? Let's say this this happens:
i.e. the change to the man page is good, but the behaviour it describes is not sufficient to make this PR here work, or what am I missing? |
Hmm, I assumed that we would dispatch signalfd event source with higher priority if mount already exited and we have a pending SIGCHLD, i.e. before the second round of waitid() with P_ALL. The idea is to always reap the process for which we got explicit SIGCHLD and only then waitd() with P_ALL. But if that is not the case, then the question is whether we can make it so w/o breaking anything? |
@poettering you are probably right. This patch would probably improve the situation a bit but is not sufficient in itself. |
If we get a SIGCHLD we enable and eventually dispatch
sigchld_event_source where we actually reap the process. We received
SIGCHLD for the specific PID so wait for that process first.
Motivation to do this is to prevent problem due to our state machine for
mount units relying on the fact that we always dispatch mountinfo
notifications before dispatching sigchld handler for the
mount. Previously, this was racy because we might have called
manager_dispatch_sigchld() for completely unrelated process but we would
actually reap the mount process which completed in the meantime. sigchld
handler for the mount unit would then fail the mount unit because we
haven't dispatched mountinfo notification yet.
Fixes #10872