Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

manager: always reap first the process for which we already got SIGCHLD #12919

Closed
wants to merge 1 commit into from

Conversation

msekletar
Copy link
Contributor

If we get a SIGCHLD we enable and eventually dispatch
sigchld_event_source where we actually reap the process. We received
SIGCHLD for the specific PID so wait for that process first.

Motivation to do this is to prevent problem due to our state machine for
mount units relying on the fact that we always dispatch mountinfo
notifications before dispatching sigchld handler for the
mount. Previously, this was racy because we might have called
manager_dispatch_sigchld() for completely unrelated process but we would
actually reap the mount process which completed in the meantime. sigchld
handler for the mount unit would then fail the mount unit because we
haven't dispatched mountinfo notification yet.

event| mount         kernel              PID 1
------------------------------------------------------------------------
1    |                                   forks off mount as PID x
------------------------------------------------------------------------
2    |                                   receives SIGCHLD for PID y
------------------------------------------------------------------------
3    |                                   enables sigchld_event_source
------------------------------------------------------------------------
4    |                                   dispatches sigchld_event_source
------------------------------------------------------------------------
5    | mount()       mountinfo_notif
------------------------------------------------------------------------
6    | exit()
------------------------------------------------------------------------
7    |                                   calls waitid() with P_ALL
------------------------------------------------------------------------
8    |                                   calls sigchld_handler for mount
------------------------------------------------------------------------
9    |                                   fails the mount unit since
     |                                   mountinfo_notif wasn't
     |                                   processed yet
------------------------------------------------------------------------

Fixes #10872

@msekletar
Copy link
Contributor Author

Do note that I was not able to reproduce the race on my test system so I couldn't verify that patch actually fixes the issue. I think it might, but YMMV.

Also whoever is able to reproduce #10872, it would be cool if you could give it a try.

@yuwata yuwata added the pid1 label Jul 3, 2019
If we get a SIGCHLD we enable and eventually dispatch
sigchld_event_source where we actually reap the process. We received
SIGCHLD for the specific PID so wait for that process first.

Motivation to do this is to prevent problem due to our state machine for
mount units relying on the fact that we always dispatch mountinfo
notifications before dispatching sigchld handler for the
mount. Previously, this was racy because we might have called
manager_dispatch_sigchld() for completely unrelated process but we would
actually reap the mount process which completed in the meantime. sigchld
handler for the mount unit would then fail the mount unit because we
haven't dispatched mountinfo notification yet.

event| mount         kernel              PID 1
------------------------------------------------------------------------
1    |                                   forks off mount as PID x
------------------------------------------------------------------------
2    |                                   receives SIGCHLD for PID y
------------------------------------------------------------------------
3    |                                   enables sigchld_event_source
------------------------------------------------------------------------
4    |                                   dispatches sigchld_event_source
------------------------------------------------------------------------
5    | mount()       mountinfo_notif
------------------------------------------------------------------------
6    | exit()
------------------------------------------------------------------------
7    |                                   calls waitid() with P_ALL
------------------------------------------------------------------------
8    |                                   calls sigchld_handler for mount
------------------------------------------------------------------------
9    |                                   fails the mount unit since
     |                                   mountinfo_notif wasn't
     |                                   processed yet
------------------------------------------------------------------------

Fixes systemd#10872
@poettering
Copy link
Member

this doesn't work... SIGCHLD is not a queue. i.e. if fifteen child processes die all in a very short time window then in theory we should get fifteen seperate SIGCHLD delivered you'd say. But that's not how this works in the kernel, unfortunately: for each unix signal (excluding realtime signals, which are different) only a single field exists per process: if the field is empty, the SIGCHLD metadata is stored there. But if it is already set, then every new SIGCHLD just overrides the earlier data. This means if fifteen children die at once, then PID 1 might only process the SIGCHLD at a time where only the last process is actually still stored in the field, and all earlier ones have been overwritten.

Yes, UNIX is stupid.

But this means you patch doesn't fix the bug unfortunately: it might very well happen that the SIGCHLD we care for is actually on eof the overwritten ones...

@poettering
Copy link
Member

As discussed elsewhere, I figure this should work, as long as we get a guarantee that the metadata we get on the SIGCHLD is indeed the oldest metadata around, i.e. the kernel drops any new SIGCHLD, and never the already pending one if multiple are seen without them being handled.

@msekletar volunteered to prep a man page patch to document this kernel behaviour ;-)

@poettering
Copy link
Member

Hmm, so I wonder, does the ordering thing really make this work?

Let's say this this happens:

  1. Random process X dies, PID 1 gets SIGCHLD
  2. PID 1 enters waitid() event handler, in order to start to process process X
    3.1 PID 1 enters waitid() event handler a second time, to call waitid(P_ALL), to process pending processes whose SIGCHLD might have been dropped
    3.2 (parallel to 2.1, early on) Our /bin/mount process (M) dies, PID 1 gets another SIGCHLD
    3.3 Now the event handler that begin in 2.1 gets to the part where it actually calls waitid(P_ALL), and gets the result of our process M, and handles that first
  3. PID 1 processes /p/s/mi, too late

i.e. the change to the man page is good, but the behaviour it describes is not sufficient to make this PR here work, or what am I missing?

@msekletar
Copy link
Contributor Author

Hmm, I assumed that we would dispatch signalfd event source with higher priority if mount already exited and we have a pending SIGCHLD, i.e. before the second round of waitid() with P_ALL. The idea is to always reap the process for which we got explicit SIGCHLD and only then waitd() with P_ALL.

But if that is not the case, then the question is whether we can make it so w/o breaking anything?

@msekletar
Copy link
Contributor Author

@poettering you are probably right. This patch would probably improve the situation a bit but is not sufficient in itself.

@msekletar msekletar closed this Jul 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

systemd: mount units fail with "Mount process finished, but there is no mount."
3 participants