Ensure that parent is alive after installing PR_SET_PDEATHSIG #141

Merged
merged 2 commits into from Sep 14, 2016
Jump to file or symbol
Failed to load files and symbols.
+20 −1
Split
View
@@ -317,6 +317,10 @@ void sc_create_or_join_ns_group(struct sc_ns_group *group)
die("cannot create eventfd for mount namespace capture");
}
debug("forking support process for mount namespace capture");
+ // Store the PID of the "parent" process. This done instead of calls to
+ // getppid() because then we can reliably track the PID of the parent even
+ // if the child process is re-parented.
+ pid_t parent = getpid();
// Glibc defines pid as a signed 32bit integer. There's no standard way to
// print pid's portably so this is the best we can do.
pid_t pid = fork();
@@ -345,6 +349,22 @@ void sc_create_or_join_ns_group(struct sc_ns_group *group)
if (prctl(PR_SET_PDEATHSIG, SIGINT, 0, 0, 0) < 0) {
die("cannot set parent process death notification signal to SIGINT");
}
+ // Check that parent process is still alive. If this is the case then
+ // we can *almost* reliably rely on the PR_SET_PDEATHSIG signal to wake
+ // us up from eventfd_read() below. In the rare case that the PID numbers
+ // overflow and the now-dead parent PID is recycled we will still hang
+ // forever on the read from eventfd below.
+ debug("ensuring that parent process is still alive");
+ if (kill(parent, 0) < 0) {
+ switch (errno) {
+ case ESRCH:
+ debug("parent process has already terminated");
+ abort();
+ default:
+ die("cannot ensure that parent process is still alive");
+ break;
+ }
+ }
@jdstrand

jdstrand Sep 14, 2016

Contributor

This is definitely better than no check, but this check is not perfect. The parent could die and the kernel could assign the parent pid to a new process. Perhaps it would be better to simply timeout. The parent is supposed to send that signal quite fast... if the child doesn't get that in say, 10 seconds, just debug("parent process did not signal with 10 seconds"); abort();

@zyga

zyga Sep 14, 2016

Collaborator

I have this in a separate branch and I agree we should do both.

@jdstrand

jdstrand Sep 14, 2016

Contributor

After talking on IRC, the intent of this is simply an early warning system if the parent dies right away and that an in progress PR will implement a timeout. In this light, this PR is fine to land as is-- in the normal case the pid won't be reused and we can bail fast but if the pid is reused, that process won't send the signal anyway and the timeout will expire and we can cleanup then.

+1 provided that a comment is added regarding pid reuse and that we'll use a timeout to handle that.

if (fchdir(group->dir_fd) < 0) {
die("cannot move process for mount namespace capture to namespace group directory");
}
@@ -354,7 +374,6 @@ void sc_create_or_join_ns_group(struct sc_ns_group *group)
if (eventfd_read(group->event_fd, &value) < 0) {
die("cannot read expected data from eventfd");
}
- pid_t parent = getppid();
debug
("capturing mount namespace of process %d in namespace group %s",
(int)parent, group->name);