Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reimplement Unix.create_process and related functions without Unix.fork #9573

Merged
merged 6 commits into from
May 19, 2020

Conversation

xavierleroy
Copy link
Contributor

@xavierleroy xavierleroy commented May 17, 2020

Currently, under POSIX, the high-level process creation functions from module Unix (create_process, system, open_process_*) are implemented with Unix.fork and Unix.execv*, in classic Unix manner.

This implementation will probably be a problem for Multicore OCaml (see ocaml-multicore/ocaml-multicore#241). Plus, it has some strange behaviors in today's multithreaded programs (e.g. a finalizer can run in the child process before the exec). Finally, a more efficient implementation is possible on systems that support posix_spawn.

This PR reimplements the high-level process creation functions on top of a new spawn C primitive, which is implemented with posix_spawn when available, or with fork() and execv*() as a fallback. Note that the fallback implementation guarantees that no OCaml code runs in the child process before the exec, hence should work fine with threads. The posix_spawn implementation should be noticeably more efficient under Linux/Glibc at least.

An unfortunate consequence of this work is that another part of the Unix library, currently implemented in OCaml, must be reimplemented in C: the emulation of the execvpe system call when it's not available. PR #1414 introduced the emulation written in OCaml. However, the new spawn C primitive needs to call execvpe from C, hence if it's not available, a C emulation must be used.

This PR is best reviewed commit per commit: the first commit (559c9bb) deals with the execvpe emulation, the following commits introduce spawn and use it in Unix.

The test suite already contained good tests for Unix.create_process and for Unix.execvpe. CI precheck testing is in progress.

@xavierleroy
Copy link
Contributor Author

CI precheck passed.

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks correct to me, modulo a small corner case on dup2/close ordering.

It is strange that by far the hardest thing in executing another process is to deal with file descriptors...

path = String_val(executable);
argv = cstringvect(args, "create_process");
if (Is_block(optenv)) {
envp = cstringvect(Field(optenv, 0), "create_process");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Advertisement: If we had already reviewed and merged #9569, this could use the new nice option-specific macros there.

r = posix_spawn_file_actions_adddup2(&act, src, dst);
if (r != 0) goto error;
r = posix_spawn_file_actions_addclose(&act, src);
if (r != 0) goto error;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a minor point, but I think it would be safer to perform all dup2 actions before all close actions, in case two of the source descriptors (typicall stdout and stderr) are themselves duplicates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well spotted! Indeed, the case >file 2>&1 was handled incorrectly. It's not enough to do all the close at the end, because we end up closing the same descriptor twice.

Commit c5d9ab3 fixes the issue and commit 4120482 adds a test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I reviewed the fix and I believe that (while it does not simplify the control flow of this piece of code) it is correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that's a lot of gotos, but for a good cause.

src = Int_val(Field(redirect, dst));
if (src != dst) {
if (dup2(src, dst) == -1) _exit(ERROR_EXIT_STATUS);
if (close(src) == -1) _exit(ERROR_EXIT_STATUS);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Same remark on dup2/close ordering.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar fix in c5d9ab3

}
cstringvect_free(argv);
if (envp != NULL) cstringvect_free(envp);
return Val_long(pid);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: I think the code would be easier to read if this parent cleanup&return logic was a if (pid != 0) right after the if (pid == -1) case, so that the early-exit cleanup logics are close together, with the exec-in-child case being the main body of the function. (One could even have a single case if (pid != 0) and then error on -1 before returning, as you do in the posix_spawn version, but that could also be confusing.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. I tried the alternate style in c2ab1c3 .


/* Try executing directly. Will not return if it succeeds. */
execve(path, argv, envp);
if (errno != ENOEXEC) return errno;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I compared this to deviations from the C execve here from the the OpenBSD libc execve code, and can't find any changes in errno responses. The OpenBSD control flow is convoluted due to removing a retry loop for ETXTBUSY and not refactoring the code.

The only change in behaviour I can spot is that argv[0] is passed as sh in OpenBSD, while /bin/sh is invoked by the exece. I don't think this changes the behaviour of any shells.

r = unix_execve_script(name, argv, envp);
} else {
/* Construct the string "directory/name" */
fullname = malloc(dirlen + 1 + namelen + 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, comparing to OpenBSD execvpe, a minor deviation is that you don't check that strlen(fullname) < PATH_MAX. A possible impact of this is that an attacker could cause a binary with a shorter name to be executed by unix_execve_script. Weirdly, OpenBSD just complains to stderr instead of returning an errno in this case. More questions than answers after looking at OpenBSD's implementation ;-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I noticed the hardcoded limit in the OpenBSD implementation. But I could not see anything like that in the Glibc implementation of execvpe.

Note that we could also treat ENAMETOOLONG as an error that stops the search in the path, avoiding the attack you outline.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think turning it into an error makes sense. It's hard to see how such a search could reliably continue if you are right at the edge of the name length and about to fall afoul of PATH_MAX or NAME_MAX anyway. I could see how (e.g.) truncating lspci to ls and controlling the arguments might allow for some interesting exploits.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, even more exciting: truncating sha256sum to sh

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit 9a498eb causes ENAMETOOLONG to stop the search.

I don't think truncation was possible with the original code: /very/long/path/sha256sum could cause ENAMETOOLONG, but would not be truncated to /very/long/path/sh then executed.

What could happen: say there is a restricted shell sh in /very/long/path and the full shell sh in /bin. If PATH is /very/long/path:/bin, and we try to execute sh, the restricted shell is not found and the full shell /bin/sh is executed instead (in my original code). In the current code, ENAMETOOLONG is returned.

@xavierleroy
Copy link
Contributor Author

I'll need to clean up the history before we merge.

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm out of review comments and I believe the code is correct, so here is an approval stamp.

A question left: you mention that this "should be more efficient under Linux/glibc". Did you try to actually measure the performance impact on a small example? (What should we expect? Will Dune magically get 5% faster?)

@xavierleroy
Copy link
Contributor Author

A question left: you mention that this "should be more efficient under Linux/glibc". Did you try to actually measure the performance impact on a small example?

Oh, you're making me run a micro benchmark! At my age!

As a matter of fact: on my Linux machine, running /bin/true 10000 times with Unix.create_process and Unix.waitpid takes 2.4s real time with the old implementation and 1.6s with the new implementation in this PR.

So, yes, the nontrivial implementation of posix_spawn in Glibc/Linux pays off.

(What should we expect? Will Dune magically get 5% faster?)

If Dune uses Unix.create_process, I would expect a small performance improvement. If it uses Unix.fork/Unix.exec* or has its own launcher in C, there will be no difference.

The `execvpe` function is not always provided by C standard libraries.

PR ocaml#1414 implemented an emulation of `execvpe` in terms of `execve`
written in OCaml.  This commit reimplements the emulation of `execvpe`
in C, so that it can be called from C.  It implements exactly the same
algorithm, with one small change detailed next.

Small change: error ENAMETOOLONG stops the search instead of
continuing with the next entry in PATH.  This seems more secure.
Unix.create_process was implemented in OCaml, using Unix.fork
and Unix.execvp.  This commit reimplements it in C, for several reasons:
- The implementation can take advantage of posix_spawn, if available,
  which is usually faster than fork + exec.
- The implementation is more atomic w.r.t. threading.
- Multicore OCaml will probably prohibit Unix.fork in multithreaded
  programs.
Avoid calling Unix.fork and use the new "spawn" primitive, for efficiency
and for improved compatibility with threads and Multicore OCaml.
@xavierleroy
Copy link
Contributor Author

Rebased and partially squashed, now merging.

@xavierleroy xavierleroy merged commit 41548a7 into ocaml:trunk May 19, 2020
@avsm
Copy link
Member

avsm commented May 19, 2020

For the record, dune uses the following spawn code, so it's Unix.fork on Linux/glibc and create_process on win32.

let spawn ?env ~prog ~argv ?(stdin = Unix.stdin) ?(stdout = Unix.stdout)
    ?(stderr = Unix.stderr) () =
  let argv = Array.of_list argv in
  Pid.of_int
    ( if Sys.win32 then
      match env with
      | None -> Unix.create_process prog argv stdin stdout stderr
      | Some env -> Unix.create_process_env prog argv env stdin stdout stderr
    else
      match Unix.fork () with
      | 0 -> (
        try
          ignore (Unix.sigprocmask SIG_SETMASK [] : int list);
          perform_redirections stdin stdout stderr;
          match env with
          | None -> Unix.execv prog argv
          | Some env -> Unix.execve prog argv env
        with _ -> sys_exit 127 )
      | pid -> pid )

@xavierleroy
Copy link
Contributor Author

For the record, dune uses the following spawn code, so it's Unix.fork on Linux/glibc and create_process on win32.

Too bad for Dune, they won't get the mad performance benefits of posix_spawn.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants