fix `fork()` usage in ocamltest #11111

gasche · 2022-03-13T22:31:22Z

The error-reporting code in ocamltest's run_unix.c is incorrect with respect to the new Multicore runtime, because it tries to use the OCaml runtime (to write to an OCaml channel) without calling caml_atfork_hook first.

reproduction information

Add

child_error("boom");

at the entry of run_command_child in ocamltest/run_unix.c, then run

make -C ocamltest
./ocamltest/ocamltest testsuite/tests/lib-queue/test.ml

Observed output contains:

 ... testing 'test.ml' with 2 (bytecode) => Fatal error: Fatal error during unlock: Operation not permitted
Process 1228576 got signal 6(Aborted), core dumped

On my system, the core dump has the following backtrace:

(gdb) #0  0x00007f18a664e2a2 in raise () from /lib64/libc.so.6
#1  0x00007f18a66378a4 in abort () from /lib64/libc.so.6
#2  0x00000000004377e1 in caml_fatal_error (msg=msg@entry=0x4496e8 "Fatal error during %s: %s\n") at misc.c:117
#3  0x00000000004389bc in caml_plat_fatal_error (action=action@entry=0x44a63b "unlock", err=<optimized out>) at platform.c:36
#4  0x0000000000424b0e in check_err (err=<optimized out>, action=0x44a63b "unlock") at caml/platform.h:130
#5  caml_plat_unlock (m=<optimized out>) at caml/platform.h:163
#6  caml_release_domain_lock () at domain.c:1322
#7  0x000000000043ddb5 in caml_write_fd (fd=3, flags=<optimized out>, buf=buf@entry=0x23c3dbc, n=n@entry=16) at unix.c:97
#8  0x000000000042feb7 in caml_flush_partial (channel=channel@entry=0x23c3d50) at io.c:257
#9  0x000000000042ff60 in caml_flush (channel=0x23c3d50) at io.c:272
#10 0x000000000041b2d7 in logToChannel ()
#11 0x000000000041ac59 in mylog ()
#12 0x000000000041ad29 in myperror_with_location.constprop ()
#13 0x000000000041b227 in run_command ()
#14 0x000000000041b4b8 in caml_run_command ()
#15 0x0000000000441bcd in caml_interprete (prog=<optimized out>, prog_size=<optimized out>) at interp.c:1012
#16 0x0000000000443572 in caml_main (argv=0x7ffce74a5da8) at startup_byt.c:578
#17 0x000000000041401c in main (argc=<optimized out>, argv=<optimized out>) at main.c:37

The problem comes from trying to use the OCaml runtime from a forked
child, without calling the new Multicore-OCaml function
caml_atfork_hook from domain.h. At this point the forked child has
its domain lock in an incoherent state, and calling caml_write_fd
fails. (Compare the ceremony around fork() in ocamltest/run_unix.c and
in otherlibs/unix/unix_fork.c)

Note: the backtrace also shows that C files in ocamltest were not
compiled with debug information. The PR also adds CFLAGS=-g in
ocamlest/Makefile to fix this.

gasche · 2022-03-13T22:40:28Z

Note: I wonder if there is a deeper problem here.

Can OCaml 5.0 users use caml_atfork_hook, is it exposed in the public API? (And, ideally, documented somewhere?)
Is it possible to write fork()-using C code that works with both 4.x and 5.0?

Maybe we can recommend to always use unix_fork from C code, and never the naked fork() call, to avoid this issue?

(Note: even with 4.x, there were reasons to recommend this, for example for Eventlog users, as unix_fork has the right cleanup logic there.)

xavierleroy · 2022-03-14T08:20:37Z

It's not obvious to me why the child process uses the OCaml runtime. If we were to implement spawning using posix_spawn instead of fork + exec (resulting in nice performance gains), there would be no opportunity for the child to use the OCaml runtime. For the same reason, the Win32 implementation of command spawning cannot use the OCaml runtime. So, what's going on here?

gasche · 2022-03-14T08:29:09Z

The ocamltest implementation of "running a command" takes as parameter an OCaml channel for logging. At the point where the child sets things up to call excevp, any error will call error-reporting functions that write to this logging channel by calling OCaml runtime functions. See run_command_child and logToChannel.

In the medium term, maybe we could reconsider the idea of maintaining yet another implementation of spawn. In the short term, the PR is a minimal fix for a bug introduced by the change of runtime assumptions around fork().

xavierleroy · 2022-03-14T08:35:47Z

At the point where the child sets things up to call excevp, any error will call error-reporting functions that write to this logging channel by calling OCaml runtime functions

Well, I find this questionable, even in OCaml 4. For example, there's a risk of double-flushing of OCaml channels (once in the parent, once in the child). resulting in duplicate output.

gasche · 2022-03-14T08:42:57Z

Good point. I did wonder about flushing before the fork() and then writing to the file descriptor directly, for example. But this also requires subtle and invasive changes to the implementation. I'm not currently planning to reimplement the run_command routine of ocamltest.

On the other hand, I think we do have OCaml programs that fork() and then use the OCaml runtime on both sides. I would expect those programs to call Unix.fork directly, but some could also possibly fork from C. We have decided that forking a multi-domain program would not be supported (and Unix.fork will error in this situation), but we intend to support forking a single-domain program, even under the Multicore runtime. So we can take this one example from the compiler distribution as a case study of compatibility issues that our users could encounter inthe wild. (Possibly as well in situations where the fork() usage is questionable, but that work fine in practice under 4.x.)

xavierleroy

OK, let's merge this as a bug fix and think about alternate implementations later. Can you please fix the Changes entry then squash-and-merge?

xavierleroy · 2022-03-16T14:45:06Z

On the other hand, I think we do have OCaml programs that fork() and then use the OCaml runtime on both sides. I would expect those programs to call Unix.fork directly, but some could also possibly fork from C.

I really hope this pattern (fork in C, then use the OCaml runtime system in the child process) is not to be found in the wild. If I understand correctly, this pattern causes a fatal error in 5.0, so we'll hear about it.

gasche · 2022-03-16T14:50:31Z

Thanks! I updated and rebased.

If I understand correctly, this pattern causes a fatal error in 5.0, so we'll hear about it.

It fails on my machine on the first time that we try to release the runtime lock / enter a blocking section. (I'm slightly nervous at the idea that our options to fix the issue in a backward-compatible ways will be very limited once 5.0 and 4.14 are release without a coherent support for forking, but oh well, let's assume that nobody does that indeed.)

enable debug info in ocamltest .c files

9ee4c07

gasche added the bug label Mar 13, 2022

xavierleroy approved these changes Mar 16, 2022

View reviewed changes

fix fork() usage in ocamltest

3a61327

gasche added the merge-me label Mar 16, 2022

gasche force-pushed the ocamltest-fork branch from c2e7301 to 3a61327 Compare March 16, 2022 14:48

gasche merged commit 81eae9c into ocaml:trunk Mar 16, 2022

gasche mentioned this pull request Mar 23, 2022

Clear error when the ocaml working directory path contains spaces #11113

Merged

shindere mentioned this pull request Apr 1, 2022

ocamltest: keep debug information for all C files #11153

Merged

gasche mentioned this pull request Jun 9, 2022

Restore basic functionality to the bytecode debugger #11065

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix `fork()` usage in ocamltest #11111

fix `fork()` usage in ocamltest #11111

gasche commented Mar 13, 2022

gasche commented Mar 13, 2022

xavierleroy commented Mar 14, 2022

gasche commented Mar 14, 2022 •

edited

xavierleroy commented Mar 14, 2022

gasche commented Mar 14, 2022

xavierleroy left a comment

xavierleroy commented Mar 16, 2022

gasche commented Mar 16, 2022

fix fork() usage in ocamltest #11111

fix fork() usage in ocamltest #11111

Conversation

gasche commented Mar 13, 2022

reproduction information

gasche commented Mar 13, 2022

xavierleroy commented Mar 14, 2022

gasche commented Mar 14, 2022 • edited

xavierleroy commented Mar 14, 2022

gasche commented Mar 14, 2022

xavierleroy left a comment

Choose a reason for hiding this comment

xavierleroy commented Mar 16, 2022

gasche commented Mar 16, 2022

fix `fork()` usage in ocamltest #11111

fix `fork()` usage in ocamltest #11111

gasche commented Mar 14, 2022 •

edited