Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C, threads, callbacks, and corrupted local_roots #8807

Open
rixed opened this issue Jul 16, 2019 · 2 comments

Comments

@rixed
Copy link
Contributor

commented Jul 16, 2019

As explained a few weeks ago on the ocaml mailling lists, I stumbled upon a bug (or at least a documentation issue) when including OCaml into a C multithreaded program (admittedly not the canonical case for OCaml programs).

So the entry point is in C (actually, C++), and spawns a few threads, before spawning a specific thread dedicated to OCaml code. Then, some of the C++ threads are calling back some OCaml code.
My understanding is that for this to work, one must only follow those rules (in addition to the usual "how to leave in peace with the GC" rules):

  1. call caml_startup(argv) before any OCaml code or any other function from the OCaml runtime is run;

  2. for each thread that will eventually run some OCaml code (anything that could call the GC (and also, possibly, do anything with signals)), call caml_c_thread_register before;

  3. before a thread terminates and another thread runs the OCaml GC, call (from that terminating thread) caml_c_thread_unregister (it is not clear exactly why to be honest, as a finished thread should have no root left)

What is a bit unclear is how thread registering and caml_startup are supposed to deal with the giant lock. From testing and
reading the code, it seems that caml_c_thread_(un)register expects the giant lock to be released, which caml_startup does not do, so calling caml_release_runtime_system right after caml_startup seems in order (and therefore reacquiring it before calling back to OCaml).

Neither is it clear to me how to declare local roots in the function registering the thread (more specifically: should caml_c_thread_register be called first, then a new block created and CAMLparam and friends be called now that the current root list is initialized?); But in the exemple below it does not matter as there are no values requiring such protection.

Yet, whatever I try, I get a crash in the GC when it's scaninng the roots in caml_do_local_roots, with a corrupted local_root pointer that is suspiciously small (few hundred bytes, frequently but not always just 0xfff).

I've managed to reproduce the exact same crash on the minimalistic C exemple below:

a.ml:

let call_me_from_c () =
  Printf.printf "Will I survive?\n%!" ;
  Gc.compact () ;
  Printf.printf "I survived!\n%!"

let init =
  ignore (Thread.self ()) ;
  ignore (Callback.register "call_me_from_c" call_me_from_c)

b.c:

#include <stdlib.h>
#include <stdio.h>
#include <pthread.h>
#include <caml/mlvalues.h>
#include <caml/startup.h>
#include <caml/callback.h>
#include <caml/threads.h>
#include <caml/memory.h>

static void do_call_ocaml(void)
{
  CAMLparam0();
  value *cb = caml_named_value("call_me_from_c");
  caml_callback(*cb, Val_unit);
  CAMLreturn0;
}

static void *another_thread(void *wtv)
{
  (void)wtv;
  caml_c_thread_register();
  printf("in another_thread\n");
  caml_acquire_runtime_system();
  do_call_ocaml();
  caml_release_runtime_system();
  caml_c_thread_unregister();
  return NULL;
}

int main(int argc, char *argv[])
{
  (void)argc;
  caml_startup(argv);
  caml_release_runtime_system();

  pthread_t th;
  (void)pthread_create(&th, NULL, another_thread, NULL);

  (void)pthread_join(th, NULL);

  return 0;
}

Makefile:

OCAMLOPT = ocamlfind ocamlopt
OCAMLLIB = $(shell $(OCAMLOPT) -where)
CFLAGS += -g -O0 -Wall -W
CPPFLAGS += -I $(OCAMLLIB)
LDFLAGS += -L$(OCAMLLIB) -lunix -lthreads -lasmrun_shared

all: b

b: a_.o b.o
        $(CC) $(LDFLAGS) b.o a_.o -o b

a.cmx: a.ml
        $(OCAMLOPT) -thread -c $< -o $@

a_.o: a.cmx
        $(OCAMLOPT) -I +threads -linkall unix.cmxa threads.cmxa $< -output-obj -o $@

clean:
        $(RM) *.o *.cmx *.cmi b *.s

What I observe:

$ make
ocamlfind ocamlopt -thread -c a.ml -o a.cmx
ocamlfind ocamlopt -I +threads -linkall unix.cmxa threads.cmxa a.cmx -output-obj -o a_.o
cc -g -O0 -Wall -W -I /Users/rixed/.opam/ramen.4.07.1.flambda/lib/ocaml  -c -o b.o b.c
cc -L/Users/rixed/.opam/ramen.4.07.1.flambda/lib/ocaml -lunix -lthreads -lasmrun_shared b.o a_.o -o b
$ /usr/bin/lldb -- ./b
(lldb) target create "./b"
Current executable set to './b' (x86_64).
(lldb) r
Process 19109 launched: './b' (x86_64)
in another_thread
Will I survive?
libasmrun_shared.so was compiled with optimization - stepping may behave oddly; variables may not be available.
Process 19109 stopped
* thread #2, stop reason = EXC_BAD_ACCESS (code=1, address=0x1007)
    frame #0: 0x000000010028daa9 libasmrun_shared.so`caml_do_local_roots(f=(libasmrun_shared.so`caml_darken at major_gc.c:151), bottom_of_stack=0x0000000000000000, last_retaddr=<unavailable>, gc_regs=0x0000000000000000, local_roots=0x0000000000000fff) at roots.c:503 [opt]
   500    }
   501    /* Local C roots */
   502    for (lr = local_roots; lr != NULL; lr = lr->next) {
-> 503      for (i = 0; i < lr->ntables; i++){
   504        for (j = 0; j < lr->nitems; j++){
   505          root = &(lr->tables[i][j]);
   506          f (*root, root);
@stedolan

This comment has been minimized.

Copy link
Contributor

commented Jul 31, 2019

This turns out to be a linking issue - you're linking against the bytecode version of the OCaml threads library rather than the native code one. The bytecode and native code versions of the runtime have different signatures for the caml_do_local_roots function. In particular, the local roots pointer is the fourth argument in native code, but the fifth in bytecode.

You can fix the problem by replacing -lthreads with -lthreadsnat in the Makefile. I'll leave this issue open, though: there's probably a documentation or naming change we should make to make this problem easier to avoid.

rixed added a commit to rixed/ramen that referenced this issue Aug 1, 2019

@rixed

This comment has been minimized.

Copy link
Contributor Author

commented Aug 1, 2019

Thank you very much for having looked at this.

I assume that OCaml devs want to keep the byterun runtime as compatible as the native one but in this instance what's the benefit of having a single name for two functions that have different signatures?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.