Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

freebsd/openbsd segfault in tests/lib-obj/test_reachable #824

Closed
avsm opened this issue Jan 1, 2022 · 1 comment · Fixed by #828
Closed

freebsd/openbsd segfault in tests/lib-obj/test_reachable #824

avsm opened this issue Jan 1, 2022 · 1 comment · Fixed by #828

Comments

@avsm
Copy link
Collaborator

avsm commented Jan 1, 2022

This fails from the testsuite on both freebsd and openbsd (bytecode only, native is fine)

freebsd-dev-1:~/ocaml # lldb /root/ocaml/testsuite/tests/lib-obj/_ocamltest/tests/lib-obj/reachable_words/ocamlc.byte/reachable_words.byte
freebsd-dev-1:~/ocaml # lldb ./runtime/ocamlrun
(lldb) target create "./runtime/ocamlrun"
Current executable set to './runtime/ocamlrun' (x86_64).
(lldb) run  /root/ocaml/testsuite/tests/lib-obj/_ocamltest/tests/lib-obj/reachable_words/ocamlc.byte/reachable_words.byte
Process 10988 launching
Process 10988 launched: '/root/ocaml/runtime/ocamlrun' (x86_64)
Process 10988 stopped
* thread #1, name = 'Domain0', stop reason = signal SIGBUS: hardware error
    frame #0: 0x000000000023eaef ocamlrun`caml_obj_reachable_words [inlined] bitvect_test(bv=0x5f6e61745f6c6d61, i=14337349) at extern.c:253:10
   250
   251  Caml_inline uintnat bitvect_test(uintnat * bv, uintnat i)
   252  {
-> 253    return bv[i / Bits_word] & ((uintnat) 1 << (i & (Bits_word - 1)));
   254  }
   255
   256  Caml_inline void bitvect_set(uintnat * bv, uintnat i)
(lldb) bt
* thread #1, name = 'Domain0', stop reason = signal SIGBUS: hardware error
  * frame #0: 0x000000000023eaef ocamlrun`caml_obj_reachable_words [inlined] bitvect_test(bv=0x5f6e61745f6c6d61, i=14337349) at extern.c:253:10
    frame #1: 0x000000000023eae8 ocamlrun`caml_obj_reachable_words [inlined] extern_lookup_position(s=<unavailable>, obj=274879973976) at extern.c:328
    frame #2: 0x000000000023ead0 ocamlrun`caml_obj_reachable_words(v=274879973976) at extern.c:1220
    frame #3: 0x00000000002315df ocamlrun`caml_interprete(prog=<unavailable>, prog_size=<unavailable>) at interp.c:1016:14
    frame #4: 0x000000000022e89c ocamlrun`caml_main(argv=0x00007fffffffea30) at startup_byt.c:399:9
    frame #5: 0x000000000025195c ocamlrun`main(argc=<unavailable>, argv=<unavailable>) at main.c:37:3
    frame #6: 0x000000000022310f ocamlrun`_start(ap=<unavailable>, cleanup=<unavailable>) at crt1.c:76:7
@avsm
Copy link
Collaborator Author

avsm commented Jan 5, 2022

Digging into this, it looks like the problem is that :

CAMLprim value caml_obj_reachable_words(value v)
{
...
  extern_init_position_table(s);
...
  /* In Multicore OCaml, we don't distinguish between major heap blocks and
   * out-of-heap blocks, so we end up counting out-of-heap blocks too. */
  while (1) {
    if (Is_long(v)) {
      /* Tagged integers contribute 0 to the size, nothing to do */
    } else if (extern_lookup_position(s, v, &pos, &h)) {
 ...

extern_lookup_position uses s to calculate a bitvector hash, but that value is not initialised correctly as:

static void extern_init_position_table(struct caml_extern_state* s)
{
  if (s->extern_flags & NO_SHARING) return;
  s->pos_table.size = POS_TABLE_INIT_SIZE;
  s->pos_table.shift = 8 * sizeof(value) - POS_TABLE_INIT_SIZE_LOG2;

NO_SHARING is set, so the remainder of s is uninitialised garbage. I'm just tracing through to see why this isn't a problem on Linux but appears to be on FreeBSD/OpenBSD.

avsm added a commit to avsm/ocaml-multicore that referenced this issue Jan 5, 2022
Without this the value may potentially contain junk values and
is subsequently used in reachable_words, which results in
memory corruption. Fixes Obj.reachable_words on platforms which
return junk memory from malloc (e.g. FreeBSD/OpenBSD but also
potentially some libcs on Linux)

Fixes ocaml-multicore#824
xavierleroy added a commit to xavierleroy/ocaml that referenced this issue Jan 6, 2022
A marshaling operation can leave `extern_flags` with the `NO_SHARING`
bit set.  In this context, `caml_obj_reachable_words` calls
`extern_init_position_table`, which does nothing, then proceeds to
access the position table, causing a crash.

The solution is trivial: initialize `extern_flags` before calling
`extern_init_position_table`.

First reported at ocaml-multicore/ocaml-multicore#824
xavierleroy added a commit to xavierleroy/ocaml that referenced this issue Jan 6, 2022
A marshaling operation can leave `extern_flags` with the `NO_SHARING`
bit set.  In this context, `caml_obj_reachable_words` calls
`extern_init_position_table`, which does nothing, then proceeds to
access the position table, causing a crash.

The solution is trivial: initialize `extern_flags` before calling
`extern_init_position_table`.

First reported at ocaml-multicore/ocaml-multicore#824
avsm pushed a commit to avsm/ocaml-multicore that referenced this issue Jan 6, 2022
A marshaling operation can leave `extern_flags` with the `NO_SHARING`
bit set.  In this context, `caml_obj_reachable_words` calls
`extern_init_position_table`, which does nothing, then proceeds to
access the position table, causing a crash.

The solution is trivial: initialize `extern_flags` before calling
`extern_init_position_table`.

First reported at ocaml-multicore#824
avsm pushed a commit to avsm/ocaml-multicore that referenced this issue Jan 6, 2022
xavierleroy added a commit to ocaml/ocaml that referenced this issue Jan 6, 2022
A marshaling operation can leave `extern_flags` with the `NO_SHARING`
bit set.  In this context, `caml_obj_reachable_words` calls
`extern_init_position_table`, which does nothing, then proceeds to
access the position table, causing a crash.

The solution is trivial: initialize `extern_flags` before calling
`extern_init_position_table`.

A regression test was added.

First reported at ocaml-multicore/ocaml-multicore#824
avsm pushed a commit to avsm/ocaml-multicore that referenced this issue Jan 6, 2022
A marshaling operation can leave `extern_flags` with the `NO_SHARING`
bit set.  In this context, `caml_obj_reachable_words` calls
`extern_init_position_table`, which does nothing, then proceeds to
access the position table, causing a crash.

The solution is trivial: initialize `extern_flags` before calling
`extern_init_position_table`.

A regression test was added.

First reported at ocaml-multicore#824
sadiqj pushed a commit to sadiqj/ocaml that referenced this issue Jan 7, 2022
Without this the value may potentially contain junk values and
is subsequently used in reachable_words, which results in
memory corruption. Fixes Obj.reachable_words on platforms which
return junk memory from malloc (e.g. FreeBSD/OpenBSD but also
potentially some libcs on Linux)

Fixes ocaml-multicore/ocaml-multicore#824
sadiqj pushed a commit to sadiqj/ocaml that referenced this issue Jan 7, 2022
A marshaling operation can leave `extern_flags` with the `NO_SHARING`
bit set.  In this context, `caml_obj_reachable_words` calls
`extern_init_position_table`, which does nothing, then proceeds to
access the position table, causing a crash.

The solution is trivial: initialize `extern_flags` before calling
`extern_init_position_table`.

A regression test was added.

First reported at ocaml-multicore/ocaml-multicore#824
sadiqj pushed a commit to sadiqj/ocaml that referenced this issue Jan 10, 2022
Without this the value may potentially contain junk values and
is subsequently used in reachable_words, which results in
memory corruption. Fixes Obj.reachable_words on platforms which
return junk memory from malloc (e.g. FreeBSD/OpenBSD but also
potentially some libcs on Linux)

Fixes ocaml-multicore/ocaml-multicore#824
sadiqj pushed a commit to sadiqj/ocaml that referenced this issue Jan 10, 2022
A marshaling operation can leave `extern_flags` with the `NO_SHARING`
bit set.  In this context, `caml_obj_reachable_words` calls
`extern_init_position_table`, which does nothing, then proceeds to
access the position table, causing a crash.

The solution is trivial: initialize `extern_flags` before calling
`extern_init_position_table`.

A regression test was added.

First reported at ocaml-multicore/ocaml-multicore#824
ctk21 pushed a commit to ctk21/ocaml that referenced this issue Jan 11, 2022
Without this the value may potentially contain junk values and
is subsequently used in reachable_words, which results in
memory corruption. Fixes Obj.reachable_words on platforms which
return junk memory from malloc (e.g. FreeBSD/OpenBSD but also
potentially some libcs on Linux)

Fixes ocaml-multicore/ocaml-multicore#824
ctk21 pushed a commit to ctk21/ocaml that referenced this issue Jan 11, 2022
A marshaling operation can leave `extern_flags` with the `NO_SHARING`
bit set.  In this context, `caml_obj_reachable_words` calls
`extern_init_position_table`, which does nothing, then proceeds to
access the position table, causing a crash.

The solution is trivial: initialize `extern_flags` before calling
`extern_init_position_table`.

A regression test was added.

First reported at ocaml-multicore/ocaml-multicore#824
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant