-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault on MacOSX with trunk #11226
Comments
I've had a preliminary look at this. At Line 212 in 4bf5393
The other |
Use a global key to access the per-thread `caml_thread_t` rather than have a key per domain. This fixes the issue reported in ocaml#11226.
Use a global key to access the per-thread `caml_thread_t` rather than have a key per domain. This fixes the issue reported in ocaml#11226.
Here is what I suspect is the root case of the bug. The test itself links against the ocaml/otherlibs/systhreads/st_stubs.c Lines 368 to 369 in 37dec39
One would expect the following assertion to trivially hold when followed by the lines above: CAMLassert (new_thread == (caml_thread_t)st_tls_get(Thread_key)) However, this assertion fails intermittently on this testcase. Later, when the systhread state is restored using ocaml/otherlibs/systhreads/st_stubs.c Lines 211 to 220 in 37dec39
This promptly makes the next external call to fail. FixThe PR #11250 fixes the issue by ensuring that the key is created exactly once per program rather than once every time a domain is created. |
I confirm that this is fixed by PR #11250. I just completed 20 reruns of Running the full multicoretest suite on MacOS has also been a pretty consistent way to trigger this issue (in something like 19/20 runs): https://github.com/jmid/multicoretests/actions/workflows/macosx-500-workflow.yml I've now run it fully 9 times without a segfault - each run takes 10-20min on my local machine. Thanks @kayceesrk! |
I've been chasing a segfault that is triggered on MacOSX. To setup and reproduce:
opam install . --deps-only --with-test
I can pretty consistently reproduce by running the following (9/10 times or so):
An attempt at reducing the problem is also available. This does not crash as consistently - but the code is a bit simpler and has fewer dependencies:
What (I think) I know so far:
at_exit
I followed a suggestion by @dra27 and tried on Refine Domain.{at_exit,at_startup,at_first_spawn,at_each_spawn} callback semantics #11213 where it also crashes.Lazy
values. Uncommenting theLazy
tests still crashed our CI tests.Here's first the output of an
lldb
run without the debug runtime which stops with aEXC_BAD_ACCESS
:and here's another one with the debug runtime which stops with
EXC_BREAKPOINT
:The text was updated successfully, but these errors were encountered: