Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

in native code, make Printexc.get_callstack work in new threads #1895

Merged
merged 1 commit into from Jul 12, 2018

Conversation

Projects
None yet
6 participants
@sliquister
Copy link
Contributor

commented Jul 11, 2018

One would expect this code to behave the same regardless of whether THREAD is set:

(* ocamlopt -I +threads unix.cmxa threads.cmxa -g a.ml *)
let f0 () = Printexc.print_raw_backtrace stdout (Printexc.get_callstack 100); ()
let f1 () = f0 (); ()
let f2 () = f1 (); ()
let f3 () = f2 (); ()
let () =
  match Sys.getenv "THREAD" with
  | exception Not_found -> f3 ()
  | (_ : string) -> Thread.join (Thread.create f3 ())

But the observed behavior is:

   + OCAMLRUNPARAM=b=1 ./a.out
   Raised by primitive operation at file "a.ml", line 1, characters 48-76
   Called from file "a.ml" (inlined), line 2, characters 12-17
   Called from file "a.ml" (inlined), line 3, characters 12-17
   Called from file "a.ml" (inlined), line 4, characters 12-17
   Called from file "a.ml", line 7, characters 27-32
   + OCAMLRUNPARAM=b=1 THREAD= ./a.out
   Raised by primitive operation at file "a.ml", line 1, characters 48-76

ie in the new thread, the callstack is one frame.

This is because caml_top_of_stack is NULL in new threads, presumably resulting Printexc.get_callstack stopping after the first (unconditional) frame. caml_top_of_stack is NULL because the th->top_of_stack for new thread starts as NULL and is initialized to a proper value too late (after caml_leave_blocking_section, which does caml_top_of_stack = th->top_of_stack). So the fix is to initialize to a proper value before the call to caml_leave_blocking_section.

With this pull request, the observed behavior is:

+ OCAMLRUNPARAM=b=1 ./a.out
Raised by primitive operation at file "a.ml", line 1, characters 48-76
Called from file "a.ml" (inlined), line 2, characters 12-17
Called from file "a.ml" (inlined), line 3, characters 12-17
Called from file "a.ml" (inlined), line 4, characters 12-17
Called from file "a.ml", line 7, characters 27-32
+ OCAMLRUNPARAM=b=1 THREAD= ./a.out
Raised by primitive operation at file "a.ml", line 1, characters 48-76
Called from file "a.ml" (inlined), line 2, characters 12-17
Called from file "a.ml" (inlined), line 3, characters 12-17
Called from file "a.ml", line 4, characters 12-17
Called from file "thread.ml", line 39, characters 8-14
@gasche

This comment has been minimized.

Copy link
Member

commented Jul 11, 2018

I can't comment on whether this is a desirable change (it seems fine to me), but it would be nice if you added a repro-test in the testsuite, for example with potential/future multicore-based implementations of Thread in mind.

@xavierleroy
Copy link
Contributor

left a comment

The handling of stack-related global variables is a bit of a mess, and it took me a while to understand what's going on. Clearly, the current code is not good and the proposed fix is much better.

I hope but am not sure this is enough to get 100% reliable backtraces in the presence of threads.

Another review by another core developer would be welcome, in case I missed something.

@sliquister

This comment has been minimized.

Copy link
Contributor Author

commented Jul 11, 2018

To perhaps clarify, I think this doesn't impact exception backtraces, only calls to Printexn.get_callstack. And apparently this caml_stack_usage function which seems to be used purely for the gc stats.

@sliquister sliquister force-pushed the sliquister:fix-get_callstack branch 2 times, most recently from 65c6760 to 8df2bff Jul 11, 2018

@sliquister

This comment has been minimized.

Copy link
Contributor Author

commented Jul 12, 2018

I added a test, there were none for Printexc.get_callstack. There are other tests that check exception backtraces, so I suppose this will work on all platforms.

@gasche

This comment has been minimized.

Copy link
Member

commented Jul 12, 2018

This test fails on the flambda-enabled CI machine:

 ... testing 'callstack.ml' with 1 (native) => failed (program output /home/travis/build/ocaml/ocaml/testsuite/_ocamltest/tests/backtrace/callstack/ocamlopt.byte/callstack.opt.output differs from reference /home/travis/build/ocaml/ocaml/testsuite/tests/backtrace/callstack.reference: 
--- /home/travis/build/ocaml/ocaml/testsuite/tests/backtrace/callstack.reference	2018-07-12 01:00:19.664842856 +0000
+++ /home/travis/build/ocaml/ocaml/testsuite/_ocamltest/tests/backtrace/callstack/ocamlopt.byte/callstack.opt.output	2018-07-12 01:10:06.430115788 +0000
@@ -1,11 +1,13 @@
 main thread:
-Raised by primitive operation at file "callstack.ml", line 8, characters 38-66
+Raised by primitive operation at file "printexc.ml" (inlined), line 251, characters 0-75
+Called from file "callstack.ml", line 8, characters 38-66
 Called from file "callstack.ml", line 9, characters 27-32
 Called from file "callstack.ml", line 10, characters 27-32
 Called from file "callstack.ml", line 11, characters 27-32
 Called from file "callstack.ml", line 13, characters 9-14
 new thread:
-Raised by primitive operation at file "callstack.ml", line 8, characters 38-66
+Raised by primitive operation at file "printexc.ml" (inlined), line 251, characters 0-75
+Called from file "callstack.ml", line 8, characters 38-66
 Called from file "callstack.ml", line 9, characters 27-32
 Called from file "callstack.ml", line 10, characters 27-32
 Called from file "callstack.ml", line 11, characters 27-32

)
@gasche

This comment has been minimized.

Copy link
Member

commented Jul 12, 2018

Maybe a fix would be to mark the Printexc.get_callstack declaration with [@inline never], but I don't know if this works on externals. (ping @let-def who knows about traces+inlining, and may have a recommendation here.)

@sliquister

This comment has been minimized.

Copy link
Contributor Author

commented Jul 12, 2018

That probably wouldn't help, judging from the extra frame. Maybe it's the coercion from external in .ml to val in the .mli that changes something.

@xavierleroy

This comment has been minimized.

Copy link
Contributor

commented Jul 12, 2018

We can't merge with a failing test. If there is no obvious fix, the test should be skipped in flambda mode (at the minimum), or removed entirely (at the maximum).

@sliquister sliquister force-pushed the sliquister:fix-get_callstack branch from 8df2bff to 70e7ee7 Jul 12, 2018

@sliquister

This comment has been minimized.

Copy link
Contributor Author

commented Jul 12, 2018

The fixes I could see were not worth it, so I went with disabling the test with flambda.

@alainfrisch alainfrisch added the bug label Jul 12, 2018

@damiendoligez
Copy link
Member

left a comment

I had a good look at this patch and it looks correct, although I can't figure out why bottom_of_stack doesn't have the same kind of problem.

@let-def

This comment has been minimized.

Copy link
Contributor

commented Jul 12, 2018

@gasche Flambda and classic deals with externals slightly differently. One solution would be to not associate any location to the code that wraps an external into an OCaml function, though that's not really a good option. The test is failing, but this behavior is fine for me.

@gasche

This comment has been minimized.

Copy link
Member

commented Jul 12, 2018

The consensus to disable the test under flambda is fine with me.

@xavierleroy

This comment has been minimized.

Copy link
Contributor

commented Jul 12, 2018

The current code and test look fine to me, thanks! Now merging...

@xavierleroy xavierleroy merged commit 2d445f5 into ocaml:trunk Jul 12, 2018

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@sliquister sliquister deleted the sliquister:fix-get_callstack branch Jul 12, 2018

@sliquister

This comment has been minimized.

Copy link
Contributor Author

commented Jul 12, 2018

Thanks!

@damiendoligez caml_bottom_of_stack is indeed also NULL when starting a new thread, however it's handled like caml_young_pointer: in a register until an external call happens, at which point caml_c_call saves the register into the variable. So it should be set by the time it's needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.