Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Print function names in backtraces #9096

Merged
merged 6 commits into from Apr 27, 2020
Merged

Conversation

stedolan
Copy link
Contributor

@stedolan stedolan commented Nov 6, 2019

This PR adds function names to OCaml backtraces. As an example, here's what I see when I add a 'assert false' randomly somewhere in the middle of ctype.ml:

Fatal error: exception File "typing/ctype.ml", line 728, characters 5-11: Assertion failed
Raised at Ctype.generalize_structure in file "typing/ctype.ml", line 728, characters 5-17
Called from Typecore.type_expect_ in file "typing/typecore.ml", line 2859, characters 6-29
Called from Builtin_attributes.warning_scope in file "parsing/builtin_attributes.ml", line 236, characters 14-18
Re-raised at Builtin_attributes.warning_scope in file "parsing/builtin_attributes.ml", line 241, characters 10-13
Called from Typecore.type_expect in file "typing/typecore.ml", line 2327, characters 4-156
Called from Typecore.type_expect_ in file "typing/typecore.ml", line 3305, characters 17-39
[...]

I have two motivations:

  1. I want function names in backtraces
  2. In Statmemprof, it's natural to group backtraces of allocations that share a suffix (statmemprof-emacs does this). But the backtraces have precise locations, so two callsites in the same function are distinct backtrace entries, making the grouping too fine-grained to be useful.

Spacetime has some code to deal with 2 (which lives in spacetime_lib), which associates backtrace entries (i.e. callsites) with functions by parsing the ELF symbol table and looking up function names via DWARF (using owee). This works, but with a couple of disadvantages: it's fairly complicated, ELF-specific, and exposes the user to unpalatable generated names. (It also doesn't help with 1).

Instead, this patch extends the existing debug info with function names, which are looked up using the same code that currently finds source locations.

Names and scopes

In OCaml, not all functions have names, and not all names are for functions.

This PR assigns names by dividing the source file into "scopes", which may nest but not overlap. Locations are named according to their stack of enclosing scopes, and new scopes are introduced by:

  • named functions
  • anonymous functions
  • modules
  • values bindings in modules
  • classes
  • methods

(See the type Lambda.scoped_location for how these are represented)

As an example, here's a simple program and the backtrace it prints:

let f () =
  List.iter (fun _ -> assert false) [1;2;3];
  print_string "done!"
let () = f ()
Fatal error: exception Assert_failure("raise_iter.ml", 2, 22)
Raised at Raise_iter.f.(fun) in file "raise_iter.ml", line 2, characters 22-34
Called from Stdlib__list.iter in file "list.ml", line 110, characters 12-15
Called from Raise_iter.f in file "raise_iter.ml", line 2, characters 2-43
Called from Raise_iter in file "raise_iter.ml", line 4, characters 9-13

The names printed above (intentionally) look like module paths. However, these are not in general paths: for instance, some mention (fun), and it is possible to have locations reported as e.g. Foo.f.Bar.g (by using let module to nest a module inside an expression), which is not a valid path.

For a more comprehensive example, see names.ml in the testsuite

Since patterns need not bind only a single name, it can be ambiguous what the name of a scope is. In such cases, this patch arbitrarily picks one of the names: let (f, g) = assert false will be reported as an exception in f. If no names are bound by a pattern, then no scope is introduced. So, if a module M contains let () = assert false, the exception is reported as simply coming from M.

Binary size impact

In native code, the extra debug information makes ocamlopt.opt about 2.5% bigger than current trunk, and similar for Flambda. Strings are shared when they occur multiple times. (An earlier version of this patch included a rather complicated compression scheme for sharing prefixes. I can revisit it later if the debug info size becomes a problem).

Bytecode uses a simpler format, and the impact there is much larger (about 10%). I note that debug info is already very large on bytecode (it includes a typing environment for ocamldebug's benefit) and
binary sizes are larger than native (bytecode ocamlopt is about 2x bigger than ocamlopt.opt). From this I conclude that nobody cares very much about bytecode binary size, so I did the simplest possible thing there and wrote strings directly. (More optimised versions are of course possible).

Reviewing notes

This is a large patch, but I've arranged the commit history to make it easier to review. The commits are:

  1. 888ce57634 Introduce the Lambda.scoped_location type
  2. 541d829457 Replace Location.t with Lambda.scoped_location in Lambda code
  3. 07da0a18e4 Print function names (derived from Lambda.scoped_location) in backtraces
  4. b4550dd1b3 Generate function names during translation from Typedtree to Lambda
  5. dde9b1e471 Update testsuite with new backtrace format, and add test for names
  6. 88f157703a  Bootstrap

Commits 1, 3 and 4 contain the actual work, and require review. They're relatively small.

Commit 2 is very large but very simple. The only thing it does is thread a new ~scopes parameter around all of the transl_foo functions that translate Typedtree to Lambda, so that the raw locations of Typedtree can be turned into scoped_locations in Lambda.

Commit 5 is also large, because it updates all of the backtraces in the testsuite. (Commit 6 is a bootstrap).

@gasche
Copy link
Member

gasche commented Nov 6, 2019

Note to reviewers: this patch looks large only because it's on top of the not-yet-merged #8805. There's only one new commit in this PR, currently at +275 -140 lines.

I solved that for you.

I haven't looked at the code. The description of the design is sensible and I agree it's a general feature. I would be interested in rephrasing the current wording of the backtrace lines, Called from <function> at <location>, to suggest less strongly that <function> is the caller, given the approximations introduced by the naming scheme. What about Called within <function> at <location>, for example?

@stedolan
Copy link
Contributor Author

stedolan commented Nov 6, 2019

I solved that for you.

Thanks!

I haven't looked at the code. The description of the design is sensible and I agree it's a general feature. I would be interested in rephrasing the current wording of the backtrace lines, Called from <function> at <location>, to suggest less strongly that <function> is the caller, given the approximations introduced by the naming scheme. What about Called within <function> at <location>, for example?

I'd claim that it's not an approximation, but that Called from <foo> ... means "called from within the definition of foo" rather than "called from within the body of the function named foo" :)

Pedantry aside, I'm not attached to the current wording (I didn't put much thought into it), but I'm not really seeing the same subtle distinction between "called from" and "called within" - they mean the same to me, but the latter sounds a bit more awkward.

The functions vs. scopes thing is a bit subtle, though (cf. the wall of text above), and I'm not really sure the best way to get it across here.

@lpw25
Copy link
Contributor

lpw25 commented Nov 6, 2019

I think that using the statically enclosing bindings like this is likely to be very confusing for users. It seems almost certain to be misread by people trying to track down a failure and to lead to wasted time looking in the wrong function.

I've tackled this issue before as part of work (not yet upstreamed) to improve flambda's inlining reports. I found that the nicest "name" for a function was along the lines of:

type parent_ =
  | Unit of string
  | Submodule of parent_ * string
  | Class of parent_ * string
  | Function of function_ * string

and function_name =
  | Name of string
  | Anonymous of Location.t

and function_ = parent_ * function_name

Which allows you to print things like: the anonymous function at foo.ml:3,1-10 within Foo.bar or Bar.Baz.foo.

That might be a bit too detailed for use in backtraces, but at the very least I think you can get them to distinguish: an anonymous function within Foo.bar from Foo.bar.

@alainfrisch
Copy link
Contributor

If we have a nice symbolic path to point to a location, it would be nice to include it in the payload for Assert_failure (for cases where the stacktrace is not available, or not stored in logs, etc). Unfortunately, this exception constructor takes a tuple argument, and extending it could break some code (do people explicitly actually pattern-match on it, in practice?). One could always add the symbolic path to the filename component of the payload, but it's a bit ugly. Switching to an (inline?) record would be better, if we can convince ourselves the amount of code breakage is small enough.

@alainfrisch
Copy link
Contributor

A new scope is introduced by a module binding or a module-level let binding (but not an expression-level let ... in ... binding)

What's the argument for keeping local module binding names but not local value names? If the location is within a local function (or even within a non-functional local binding) with a clear name, it seems useful to keep that name in the path.

@stedolan
Copy link
Contributor Author

@lpw25

I found that the nicest "name" for a function was along the lines of: [...]

This sounds like a solution to a slightly different problem.

In an inlining report you need to be able to refer to particular functions, since inlining is something that happens to functions. The problem being solved is "name this function", and writing M.foo for two separate functions (the one bound to M.foo and an anonymous function inside it) would indeed be confusing.

In a backtrace, the problem being solved is "describe this location", where locations in question are not functions but callsites (or raises, etc.). In particular, text like the anonymous function at foo.ml:3,1-10 is wrong: there's no function at the described location.

The goal here is to describe locations approximately. Half the point is that multiple nearby locations get the same description. I'm open to suggestions about what should contribute to a description (classes should definitely be in, as @alainfrisch points out that it's weird to include local modules but not local functions, and I'm still on the fence about marking anonymous functions). But I think it's important to realise that this is a different problem from naming functions for an inlining report.

@lpw25
Copy link
Contributor

lpw25 commented Nov 11, 2019

That difference is why I suggested that the full description might be too detailed.

In particular, text like the anonymous function at foo.ml:3,1-10 is wrong: there's no function at the described location.

To clarify, the full backtrace slot with the full description would be something like:

Called from the anonymous function at foo.ml:3,1-10 within Foo.bar in file "list.ml", line 110, characters 12-15

i.e. the location I was mentioning above was an additional location, not to be confused with the location of the call site.

I'm open to suggestions about what should contribute to a description

I think the distinction between:

Called from Typecore.type_expect in file "typing/typecore.ml", line 2327, characters 4-156

and

Called from an anonymous function within Typecore.type_expect in file "typing/typecore.ml", line 2327, characters 4-156

is the most important thing to distinguish.

@stedolan stedolan force-pushed the backtrace-names branch 2 times, most recently from b5632e7 to 52f9743 Compare November 20, 2019 16:31
@stedolan
Copy link
Contributor Author

Update: This patch got rewritten, which made it both longer and less hacky.

Instead of adding Levent nodes to Lambda which get converted to debuginfo separately by each of the backends (native, bytecode, flambda), the updated PR replaces the Location.t type in Lambda with a new scoped_location (which is a pair of a Location.t and some information about scopes). This means the scoping information is only computed in one place.

To the reviewer: read this commit-by-commit. Commits 1 and 4 are large and boring (1 threads a scopes parameter through transl{obj,class,mod} and 4 updates tests), while 2 and 3 are small and interesting (2 is the backend support for printing names, 3 specifies how the names are generated)

@stedolan
Copy link
Contributor Author

stedolan commented Jan 9, 2020

ping?

stedolan added a commit to janestreet/ocaml that referenced this pull request Apr 1, 2020
  - Introduce the Lambda.scoped_location type
  - Replace Location.t with Lambda.scoped_location in Lambda code
  - Print function names (derived from Lambda.scoped_location) in backtraces
  - Generate function names during translation from Typedtree to Lambda
  - Update testsuite with new backtrace format, and add test for names
@shindere
Copy link
Contributor

shindere commented Apr 9, 2020

I'm not able to review the code, sorry about that. I just wanted to show
appreciation for the work accomplished here and all the efforts put in
making it understandable even to a non-expert like me.

Not sure this will receive enough reviews to be ready before the 4.11
freeze, but hoping for the best.

@avsm avsm added this to the 4.11 milestone Apr 16, 2020
@avsm
Copy link
Member

avsm commented Apr 16, 2020

Developer meeting discussion: @gasche: is there consensus on the interface here? @lpw25 / @stedolan both agree on the interface now (but PR description has not been updated). Decision: consider this for merge into 4.11.

@gasche
Copy link
Member

gasche commented Apr 16, 2020

I think the most practical thing to do for this PR is to accept to merge it after the branch/freeze date, once it is reviewed. I'm not in favor of forcing people to work at full speed before next Monday, because this risks decreasing the quality of the result, increasing stress, and I don't see any advantage.

@stedolan
Copy link
Contributor Author

(but PR description has not been updated)

This is done now.

@stedolan stedolan changed the title [WIP] Print function names in backtraces Print function names in backtraces Apr 16, 2020
@gasche
Copy link
Member

gasche commented Apr 16, 2020

For me the missing piece was that @lpw25 is happy with the new messages that are proposed. He said so during the meeting, but there is no sign of this in the PR that I can find. I shamefully admit that this decreased my own motivation/effectiveness to do a review: it's a lot of work to get into these low-level details, and it's less exciting to do so when one has the feeling that the PR may linger forever because people disagree (for reasonable reasons) about how to print the result.

P.S.: thinking about this again, you mentioned during the meeting that you have tools that depend on this PR. Again, maybe this could have been emphasized in the PR discussion, to help prioritize review work. (Also, are the tools available somewhere for the curious?)

Copy link
Contributor

@lpw25 lpw25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed the compiler parts up to cmm and they look pretty good. A few parts could be tidied up. In particular, I think that the scope types should drop the lambda_ from their name and move to Debuginfo, and I would like the locations generated in matching.ml to also have scopes.

lambda/debuginfo.mli Show resolved Hide resolved
middle_end/flambda/closure_conversion.ml Outdated Show resolved Hide resolved
asmcomp/cmm_helpers.ml Show resolved Hide resolved
lambda/debuginfo.ml Outdated Show resolved Hide resolved
lambda/lambda.ml Outdated Show resolved Hide resolved
lambda/lambda.mli Outdated Show resolved Hide resolved
lambda/lambda.mli Outdated Show resolved Hide resolved
lambda/matching.ml Show resolved Hide resolved
lambda/translprim.ml Outdated Show resolved Hide resolved
type debug_event =
{ mutable ev_pos: int; (* Position in bytecode *)
ev_module: string; (* Name of defining module *)
ev_loc: Location.t; (* Location in source file *)
ev_kind: debug_event_kind; (* Before/after event *)
ev_defname: string; (* Enclosing definition *)
ev_info: debug_event_info; (* Extra information *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't need to be in this PR, but it might be worth considering how this information could be used in ocamldebug. For example, I think updating the code in Show_information.show_one_frame would allow changing the result of ocamldebug's backtrace command to match the new backtrace printing.

@gasche
Copy link
Member

gasche commented Apr 18, 2020

Thanks @lpw25 for your review; I like the style of your review comments, some I could have written myself. (Yay for complaining about optional arguments.) Let me know I can help for the Matching/Switch part. I worked on having more locations in Switch in trunk...gasche:switch-loc when reviewing #2294. There the purpose was to have more precise locations for the tests; I'm not sure you need this here (I haven't looked at the code), @lpw25 suggests that just parametrization is enough.

@mshinwell
Copy link
Contributor

@lpw25 I will read the backend parts, probably today -- can you confirm that I need to read the runtime changes, Cmm_helpers and Emitaux?

@stedolan stedolan force-pushed the backtrace-names branch 2 times, most recently from bd7a725 to a348b2d Compare April 20, 2020 14:29
mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Jul 16, 2020
mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Jul 20, 2020
mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Jul 20, 2020
mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Jul 21, 2020
mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Jul 21, 2020
mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Jul 30, 2020
mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Jul 30, 2020
mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Aug 3, 2020
mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Aug 4, 2020
mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Aug 5, 2020
mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Aug 7, 2020
mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Aug 10, 2020
mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Aug 10, 2020
mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Aug 17, 2020
mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Aug 18, 2020
mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Aug 19, 2020
mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Aug 20, 2020
mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Aug 28, 2020
mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Sep 2, 2020
mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Sep 2, 2020
mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Sep 2, 2020
mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Sep 7, 2020
gasche added a commit to gasche/ocaml that referenced this pull request Dec 22, 2020
The regression was introduced by ocaml#9096 in this part of the diff:

             | Tcf_method (name, _, Tcfk_concrete (_, exp)) ->
-                let met_code = msubst true (transl_exp exp) in
+                let scopes = enter_method_definition ~scopes name.txt in
+                let met_code =
+                  msubst true (transl_scoped_exp ~scopes exp) in

transl_exp would use Translobj.oo_wrap on functions, while the new
transl_scoped_expr does not.

The regression was first reported by Sacha Ayoun in ocaml#10092, as it
results in a large slowdown (2x slower) on the Gillian codebase, which
uses an object-oriented visitor in a performance-critical way.
dbuenzli pushed a commit to dbuenzli/ocaml that referenced this pull request Mar 25, 2021
The regression was introduced by ocaml#9096 in this part of the diff:

             | Tcf_method (name, _, Tcfk_concrete (_, exp)) ->
-                let met_code = msubst true (transl_exp exp) in
+                let scopes = enter_method_definition ~scopes name.txt in
+                let met_code =
+                  msubst true (transl_scoped_exp ~scopes exp) in

transl_exp would use Translobj.oo_wrap on functions, while the new
transl_scoped_expr does not.

The regression was first reported by Sacha Ayoun in ocaml#10092, as it
results in a large slowdown (2x slower) on the Gillian codebase, which
uses an object-oriented visitor in a performance-critical way.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants