Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allowing displaying multi-line backtrace locations in the same way as errors #10111

Merged
merged 7 commits into from
Jun 8, 2023

Conversation

dra27
Copy link
Member

@dra27 dra27 commented Jan 1, 2021

This PR paves the way for changing how backtrace locations are printed to be the same as #8541, so with locations spanning multiple lines reporting the line and column of the end location, rather than simply the number of characters offset from the start line. The current format is fine for machines, but a nuisance for the occasions when humans have to look at the output!

As it happens, though, my primary motivation for this is a series of (smaller) PRs which eliminate the special handling needed for CRLF-checkouts of the compiler's codebase. I've quietly maintained the .gitattributes entries for these for a few years (we all have our foibles) with the intention of one day sorting it out properly. That one day arrived in September, because there are recent changes to our codebase which mean that it's not just some testsuite files which need forcing to use LF-endings but some of the compiler's sources too, it's just taken me a few months to get to tidy up the code.

There are three parts:

  1. Extending struct ev_info to record the extra fields in bytecode (simple change)
  2. Altering the encoding of debuginfo to record the extra data in native code (less simple - more below)
  3. Threading this through to Printexc and adding two new fields to Printexc.location

Actually displaying this information will be in a future PR.

At present, the 64-bit words for debug info allow 20 bits for the line number, 8 bits for the first character location and 10 bits for the end character location. In our tree, that at present leads to just over 1000 locations which overflow the representation (not that they're necessarily ever displayed). Now I steal 1 bit from the line number (so we technically can no longer display locations from lines 524288-1048575 in a file) which is used to indicate two possible formats for the 64-bit word. The first adds no further overhead by using 12 bits for the line number, 6 bits for the character offset of the start, 3 bits to encode the offset to the end-line (so 0 when the location is on the same line), 7 bits to encode the character offset of the end (relative to the end-line) and - in order to keep the information we had before - 9 bits to store the end character offset as before. This encoding captures 92% of the locations in a build of the compiler tree. When that overflows, the representation instead stores an extra 64bits in the name_info struct containing the defname (thus adding 8 bytes overhead), allowing 16bits each for the start and end character offset (the other int is used for the original end character offset relative to the start). I haven't shared defname strings as when I checked the tree there were only 50 locations where the strings were being duplicated, so the overhead (at least for the compiler sources) is small (indeed, the extra 4 byte offset in the other locations would dwarf the saving on the strings).

With these two representations, no locations in the compiler's tree are now unrepresentable.

Having stored that information, it clearly makes sense to be able to access it in Printexc - in this version I've simply added two fields to the end of the Printexc.location. The alternative would be an additional type with a different retrieval function.

@dra27 dra27 force-pushed the enhanced-debug-locs branch 2 times, most recently from 2fc86ed to e718e31 Compare January 4, 2021 09:39
runtime/backtrace_nat.c Outdated Show resolved Hide resolved
@stedolan
Copy link
Contributor

stedolan commented Jan 4, 2021

This sounds good! I've definitely grown tired of seeing -1023 in memtrace files: the end_char offset can overflow quite quickly when it's referring to a whole multiline function.

Is the optimisation of partially packed debuginfos important? Naively I would have thought that since most debuginfos are fully packed, we don't lose much by not bothering to do any bitpacking in the overflow case.

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks beneficial but technical work. It needs a reviewer. Do we have a volunteer? Would one of the people investing their time in debugging/tracing tools be willing to look at this?

(Naive question from just reading the PR description: have you kept some space in the format for eventual future extension? What do we do for locations, possibly in generated code, that actually have huge line numbers, or large character positions? Should we have a "safe/robust" mode where everything is 64bits, instead of making various assumptions?)

runtime/backtrace_nat.c Outdated Show resolved Hide resolved
asmcomp/emitaux.ml Show resolved Hide resolved
@dra27
Copy link
Member Author

dra27 commented Apr 14, 2021

This is also partly waiting for me to update the code to address @stedolan's comment!

@dra27 dra27 force-pushed the enhanced-debug-locs branch 2 times, most recently from 285992a to bc71853 Compare April 14, 2021 20:16
@dra27
Copy link
Member Author

dra27 commented Apr 14, 2021

Rebased to get rid of the conflict and also updated to use a pair of uint16_ts instead of a manually packed uint32_t.

@stedolan and I had an offlineonline (but not on GitHub) discussion about this in January. This PR seeks maximum compatibility with the what was there before by maintaining the old end of character range field to be computed. The intention is that the next PR would do away with this compatibility (at the cost of a visible change in Printexc) where virtually all locations would be encoded in the packed format and we could indeed simplify the code slightly by doing away with the "partially packed" bit format because there would be so few locations overflowing the packed format that we wouldn't care at the slight increase in size of the struct.

Copy link
Contributor

@abbysmal abbysmal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a look at the review, I am not extremely familiar with this part of the codebase, but it does look correct to me.
@stedolan did you get a chance to take another look at it?

runtime/backtrace_nat.c Outdated Show resolved Hide resolved
@dra27 dra27 force-pushed the enhanced-debug-locs branch 2 times, most recently from da02bb2 to 85141ec Compare January 6, 2023 15:27
@dra27
Copy link
Member Author

dra27 commented Jan 6, 2023

Thanks, @Engil! I updated the @since markings, and also put this through precheck#812

runtime/backtrace_nat.c Outdated Show resolved Hide resolved
asmcomp/emitaux.ml Outdated Show resolved Hide resolved
@dra27 dra27 merged commit 10ff471 into ocaml:trunk Jun 8, 2023
9 checks passed
@dra27 dra27 deleted the enhanced-debug-locs branch June 8, 2023 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants