New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Produce debug entries for inlined methods #2880
Conversation
Examples of DIEs referenced above: Partial
|
Looking into travis failures... |
2deb19c
to
93bfa4e
Compare
@adinn can you please review this? |
@zakkak This looks very interesting but it's going to need to wait on some large, upcoming changes to the current debuginfo code that I have queued up. I have almost completed adding types to the generated info model and that includes reorganizing the method debug info so that it is associated with class info records. It also involves following the gcc model of splitting it into separate declarations and definitions for the methods. Your changes will need to be reworked into that patch and I think it would be best to wait for it to arrive before doing so. |
OK, let's put it on hold then :) (I was actually expecting this, knowing that you are working on something that big) |
0373461
to
b8477f0
Compare
b8477f0
to
8340c27
Compare
substratevm/src/com.oracle.objectfile/src/com/oracle/objectfile/debugentry/Range.java
Outdated
Show resolved
Hide resolved
...tevm/src/com.oracle.objectfile/src/com/oracle/objectfile/elf/dwarf/DwarfInfoSectionImpl.java
Outdated
Show resolved
Hide resolved
@adinn can you please have a look? :) |
Note: The current implementation uses "pseudo-method" entries for the nested inline methods. So instead of using |
8340c27
to
5feb1fc
Compare
5feb1fc
to
986e088
Compare
cc06434
to
df42dd2
Compare
@adinn can you please review this? This PR associates ranges corresponding to inlined methods with their "caller" ranges and then using that info generates the corresponding concrete instance trees. Note the implementation still generates more concrete instance trees than we actually need (see "Known Issues" in #2880 (comment)). My understanding is that in order to solve this we would need to change the way we represent subranges, currently as a list of contiguous memory addresses, to a list of subtrees representing the inlined call trees. I would like to work on this on a different PR if you agree. Regarding perfomance and debug info size please see the following up-to-date table:
|
I don't agree that this is the only way to solve the problem or even the best one. As the comment in #2880 makes clear this problem arises because there are methods, and sometimes even owning classes for those methods, mentioned in the inline info that are not notified via the DebugTypeInfo traversal. As a result the set of ClassEntry and MethodEntry instances is incomplete when inline method processing is started. This is best fixed by completing the set before doing this inline method processing. That will ensure that whenever an inline info record for method This requires introducing an intermediate processing step preceding the current DebugCodeInfo traversal. A prior traversal or DebugCodeInfo can identify missing methods and classes and create entries for them. This requires that the ClassEntry and MethodEntry instances introduced during this completion step are able to be distinguished as being incomplete types/methods for which, respectively, no class structure or method implementation is defined (n.b. that might be best achieved by refactoring the Entry class hierarchy to introduce special subtypes of TypeEntry and MemberEntry that share some behavior with ClassEntry and MethodEntry). I think it would be much better to implement this prior step and modify the current patch to profit from it than commit it as is. I will be happy to provide an implementation unless you wish to attempt it. |
@adinn @zakkak I ran a js-image build with and without
I also did the same test with GraalVM 21.2.0.1 Java 11 EE (using EE debuginfo generation) GraalVM 21.2.0.1 Java 11 EE with debuginfo:
GraalVM 21.2.0.1 Java 11 EE without debuginfo
The difference is worrying. With the new implementation building with debuginfo more than doubles total build time. With these numbers I'm not sure anymore if setting One big difference in EE debuginfo generation is that we only emit a single CU for the whole image. Could it be that the use of one CU per class is a major factor in creating all that overhead compared to EE debuginfo generation? Are there any plans to further improve CE debuginfo generation build time? |
Intuitively I would say no. If I had to guess (given that I don't have access to the EE implementation) I would attribute the difference to something like using "pseudo-methods" for inline methods as I did initially in this PR (#2880 (comment)). @adinn might be able to give a better explanation since the difference could also be due to the type info we generate in CE (which I guess EE doesn't generate given the single CU, but I might be wrong).
I am certainly interested in doing some profiling to figure out where we spent most of the time and see how we could optimize things. |
@olpaw I just noticed that the figures I provided were skewed by configuring logging during debug generation. The corrected timings and sizings are as follows:
Individual runs are as follows:
|
@olpaw wrote:
I don't expect collapsing all CUs into one CU will make a great deal of difference to either space or to generate time if the CU still contains the same content. None of the CUs is very small so the cost of writing an extra header is small. A more likely thing that could improve generation time/size is to be more careful about what content we actually generate. We should definitely look for opportunities to improve both on space and generate time. Clearly the way to proceed is to profile and then concentrate on the stuff that happens most frequently or takes the longest time to generate. One specific thing we should definitely look at is the two-pass model for generating the DWARF sections. The first one figures out sizing and compute offsets for inter-DIE references and a second one to assemble the content. If we swapped to use resizeable content buffers supplemented with reference resolution patching we might well cut down on execution time by a significant amount. Of course, it's jumping the gun to assume this is the most important issue. It may well be that the most significant portion of the time is needed to assemble the generic code & file model that drives DWARF section generation or in creating/checking validity/updating the source file cache. |
@adinn wrote
Full disclosure: having just one CU would make a small difference to the line info size -- probably with a comparable reduction in writing time -- because it would mean constructing and writing only one global file table and one global dir table. That will avoid a certain amount of repetition in indexing/processing file names and paths that are referenced from more than one class. For Hello.java the UTF8 bytes for the file and dir names take ~890Kb when separate file and dir tables are written per CU (i.e. per class). After sorting these names and removing duplicates the UTF8 byte count came down to ~27Kb. However, saving on assembling and writing 860K of string bytes is not going to make a lot of difference given the total debuginfo size of ~34Mb. |
After some profiling it looks like the dual pass is not our main issue (at least for now). |
Sure, go ahead. |
Update: I have managed to bring the time consumed in See below for some rough numbers from 6 runs with inline methods included:
and from 6 runs with inline methods omitted:
|
69adb17
to
da64e03
Compare
Looks promising @zakkak I will give it a try with building the js image with |
This way we avoid the expensive calls to "toJavaName". This patch reduces the time spend in debug info generation from ~12s to 5s.
da64e03
to
6ffc908
Compare
Sure, go ahead. |
@fzakkak wrote:
I have not yet investigated the newer cases but in the original case concerning inlining into a substituted method the problem clearly arises during compilation. The list of ranges for the substituted method (all 2 of them :-) includes a range that covers the call to the innermost non-inlined method. The line number for this range is correctly placed at the source code call site for that called method. However, the call hierarchy for the range is empty. There is no mention of the intervening inlined methods. So it looks like we got there directly from the call to the substituted method. I am investigating the compiler code to see why inlining in a substituted method loses call hierarchy info. Once I understand what is going wrong there (and, if it is simple enough, identified a fix) I will look into the other cases you brought up. |
I have found the problem with missing inline frames in substituted calls. The problem happens when the original invoke node is copied. The fix is a simply copy across of the node source position in InvokeWithExceptionNode::replaceInvoke(). A patch to fix this and update the debuginfo test script to check for the correct nodes is here. n.b. note that the backtrace listing for the inlined methods (frames |
@olpaw if you could cherry pick the commit I mentioned above into this branch that would ensure that we get proper handling of inlined methods when there is a substitution |
If that is a problem then I can raise it as a follow-up PR. |
@adinn you should be able to commit to @zakkak's branch just fine. I have sucessfully done this on other github branches (!= my own) in the past. |
Ahh bummer. I'm afraid only @zakkak can merge that delta PR back into his repo. To fix this we have to use a different approach. @adinn please create new PR and cherry-pick @zakkak's commits and then put your commit on top. I will then use this new PR for internal testing. Sorry for the inconveniences & thanks for your patience. |
Superseded by #3745 |
Known Issues
we could create a "concrete inlined instance tree" (per the DWARF spec) like this:
instead we generate two separate trees, resulting in more DIEs (Debug Information Entries).
DW_TAG_inlined_subroutine
would reference (throughDW_AT_abstract_origin
) theDW_TAG_subprogram
defining the corresponding subprogram even if it's declared in a different compilation unit. There are cases however where when creating theDW_TAG_inlined_subroutine
the correspondingDW_TAG_subprogram
has not been created yet (even worse the corresponding compilation unit might not have been created yet). To overcome this issue our current approach creates partialDW_TAG_subprogram
entries in the compilation unit where the inlining happens. This however comes at the cost of having more DIEs.Before this PR hello.hello with debug symbols is 10MB
After this PR hello.hello with debug symbols becomes 38MB
Without debug symbols hello.hello is 7.9MB
Closes #2701