Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Produce debug entries for inlined methods #2880

Closed
wants to merge 23 commits into from

Conversation

zakkak
Copy link
Collaborator

@zakkak zakkak commented Sep 30, 2020

Known Issues

  • Given two call-chains:
    1. foo->bar->baz
    2. foo->bar->buz
      we could create a "concrete inlined instance tree" (per the DWARF spec) like this:
   foo
    |
   bar
    |
   / \
  /   \
baz   buz

instead we generate two separate trees, resulting in more DIEs (Debug Information Entries).

  • Ideally an DW_TAG_inlined_subroutine would reference (through DW_AT_abstract_origin) the DW_TAG_subprogram defining the corresponding subprogram even if it's declared in a different compilation unit. There are cases however where when creating the DW_TAG_inlined_subroutine the corresponding DW_TAG_subprogram has not been created yet (even worse the corresponding compilation unit might not have been created yet). To overcome this issue our current approach creates partial DW_TAG_subprogram entries in the compilation unit where the inlining happens. This however comes at the cost of having more DIEs.

Before this PR hello.hello with debug symbols is 10MB
After this PR hello.hello with debug symbols becomes 38MB
Without debug symbols hello.hello is 7.9MB

Closes #2701

@zakkak
Copy link
Collaborator Author

zakkak commented Sep 30, 2020

Examples of DIEs referenced above:

Partial DW_TAG_subprogram

 <1><38>: Abbrev Number: 5 (DW_TAG_subprogram)
    <39>   DW_AT_name        : (indirect string, offset: 0x34686): com.oracle.svm.core.os.ImageHeapProvider::get
    <3d>   DW_AT_inline      : 1        (inlined)
    <3e>   DW_AT_decl_file   : 0x3
    <42>   DW_AT_decl_line   : 0x2c
    <46>   DW_AT_external    : 1

Complete DW_TAG_subprogram

 <1><47>: Abbrev Number: 4 (DW_TAG_subprogram)
    <48>   DW_AT_name        : (indirect string, offset: 0x6bae): com.oracle.svm.core.Isolates::create(org.graalvm.nativeimage.c.type.WordPointer, com.oracle.svm.core.c.function.CEntryPointCreateIsolateParameters)
    <4c>   DW_AT_low_pc      : 0x405000
    <54>   DW_AT_high_pc     : 0x405071
    <5c>   DW_AT_decl_file   : 0x1
    <60>   DW_AT_decl_line   : 0x48
    <64>   DW_AT_external    : 1

Concrete inlined instance trees

 <1><137>: Abbrev Number: 4 (DW_TAG_subprogram)
    <138>   DW_AT_name        : (indirect string, offset: 0x1ad5e): com.oracle.svm.core.Isolates::tearDownCurrent
    <13c>   DW_AT_low_pc      : 0x405080
    <144>   DW_AT_high_pc     : 0x4050f9
    <14c>   DW_AT_decl_file   : 0x1
    <150>   DW_AT_decl_line   : 0x5e
    <154>   DW_AT_external    : 1
 <2><155>: Abbrev Number: 8 (DW_TAG_inlined_subroutine)
    <156>   DW_AT_abstract_origin: <0xa1>
    <15a>   DW_AT_low_pc      : 0x405084
    <162>   DW_AT_high_pc     : 0x40508f
    <16a>   DW_AT_call_file   : 0x1
    <16e>   DW_AT_call_line   : 0x5e
 <3><172>: Abbrev Number: 8 (DW_TAG_inlined_subroutine)
    <173>   DW_AT_abstract_origin: <0xb0>
    <177>   DW_AT_low_pc      : 0x405084
    <17f>   DW_AT_high_pc     : 0x40508f
    <187>   DW_AT_call_file   : 0x1
    <18b>   DW_AT_call_line   : 0x65
 <4><18f>: Abbrev Number: 7 (DW_TAG_inlined_subroutine)
    <190>   DW_AT_abstract_origin: <0xbf>
    <194>   DW_AT_low_pc      : 0x405084
    <19c>   DW_AT_high_pc     : 0x40508f
    <1a4>   DW_AT_call_file   : 0x4
    <1a8>   DW_AT_call_line   : 0xfa
 <4><1ac>: Abbrev Number: 0
 <3><1ad>: Abbrev Number: 0
 <2><1ae>: Abbrev Number: 8 (DW_TAG_inlined_subroutine)
    <1af>   DW_AT_abstract_origin: <0xa1>
    <1b3>   DW_AT_low_pc      : 0x40508f
    <1bb>   DW_AT_high_pc     : 0x405093
    <1c3>   DW_AT_call_file   : 0x1
    <1c7>   DW_AT_call_line   : 0x5e
 <3><1cb>: Abbrev Number: 8 (DW_TAG_inlined_subroutine)
    <1cc>   DW_AT_abstract_origin: <0xb0>
    <1d0>   DW_AT_low_pc      : 0x40508f
    <1d8>   DW_AT_high_pc     : 0x405093
    <1e0>   DW_AT_call_file   : 0x1
    <1e4>   DW_AT_call_line   : 0x65
 <4><1e8>: Abbrev Number: 7 (DW_TAG_inlined_subroutine)
    <1e9>   DW_AT_abstract_origin: <0xce>
    <1ed>   DW_AT_low_pc      : 0x40508f
    <1f5>   DW_AT_high_pc     : 0x405093
    <1fd>   DW_AT_call_file   : 0x4
    <201>   DW_AT_call_line   : 0xfa
 <4><205>: Abbrev Number: 0
 <3><206>: Abbrev Number: 0

Note that this could become:

 <1><137>: Abbrev Number: 4 (DW_TAG_subprogram)
    <138>   DW_AT_name        : (indirect string, offset: 0x1ad5e): com.oracle.svm.core.Isolates::tearDownCurrent
    <13c>   DW_AT_low_pc      : 0x405080
    <144>   DW_AT_high_pc     : 0x4050f9
    <14c>   DW_AT_decl_file   : 0x1
    <150>   DW_AT_decl_line   : 0x5e
    <154>   DW_AT_external    : 1
 <2><155>: Abbrev Number: 8 (DW_TAG_inlined_subroutine)
    <156>   DW_AT_abstract_origin: <0xa1>
    <15a>   DW_AT_low_pc      : 0x405084
    <162>   DW_AT_high_pc     : 0x405093
    <16a>   DW_AT_call_file   : 0x1
    <16e>   DW_AT_call_line   : 0x5e
 <3><172>: Abbrev Number: 8 (DW_TAG_inlined_subroutine)
    <173>   DW_AT_abstract_origin: <0xb0>
    <177>   DW_AT_low_pc      : 0x405084
    <17f>   DW_AT_high_pc     : 0x405093
    <187>   DW_AT_call_file   : 0x1
    <18b>   DW_AT_call_line   : 0x65
 <4><18f>: Abbrev Number: 7 (DW_TAG_inlined_subroutine)
    <190>   DW_AT_abstract_origin: <0xbf>
    <194>   DW_AT_low_pc      : 0x405084
    <19c>   DW_AT_high_pc     : 0x40508f
    <1a4>   DW_AT_call_file   : 0x4
    <1a8>   DW_AT_call_line   : 0xfa
 <4><1e8>: Abbrev Number: 7 (DW_TAG_inlined_subroutine)
    <1e9>   DW_AT_abstract_origin: <0xce>
    <1ed>   DW_AT_low_pc      : 0x40508f
    <1f5>   DW_AT_high_pc     : 0x405093
    <1fd>   DW_AT_call_file   : 0x4
    <201>   DW_AT_call_line   : 0xfa
 <4><1ac>: Abbrev Number: 0
 <3><1ad>: Abbrev Number: 0

@zakkak
Copy link
Collaborator Author

zakkak commented Sep 30, 2020

Looking into travis failures...

@zakkak zakkak force-pushed the oracle/graal/fix-2701 branch 2 times, most recently from 2deb19c to 93bfa4e Compare September 30, 2020 21:37
@zakkak
Copy link
Collaborator Author

zakkak commented Sep 30, 2020

@adinn can you please review this?

@adinn
Copy link
Collaborator

adinn commented Oct 1, 2020

@zakkak This looks very interesting but it's going to need to wait on some large, upcoming changes to the current debuginfo code that I have queued up.

I have almost completed adding types to the generated info model and that includes reorganizing the method debug info so that it is associated with class info records. It also involves following the gcc model of splitting it into separate declarations and definitions for the methods. Your changes will need to be reworked into that patch and I think it would be best to wait for it to arrive before doing so.

@zakkak
Copy link
Collaborator Author

zakkak commented Oct 1, 2020

OK, let's put it on hold then :)

(I was actually expecting this, knowing that you are working on something that big)

@zakkak zakkak marked this pull request as ready for review March 16, 2021 17:52
@zakkak
Copy link
Collaborator Author

zakkak commented Mar 16, 2021

@adinn can you please have a look? :)

@zakkak
Copy link
Collaborator Author

zakkak commented Mar 16, 2021

Note: The current implementation uses "pseudo-method" entries for the nested inline methods. So instead of using DW_AT_specification to link from an inlined method location to the corresponding method declaration I use DW_AT_name, DW_AT_decl_file and DW_AT_decl_line to provide enough info for gdb to produce the backtrace.

@zakkak zakkak marked this pull request as draft March 17, 2021 14:35
@zakkak zakkak force-pushed the oracle/graal/fix-2701 branch 6 times, most recently from cc06434 to df42dd2 Compare June 29, 2021 11:11
@zakkak
Copy link
Collaborator Author

zakkak commented Jun 29, 2021

@adinn can you please review this?

This PR associates ranges corresponding to inlined methods with their "caller" ranges and then using that info generates the corresponding concrete instance trees.

Note the implementation still generates more concrete instance trees than we actually need (see "Known Issues" in #2880 (comment)). My understanding is that in order to solve this we would need to change the way we represent subranges, currently as a list of contiguous memory addresses, to a list of subtrees representing the inlined call trees. I would like to work on this on a different PR if you agree.

Regarding perfomance and debug info size please see the following up-to-date table:

master this branch
-H:+OmitInlinedMethodDebugLineInfo ~82s / 14MB ~82s / 14MB
-H:-OmitInlinedMethodDebugLineInfo ~90s / 17MB ~330s / 65MB

@adinn
Copy link
Collaborator

adinn commented Jul 5, 2021

Note the implementation still generates more concrete instance trees than we actually need (see "Known Issues" in #2880 (comment)). My understanding is that in order to solve this we would need to change the way we represent subranges, currently as a list of contiguous memory addresses, to a list of subtrees representing the inlined call trees. I would like to work on this on a different PR if you agree.

I don't agree that this is the only way to solve the problem or even the best one. As the comment in #2880 makes clear this problem arises because there are methods, and sometimes even owning classes for those methods, mentioned in the inline info that are not notified via the DebugTypeInfo traversal. As a result the set of ClassEntry and MethodEntry instances is incomplete when inline method processing is started.

This is best fixed by completing the set before doing this inline method processing. That will ensure that whenever an inline info record for method m of class C needs to refer to its specification method (an info record with tag DW_TAG_subprogram) there will be a compilation unit defining class C with a child method specification declaring method m for it to reference.

This requires introducing an intermediate processing step preceding the current DebugCodeInfo traversal. A prior traversal or DebugCodeInfo can identify missing methods and classes and create entries for them. This requires that the ClassEntry and MethodEntry instances introduced during this completion step are able to be distinguished as being incomplete types/methods for which, respectively, no class structure or method implementation is defined (n.b. that might be best achieved by refactoring the Entry class hierarchy to introduce special subtypes of TypeEntry and MemberEntry that share some behavior with ClassEntry and MethodEntry).

I think it would be much better to implement this prior step and modify the current patch to profit from it than commit it as is. I will be happy to provide an implementation unless you wish to attempt it.

@olpaw
Copy link
Member

olpaw commented Aug 24, 2021

@adinn @zakkak I ran a js-image build with and without -g on this PR on my local machine (Dell XPS 17 9700).

oracle/graal/fix-2701 with debuginfo:

$ mx native-image -g --macro:js-launcher
...
[js:7833]      compile:  33,453.86 ms,  6.67 GB
[js:7833]      dbginfo:  93,261.71 ms,  6.91 GB
[js:7833]        image: 103,866.72 ms,  7.33 GB
[js:7833]        write:  10,052.75 ms,  7.41 GB
[js:7833]      [total]: 205,244.03 ms,  7.41 GB
# Printing build artifacts to: /home/pwoegere/OLabs/issues/ce-debug/mandrel/sdk/mxbuild/linux-amd64/GRAALVM_E667026FC4_JAVA11/graalvm-e667026fc4-java11-21.3.0-dev/lib/svm/macros/js-launcher/../../../../languages/js/bin/js.build_artifacts.txt

oracle/graal/fix-2701 without debuginfo

$ mx native-image --macro:js-launcher
...
[js:8807]      compile:  30,523.36 ms,  6.96 GB
[js:8807]        image:   5,942.89 ms,  6.96 GB
[js:8807]        write:     959.05 ms,  6.86 GB
[js:8807]      [total]:  85,194.27 ms,  6.86 GB
# Printing build artifacts to: /home/pwoegere/OLabs/issues/ce-debug/mandrel/sdk/mxbuild/linux-amd64/GRAALVM_E667026FC4_JAVA11/graalvm-e667026fc4-java11-21.3.0-dev/lib/svm/macros/js-launcher/../../../../languages/js/bin/js.build_artifacts.txt

I also did the same test with GraalVM 21.2.0.1 Java 11 EE (using EE debuginfo generation)

GraalVM 21.2.0.1 Java 11 EE with debuginfo:

$ ~/OLabs/graalvm-11/bin/native-image -g --macro:js-launcher
...
[js:10442]      compile:  65,240.21 ms,  8.72 GB
[js:10442]        image:  12,492.59 ms,  8.72 GB
[js:10442]    debuginfo:   6,792.39 ms,  8.24 GB
[js:10442]        write:  12,636.93 ms,  8.01 GB
[js:10442]      [total]: 162,886.10 ms,  8.01 GB
# Printing build artifacts to: /home/pwoegere/.local/graalvm/graalvm-ee-java11-21.2.0.1/lib/svm/macros/js-launcher/../../../../languages/js/bin/js.build_artifacts.txt

GraalVM 21.2.0.1 Java 11 EE without debuginfo

$ ~/OLabs/graalvm-11/bin/native-image --macro:js-launcher -H:Debug=0
[js:11298]    classlist:   1,325.93 ms,  0.96 GB
--initialize-at-build-time without arguments has been deprecated and will be removed in GraalVM 22.0.
...
[js:11298]      compile:  65,930.98 ms,  8.50 GB
[js:11298]        image:  12,301.30 ms,  8.34 GB
[js:11298]        write:   1,426.32 ms,  8.34 GB
[js:11298]      [total]: 141,971.46 ms,  8.34 GB
# Printing build artifacts to: /home/pwoegere/.local/graalvm/graalvm-ee-java11-21.2.0.1/lib/svm/macros/js-launcher/../../../../languages/js/bin/js.build_artifacts.txt

The difference is worrying. With the new implementation building with debuginfo more than doubles total build time.

With these numbers I'm not sure anymore if setting OmitInlinedMethodDebugLineInfo to false per default is a good idea.

One big difference in EE debuginfo generation is that we only emit a single CU for the whole image. Could it be that the use of one CU per class is a major factor in creating all that overhead compared to EE debuginfo generation? Are there any plans to further improve CE debuginfo generation build time?

@zakkak
Copy link
Collaborator Author

zakkak commented Aug 24, 2021

The difference is worrying. With the new implementation building with debuginfo more than doubles total build time.

With these numbers I'm not sure anymore if setting OmitInlinedMethodDebugLineInfo to false per default is a good idea.

One big difference in EE debuginfo generation is that we only emit a single CU for the whole image. Could it be that the use of one CU per class is a major factor in creating all that overhead compared to EE debuginfo generation?

Intuitively I would say no. If I had to guess (given that I don't have access to the EE implementation) I would attribute the difference to something like using "pseudo-methods" for inline methods as I did initially in this PR (#2880 (comment)). @adinn might be able to give a better explanation since the difference could also be due to the type info we generate in CE (which I guess EE doesn't generate given the single CU, but I might be wrong).

Are there any plans to further improve CE debuginfo generation build time?

I am certainly interested in doing some profiling to figure out where we spent most of the time and see how we could optimize things.

@adinn
Copy link
Collaborator

adinn commented Aug 24, 2021

@olpaw I just noticed that the figures I provided were skewed by configuring logging during debug generation. The corrected timings and sizings are as follows:

run debuginfo inlineinfo isolates size dbginfo time total time
1 omit n/a off 10180520 0 55,974.20
2 include omit off 12444456 1,914.40 56,157.45
3 include include off 32307072 11,217.54 63,919.24
4 omit n/a on 11192384 0 50,620.07
5 include omit on 13738400 1,733.80 53,313.88
6 include omit on 34343088 11,821.70 65,512.47

Individual runs are as follows:

run debuginfo inlineinfo isolates size dbginfo time total time
1 omit n/a off 10180520 0 55,974.20
[adinn@sputstik substratevm]$ bash do_native_image -debug -debuginfo -omitinline -isolates -log Hello
[hello:2388596]    classlist:   3,467.21 ms,  0.96 GB
[hello:2388596]        (cap):     599.38 ms,  0.96 GB
[hello:2388596]        setup:   2,606.94 ms,  0.96 GB
[hello:2388596]     (clinit):     231.79 ms,  2.30 GB
[hello:2388596]   (typeflow):   6,493.25 ms,  2.30 GB
[hello:2388596]    (objects):   7,559.47 ms,  2.30 GB
[hello:2388596]   (features):   1,383.11 ms,  2.30 GB
[hello:2388596]     analysis:  15,988.25 ms,  2.30 GB
[hello:2388596]     universe:   1,342.81 ms,  2.30 GB
[hello:2388596]      (parse):   2,176.86 ms,  3.19 GB
[hello:2388596]     (inline):   2,625.46 ms,  3.19 GB
[hello:2388596]    (compile):  22,552.72 ms,  4.38 GB
[hello:2388596]      compile:  30,392.48 ms,  4.38 GB
[hello:2388596]        image:   1,512.35 ms,  4.38 GB
[hello:2388596]        write:     337.22 ms,  4.38 GB
[hello:2388596]      [total]:  55,974.20 ms,  4.38 GB
# Printing build artifacts to: /home/adinn/redhat/openjdk/graal/graal/substratevm/hello.build_artifacts.txt
[adinn@sputstik substratevm]$ ls -l hello
-rwxrwxr-x. 1 adinn adinn 10180520 Aug 24 12:21 hello
run debuginfo inlineinfo isolates size dbginfo time total time
2 include omit off 12444456 1,914.40 56,157.45
[adinn@sputstik substratevm]$ bash do_native_image -debug +debuginfo +omitinline -isolates -log Hello
[hello:2389139]    classlist:   3,317.36 ms,  0.96 GB
[hello:2389139]        (cap):     632.99 ms,  0.96 GB
[hello:2389139]        setup:   3,098.77 ms,  0.96 GB
[hello:2389139]     (clinit):     247.46 ms,  2.30 GB
[hello:2389139]   (typeflow):   6,048.36 ms,  2.30 GB
[hello:2389139]    (objects):   7,012.20 ms,  2.30 GB
[hello:2389139]   (features):   1,436.59 ms,  2.30 GB
[hello:2389139]     analysis:  15,062.28 ms,  2.30 GB
[hello:2389139]     universe:   1,762.90 ms,  2.30 GB
[hello:2389139]      (parse):   2,082.50 ms,  2.30 GB
[hello:2389139]     (inline):   2,569.36 ms,  3.13 GB
[hello:2389139]    (compile):  20,737.01 ms,  4.47 GB
[hello:2389139]      compile:  28,176.02 ms,  4.47 GB
[hello:2389139]      dbginfo:   1,914.40 ms,  4.47 GB
[hello:2389139]        image:   3,637.74 ms,  4.47 GB
[hello:2389139]        write:     828.11 ms,  4.47 GB
[hello:2389139]      [total]:  56,157.45 ms,  4.47 GB
# Printing build artifacts to: /home/adinn/redhat/openjdk/graal/graal/substratevm/hello.build_artifacts.txt
[adinn@sputstik substratevm]$ ls -l hello
-rwxrwxr-x. 1 adinn adinn 12444456 Aug 24 12:23 hello
run debuginfo inlineinfo isolates size dbginfo time total time
3 include include off 32307072 11,217.54 63,919.24
[adinn@sputstik substratevm]$ bash do_native_image -debug +debuginfo -omitinline -isolates -log Hello
[hello:2389679]    classlist:   2,651.40 ms,  0.96 GB
[hello:2389679]        (cap):     672.29 ms,  0.96 GB
[hello:2389679]        setup:   3,441.30 ms,  0.96 GB
[hello:2389679]     (clinit):     197.86 ms,  2.31 GB
[hello:2389679]   (typeflow):   5,833.55 ms,  2.31 GB
[hello:2389679]    (objects):   6,887.53 ms,  2.31 GB
[hello:2389679]   (features):     800.76 ms,  2.31 GB
[hello:2389679]     analysis:  14,059.57 ms,  2.31 GB
[hello:2389679]     universe:   1,310.38 ms,  2.31 GB
[hello:2389679]      (parse):   1,975.55 ms,  2.31 GB
[hello:2389679]     (inline):   1,980.66 ms,  3.14 GB
[hello:2389679]    (compile):  19,761.21 ms,  4.47 GB
[hello:2389679]      compile:  26,639.39 ms,  4.48 GB
[hello:2389679]      dbginfo:  11,217.54 ms,  4.48 GB
[hello:2389679]        image:  12,724.97 ms,  4.48 GB
[hello:2389679]        write:   2,826.68 ms,  4.47 GB
[hello:2389679]      [total]:  63,919.24 ms,  4.47 GB
# Printing build artifacts to: /home/adinn/redhat/openjdk/graal/graal/substratevm/hello.build_artifacts.txt
[adinn@sputstik substratevm]$ ls -l hello
-rwxrwxr-x. 1 adinn adinn 32307072 Aug 24 12:25 hello
run debuginfo inlineinfo isolates size dbginfo time total time
4 omit n/a on 11192384 0 50,620.07
[adinn@sputstik substratevm]$ bash do_native_image -debug -debuginfo -omitinline +isolates -log Hello
[hello:2390896]    classlist:   2,873.23 ms,  1.21 GB
[hello:2390896]        (cap):     671.07 ms,  1.21 GB
[hello:2390896]        setup:   3,099.24 ms,  1.21 GB
[hello:2390896]     (clinit):     200.87 ms,  2.71 GB
[hello:2390896]   (typeflow):   5,776.27 ms,  2.71 GB
[hello:2390896]    (objects):   6,233.92 ms,  2.71 GB
[hello:2390896]   (features):   1,550.34 ms,  2.71 GB
[hello:2390896]     analysis:  14,168.24 ms,  2.71 GB
[hello:2390896]     universe:   1,095.76 ms,  2.71 GB
[hello:2390896]      (parse):   1,460.66 ms,  2.71 GB
[hello:2390896]     (inline):   1,564.57 ms,  3.83 GB
[hello:2390896]    (compile):  21,595.04 ms,  4.42 GB
[hello:2390896]      compile:  27,511.83 ms,  4.42 GB
[hello:2390896]        image:   1,418.68 ms,  4.42 GB
[hello:2390896]        write:     187.53 ms,  4.42 GB
[hello:2390896]      [total]:  50,620.07 ms,  4.42 GB
# Printing build artifacts to: /home/adinn/redhat/openjdk/graal/graal/substratevm/hello.build_artifacts.txt
[adinn@sputstik substratevm]$ ls -l hello
-rwxrwxr-x. 1 adinn adinn 11192384 Aug 24 13:04 hello
run debuginfo inlineinfo isolates size dbginfo time total time
5 include omit on 13738400 1,733.80 53,313.88
[adinn@sputstik substratevm]$ bash do_native_image -debug +debuginfo +omitinline +isolates -log Hello
[hello:2391439]    classlist:   2,851.83 ms,  0.96 GB
[hello:2391439]        (cap):     638.89 ms,  0.96 GB
[hello:2391439]        setup:   3,341.74 ms,  0.96 GB
[hello:2391439]     (clinit):     195.10 ms,  2.31 GB
[hello:2391439]   (typeflow):   5,925.15 ms,  2.31 GB
[hello:2391439]    (objects):   7,930.42 ms,  2.31 GB
[hello:2391439]   (features):     815.66 ms,  2.31 GB
[hello:2391439]     analysis:  15,215.00 ms,  2.31 GB
[hello:2391439]     universe:   1,295.75 ms,  2.31 GB
[hello:2391439]      (parse):   1,948.72 ms,  2.31 GB
[hello:2391439]     (inline):   2,194.23 ms,  3.14 GB
[hello:2391439]    (compile):  20,056.46 ms,  4.46 GB
[hello:2391439]      compile:  26,707.75 ms,  4.46 GB
[hello:2391439]      dbginfo:   1,733.80 ms,  4.46 GB
[hello:2391439]        image:   2,955.22 ms,  4.46 GB
[hello:2391439]        write:     677.07 ms,  4.46 GB
[hello:2391439]      [total]:  53,313.88 ms,  4.46 GB
# Printing build artifacts to: /home/adinn/redhat/openjdk/graal/graal/substratevm/hello.build_artifacts.txt
[adinn@sputstik substratevm]$ ls -l hello
-rwxrwxr-x. 1 adinn adinn 13738400 Aug 24 13:05 hello
run debuginfo inlineinfo isolates size dbginfo time total time
6 include omit on 34343088 11,821.70 65,512.47
[adinn@sputstik substratevm]$ bash do_native_image -debug +debuginfo -omitinline +isolates -log Hello
[hello:2391971]    classlist:   2,746.48 ms,  1.21 GB
[hello:2391971]        (cap):     585.69 ms,  1.21 GB
[hello:2391971]        setup:   3,173.74 ms,  1.21 GB
[hello:2391971]     (clinit):     213.85 ms,  2.71 GB
[hello:2391971]   (typeflow):   5,923.40 ms,  2.71 GB
[hello:2391971]    (objects):   7,767.70 ms,  2.71 GB
[hello:2391971]   (features):     880.01 ms,  2.71 GB
[hello:2391971]     analysis:  15,120.81 ms,  2.71 GB
[hello:2391971]     universe:   1,339.80 ms,  2.71 GB
[hello:2391971]      (parse):   1,892.37 ms,  2.71 GB
[hello:2391971]     (inline):   2,374.36 ms,  3.82 GB
[hello:2391971]    (compile):  20,252.40 ms,  4.42 GB
[hello:2391971]      compile:  27,376.33 ms,  4.42 GB
[hello:2391971]      dbginfo:  11,821.70 ms,  4.43 GB
[hello:2391971]        image:  13,040.10 ms,  4.44 GB
[hello:2391971]        write:   2,437.61 ms,  4.44 GB
[hello:2391971]      [total]:  65,512.47 ms,  4.44 GB
# Printing build artifacts to: /home/adinn/redhat/openjdk/graal/graal/substratevm/hello.build_artifacts.txt
[adinn@sputstik substratevm]$ ls -l hello
-rwxrwxr-x. 1 adinn adinn 34343088 Aug 24 13:06 hello

@adinn
Copy link
Collaborator

adinn commented Aug 24, 2021

@olpaw wrote:

One big difference in EE debuginfo generation is that we only emit a single CU for the whole image. Could it be that the use of one CU per class is a major factor in creating all that overhead compared to EE debuginfo generation? Are there any plans to further improve CE debuginfo generation build time?

I don't expect collapsing all CUs into one CU will make a great deal of difference to either space or to generate time if the CU still contains the same content. None of the CUs is very small so the cost of writing an extra header is small.

A more likely thing that could improve generation time/size is to be more careful about what content we actually generate. We should definitely look for opportunities to improve both on space and generate time. Clearly the way to proceed is to profile and then concentrate on the stuff that happens most frequently or takes the longest time to generate.

One specific thing we should definitely look at is the two-pass model for generating the DWARF sections. The first one figures out sizing and compute offsets for inter-DIE references and a second one to assemble the content. If we swapped to use resizeable content buffers supplemented with reference resolution patching we might well cut down on execution time by a significant amount. Of course, it's jumping the gun to assume this is the most important issue. It may well be that the most significant portion of the time is needed to assemble the generic code & file model that drives DWARF section generation or in creating/checking validity/updating the source file cache.

@adinn
Copy link
Collaborator

adinn commented Aug 24, 2021

@zakkak wrote:

@adinn inlined methods invoked by a substitute method appear to be omited. I'll have a better look, but if you have any insight it would be much appreciated. See here.

I'm looking into this now.

@adinn
Copy link
Collaborator

adinn commented Aug 24, 2021

@adinn wrote

I don't expect collapsing all CUs into one CU will make a great deal of difference to either space or to generate time if the CU still contains the same content. None of the CUs is very small so the cost of writing an extra header is small.

Full disclosure: having just one CU would make a small difference to the line info size -- probably with a comparable reduction in writing time -- because it would mean constructing and writing only one global file table and one global dir table. That will avoid a certain amount of repetition in indexing/processing file names and paths that are referenced from more than one class.

For Hello.java the UTF8 bytes for the file and dir names take ~890Kb when separate file and dir tables are written per CU (i.e. per class). After sorting these names and removing duplicates the UTF8 byte count came down to ~27Kb. However, saving on assembling and writing 860K of string bytes is not going to make a lot of difference given the total debuginfo size of ~34Mb.

@zakkak
Copy link
Collaborator Author

zakkak commented Aug 24, 2021

One specific thing we should definitely look at is the two-pass model for generating the DWARF sections. The first one figures out sizing and compute offsets for inter-DIE references and a second one to assemble the content. If we swapped to use resizeable content buffers supplemented with reference resolution patching we might well cut down on execution time by a significant amount. Of course, it's jumping the gun to assume this is the most important issue. It may well be that the most significant portion of the time is needed to assemble the generic code & file model that drives DWARF section generation or in creating/checking validity/updating the source file cache.

After some profiling it looks like the dual pass is not our main issue (at least for now).
About 90% of the time spent in dbginfo is actually spent in the 3rd phase where we generate the Ranges.
This makes sense since it also matches the fact that when we do not omit inline methods (aka we generate more Ranges) we see a rapid increase in execution time.
Apart from the numerous data structures (e.g., class index, type index, etc) involved in the Ranges generation it looks like a large portion of the time appears to be spent in constructing Strings using string builders! We are constructing at least four Strings (className, methodName, paramSignature, returnType) for each DebugCodeInfo and DebugLineInfo.
If you agree I will start by exploring how we could reduce these instances (either by caching names for known types or by avoiding the creation of new Strings where not strictly necessary).

@adinn
Copy link
Collaborator

adinn commented Aug 24, 2021

If you agree I will start by exploring how we could reduce these instances (either by caching names for known types or by avoiding the creation of new Strings where not strictly necessary).

Sure, go ahead.

@zakkak
Copy link
Collaborator Author

zakkak commented Aug 25, 2021

Update: I have managed to bring the time consumed in dbginfo down to ~7s from ~13-14s

See below for some rough numbers from 6 runs with inline methods included:

$ grep -e dbginfo -e image -e total noomitinlined.txt                                                                                
[hello.hello:939716]      dbginfo:   7,068.08 ms,  4.63 GB
[hello.hello:939716]        image:   9,244.38 ms,  4.63 GB
[hello.hello:939716]      [total]:  66,139.50 ms,  5.07 GB
[hello.hello:940030]      dbginfo:   6,720.81 ms,  5.14 GB
[hello.hello:940030]        image:   8,835.13 ms,  5.14 GB
[hello.hello:940030]      [total]:  72,484.42 ms,  5.14 GB
[hello.hello:940345]      dbginfo:   7,425.84 ms,  4.64 GB
[hello.hello:940345]        image:   9,534.37 ms,  4.64 GB
[hello.hello:940345]      [total]:  74,840.29 ms,  5.09 GB
[hello.hello:940667]      dbginfo:   6,927.02 ms,  4.65 GB
[hello.hello:940667]        image:   9,588.95 ms,  4.65 GB
[hello.hello:940667]      [total]:  68,857.80 ms,  5.04 GB
[hello.hello:940985]      dbginfo:   7,438.26 ms,  4.65 GB
[hello.hello:940985]        image:  10,134.49 ms,  4.65 GB
[hello.hello:940985]      [total]:  69,913.16 ms,  5.05 GB
[hello.hello:941311]      dbginfo:   7,134.88 ms,  5.11 GB
[hello.hello:941311]        image:   9,053.28 ms,  5.11 GB
[hello.hello:941311]      [total]:  68,700.67 ms,  5.11 GB

and from 6 runs with inline methods omitted:

$ grep -e dbginfo -e image -e total omitinlined.txt  
[hello.hello:937744]      dbginfo:   2,300.81 ms,  4.18 GB
[hello.hello:937744]        image:   4,416.39 ms,  4.18 GB
[hello.hello:937744]      [total]:  58,817.67 ms,  4.18 GB
[hello.hello:938047]      dbginfo:   2,039.19 ms,  4.19 GB
[hello.hello:938047]        image:   4,247.20 ms,  4.19 GB
[hello.hello:938047]      [total]:  61,960.91 ms,  4.19 GB
[hello.hello:938352]      dbginfo:   1,959.37 ms,  4.62 GB
[hello.hello:938352]        image:   3,943.96 ms,  4.62 GB
[hello.hello:938352]      [total]:  64,410.12 ms,  4.62 GB
[hello.hello:938688]      dbginfo:   1,903.15 ms,  4.63 GB
[hello.hello:938688]        image:   3,948.00 ms,  4.63 GB
[hello.hello:938688]      [total]:  68,297.06 ms,  4.63 GB
[hello.hello:938997]      dbginfo:   2,322.57 ms,  4.19 GB
[hello.hello:938997]        image:   4,559.33 ms,  4.19 GB
[hello.hello:938997]      [total]:  67,075.78 ms,  4.19 GB
[hello.hello:939313]      dbginfo:   2,007.51 ms,  4.65 GB
[hello.hello:939313]        image:   4,276.47 ms,  4.65 GB
[hello.hello:939313]      [total]:  63,129.73 ms,  4.65 GB

@olpaw
Copy link
Member

olpaw commented Aug 25, 2021

Update: I have managed to bring the time consumed in dbginfo down to ~7s from ~13-14s

Looks promising @zakkak

I will give it a try with building the js image with -g tomorrow. Thanks for your effort!

@adinn
Copy link
Collaborator

adinn commented Aug 27, 2021

If you agree I will start by exploring how we could reduce these instances (either by caching names for known types or by avoiding the creation of new Strings where not strictly necessary).

Sure, go ahead.

@adinn
Copy link
Collaborator

adinn commented Aug 27, 2021

@fzakkak wrote:

@adinn inlined methods invoked by a substitute method appear to be omited. I'll have a better look, but if you have any insight it would be much appreciated. See here.

I have not yet investigated the newer cases but in the original case concerning inlining into a substituted method the problem clearly arises during compilation. The list of ranges for the substituted method (all 2 of them :-) includes a range that covers the call to the innermost non-inlined method. The line number for this range is correctly placed at the source code call site for that called method. However, the call hierarchy for the range is empty. There is no mention of the intervening inlined methods. So it looks like we got there directly from the call to the substituted method.

I am investigating the compiler code to see why inlining in a substituted method loses call hierarchy info. Once I understand what is going wrong there (and, if it is simple enough, identified a fix) I will look into the other cases you brought up.

@adinn
Copy link
Collaborator

adinn commented Aug 27, 2021

@zakkak

I have found the problem with missing inline frames in substituted calls. The problem happens when the original invoke node is copied. The fix is a simply copy across of the node source position in InvokeWithExceptionNode::replaceInvoke(). A patch to fix this and update the debuginfo test script to check for the correct nodes is here.

n.b. note that the backtrace listing for the inlined methods (frames #2 and #3) lists them as hello.SubstituteHelperClass::staticInlineGreet() and hello.SubstituteHelperClass::inlineGreet() i.e. it does not specify the argument lists as (void). That is because these methods only exist as inlined and so are not detailed via a type notification via the DebugTypeInfo API. Hence, the specification of parameters for these methods in the class is omitted. By contrast, in the other greet method backtrace lines (#1 and #4) we see hello.SubstituteHelperClass::nestedGreet(void) and hello\.Hello$DefaultGreeter::greet(void) because these appear as top-level compiled methods and so have a full specification of their parameters.

@adinn
Copy link
Collaborator

adinn commented Aug 27, 2021

@olpaw if you could cherry pick the commit I mentioned above into this branch that would ensure that we get proper handling of inlined methods when there is a substitution

@adinn
Copy link
Collaborator

adinn commented Aug 27, 2021

If that is a problem then I can raise it as a follow-up PR.

@olpaw
Copy link
Member

olpaw commented Aug 27, 2021

@olpaw if you could cherry pick the commit I mentioned above into this branch that would ensure that we get proper handling of inlined methods when there is a substitution

@adinn you should be able to commit to @zakkak's branch just fine. I have sucessfully done this on other github branches (!= my own) in the past.

@adinn
Copy link
Collaborator

adinn commented Aug 27, 2021

@adinn you should be able to commit to @zakkak's branch just fine. I have sucessfully done this on other github branches (!= my own) in the past.

@olpaw Ok, I'll give it a try

@adinn
Copy link
Collaborator

adinn commented Aug 31, 2021

@olpaw wrote:

@adinn you should be able to commit to @zakkak's branch just fine. I have sucessfully done this on other github branches (!= my own) in the past.

I am not able to commit to zakkak's repo. I have raised a delta PR against his PR branch from my own updated copy of the branch.

@olpaw
Copy link
Member

olpaw commented Aug 31, 2021

@olpaw wrote:

@adinn you should be able to commit to @zakkak's branch just fine. I have sucessfully done this on other github branches (!= my own) in the past.

I am not able to commit to zakkak's repo. I have raised a delta PR against his PR branch from my own updated copy of the branch.

Ahh bummer. I'm afraid only @zakkak can merge that delta PR back into his repo. To fix this we have to use a different approach.

@adinn please create new PR and cherry-pick @zakkak's commits and then put your commit on top. I will then use this new PR for internal testing. Sorry for the inconveniences & thanks for your patience.

@adinn
Copy link
Collaborator

adinn commented Sep 1, 2021

@olpaw I have raised PR #3745 which includes the necessary compiler patch plus also updates to the debuginfotest driver script which verify that all relevant inline and non-inline call trees are being displayed as expected. If you could switch your review to that PR we can close this one.

@olpaw
Copy link
Member

olpaw commented Sep 1, 2021

Superseded by #3745

@olpaw olpaw closed this Sep 1, 2021
@zakkak zakkak deleted the oracle/graal/fix-2701 branch September 14, 2021 10:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inlined methods not referenceable from gdb when compiling with -g.
5 participants