-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LLVM Source Level debug info support #4964
base: master
Are you sure you want to change the base?
LLVM Source Level debug info support #4964
Conversation
…o in the IR. Tested on a sample helloworld program
…o in the IR. Tested on a sample helloworld program and added comments Checkstyle passes
…al into LLVMDebugInfoSupport
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like cool changes, thanks a lot! Let me know when you've had the time to look into my comments.
@@ -470,7 +470,9 @@ public void build(String imageName, DebugContext debug) { | |||
if (SubstrateOptions.GenerateDebugInfo.getValue(HostedOptionValues.singleton()) > 0) { | |||
Timer timer = TimerCollection.singleton().get(TimerCollection.Registry.DEBUG_INFO); | |||
try (Timer.StopTimer t = timer.start()) { | |||
ImageSingletons.add(SourceManager.class, new SourceManager()); | |||
if (ImageSingletons.contains(SourceManager.class) == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will try to add debug info the Native Image way, by generating it. I think what we want is let the LLVM compiler handle this, so I think this if
should be omitted altogether when using the LLVM backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SourceManager
singleton really ought to be registered in NativeImageDebugInfoFeature
conditional on debuginfo generation being enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems to be a cleaner solution indeed.
@@ -49,5 +49,8 @@ public class LLVMOptions { | |||
@Option(help = "Enable LLVM bitcode optimizations")// | |||
public static final HostedOptionKey<Boolean> BitcodeOptimizations = new HostedOptionKey<>(false); | |||
|
|||
@Option(help = "Include source code level debug info in the output LLVM IR", type = OptionType.Debug)// | |||
public static final HostedOptionKey<Boolean> IncludeLLVMSourceDebugInfo = new HostedOptionKey<>(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't add a new option for this, LLVM debug info generation should be tied to the existing GenerateDebugInfo
option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree. Enabling debug info generation and selecting the generated target type are (must be) orthogonal choices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I'll remove this option and just use GenerateDebugInfo
if (LLVMOptions.IncludeLLVMSourceDebugInfo.getValue()) { | ||
//TODO: Avoid this lock if possible | ||
imageSingletonesLock.lock(); | ||
if (ImageSingletons.contains(SourceManager.class) == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be added in LLVMFeature.beforeCompilation
once, and SourceManager
should be made thread-safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, NativeImageDebugInfoFeature
is the right place to do this as it is relevant to all debug info generation code not just the LLVM debug info generator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, SourceManager
is definitely not thread safe at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, having all management related to SourceManager
in NativeImageDebugInfoFeature
makes sense. The handling of debug information on the LLVM backend has to happen in the compilation phase, so making SourceManager
can't be avoided, I think. It should be as simple as changing its data structures to thread-safe ones though, so not a big hurdle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, this is a cache manager so it is not quite so simple as just updating the data structures. The important thing to get right is to ensure that the check as to whether a source file is cached in the sources directory and any subsequent lookup and copy from some originating jar into the sources directory happens atomically. A secondary constraint is that a failed lookup must records the lack of any suitable source atomically so that we don't waste time repeating lookups.
Like all concurrent caches there are different choices available to enforce consistency, ranging from the sledgehammer solution of a synchronization on the whole cache at every lookup, through a synchronization on some proxy that exists per cached file, right down to a racy lookup and copy of sources that only synchronizes at the point of linking the copied file into the source tree.
I am happy to review any proposed fix. However, I am also happy to implement a solution, which may be quicker as I am already familiar with the code. @rishikeshdevsot @loicottet please confirm how you would like to proceed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adinn I am okay with you implementing a solution since you are familiar with the problems and the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll prepare that in a separate PR and you can rebase this PR on it.
@@ -796,6 +797,12 @@ public Value setResult(ValueNode node, Value operand) { | |||
"value type doesn't match node stamp (" + node.stamp(NodeView.DEFAULT).toString() + ")", llvmOperand.get()); | |||
|
|||
gen.getDebugInfoPrinter().setValueName(llvmOperand, node); | |||
if (LLVMOptions.IncludeLLVMSourceDebugInfo.getValue()) { | |||
if (llvmOperand.get() instanceof LLVMValueRef) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell llvmOperand.get()
is always an LLVMValueRef
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I'll remove this check
return null; | ||
} | ||
|
||
protected static ResolvedJavaType getJavaType(HostedMethod hostedMethod, boolean wantOriginal) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can get the same result by looking up hostedMethod.getDeclaringClass().getJavaClass()
. This will be cleaner than looking into the internals of HostedType
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lookup is performed this way in NativeImageDebugInfoProvider for a good reason. There are specific cases related to the use of substitutions where looking into the internals of HostedType is required in order to give a consistent view of the code base to a debugger.
The problematic situation occurs when we have a class C and substitution class S with method m1 defined by C and m2 substituted by S. We can only have one class in the debug info and we need to ensure m1 and m2 are reported as methods of that class. The current code in NativeImageDebugInfoProvider ensures that the class name is always reported as C and that the methods are reported as C.m1 and C.m2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note however, that the source file for m1 needs to be C.java while the source file for m2 needs to be S.java.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, that makes sense. However it would be cleaner to reuse those lookup methods from NativeImageDebugInfoProvider
instead of copying them here.
public void computeFullFilePath() { | ||
ResolvedJavaType declaringClass = method.getDeclaringClass(); | ||
Class<?> clazz = null; | ||
if (declaringClass instanceof OriginalClassProvider) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're generating debug info during compilation, method
should always be a HostedMerthod
. This should allow you to use getJavaClass()
directly here to get the class information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is true of of all top level compiled methods. It does not turn out to be true of all inlined methods that are referenced in frame data derived from the CompilationResult. These inline methods and their classes also need to be associated with a source file.
Note also that in this case the source file for a substituted method S.m1 needs to be the file for class S, not for the class C that it substitutes. That is why the lookup functions above take a flag describing whether or not the original is wanted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, I didn't consider inlined methods. As a general question related to that file, how is it different for LLVM? Couldn't we just reuse the one from the normal backend?
@@ -78,6 +98,11 @@ public LLVMIRBuilder(LLVMIRBuilder primary) { | |||
this.context = primary.context; | |||
this.builder = LLVM.LLVMCreateBuilderInContext(context); | |||
this.module = primary.module; | |||
if (LLVMOptions.IncludeLLVMSourceDebugInfo.getValue()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This constructor is only used to create tiny helper methods that represent a single instruction in a main program method (e.g. a compare-and-swap between two objects). Does it make sense to emit debug info here as well? There wouldn't be anything more specific than the debug info attached to the call to this helper method in the main (primary) builder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I removed debug info emission for this constructor
|
||
// Maps filenames to a compile unit | ||
public HashMap<String, LLVMMetadataRef> diFilenameToCU; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think LLVM automatically merges identical metadata nodes when it encounters them, if that's the case we can probably get rid of those caches. Also, shouldn't all the debug information from a function have the same file name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what the answer is to that last question. However, I'd just note that a compiled method can include code from inlined methods which belong to a different source file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding, LLVM automatically merges identical nodes during linking. ("...duplicate information is automatically merged by the linker" https://llvm.org/docs/SourceLevelDebugging.html) But that means the individual bytecode files corresponding to the functions(e.g. fXX.bc
) will have duplicate compile units and subprograms without the caches based on how the code is written. So the user will have to deal with duplicate debug info if they wish to perform static analysis on the function level bytecode files. Atleast, for my use case, I would like to perform static analysis on individual bytecode files which is why I added these caches, when I saw that there were multiple compile units present in the fXX.bc
files.
If you think that is not a general use case, then I can remove the caches.
import static org.graalvm.compiler.debug.GraalError.unimplemented; | ||
|
||
// Similar to NativeImageDebugLineInfo from previous releases | ||
public class LLVMDebugLineInfo { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NativeImageDebugLineInfo
has been replaced with NativeImageDebugLocationInfo
. You should base your changes on the newer version to be able to merge the changes to master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, is it necessary to reimplement this class? Can't you just reuse it as is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a great number of other changes that have happened along with that change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I would say to integrate into the current version of GraalVM we should base LLVM debug info emission on the current method used by the standard backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay.
I pushed some changes where I attempted to use the standard NativeImageDebugInfoProvider
backend for the location information. I had to make the class public and add a new subclass NativeImageDebugLLVMLocationInfo
.
I had one question regarding NativeImageBaseMethodInfo
's constructor. When I tried using the default constructor, I got a NullPointerException from isPseudoObjectType()
called from createParamInfo()
inside the constructor. I am not sure what these functions do, so I don't know how to fix the error for the LLVM backend. I was hoping if someone could let me know what they do. Currently, in the pushed commits, I have a workaround where I created another constructor inside NativeImageBaseMethodInfo
that does not make these calls.
Let me know if this is similar to what you were expecting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rishikeshdevsot Method isPseudoObjectType
checks whether a type presented as a Java object type is derived from class WordBase. ThaIf so that means the Object type used as a proxy for a foreign numeric or pointer value. As far as the type info model is concerned the debugger still needs to present these types as object types. However, as far as the compiled code is concerned they get handled as 64 bit words. That only has influences debug info generation for local vars and values. A parameter or local var of this type will use the object type as its type but will use long as its JavaKind. A constant value for these types will use a long not an oop. This is important when processing frame data that identifies local values and when checking that repeat occurences of vars and values actually alias each other.
I'm not sure what you mean when you say "When I tried using the default constructor" for NativeImageBaseMethodInfo
. Do you mean you added a constructor which does not receive a ResolvedJavaMethod
? If so then the resulting NullPointerException
is only going to be the start of your problems.
If instead you used the existing constructor and it gave you an NPE then it looks like something is very wrong with the ResolvedJavaMethod
you passed it.
…mageDebugInfo provider doesn't have to instantiated in LL VMIRBuilder
@rishikeshdevsot PR #5379 which implements a thread safe version of SourceManager is now under review. It would be helpful if you could merge the PR branch into a copy of your branch before I commit it in order to check that it correctly addresses the problem of multi-threaded access. |
@adinn I rebased your PR on the current master and then rebased my current changes on top of that. I compiled and ran
Rebasing my current changes onto the master without your changes is able to generate the native image fine(without any errors) and have the relevant source mappings in the llvm bytecode files |
970cbd4
to
e9bb6cf
Compare
I'm not at all sure how this has happened. The line where the NPE occurs is the call to
That can only mean field
Has your merge changed any of this code? |
@adinn I don't think my code touches any of these parts. I have now pushed all my latest changes merged with the current master to this PR, it is able to build an image and output LLVM bytecode without errors. When I just try your PR without any of my changes, the native image is still built fine and the LLVM bytecode is also output as expected. |
@rishikeshdevsot Well looking into your PR it seems neither my patch to make Sourcemanager thread safe nor your PR modifies these lines. So, I have no idea how the NPE you report can possibly happen with this code as it stands. The above problem represents an issue in the code that models the class base which is almost totally neutral as far as the back end code generator is concerned. However, I am very surprised to see that you are generating DWARF info using the DWARF-specific ELF sections generator code that I wrote for Linux when using the SubstrateVM configured variant of the GraalVM JIT back end when compiling an image using the LLVM back end. Do you have a good reason to assume that the DWARF it generates will match the compiled program layout employed by LLVM? My assumption would be that this would be unlikely without at least some significant modifications. If your PR does make modifications to either the debug info provider code (DebugInfoProvider, NativeImageDebugInfoProvider and their many related sub-interfaces/classes), the modelling code (classes in package |
@rishikeshdevsot I have looked into your code and I think there is a lot more work to do here. The presence of a significant problem is clearly foreshadowed in your header comment note:
There is a good reason why this error is being generated. It is deliberate and reflects the fact that the existing mechanism for generating debug enabled using In order to fix this you need to do two things:
It looks like your code has done some of the work required for step 1 i.e. it labels individual instructions or instruction sequences with a file, line number and method name. The exception trace for the NPE that you printed above shows that you have not properly addressed step 2. The above prescription assumes optimistically that LLVM provides a comprehensive API to drive debug info generation in total. The pessimistic alternative is that that it does so partially. In that case in order to get a good debugging experience you need to hope that the LLVM API operates in a way that allows the info it does support to be supplemented with whatever info the API does not cater for. You will also need to think about how you are going allow for that extra information to be added. At this point, it is worth noting that interface Clearly, you will have to do a lot more work to get full debug information into an LLVM image than simply add file, line and method name tags. If there are further hooks to allow this to be passed to LLVM during generation then you may be able to completely disable the existing debug info generation code. If not then you will have to look into the current implementation, understand in exact detail how it works and make changes to accommodate the steps that LLVM will take while somehow filling in the blanks for the steps it does not take. I may be able help you with that latter path or I may be just be able to identify that it is not possible. Before I can provide that help I'll need you to identify and explain a lot more about what LLVM will do to install debug info in the generated image, what it will not do and how easy it is to extend what it does to supplement the info it generates cater for any omissions. There are several things that are clear from looking at the code and the above exception trace.
Rather than start continuing to implement something along these lines I think you need to investigate what LLVM provides by way of API to support debug info generation, see if it will provide all that is needed and, if not, see whether it can be supplemented by external code. If you report back to me with that I can help you to identify whether any of the existing code can be reused and how best to approach doing that, including what possible refactoring might be appropriate. |
@adinn Thanks a lot for your insightful comments on my code and sorry for the delay in my response. I can start off by looking at what LLVM provides in terms of debug info generation and let you know. |
Hi @adinn, From my understanding of the source code, the output native image generated with or without the llvm backend being enabled is the same. I think the native image generation code is decoupled from the LLVM IR generation code even when the LLVM backend is enabled. The only thing enabling the LLVM backend does is generate LLVM IR using the GraalVM AST during the compilation phase, the generated IR is then compiled and linked into an object file using LLVM's LLD; but this object file is different from the output native image. I don’t think LLVM takes part in native image generation. The debug info being generation code I wrote is also only being included in the LLVM IR, the debug info being written to the native image should be the same as without the llvm backend flag being enabled. So when you say the following
There is no LLVM-generated image per se. The native image generated when the LLVM backend is enabled is the same as when it is disabled. Correct me if I am wrong about the above @loicottet Nonetheless, I have looked into LLVM API for including debug info, and based on my reading of https://llvm.org/docs/SourceLevelDebugging.html I think LLVM provides an API for debug info generation in total. LLVM provides:
I can get started on including this debug information in the LLVM IR output by Graal but I don't think this would affect the output native image. Let me know if I misunderstood something or need to provide more details. |
I am not clear what these two statements are meant to be suggesting. It would help if you could provide a more detailed explanation of what the LLVM back end is doing. Maybe you could start by clarifying what input LLVM consumes. What does 'the GraalVM AST' mean? Which input (program code in some high level language? LLVM bitcode?) is this AST derived from? You go on to say "this object file is different from the output native image". A native image is a self-contained executable (I'm assuming you are not generating a shared library). If the image is really 'no different' when the LLVM back end is used then how does the object code generated by the LLVM back end relate to the executable? Is it a completely separate artifact? Alternatively, do you just mean that the object code that is derived from Java bytecode and generated by the standard back end is generated in the same way and that enabling the LLVM back end serves to handle some other (non-Java) input, generate extra object code which needs to be linked in to produce the final executable? If so then
|
@rishikeshdevsot At the moment the LLVM backend is only replacing the regular backend to emit the code section of the final Native Image executable from Graal's intermediate representation (IR). This is done by transforming this IR into LLVM bitcode and feeding it to the LLVM compiler to generate an object file containing the code of the program, with symbols pointing to the Graal-generated data sections. This LLVM-generated object file is then linked with the object file generated by Graal, which is identical to what it would be when not using the LLVM backend except it has an empty code section and exports symbols to individual objects in its data sections. These symbols are only used during linking, and do not end up in the final image. As a consequence, the final executable looks mostly the same as the executable produced by the standard backend, but the code section contents are different, since they were produced by different compilers. |
@loicottet Thanks for that explanation. @rishikeshdevsot I believe this explains why the debug info currently generated for Linux does not allow work with an image generated using the LLVM back end. There are various places in the debug info generation scheme where the generator relies on information generated by the GraalVM compiler (more precisely, the compiler as configured for use with SubstrateVM) during compilation or, in some cases, makes assumptions about the layout and operation of the code that compiler generates. This info and the associated assumptions are closely tied to specific, key decisions taken by the compiler during processing of the IR and generation of the machine code. Those decisions don't just determine where the code starts and ends or which internal code address maps to which source line. They also determine things like the stack frame layouts, inline hierarchies, and local var, inlined parameter or constant locations. n.b. given what Loic states, I don't believe LLVM will take any different decisions regarding data layouts for Java types (otherwise the layout of non-code sections, such as the initial heap, would be affected). Now, if generation of the code (.text) section is handed over to LLVM then this will have two consequences. Firstly, information needed to generate the debug info may no longer be created and passed to the generator. Secondly, LLVM may choose to lay out the compiled code, stack frames and var locations in ways that do not conform to the assumptions made by the current debug info generator. There are two ways to deal with this. One way would be to 1) ensure LLVM provides comparable info the the current generator and b) modify the generator to revise its fixed assumptions so that it can accommodate the different decisions made when using LLVM. The alternative is, as I suggested, to plug in a different debug info generator that understand how LLVM operates and produces the required debug sections using whatever debug info generation API LLVM provides. Clearly, a major attraction of the first option is that the generator already knows how to produce complete debug information about the relevant Java types. Another less obvious (but, arguably, just as important) attraction is that the generator handles creation and installation of all the different ELF object debug sections needed to store the various different suites of debug information. It provides utility code for writing basic elements of any given section in the expected format (i.e. at the individual record level below that where compiler decisions inform the structure). That said, it may not be easy to get LLVM to provide all the desired information needed by the current generator to produce the non-type related debug info -- especially as we would prefer it to arrive in much the same format as the GraalVM compiler does. Likewise it may not be easy a) to identify and precisely define the assumptions LLVM makes about code layouts, stack frames, line numbering, inlining, local/param/constant placements etc or b) to adapt the generator to accommodate them. The attractiveness of the second option really depends on how much support LLVM provides for debug info generation. Clearly, one downside of using an LLVM API would involve redoing the work that the current generator does to write details of which types exist in the system. How much work that is really depends on how much help the LLVM API provides. Likewise, the nature of the API may determine how much work needs to be done in order to support creation and installation of the ELF debug sections and writing of individual records. Finally, as a middle path, it might, perhaps, be possible to factor out some of the current generator code as one or more libraries and reuse it in a dedicated LLVM generator. I cannot really help you make a judgement as to which path to follow as I don't know any of the details of the LLVM API. If you want to investigate further and try to decide how to proceed I'll be happy to read any reports you can provide of what you think is involved in pursuing either path (or the middle route) and provide what advice I can. |
Hi @adinn, I did some investigation and the way LLVM debug info generation works is as follows:
For example: creating the double type using the DIBuilder API:
where the arguments are name, size in bits, and encoding (DWARF encoding code). This shows up as metadata inside the LLVM IR as:
which further gets compiled to dwarf as
Together steps 1 and 2 will drive debug info generation in total. I believe the approach we should take is to include the debug info metadata in the LLVM IR when the
This approach does have the disadvantage that it would involve re-implementing a lot of the existing debug info generator but I think it is still worth the effort. Let me know if there are any questions about anything I’ve mentioned above. Also, can you point me to where all the debug info type definitions are currently being done for all the Java types? |
I think that sound like the correct approach.
I agree that it is not going to be possible to reuse much of the existing debug info generator code. The bulk of this code is designed to generate complete debug info after compilation has completed based on the specific data currently collected by the compiler as a side-effect of compilation. The LLVM approach appears to require injecting metadata into the LLVM IR as an auxiliary part of the compilation process. However, the current debug info generator code is very tightly designed around consuming the auxiliary (non-IR/non-code) data structures the compiler outputs and transforming them into linker sections organized as DWARF or CodeView debug records. It is hard to see how much of that existing code will be of use when it comes to modifying the compilation process to inject LLVM metadata whose format is different to and independent from the target debug section formats. What that implies is that when using the LLVM back end the current debug info generation step needs to be disabled. This is provided via an internal feature (class After that implementing LLVM debug info generation is going to require by making modifications to Obviously, you will also need to inform LLVM of details of the Java type model. That will very likely need to be done before you can inject debug metadata into the LLVM IR for some compiled method.
I'm not sure exactly what you mean by 'all the Java types' but let me clarify what the current generator caters for. The range of types for which type info is generated is as follows:
The data that describes all these different types is constructed in class Interface I think this API (suite of interfaces) is probably unlikely to suit your needs and even more so the implementation. The API includes methods that you will not need/want to implement and the current implementation can only be created by passing in data that you may well not have available at the point where you need to communicate the type into to LLVM. I don't really know how/where you would be able to notify to LLVM of this type information in order to produce complete LLVM debug metadata, although I assume that you will be able to use the builder API to do so. You may be in a position to to achieve this by iterating over the types in the |
…g the LLVM backend with debug enabled. Requires turning off the assertions for the llvm-link executable
Hi @adinn @loicottet I wanted to update you on my progress and I have pushed the code I have worked on so far. I’ve been able to set subprograms to functions and add line information to instructions inside functions which then gets compiled to DWARF sections by LLVM’s compiler. The debug_line section shows the location information and the debug_info section shows the function subprogram information. I’ve also added support to generate primitive type information by using the NativeImageHeap which is available before LLVM compilation occurs.
by using a helper class. I’ve used the LLVM DIBuilder API to set subprograms, create types and set debug locations. The API includes debug info metadata in the output LLVM IR. This LLVM IR is then compiled using LLVM’s llc compiler which generates the code section and the debug section. There are a couple of things I wanted to bring to your attention/want advice on:
I think the remaining parts are to include all type information, live variable locations and verifying all the debug information is accurate. |
…es in functions for the LLVM backend. Created debug type information by using the NativeImageHeap for the LLVM backend.
Hi @rishikeshdevsot Thanks for pursuing this. I am still busy with other changes to the debug info generator but I will look into your refactoring of the debug info provider helper code as soon as I have time. |
…utput for the llvm backend. Simplified the code for cycle checking when recursively generating types. Ignoring visiting fullPointNodes
…ider of the llvm backend to use the debug info interfaces.
Include unique name generation in the llvm backend as well when debug info is enabled
…eding to disable asserts on the llvm-link binary. The final executable has DWARF information generated using the llvm compiler present.
Hi @loicottet @adinn
There were a few things that I am not sure how to implement or deal with:
P.S. Do you have any tests for NativeImageDebugInfo that I can use to generate and verify the debug info being generated for llvm backend? |
Following up on this in case anything else is needed from me to move the PR forward. |
(1) the change or feature is needed
Graal substratevm has support for an LLVM backend but it is not possible to associate source level location information (e.g. line number, filename) with the output IR. That is what this commit aims to do, adding an option called "IncludeLLVMSourceDebugInfo" which includes source level debug information in the output IR built using the LLVM DI Builder interface . This information in the IR can be helpful for debugging purposes and performing static analysis on the IR.
(2) how it is implemented
GraalVM SubstrateVM currently uses JavaCPP's JNI functions for llvm to build the LLVM IR. I am using the JNI methods of the LLVM DIBuilder class to create location information with corresponding line number, subprogram and filename. This location information is associated to the corresponding LLVM Instruction, function or basic block using the
LLVMSetCurrentDebugLocation2
andLLVMSetInstDebugLocation
functions. The line number, method name and file name information is derived using a new class called LLVMDebugLineInfo which is synonymous to the NativeImageDebugLineInfo class from previous GraalVM releases (21.1.0).The debug location is attached to an IR instruction by calling
buildDebugInfoForInstr()
from inside the setResult() method in NodeLLVMBuilder class.buildDebugInfoForInstr()
is defined inside the LLVMIRBuilder class and uses the methods of the LLVM DIBuilder to generate the source level location information.(3) Notes
I needed to use the
SourceManager
class to obtain the file and directory information associated with the node. OnceSourceManager
was defined, I added to theImageSingletons
map insideemitLLVM()
. But the problem is thatemitLLVM()
seems to be run by multiple threads which would cause multiple additions of theSourceManager
toImageSingletons
. I've currently sidestepped the problem by enclosing the addition toImageSingletons
within a lock but ideally I would like to perform this addition before the multiple threads are launched but I am not sure where exactly that is?When using the
-O0
flag to generate the native-image with an LLVM backend, I get an error saying "the LLVM backend doesn't support debug info generation" called from insidevisitFullInfoPointNode
in NodeLLVMBuilder and I am not sure how to avoid this problem. This problem was not present when I implemented this feature in Graal 21.1.0.I have tested the feature to compile Hadoop to output LLVM IR with location information using GraalVM 21.1.0. But for the latest build, I have only tested it with toy examples.
The flags required to obtain location information are
-g
and-H=+IncludeLLVMSourceDebugInfo
.This PR is hopefully just some startup code so that I can receive feedback and comments to understand how the code needs to structured and any other requirements that might me needed.