Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLVM Source Level debug info support #4964

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

rishikeshdevsot
Copy link

(1) the change or feature is needed

Graal substratevm has support for an LLVM backend but it is not possible to associate source level location information (e.g. line number, filename) with the output IR. That is what this commit aims to do, adding an option called "IncludeLLVMSourceDebugInfo" which includes source level debug information in the output IR built using the LLVM DI Builder interface . This information in the IR can be helpful for debugging purposes and performing static analysis on the IR.

(2) how it is implemented

GraalVM SubstrateVM currently uses JavaCPP's JNI functions for llvm to build the LLVM IR. I am using the JNI methods of the LLVM DIBuilder class to create location information with corresponding line number, subprogram and filename. This location information is associated to the corresponding LLVM Instruction, function or basic block using the LLVMSetCurrentDebugLocation2 and LLVMSetInstDebugLocation functions. The line number, method name and file name information is derived using a new class called LLVMDebugLineInfo which is synonymous to the NativeImageDebugLineInfo class from previous GraalVM releases (21.1.0).

The debug location is attached to an IR instruction by calling buildDebugInfoForInstr() from inside the setResult() method in NodeLLVMBuilder class. buildDebugInfoForInstr() is defined inside the LLVMIRBuilder class and uses the methods of the LLVM DIBuilder to generate the source level location information.

(3) Notes

  1. I needed to use the SourceManager class to obtain the file and directory information associated with the node. Once SourceManager was defined, I added to the ImageSingletons map inside emitLLVM(). But the problem is that emitLLVM() seems to be run by multiple threads which would cause multiple additions of the SourceManager to ImageSingletons. I've currently sidestepped the problem by enclosing the addition to ImageSingletons within a lock but ideally I would like to perform this addition before the multiple threads are launched but I am not sure where exactly that is?

  2. When using the -O0 flag to generate the native-image with an LLVM backend, I get an error saying "the LLVM backend doesn't support debug info generation" called from inside visitFullInfoPointNode in NodeLLVMBuilder and I am not sure how to avoid this problem. This problem was not present when I implemented this feature in Graal 21.1.0.

  3. I have tested the feature to compile Hadoop to output LLVM IR with location information using GraalVM 21.1.0. But for the latest build, I have only tested it with toy examples.

The flags required to obtain location information are -g and -H=+IncludeLLVMSourceDebugInfo.

This PR is hopefully just some startup code so that I can receive feedback and comments to understand how the code needs to structured and any other requirements that might me needed.

…o in the IR. Tested on a sample helloworld program
…o in the IR. Tested on a sample helloworld program and added comments

Checkstyle passes
@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Sep 15, 2022
@rishikeshdevsot rishikeshdevsot marked this pull request as draft September 15, 2022 01:15
@rishikeshdevsot rishikeshdevsot marked this pull request as ready for review September 15, 2022 01:17
@loicottet loicottet self-requested a review September 20, 2022 09:57
Copy link
Member

@loicottet loicottet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like cool changes, thanks a lot! Let me know when you've had the time to look into my comments.

@@ -470,7 +470,9 @@ public void build(String imageName, DebugContext debug) {
if (SubstrateOptions.GenerateDebugInfo.getValue(HostedOptionValues.singleton()) > 0) {
Timer timer = TimerCollection.singleton().get(TimerCollection.Registry.DEBUG_INFO);
try (Timer.StopTimer t = timer.start()) {
ImageSingletons.add(SourceManager.class, new SourceManager());
if (ImageSingletons.contains(SourceManager.class) == false) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will try to add debug info the Native Image way, by generating it. I think what we want is let the LLVM compiler handle this, so I think this if should be omitted altogether when using the LLVM backend.

Copy link
Collaborator

@adinn adinn Oct 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SourceManager singleton really ought to be registered in NativeImageDebugInfoFeature conditional on debuginfo generation being enabled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems to be a cleaner solution indeed.

@@ -49,5 +49,8 @@ public class LLVMOptions {
@Option(help = "Enable LLVM bitcode optimizations")//
public static final HostedOptionKey<Boolean> BitcodeOptimizations = new HostedOptionKey<>(false);

@Option(help = "Include source code level debug info in the output LLVM IR", type = OptionType.Debug)//
public static final HostedOptionKey<Boolean> IncludeLLVMSourceDebugInfo = new HostedOptionKey<>(false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't add a new option for this, LLVM debug info generation should be tied to the existing GenerateDebugInfo option.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree. Enabling debug info generation and selecting the generated target type are (must be) orthogonal choices.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I'll remove this option and just use GenerateDebugInfo

if (LLVMOptions.IncludeLLVMSourceDebugInfo.getValue()) {
//TODO: Avoid this lock if possible
imageSingletonesLock.lock();
if (ImageSingletons.contains(SourceManager.class) == false) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be added in LLVMFeature.beforeCompilation once, and SourceManager should be made thread-safe.

Copy link
Collaborator

@adinn adinn Oct 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, NativeImageDebugInfoFeature is the right place to do this as it is relevant to all debug info generation code not just the LLVM debug info generator.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, SourceManager is definitely not thread safe at the moment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, having all management related to SourceManager in NativeImageDebugInfoFeature makes sense. The handling of debug information on the LLVM backend has to happen in the compilation phase, so making SourceManager can't be avoided, I think. It should be as simple as changing its data structures to thread-safe ones though, so not a big hurdle.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this is a cache manager so it is not quite so simple as just updating the data structures. The important thing to get right is to ensure that the check as to whether a source file is cached in the sources directory and any subsequent lookup and copy from some originating jar into the sources directory happens atomically. A secondary constraint is that a failed lookup must records the lack of any suitable source atomically so that we don't waste time repeating lookups.

Like all concurrent caches there are different choices available to enforce consistency, ranging from the sledgehammer solution of a synchronization on the whole cache at every lookup, through a synchronization on some proxy that exists per cached file, right down to a racy lookup and copy of sources that only synchronizes at the point of linking the copied file into the source tree.

I am happy to review any proposed fix. However, I am also happy to implement a solution, which may be quicker as I am already familiar with the code. @rishikeshdevsot @loicottet please confirm how you would like to proceed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adinn I am okay with you implementing a solution since you are familiar with the problems and the code

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll prepare that in a separate PR and you can rebase this PR on it.

@@ -796,6 +797,12 @@ public Value setResult(ValueNode node, Value operand) {
"value type doesn't match node stamp (" + node.stamp(NodeView.DEFAULT).toString() + ")", llvmOperand.get());

gen.getDebugInfoPrinter().setValueName(llvmOperand, node);
if (LLVMOptions.IncludeLLVMSourceDebugInfo.getValue()) {
if (llvmOperand.get() instanceof LLVMValueRef) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell llvmOperand.get() is always an LLVMValueRef

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I'll remove this check

return null;
}

protected static ResolvedJavaType getJavaType(HostedMethod hostedMethod, boolean wantOriginal) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can get the same result by looking up hostedMethod.getDeclaringClass().getJavaClass(). This will be cleaner than looking into the internals of HostedType

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lookup is performed this way in NativeImageDebugInfoProvider for a good reason. There are specific cases related to the use of substitutions where looking into the internals of HostedType is required in order to give a consistent view of the code base to a debugger.

The problematic situation occurs when we have a class C and substitution class S with method m1 defined by C and m2 substituted by S. We can only have one class in the debug info and we need to ensure m1 and m2 are reported as methods of that class. The current code in NativeImageDebugInfoProvider ensures that the class name is always reported as C and that the methods are reported as C.m1 and C.m2.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note however, that the source file for m1 needs to be C.java while the source file for m2 needs to be S.java.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that makes sense. However it would be cleaner to reuse those lookup methods from NativeImageDebugInfoProvider instead of copying them here.

public void computeFullFilePath() {
ResolvedJavaType declaringClass = method.getDeclaringClass();
Class<?> clazz = null;
if (declaringClass instanceof OriginalClassProvider) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're generating debug info during compilation, method should always be a HostedMerthod. This should allow you to use getJavaClass() directly here to get the class information.

Copy link
Collaborator

@adinn adinn Oct 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true of of all top level compiled methods. It does not turn out to be true of all inlined methods that are referenced in frame data derived from the CompilationResult. These inline methods and their classes also need to be associated with a source file.

Note also that in this case the source file for a substituted method S.m1 needs to be the file for class S, not for the class C that it substitutes. That is why the lookup functions above take a flag describing whether or not the original is wanted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, I didn't consider inlined methods. As a general question related to that file, how is it different for LLVM? Couldn't we just reuse the one from the normal backend?

@@ -78,6 +98,11 @@ public LLVMIRBuilder(LLVMIRBuilder primary) {
this.context = primary.context;
this.builder = LLVM.LLVMCreateBuilderInContext(context);
this.module = primary.module;
if (LLVMOptions.IncludeLLVMSourceDebugInfo.getValue()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constructor is only used to create tiny helper methods that represent a single instruction in a main program method (e.g. a compare-and-swap between two objects). Does it make sense to emit debug info here as well? There wouldn't be anything more specific than the debug info attached to the call to this helper method in the main (primary) builder.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I removed debug info emission for this constructor


// Maps filenames to a compile unit
public HashMap<String, LLVMMetadataRef> diFilenameToCU;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think LLVM automatically merges identical metadata nodes when it encounters them, if that's the case we can probably get rid of those caches. Also, shouldn't all the debug information from a function have the same file name?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the answer is to that last question. However, I'd just note that a compiled method can include code from inlined methods which belong to a different source file.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding, LLVM automatically merges identical nodes during linking. ("...duplicate information is automatically merged by the linker" https://llvm.org/docs/SourceLevelDebugging.html) But that means the individual bytecode files corresponding to the functions(e.g. fXX.bc) will have duplicate compile units and subprograms without the caches based on how the code is written. So the user will have to deal with duplicate debug info if they wish to perform static analysis on the function level bytecode files. Atleast, for my use case, I would like to perform static analysis on individual bytecode files which is why I added these caches, when I saw that there were multiple compile units present in the fXX.bc files.

If you think that is not a general use case, then I can remove the caches.

import static org.graalvm.compiler.debug.GraalError.unimplemented;

// Similar to NativeImageDebugLineInfo from previous releases
public class LLVMDebugLineInfo {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NativeImageDebugLineInfo has been replaced with NativeImageDebugLocationInfo. You should base your changes on the newer version to be able to merge the changes to master.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is it necessary to reimplement this class? Can't you just reuse it as is?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a great number of other changes that have happened along with that change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I would say to integrate into the current version of GraalVM we should base LLVM debug info emission on the current method used by the standard backend.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay.

I pushed some changes where I attempted to use the standard NativeImageDebugInfoProvider backend for the location information. I had to make the class public and add a new subclass NativeImageDebugLLVMLocationInfo.

I had one question regarding NativeImageBaseMethodInfo's constructor. When I tried using the default constructor, I got a NullPointerException from isPseudoObjectType() called from createParamInfo() inside the constructor. I am not sure what these functions do, so I don't know how to fix the error for the LLVM backend. I was hoping if someone could let me know what they do. Currently, in the pushed commits, I have a workaround where I created another constructor inside NativeImageBaseMethodInfo that does not make these calls.

Let me know if this is similar to what you were expecting.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rishikeshdevsot Method isPseudoObjectType checks whether a type presented as a Java object type is derived from class WordBase. ThaIf so that means the Object type used as a proxy for a foreign numeric or pointer value. As far as the type info model is concerned the debugger still needs to present these types as object types. However, as far as the compiled code is concerned they get handled as 64 bit words. That only has influences debug info generation for local vars and values. A parameter or local var of this type will use the object type as its type but will use long as its JavaKind. A constant value for these types will use a long not an oop. This is important when processing frame data that identifies local values and when checking that repeat occurences of vars and values actually alias each other.

I'm not sure what you mean when you say "When I tried using the default constructor" for NativeImageBaseMethodInfo. Do you mean you added a constructor which does not receive a ResolvedJavaMethod? If so then the resulting NullPointerException is only going to be the start of your problems.

If instead you used the existing constructor and it gave you an NPE then it looks like something is very wrong with the ResolvedJavaMethod you passed it.

@adinn
Copy link
Collaborator

adinn commented Nov 9, 2022

@rishikeshdevsot PR #5379 which implements a thread safe version of SourceManager is now under review. It would be helpful if you could merge the PR branch into a copy of your branch before I commit it in order to check that it correctly addresses the problem of multi-threaded access.

@rishikeshdevsot
Copy link
Author

rishikeshdevsot commented Nov 11, 2022

@adinn I rebased your PR on the current master and then rebased my current changes on top of that. I compiled and ran mx native-image with an LLVM Backend for a simple Helloworld java program. It gave me the following error with image generation failed:

Fatal error: com.oracle.svm.core.util.VMError$HostedError: java.lang.NullPointerException
	at org.graalvm.nativeimage.builder/com.oracle.svm.core.util.VMError.shouldNotReachHere(VMError.java:72)
	at org.graalvm.nativeimage.builder/com.oracle.svm.hosted.image.NativeImage.write(NativeImage.java:175)
	at org.graalvm.nativeimage.builder/com.oracle.svm.hosted.image.NativeImageViaCC.write(NativeImageViaCC.java:97)
	at org.graalvm.nativeimage.builder/com.oracle.svm.hosted.NativeImageGenerator.doRun(NativeImageGenerator.java:719)
	at org.graalvm.nativeimage.builder/com.oracle.svm.hosted.NativeImageGenerator.run(NativeImageGenerator.java:542)
	at org.graalvm.nativeimage.builder/com.oracle.svm.hosted.NativeImageGeneratorRunner.buildImage(NativeImageGeneratorRunner.java:403)
	at org.graalvm.nativeimage.builder/com.oracle.svm.hosted.NativeImageGeneratorRunner.build(NativeImageGeneratorRunner.java:580)
	at org.graalvm.nativeimage.builder/com.oracle.svm.hosted.NativeImageGeneratorRunner.main(NativeImageGeneratorRunner.java:128)
Caused by: java.lang.NullPointerException
	at org.graalvm.nativeimage.objectfile/com.oracle.objectfile.debugentry.ClassEntry.localFilesIdx(ClassEntry.java:235)
	at org.graalvm.nativeimage.objectfile/com.oracle.objectfile.debugentry.Range.getFileIndex(Range.java:284)
	at org.graalvm.nativeimage.objectfile/com.oracle.objectfile.elf.dwarf.DwarfInfoSectionImpl.writeInlineSubroutine(DwarfInfoSectionImpl.java:1565)
	at org.graalvm.nativeimage.objectfile/com.oracle.objectfile.elf.dwarf.DwarfInfoSectionImpl.generateConcreteInlinedMethods(DwarfInfoSectionImpl.java:1047)
	at org.graalvm.nativeimage.objectfile/com.oracle.objectfile.elf.dwarf.DwarfInfoSectionImpl.writeMethodLocation(DwarfInfoSectionImpl.java:1408)
	at org.graalvm.nativeimage.objectfile/com.oracle.objectfile.elf.dwarf.DwarfInfoSectionImpl.lambda$writeMethodLocations$11(DwarfInfoSectionImpl.java:1017)
	at java.base/java.util.stream.ReduceOps$1ReducingSink.accept(ReduceOps.java:80)
	at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
	at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.ReferencePipeline.reduce(ReferencePipeline.java:563)
	at org.graalvm.nativeimage.objectfile/com.oracle.objectfile.elf.dwarf.DwarfInfoSectionImpl.writeMethodLocations(DwarfInfoSectionImpl.java:1016)
	at org.graalvm.nativeimage.objectfile/com.oracle.objectfile.elf.dwarf.DwarfInfoSectionImpl.writeCompiledClassUnit(DwarfInfoSectionImpl.java:473)
	at org.graalvm.nativeimage.objectfile/com.oracle.objectfile.elf.dwarf.DwarfInfoSectionImpl.lambda$writeCompiledClasses$6(DwarfInfoSectionImpl.java:412)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
	at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
	at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
	at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
	at org.graalvm.nativeimage.objectfile/com.oracle.objectfile.elf.dwarf.DwarfInfoSectionImpl.writeCompiledClasses(DwarfInfoSectionImpl.java:411)
	at org.graalvm.nativeimage.objectfile/com.oracle.objectfile.elf.dwarf.DwarfInfoSectionImpl.generateContent(DwarfInfoSectionImpl.java:157)
	at org.graalvm.nativeimage.objectfile/com.oracle.objectfile.elf.dwarf.DwarfInfoSectionImpl.createContent(DwarfInfoSectionImpl.java:90)
	at org.graalvm.nativeimage.objectfile/com.oracle.objectfile.elf.dwarf.DwarfSectionImpl.getOrDecideSize(DwarfSectionImpl.java:608)
	at org.graalvm.nativeimage.objectfile/com.oracle.objectfile.elf.ELFUserDefinedSection.getOrDecideSize(ELFUserDefinedSection.java:105)
	at org.graalvm.nativeimage.objectfile/com.oracle.objectfile.ObjectFile.bake(ObjectFile.java:1625)
	at org.graalvm.nativeimage.objectfile/com.oracle.objectfile.ObjectFile.write(ObjectFile.java:1269)
	at org.graalvm.nativeimage.builder/com.oracle.svm.hosted.image.NativeImage.lambda$write$0(NativeImage.java:171)
	at org.graalvm.nativeimage.objectfile/com.oracle.objectfile.ObjectFile.withDebugContext(ObjectFile.java:1801)
	at org.graalvm.nativeimage.builder/com.oracle.svm.hosted.image.NativeImage.write(NativeImage.java:170

Rebasing my current changes onto the master without your changes is able to generate the native image fine(without any errors) and have the relevant source mappings in the llvm bytecode files

@adinn
Copy link
Collaborator

adinn commented Nov 11, 2022

@rishikeshdevsot

I'm not at all sure how this has happened. The line where the NPE occurs is the call to get in this method of ClassEntry:

public int localFilesIdx(@SuppressWarnings("hiding") FileEntry fileEntry) {
    return localFilesIndex.get(fileEntry);
}

That can only mean field localFilesIndex must be null. However, the field is always written in the constructor of ClassEntry:

public ClassEntry(String className, FileEntry fileEntry, int size) {
    super(className, size);
    this.interfaces = new ArrayList<>();
    . . .
    this.localFilesIndex = new HashMap<>();
    this.localDirs = new ArrayList<>();
    . . .

Has your merge changed any of this code?

@rishikeshdevsot
Copy link
Author

@adinn I don't think my code touches any of these parts. I have now pushed all my latest changes merged with the current master to this PR, it is able to build an image and output LLVM bytecode without errors. When I just try your PR without any of my changes, the native image is still built fine and the LLVM bytecode is also output as expected.
So the cause should be in someway your changes are interacting with my changes. I can try replaying my changes rebased on top of your PR one at a time to see which one leads to the error.

@adinn
Copy link
Collaborator

adinn commented Nov 12, 2022

@rishikeshdevsot Well looking into your PR it seems neither my patch to make Sourcemanager thread safe nor your PR modifies these lines. So, I have no idea how the NPE you report can possibly happen with this code as it stands.

The above problem represents an issue in the code that models the class base which is almost totally neutral as far as the back end code generator is concerned. However, I am very surprised to see that you are generating DWARF info using the DWARF-specific ELF sections generator code that I wrote for Linux when using the SubstrateVM configured variant of the GraalVM JIT back end when compiling an image using the LLVM back end.

Do you have a good reason to assume that the DWARF it generates will match the compiled program layout employed by LLVM? My assumption would be that this would be unlikely without at least some significant modifications. If your PR does make modifications to either the debug info provider code (DebugInfoProvider, NativeImageDebugInfoProvider and their many related sub-interfaces/classes), the modelling code (classes in package com.oracle.objectfile.debugentry) or the ELF section generator code (in package com.oracle.objectfile.elf.dwarf) could you perhaps summarize them so I can have some idea where to look for the source of this strange error?

@adinn
Copy link
Collaborator

adinn commented Nov 17, 2022

@rishikeshdevsot I have looked into your code and I think there is a lot more work to do here. The presence of a significant problem is clearly foreshadowed in your header comment note:

When using the -O0 flag to generate the native-image with an LLVM backend, I get an error saying "the LLVM backend doesn't support debug info generation" called from inside visitFullInfoPointNode in NodeLLVMBuilder and I am not sure how to avoid this problem. This problem was not present when I implemented this feature in Graal 21.1.0.

There is a good reason why this error is being generated. It is deliberate and reflects the fact that the existing mechanism for generating debug enabled using -g is not compatible with an LLVM generated image.

In order to fix this you need to do two things:

  1. Make the LLVM compiler generate debug information that identifies: types and their methods and fields; file and line number info; information about parameter and local var names & types and their live value locations; etc which it will pass on to the ELF image writer for inclusion in the image.
  2. Disable or modify the code that is currently used to generate debug information and pass it to the ELF image writer for inclusion in the image.

It looks like your code has done some of the work required for step 1 i.e. it labels individual instructions or instruction sequences with a file, line number and method name. The exception trace for the NPE that you printed above shows that you have not properly addressed step 2.

The above prescription assumes optimistically that LLVM provides a comprehensive API to drive debug info generation in total. The pessimistic alternative is that that it does so partially. In that case in order to get a good debugging experience you need to hope that the LLVM API operates in a way that allows the info it does support to be supplemented with whatever info the API does not cater for. You will also need to think about how you are going allow for that extra information to be added.

At this point, it is worth noting that interface DebugInfoProvider is the internal API that NativeImage provides for generating debug info in total and NativeImageDebugInfoProvider is the class that implements that API in total. So, if LLVM only provides partial support for generating debug info then you may be able to resue some of this existing implementation. However you may also need to do a lot of work to decouple functionality that fill in the gaps from DebugInfoProvider Depending on what LLVM does provide by way of API, this decoupling and reuse may or may not be possible. I am currently not sure how -- or even whether -- you ought to be using interface DebugInfoProvider at all.

Clearly, you will have to do a lot more work to get full debug information into an LLVM image than simply add file, line and method name tags. If there are further hooks to allow this to be passed to LLVM during generation then you may be able to completely disable the existing debug info generation code. If not then you will have to look into the current implementation, understand in exact detail how it works and make changes to accommodate the steps that LLVM will take while somehow filling in the blanks for the steps it does not take. I may be able help you with that latter path or I may be just be able to identify that it is not possible. Before I can provide that help I'll need you to identify and explain a lot more about what LLVM will do to install debug info in the generated image, what it will not do and how easy it is to extend what it does to supplement the info it generates cater for any omissions.

There are several things that are clear from looking at the code and the above exception trace.

  1. Your attempt to reuse class NativeImageDebugLocationInfo as a super of class NativeImageDebugLLVMLocationInfo does not work. It is not simply that this reuse is resulting in the NPE. It is also a bad design that is confusing and likely to be difficult to maintain. You have been forced to add a spurious constructor to NativeImageDebugInfoProvider solely in order to be able to create a throwaway instance which you can then use to instantiate a NativeImageDebugLLVMLocationInfo instance. The only point of all this is to reuse a small amount of functionality from NativeImageDebugLocationInfo, NativeImageDebugBaseMethodInfo and NativeImageDebugFileInfo.
    This is a problem because it is incidental to your use of these types -- and hence to your type -- that they actually serve to implement the corresponding DebugInfoProvider interfaces and thereby drive the existing debug info generator implementations. That latter fact implies that changes to the DebugInfoProvider interface or implementation which arise during maintenance will likely cause unnecessary and avoidable problems for your implementation of NativeImageDebugLLVMLocationInfo.
    I think it would be much better for you to factor out the necessary functionality into some auxiliary (static) methods of NativeImageDebugInfoProvider which your own class can depend. That avoids your code having to implement the DebugInfoProvider interfaces, subclass the associated NativeImageDebugInfoProvider inner classes and indirectly access the behaviour through NativeImageDebugInfoProvider, avoiding a lot of unnecessary complexity and confusion. Alternatively, you might want to introduce a separate utility class to house this functionality which your class and NativeImageDebugInfoProvider can both delegate to.
  2. If you want debug info generation to work when using the LLVM back end then you need to ensure that the current generation steps that happen when -g is passed either do not happen or are modified to avoid generating info that is inappropriate.
    Currently debug info generation occurs in two stages, one before image write and the next during image write. The point where you need to intercept the current implementation is the beforeImageWrite callback NativeImageDebugInfoFeatur where stage 1 is initiated.
    At this point the Provider API is used by the object file code to build a generic model of the debug info content that needs to be included in the image. The object file also creates ELF sections into which the debug info can be written. This is done by creating a instance of NativeImageDebugInfoProvider and passing it to ObjectFile::installDebugInfo.
    If you are relying on LLVM to generate all the debug info then you simply need to bypass these two steps conditional on LLVM being in use. If you can only get LLVM to generate some of the section content and want to try to reuse the existing code to do the rest then you will need to leave this code in place. You may well need to leave it up to LLVM to create some of the target sections and you will likely need to modify the code that builds the model to incorporate information about the debug inof elements that LLVM is managing (e.g., with your current code, files and line numbers).
    The second stage of debug info generation happens when the ELF file is being written. The sections created at stage 1 are requested to provide a byte array that contains the section content. Each ELF debug section implementation class iterates over the generic model created at stage 1 and encodes the relevant info into a byte array, whether that is file and line info, string info, type info or whatever. If LLVM generates everything then you don't need to create any of these sections which means the request to encode and return the content will not occur. However, if you need to retain just some of these sections -- because LLVM does not generate this sort of content -- then you will need to modify the methods which encode these retained sections so that the info they include is compatible with/cross-references the info LLVM generates.

Rather than start continuing to implement something along these lines I think you need to investigate what LLVM provides by way of API to support debug info generation, see if it will provide all that is needed and, if not, see whether it can be supplemented by external code. If you report back to me with that I can help you to identify whether any of the existing code can be reused and how best to approach doing that, including what possible refactoring might be appropriate.

@rishikeshdevsot
Copy link
Author

@adinn Thanks a lot for your insightful comments on my code and sorry for the delay in my response. I can start off by looking at what LLVM provides in terms of debug info generation and let you know.

@rishikeshdevsot
Copy link
Author

Hi @adinn,

From my understanding of the source code, the output native image generated with or without the llvm backend being enabled is the same. I think the native image generation code is decoupled from the LLVM IR generation code even when the LLVM backend is enabled. The only thing enabling the LLVM backend does is generate LLVM IR using the GraalVM AST during the compilation phase, the generated IR is then compiled and linked into an object file using LLVM's LLD; but this object file is different from the output native image. I don’t think LLVM takes part in native image generation. The debug info being generation code I wrote is also only being included in the LLVM IR, the debug info being written to the native image should be the same as without the llvm backend flag being enabled. So when you say the following

There is a good reason why this error is being generated. It is deliberate and reflects the fact that the existing mechanism for generating debug enabled using -g is not compatible with an LLVM-generated image.

There is no LLVM-generated image per se. The native image generated when the LLVM backend is enabled is the same as when it is disabled.

Correct me if I am wrong about the above @loicottet

Nonetheless, I have looked into LLVM API for including debug info, and based on my reading of https://llvm.org/docs/SourceLevelDebugging.html I think LLVM provides an API for debug info generation in total. LLVM provides:

  1. A set of intrinsic functions to declare and track source local variables.
  2. API calls to set line number, file name, and scope information
  3. API calls to declare types, methods, and method parameters
  4. API calls to declare global variables
    etc

I can get started on including this debug information in the LLVM IR output by Graal but I don't think this would affect the output native image.

Let me know if I misunderstood something or need to provide more details.

@adinn
Copy link
Collaborator

adinn commented Jan 12, 2023

Hi @rishikeshdevsot

The only thing enabling the LLVM backend does is generate LLVM IR using the GraalVM AST during the compilation phase, the generated IR is then compiled and linked into an object file using LLVM's LLD; but this object file is different from the output native image.

The native image generated when the LLVM backend is enabled is the same as when it is disabled.

I am not clear what these two statements are meant to be suggesting. It would help if you could provide a more detailed explanation of what the LLVM back end is doing.

Maybe you could start by clarifying what input LLVM consumes. What does 'the GraalVM AST' mean? Which input (program code in some high level language? LLVM bitcode?) is this AST derived from?

You go on to say "this object file is different from the output native image". A native image is a self-contained executable (I'm assuming you are not generating a shared library). If the image is really 'no different' when the LLVM back end is used then how does the object code generated by the LLVM back end relate to the executable? Is it a completely separate artifact?

Alternatively, do you just mean that the object code that is derived from Java bytecode and generated by the standard back end is generated in the same way and that enabling the LLVM back end serves to handle some other (non-Java) input, generate extra object code which needs to be linked in to produce the final executable? If so then

  • what is this extra input?
  • what are the symbols this extra object exports?
  • how do those symbols relate to some input to the native image front end?
  • how are those symbols connected to the compiled Java code output by the standard back end?

@loicottet
Copy link
Member

@rishikeshdevsot At the moment the LLVM backend is only replacing the regular backend to emit the code section of the final Native Image executable from Graal's intermediate representation (IR). This is done by transforming this IR into LLVM bitcode and feeding it to the LLVM compiler to generate an object file containing the code of the program, with symbols pointing to the Graal-generated data sections.

This LLVM-generated object file is then linked with the object file generated by Graal, which is identical to what it would be when not using the LLVM backend except it has an empty code section and exports symbols to individual objects in its data sections. These symbols are only used during linking, and do not end up in the final image.

As a consequence, the final executable looks mostly the same as the executable produced by the standard backend, but the code section contents are different, since they were produced by different compilers.

@adinn
Copy link
Collaborator

adinn commented Jan 16, 2023

@loicottet Thanks for that explanation.

@rishikeshdevsot I believe this explains why the debug info currently generated for Linux does not allow work with an image generated using the LLVM back end.

There are various places in the debug info generation scheme where the generator relies on information generated by the GraalVM compiler (more precisely, the compiler as configured for use with SubstrateVM) during compilation or, in some cases, makes assumptions about the layout and operation of the code that compiler generates. This info and the associated assumptions are closely tied to specific, key decisions taken by the compiler during processing of the IR and generation of the machine code. Those decisions don't just determine where the code starts and ends or which internal code address maps to which source line. They also determine things like the stack frame layouts, inline hierarchies, and local var, inlined parameter or constant locations. n.b. given what Loic states, I don't believe LLVM will take any different decisions regarding data layouts for Java types (otherwise the layout of non-code sections, such as the initial heap, would be affected).

Now, if generation of the code (.text) section is handed over to LLVM then this will have two consequences. Firstly, information needed to generate the debug info may no longer be created and passed to the generator. Secondly, LLVM may choose to lay out the compiled code, stack frames and var locations in ways that do not conform to the assumptions made by the current debug info generator.

There are two ways to deal with this. One way would be to 1) ensure LLVM provides comparable info the the current generator and b) modify the generator to revise its fixed assumptions so that it can accommodate the different decisions made when using LLVM.

The alternative is, as I suggested, to plug in a different debug info generator that understand how LLVM operates and produces the required debug sections using whatever debug info generation API LLVM provides.

Clearly, a major attraction of the first option is that the generator already knows how to produce complete debug information about the relevant Java types. Another less obvious (but, arguably, just as important) attraction is that the generator handles creation and installation of all the different ELF object debug sections needed to store the various different suites of debug information. It provides utility code for writing basic elements of any given section in the expected format (i.e. at the individual record level below that where compiler decisions inform the structure).

That said, it may not be easy to get LLVM to provide all the desired information needed by the current generator to produce the non-type related debug info -- especially as we would prefer it to arrive in much the same format as the GraalVM compiler does. Likewise it may not be easy a) to identify and precisely define the assumptions LLVM makes about code layouts, stack frames, line numbering, inlining, local/param/constant placements etc or b) to adapt the generator to accommodate them.

The attractiveness of the second option really depends on how much support LLVM provides for debug info generation. Clearly, one downside of using an LLVM API would involve redoing the work that the current generator does to write details of which types exist in the system. How much work that is really depends on how much help the LLVM API provides. Likewise, the nature of the API may determine how much work needs to be done in order to support creation and installation of the ELF debug sections and writing of individual records.

Finally, as a middle path, it might, perhaps, be possible to factor out some of the current generator code as one or more libraries and reuse it in a dedicated LLVM generator.

I cannot really help you make a judgement as to which path to follow as I don't know any of the details of the LLVM API. If you want to investigate further and try to decide how to proceed I'll be happy to read any reports you can provide of what you think is involved in pursuing either path (or the middle route) and provide what advice I can.

@rishikeshdevsot
Copy link
Author

rishikeshdevsot commented Jan 31, 2023

Hi @adinn,

I did some investigation and the way LLVM debug info generation works is as follows:

  1. The LLVM [DIBuilder API] is used to include debug information in the LLVM IR in the form of LLVM IR metadata and [intrinsic functions]. DIBuilder can be used to include type definitions, function type declarations, source mappings as metadata in the LLVM IR. Local variable declaration, live value locations are included as intrinsic functions in the LLVM IR.
  2. The same llvm compiler (llc) which is currently used to generate the code section of the ELF executable by taking LLVM IR as input, can also be used to perform DWARF emission and generate the debug section when the input LLVM IR has the debug info metadata from step 1.

For example: creating the double type using the DIBuilder API:

DIType DblTy = DBuilder->createBasicType("double", 64, dwarf::DW_ATE_FLOAT)

where the arguments are name, size in bits, and encoding (DWARF encoding code). This shows up as metadata inside the LLVM IR as:

!3 = {.} ; [DW_TAG_ base_type ] [double] [line 0, size 64, align 64, offset O, enc DW_ATE_float]

which further gets compiled to dwarf as

0X00000097: TAG_base_type [5] AT name( "double" ) AT_encoding( DW_ATE_float AT byte size 0x08 )

Together steps 1 and 2 will drive debug info generation in total.

I believe the approach we should take is to include the debug info metadata in the LLVM IR when the -g flag is enabled and let the LLVM compiler handle the debug info generation. This has the following advantages:

  1. The debug information metadata included in the LLVM IR is designed to be target agnostic and hence supports multiple debug information formats(DWARF/Microsoft Code view)
  2. “LLVM debug information always provides information to accurately read the source-level state of the program, regardless of which LLVM optimizations have been run”…” LLVM debug information is automatically optimized along with the rest of the program, using existing facilities. For example, duplicate information is automatically merged by the linker, and unused information is automatically removed.” (https://llvm.org/docs/SourceLevelDebugging.html#ccxx-frontend). So if we were to take a different approach and use a separate debug info generator instead of using llc then we would not be able to perform any LLVM IR optimizations(for example inlining, basic block reordering/merging/cleanup, etc) when -g is enabled. Or we would have to make the debug info generator aware of any LLVM optimizations being performed.
  3. It allows performing static analysis on the LLVM IR which requires mappings to the source code. This is my current use case, I require a way to be able to perform static analysis in the LLVM IR and then be able to map certain LLVM IR instructions back to the Java source which I can do using the debug info metadata in the LLVM IR.

This approach does have the disadvantage that it would involve re-implementing a lot of the existing debug info generator but I think it is still worth the effort.

Let me know if there are any questions about anything I’ve mentioned above.

Also, can you point me to where all the debug info type definitions are currently being done for all the Java types?

@adinn
Copy link
Collaborator

adinn commented Feb 1, 2023

I believe the approach we should take is to include the debug info metadata in the LLVM IR when the -g flag is enabled and let the LLVM compiler handle the debug info generation

I think that sound like the correct approach.

This approach does have the disadvantage that it would involve re-implementing a lot of the existing debug info generator but I think it is still worth the effort.

I agree that it is not going to be possible to reuse much of the existing debug info generator code. The bulk of this code is designed to generate complete debug info after compilation has completed based on the specific data currently collected by the compiler as a side-effect of compilation. The LLVM approach appears to require injecting metadata into the LLVM IR as an auxiliary part of the compilation process. However, the current debug info generator code is very tightly designed around consuming the auxiliary (non-IR/non-code) data structures the compiler outputs and transforming them into linker sections organized as DWARF or CodeView debug records. It is hard to see how much of that existing code will be of use when it comes to modifying the compilation process to inject LLVM metadata whose format is different to and independent from the target debug section formats.

What that implies is that when using the LLVM back end the current debug info generation step needs to be disabled. This is provided via an internal feature (class NativeImageDebugInfoFeature) so switching it off merely requires a tweak to method isInConfiguration.

After that implementing LLVM debug info generation is going to require by making modifications to SubstrateLLVMBackend so it injects debug metadata into the generated LLVM IR. If the information stored in a CompilationResult is not enough to produce all the debug metadata you need then you may well also have to modify some of the classes in package com.oracle.svm.core.graal.llvm to ensure it is collected during compilation. That could well involve overriding methods in class Backend. I don't know if it will also necessitate making changes in classes that belong to package org.graalvm.compiler but I hope that will not be not necessary.

Obviously, you will also need to inform LLVM of details of the Java type model. That will very likely need to be done before you can inject debug metadata into the LLVM IR for some compiled method.

Also, can you point me to where all the debug info type definitions are currently being done for all the Java types?

I'm not sure exactly what you mean by 'all the Java types' but let me clarify what the current generator caters for. The range of types for which type info is generated is as follows:

  1. Java primitive types
  2. Special (non-Java )object header type (named _objhdr)
  3. Java instance types
  4. Java interface types
  5. Java array types

The data that describes all these different types is constructed in class NativeImageDebugInfoProvider which is an implementation of interface DebugInfoProvider. The main API method that retrieves this data is DebugInfoProvider#typeInfoProvider(). Most of the type records retrieved via this API method are derived from data attached to the heap (NativeImageHeap) and the code cache (NativeImageCodeCache). Some of it (primitive type/header) is synthesized from scratch.

Interface DebugInfoProvider allows the generator to retrieve this type information plus also info about compiled methods (DebugInfoProvider#codeInfoProvider()) and heap data (DebugInfoProvider#dataInfoProvider()) which are all used together to encode complete DWARF or CodeView records. This main interface defines a host of nested interfaces, such as DebugTypeInfo, DebugPrimitiveTypeInfo, DebugHeaderTypeInfo etc, which are implemented by NativeImageDebugInfoProvider using correspondingly named implementation types, NativeImageDebugPrimitiveTypeInfo, NativeImageDebugHeaderTypeInfo, etc. These nested interfaces declare further methods which expose related content like super info, fields, methods, local var info, etc.

I think this API (suite of interfaces) is probably unlikely to suit your needs and even more so the implementation. The API includes methods that you will not need/want to implement and the current implementation can only be created by passing in data that you may well not have available at the point where you need to communicate the type into to LLVM.

I don't really know how/where you would be able to notify to LLVM of this type information in order to produce complete LLVM debug metadata, although I assume that you will be able to use the builder API to do so. You may be in a position to to achieve this by iterating over the types in the NativeImageHeap as is done in NativeImageDebugInfoProvider. Alternatively, you may need to reach further back into the structures generated by the points to analysis. I say that because I suspect you will probably need to identify what types will be used in the image before you can inject info into the LLVM IR. That's for the obvious reason that the metadata injected into the LLVM IR will need to refer to types (e.g. to define method owner, param, return and local var types). That might require you to notify type info before a
NativeImageHeap has been created or is available to your code.

…g the LLVM backend with debug enabled. Requires turning off the assertions for the llvm-link executable
@rishikeshdevsot
Copy link
Author

Hi @adinn @loicottet

I wanted to update you on my progress and I have pushed the code I have worked on so far. I’ve been able to set subprograms to functions and add line information to instructions inside functions which then gets compiled to DWARF sections by LLVM’s compiler. The debug_line section shows the location information and the debug_info section shows the function subprogram information. I’ve also added support to generate primitive type information by using the NativeImageHeap which is available before LLVM compilation occurs.
I have refactored the code to avoid the problem adinn highlighted

Your attempt to reuse class NativeImageDebugLocationInfo as a super of class NativeImageDebugLLVMLocationInfo does not work.

by using a helper class.

I’ve used the LLVM DIBuilder API to set subprograms, create types and set debug locations. The API includes debug info metadata in the output LLVM IR. This LLVM IR is then compiled using LLVM’s llc compiler which generates the code section and the debug section.

There are a couple of things I wanted to bring to your attention/want advice on:

  1. The llvm-link executable used in the LLVMNativeImageCodeCache.java to link all the individual llvm bytecode files into a single bytecode file seems to be of a Release+Asserts build. Currently, there is a problem I am facing with that, the llvm-link has an assertion that ensures that call instructions have debug location when there are present in functions with subprograms assigned to them. But there are several call instructions (mostly from buildStatepointCall) whose nodes don’t have an associated NodeSourcePosition so I cannot get their surrounding functions or their source location. Which means I will not be able to assign these call instructions debug locations. But if the function enclosing this call instruction has a subprogram set to it then llvm-link assertion inlinable function call in a function with debug info must have a !dbg location gets triggered. When I disable this assertion and use llvm-link and llc (LLVM’s compiler that generates the code section and the debug section), the compilation is successful and DWARF information is present in the output object file.
  2. I am thinking we should switch from using the llvm-link’s Release+Asserts build executable to just the Release executable. This problem was also brought up in the official llvm-project repository and there was a comment about turning this assertion into a warning [ICP] Verifier failure: inlinable function call in a function with debug info must have a !dbg location llvm/llvm-project#57727 (comment)
  3. I have also tried setting the no-inline attribute to the callsites but that did not help with the problem.
  4. Another thing I tried was just to provide a placeholder function name and file name to these call instruction but that triggers another assertion which checks whether information of the surrounding function’s subprogram matches with the debug information in the call instruction.
  5. Let me know how I can approach this problem.

I think the remaining parts are to include all type information, live variable locations and verifying all the debug information is accurate.

…es in functions for the LLVM backend. Created debug type information by using the NativeImageHeap for the LLVM backend.
@adinn
Copy link
Collaborator

adinn commented Apr 19, 2023

Hi @rishikeshdevsot Thanks for pursuing this. I am still busy with other changes to the debug info generator but I will look into your refactoring of the debug info provider helper code as soon as I have time.

…utput for the llvm backend.

Simplified the code for cycle checking when recursively generating types.
Ignoring visiting fullPointNodes
…ider of the llvm backend to use the debug info interfaces.
Include unique name generation in the llvm backend as well when debug info is enabled
…eding to disable asserts on the llvm-link binary.

The final executable has DWARF information generated using the llvm compiler present.
@rishikeshdevsot
Copy link
Author

Hi @loicottet @adinn
I believe I have completed almost everything required for this PR. Debug information metadata for types, locations, variables, subprograms, and compile units is included in the LLVM IR, and the LLVM compiler compiles the metadata to DWARF in the final executable. Running dwarfdump on the final executable shows the debug information generated by LLVM. Since my last update, I have made the following changes:

  1. Added a workaround that allows the LLVM linker to pass without disabling asserts (continuing from my previous post): In LLVMGenerator::buildStatepointCall and LLVMGenerator::buildStatepointInvoke, I included generating placeholder debug information (method is the main method and line number is 0) using the LLVMIRBuilder::setPlaceholderInformation function. I have also added generating placeholder information when the node source position is null for a ValueNode for which debug information is being added.
  2. Generated debug information for function parameters and local variables: For function parameters, I created alloca instructions for each parameter as specified by the LLVM documentation and added declarations for the same. The variable declarations are at the end of each block. So when generating instructions, the variable information is obtained using the getLocalsBySlot method and stored in a linked list called localVarListPerBlock. When NodeLLVMBuilder::doBlock reaches the end of the block, the declarations for all the variables are added.
  3. Generated type information: Types are generated recursively as new type names are encountered when adding LLVM IR instructions or when creating function parameters. Whenever a type name is encountered, the LLVMIRBuilder::getDiType function is called on that type name. If debug information for that type has already been generated, then that is returned. Otherwise, the type is looked up in the NativeImageHeap, and the hostedType is used to generate debug type information using the createDebugTypeInfo function. I have added the class LLVMImageDebugTypeInfo inside the LLVMDebugInfoProvider, which implements DebugTypeInfo similar to the NativeImageDebugTypeInfo class and its child classes. In LLVM, the debug type information cannot be shared across modules because each debug type info is associated with an LLVM DIBuilder, which is module-specific. That is why instead of generating all the debug type information at once, I have opted for the recursive approach where type information is only generated as new types are encountered. There can be cases where a class can have members which are of the same type as the enclosing class (e.g., class A { A memberX = new A(); }). To deal with such cases so that we don’t have a stack overflow, I created an array typesVisitedRecursively which keeps track of already visited classes.
  4. Included location information: The location information and the subprograms for inlined methods are included if the NodeSourcePosition contains this information.
  5. Disabled the NativeImageDebugInfo feature when the LLVM backend is enabled with debug symbols as adinn mentioned

What that implies is that when using the LLVM back end the current debug info generation step needs to be disabled. This is provided via an internal feature (class NativeImageDebugInfoFeature) so switching it off merely requires a tweak to method isInConfiguration.

  1. Refactored the code so that the LLVMDebugInfoProvider implements the DebugInfoProvider interfaces similar to NativeImageDebugInfoProvider to make the code more readable. I have also moved functions common to both the DebugInfo providers to DebugInfoProviderHelper class.

There were a few things that I am not sure how to implement or deal with:

  1. When a type name is encountered which is not present in the NativeImageHeap (E.g. DynamicHub). I am not sure what to do. For now I’ve just generated a basic placeholder type with the same name
  2. Obtain enum values: I could not figure out how to obtain the value of an enum using its hostedField. I think I have to use readStorageValue() function but I don’t understand how it is supposed to be used.
  3. Obtain array length: I could not figure out how to obtain the number of elements present in array given a hostedType of the array. So for now I have given them a constant value.
  4. For cases when the member of the class is of the same type as the enclosing class, I didn’t know what to do. I thought about creating a pointer to the class but that requires the class type to be generated and that in turn requires all the member types to be generated. So for now I have ignored generating a type for such members.

P.S. Do you have any tests for NativeImageDebugInfo that I can use to generate and verify the debug info being generated for llvm backend?

@rishikeshdevsot
Copy link
Author

Following up on this in case anything else is needed from me to move the PR forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants