[flang] Add design document for debug info generation. #86939

abidh · 2024-03-28T11:57:32Z

This document discusses some options where the debug metadata can be generated.
It also goes through various language constructs and explains how the debug metadata
will look like for that construct and how we can extract that information.

The real point of discussion is how and where to extract the information about various
language features to generate the debug metadata. The structure of the metadata itself
is mostly settled as that is dictated by the DWARF and structure of LLVM IR metadata.
The classic flang and gfortran generate quite similar DWARF for the various language constructs.

This document is based on what Kiran posted in https://reviews.llvm.org/D138534.

This document discusses some options where the debug metadata can be generated. It also goes through various language constructs and explains how the debug metadata will look like for that construct and how we can extract that information.

tblah

Thank you very much for writing this comprehensive design document.

I have a few questions but on the whole I like the approach here.

flang/docs/DebugGeneration.md

Co-authored-by: Tom Eccles <t@freedommail.info>

jeanPerier

Thanks for writing this down, this very useful!
I am in line with the approach. Few questions/notes below.

jeanPerier · 2024-04-02T12:46:28Z

flang/docs/DebugGeneration.md

+members of the derived type.
+
+2. Use of derived type causes the generation of many other global variables.
+Do they need to be retained in debug info?


They would be meaningless to the programmer, I do not think so.

jeanPerier · 2024-04-02T12:51:56Z

flang/docs/DebugGeneration.md

+still some open questions about derived types.
+
+1. What is the correct way to get the offset and alignment information for the
+members of the derived type.


We can generate that info from the FIR RecordType type directly + mlir datalayout (there is currently no utility generating this info since this is left-up to LLVM, but this is doable (similar logic as what happens in fir::getTypeSizeAndAlignment.

jeanPerier · 2024-04-02T12:53:30Z

flang/docs/DebugGeneration.md

+
+3. The location where a component of the derived type is declared is not
+available in RecordType or TypeInfoOp. This probably will need to be
+added.


Do you think it could be added as FusedLoc on the TypeInfoOp?

I think it can work. I looped over all components in createTypeInfoOp and added the locations as FusedLoc. Later in the AddDebugInfo pass, I can see the following.

fir.type_info @_QMhelperTt_pair noinit nodestroy nofinal : !fir.type<_QMhelperTt_pair{i:i32,i1:i32,x:f32}> loc(fused<loc("derived.f90":6:13)>[fused<loc("derived.f90":5:16)>[fused<loc("derived.f90":4:16)>["derived.f90":3:11]]])

jeanPerier · 2024-04-02T12:58:17Z

flang/docs/DebugGeneration.md

+In FIR, the `DeclareOp` points to a `ShapeOp` and we can walk the chain
+to get the value that represents the array bound in any dimension. We will
+create a `DILocalVariableAttr` that will point to that location. This
+variable will be used in the `DISubrangeAttr`. Note that this


Does this need to be a variable as in some memory location, or can a DILocalVariableAttr point to any operation producing an SSA value (like an arith.addi)?

Lowering does not put n+1 in memory when evaluating specification expressions of x(n+1).

The first argument to DebugDeclareOp needs to be an llvm pointer type so an SSA value may not work. I looked at the IR from the classic flang and it does seem to put such expressions to memory.

Could llvm.dbg.value be used instead of llvm.dbg.declare here?
If not, I guess the debug info pass will have to insert allocas for bounds that are non constant SSA values (even if lowering did lowered them in memory, we could not guarantee they would not be promoted to values by some MLIR optimization).

The llvm.dbg.value could work. As you mentioned, there is option of adding allocas during debug info generation. The issue may be to keep the value alive throughout the function.

jeanPerier · 2024-04-02T13:01:58Z

flang/docs/DebugGeneration.md

+#### Assumed Size
+
+This is treated as raw array. Debug information will not provide any bounds
+information.


will not provide any bounds information.

Is this true for any bounds or only the upper bound in the last dimension?

Upper bound in last dimension.

jeanPerier · 2024-04-02T13:20:16Z

flang/docs/DebugGeneration.md

+
+### Namelists
+
+FIR does not seem to have a way to extract information about namelists.


Namelists are handled by creating runtime data structures that describes them. These data structures are globals when possible, and locals otherwise (without fir.declare currently). They can be identified via their mangling using InternalNames.h.

It may be non trivial to generate the debug info from the global/local. Maybe we could abstract that more in FIR/Lowering.

Thanks for the information. Although when I tried a simple namelist example, I did not see a local or global with this name mangling scheme in fir. Ideally we will need something that represent namelist and then refer to DeclareOp of all the variables that are part of the given namelist.

These data structures are created lazily when encountering the namelist in an IO statement, so if your test had no IO statements using it, you would indeed not have seen it. I understand this is not convenient from a debug info point of view, again more abstraction might be needed in lowering here to generate the debug info.

jeanPerier · 2024-04-02T13:22:37Z

flang/docs/DebugGeneration.md

+
+They will be treated like normal variables. Although we may require to handle
+the case where the `DeclareOp` of one variable points to the `DeclareOp` of
+another variable (e.g. a => b).


Currently inlining is not done in FIR, but that would also create DeclareOp chains.

jeanPerier · 2024-04-02T13:31:37Z

flang/docs/DebugGeneration.md

+available in RecordType or TypeInfoOp. This probably will need to be
+added.
+
+### CommonBlocks


Do you know if something special is needed to deal with equivalence outside of Common Blocks, or can the output of the fir.coordinate_of be used for the DILocalVariableAttr?

e.g:

subroutine test_locals() real :: x(100), y equivalence (x(10), y) end subroutine

Thanks for pointing out this case. The DbgDeclareOp takes a 3rd expression argument which we can use to handle cases like this by using something like DIExpression(DW_OP_plus_uconst, 36) for the example case.

jiel-nv · 2024-04-03T20:22:40Z

flang/docs/DebugGeneration.md

+
+### Strings
+
+Fixed sized strings will be treated like fixed sizes arrays and


Starting from LLVM 12, the Fortran string types (fixed length, assumed length, deferred length) LLVM debug metadata are represented by DIStringType (with name, size, stringLength, stringLengthExpression fields).

Thanks for pointing that out. The DIStringType also has stringLocationExpression field which will result in very clean implementation for strings. I will update the document and add the example IR. We will also need to add DIStringTypeAttr in the MLIR.

1. Mention the upper bound of the last dimension in the 'Assumed Size' section. 2. Redo 'Strings' section using 'DIStringType'. 3. Mention 'DIStringType' in 'Missing metadata in MLIR'

jeanPerier

Thanks for the document and the answers to the comments. Looks great!

tblah

LGTM, thanks for all of your work on this!

kiranchandramohan

Nice work. Thanks for the writing this down and the initial work done for this. I have a few questions. But no need to wait for further replies from my side.

flang/docs/DebugGeneration.md

kiranchandramohan · 2024-04-08T09:57:00Z

flang/docs/DebugGeneration.md

+
+The AddDebugFoundationPass will be renamed to AddDebugInfo Pass. The
+information mentioned in the line info section above will be passed to it from
+the driver. This pass will run quite late in the pipeline but before


Does FIRDeclare have all the information? HLFIRDeclare probably has more.

Can you kindly give an example of thing which will be present in HLFIRDeclare and not in FIRDeclare. I don't mind using it but then we will have to run the AddDebugInfo pass early when hlfir is still there.

kiranchandramohan · 2024-04-08T10:00:08Z

flang/docs/DebugGeneration.md

+  - Break at functions.
+  - print type (ptype) of function names.
+  - print values and types (ptype) of various type of variables
+- Manually run `GDB`'s gdb.fortran testsuite with llvm-flang.


Can we add the gdb.fortran tests to the llvm-testsuite just like the gfortran tests are added?

The GDB tests require its tcl based framework to issue commands and look for results. It may not be feasible to add them to llvm-testsuite. Adding a bot that builds flang and then run gdb.fortran tests with it may be a possibility.

kiranchandramohan · 2024-04-08T10:11:30Z

flang/docs/DebugGeneration.md

+`DISubroutineTypeAttr` currently has a fixed type. This will be changed to
+match the signature of the actual function/subroutine.


How do you plan to get the actual signature since this is going to be different from what is there in FIR?

What is meant by actual signature and how this will be different to the information that is in the FIR. My plan was to create the DISubroutineTypeAttr based on the information about arguments and results in FuncOp. A few simple cases that I tried, the results of the ptype command of the GDB were similar for executable generated with flang-new and gfortran.

As discussed in https://reviews.llvm.org/D138534, for the following program, what would be the ptype for the function foo?

program mn integer :: arr(10), cnt call foo(arr, cnt) contains subroutine foo( a, n) integer :: a(:), n end subroutine foo end program

gfortran (Actual signature)

ptype foo type = void (integer(kind=4) (:), integer(kind=4))

Classic Flang (LLVM IR signature)

(gdb) ptype foo type = void (integer*8, integer, integer (187651372366496), uinteger*8)

I expect it to be similar to gfortran.

kiranchandramohan · 2024-04-08T10:19:47Z

flang/docs/DebugGeneration.md

+1. There will be a `datalocation` field which will be an expression. This will
+enable debugger to get the data pointer from array descriptor.
+
+2. The field in `DISubrangeAttr` for array bounds will be expression which will
+allow the debugger to get the bounds from descriptor.


Will all these be generated during FIRToLLVM conversion? How will you get the right offsets from the descriptor?

These will be generated during normal debug attribute generation in AddDebugInfoPass. I am not clear about the 2nd part of the question. If you mean how we will get the offsets of this information in descriptor, I think we know the layout of the descriptor in the memory and can tell where various things are located. If you mean how debugger will get this information, we will encode them in DWARF's expressions which can be decoded by the debuggers.

Please note that most of the examples that I included in the document are known to work. Here are the steps I followed to verify the example debug metadata.

Use flang-new to create LLVM IR file from the sample fortran file.

Manually edit the .ll file to add the example debug metadata

Build the edited file into an executable

Run the executable under GDB and check that debugger is able to extract the right information.

If you mean how we will get the offsets of this information in descriptor

It is this part. The offsets are only available when the descriptor is converted to a structure in the FIRToLLVM conversion pass. So how can you get this info in the AddDebugInfo pass?

I thought the layout was known. For example, the FIRToLLVM seems to rely on the hardcoded values like kDimsPosInBox. We could have similar thing in AddDebugInfo pass but with offsets.

I also inspected the layout of the descriptor at runtime in the debugger and then looked at the CFI_cdesc_t and it matched.

kiranchandramohan · 2024-04-08T10:20:01Z

flang/docs/DebugGeneration.md

+In assumed shape case, the rank can be determined from the FIR's `SequenceType`.
+This allows us to generate a `DISubrangeAttr` in each dimension.


Is this also available from the descriptor?

I mean to say that DISubrangeAttr can encode where the information about the array bounds is located in the descriptor for each dimension.

Review Suggestion. Co-authored-by: Kiran Chandramohan <kiranchandramohan@gmail.com>

@tblah

Currently, cg-rewrite removes the DeclareOp. As AddDebugInfo runs after that, it cannot process the DeclareOp. My initial plan was to make the AddDebugInfo pass run before the cg-rewrite but that has few issues. 1. Initially I was thinking to use the memref op to carry the variable attr. But as @tblah suggested in the llvm#86939, it makes more sense to carry that information on DeclareOp. It also makes it easy to handle it in codegen and there is no special handling needed for arguments. For that, we need to preserve the DeclareOp till the codegen. 2. Running earlier, we will miss the changes in passes that run between cg-rewrite and codegen. But not removing the DeclareOp in cg-rewrite has the issue that ShapeOp remains and it causes errors during codegen. To solve this problem, I convert DeclareOp to XDeclareOp in cg-rewrite instead of removing it. This was mentioned as possible solution by @jeanPerier in https://reviews.llvm.org/D136254 The conversion follows similar logic as used for other operators in that file. The FortranAttr and CudaAttr are currently not converted but left as TODO when the need arise. A later commit will use the XDeclareOp to extract the variable information.

@tblah

We need the information in the `DeclareOp` to generate debug information for variables. Currently, cg-rewrite removes the `DeclareOp`. As `AddDebugInfo` runs after that, it cannot process the `DeclareOp`. My initial plan was to make the `AddDebugInfo` pass run before the cg-rewrite but that has few issues. 1. Initially I was thinking to use the memref op to carry the variable attr. But as @tblah suggested in the #86939, it makes more sense to carry that information on `DeclareOp`. It also makes it easy to handle it in codegen and there is no special handling needed for arguments. For this reason, we need to preserve the `DeclareOp` till the codegen. 2. Running earlier, we will miss the changes in passes that run between cg-rewrite and codegen. But not removing the DeclareOp in cg-rewrite has the issue that ShapeOp remains and it causes errors during codegen. To solve this problem, I convert DeclareOp to XDeclareOp in cg-rewrite instead of removing it. This was mentioned as possible solution by @jeanPerier in https://reviews.llvm.org/D136254 The conversion follows similar logic as used for other operators in that file. The FortranAttr and CudaAttr are currently not converted but left as TODO when the need arise. Now `AddDebugInfo` pass can extracts information about local variables from `XDeclareOp` and creates `DILocalVariableAttr`. These are attached to `XDeclareOp` using `FusedLoc` approach. Codegen can use them to create `DbgDeclareOp`. I have added tests that checks the debug information in mlir from and also in llvm ir. Currently we only handle very limited types. Rest are given a place holder type. The previous placeholder type was basic type with `DW_ATE_address` encoding. When variables are added, it started causing assertions in the llvm debug info generation logic for some types. It has been changed to an interger type to prevent these issues until we handle those types properly.

@tblah

Currently, cg-rewrite removes the DeclareOp. As AddDebugInfo runs after that, it cannot process the DeclareOp. My initial plan was to make the AddDebugInfo pass run before the cg-rewrite but that has few issues. 1. Initially I was thinking to use the memref op to carry the variable attr. But as @tblah suggested in the llvm#86939, it makes more sense to carry that information on DeclareOp. It also makes it easy to handle it in codegen and there is no special handling needed for arguments. For that, we need to preserve the DeclareOp till the codegen. 2. Running earlier, we will miss the changes in passes that run between cg-rewrite and codegen. But not removing the DeclareOp in cg-rewrite has the issue that ShapeOp remains and it causes errors during codegen. To solve this problem, I convert DeclareOp to XDeclareOp in cg-rewrite instead of removing it. This was mentioned as possible solution by @jeanPerier in https://reviews.llvm.org/D136254 The conversion follows similar logic as used for other operators in that file. The FortranAttr and CudaAttr are currently not converted but left as TODO when the need arise. A later commit will use the XDeclareOp to extract the variable information.

abidh requested review from tblah, kiranchandramohan, jeanPerier, dpalermo, kiranktp and vzakhari March 28, 2024 11:57

llvmbot added the flang Flang issues not falling into any other category label Mar 28, 2024

[flang][docs] Add reference to DebugGeneration in toctree.

ab10241

tblah reviewed Mar 28, 2024

View reviewed changes

Apply suggestions from code review

96c4bff

Co-authored-by: Tom Eccles <t@freedommail.info>

jeanPerier reviewed Apr 2, 2024

View reviewed changes

jiel-nv reviewed Apr 3, 2024

View reviewed changes

Handle code review comments.

ef7e623

1. Mention the upper bound of the last dimension in the 'Assumed Size' section. 2. Redo 'Strings' section using 'DIStringType'. 3. Mention 'DIStringType' in 'Missing metadata in MLIR'

jeanPerier approved these changes Apr 5, 2024

View reviewed changes

tblah approved these changes Apr 8, 2024

View reviewed changes

kiranchandramohan approved these changes Apr 8, 2024

View reviewed changes

abidh and others added 2 commits April 11, 2024 14:29

Update flang/docs/DebugGeneration.md

fde384d

Review Suggestion. Co-authored-by: Kiran Chandramohan <kiranchandramohan@gmail.com>

Remove questions that have been answered during code review.

b5838b9

abidh merged commit 357f6c7 into llvm:main Apr 11, 2024
4 of 5 checks passed

abidh deleted the debug_design branch April 11, 2024 16:54

abidh mentioned this pull request May 2, 2024

[flang] Initial debug info support for local variables. #90905

Merged


		### Namelists

		FIR does not seem to have a way to extract information about namelists.


		### Strings

		Fixed sized strings will be treated like fixed sizes arrays and

		`DISubroutineTypeAttr` currently has a fixed type. This will be changed to
		match the signature of the actual function/subroutine.

		In assumed shape case, the rank can be determined from the FIR's `SequenceType`.
		This allows us to generate a `DISubrangeAttr` in each dimension.

[flang] Add design document for debug info generation. #86939

[flang] Add design document for debug info generation. #86939

Conversation

abidh commented Mar 28, 2024

tblah left a comment

Choose a reason for hiding this comment

jeanPerier left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abidh Apr 4, 2024 • edited Loading

Choose a reason for hiding this comment

jeanPerier left a comment

Choose a reason for hiding this comment

tblah left a comment

Choose a reason for hiding this comment

kiranchandramohan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kiranchandramohan Apr 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abidh Apr 4, 2024 •

edited

Loading

kiranchandramohan Apr 11, 2024 •

edited

Loading