Skip to content

Commit

Permalink
[DebugInfo] Enforce implicit constraints on distinct MDNodes
Browse files Browse the repository at this point in the history
Add UNIQUED and DISTINCT properties in Metadata.def and use them to
implement restrictions on the `distinct` property of MDNodes:

* DIExpression can currently be parsed from IR or read from bitcode
  as `distinct`, but this property is silently dropped when printing
  to IR. This causes accepted IR to fail to round-trip. As DIExpression
  appears inline at each use in the canonical form of IR, it cannot
  actually be `distinct` anyway, as there is no syntax to describe it.
* Similarly, DIArgList is conceptually always uniqued. It is currently
  restricted to only appearing in contexts where there is no syntax for
  `distinct`, but for consistency it is treated equivalently to
  DIExpression in this patch.
* DICompileUnit is already restricted to always being `distinct`, but
  along with adding general support for the inverse restriction I went
  ahead and described this in Metadata.def and updated the parser to be
  general. Future nodes which have this restriction can share this
  support.

The new UNIQUED property applies to DIExpression and DIArgList, and
forbids them to be `distinct`. It also implies they are canonically
printed inline at each use, rather than via MDNode ID.

The new DISTINCT property applies to DICompileUnit, and requires it to
be `distinct`.

A potential alternative change is to forbid the non-inline syntax for
DIExpression entirely, as is done with DIArgList implicitly by requiring
it appear in the context of a function. For example, we would forbid:

    !named = !{!0}
    !0 = !DIExpression()

Instead we would only accept the equivalent inlined version:

    !named = !{!DIExpression()}

This essentially removes the ability to create a `distinct` DIExpression
by construction, as there is no syntax for `distinct` inline. If this
patch is accepted as-is, the result would be that the non-canonical
version is accepted, but the following would be an error and produce a diagnostic:

    !named = !{!0}
    ; error: 'distinct' not allowed for !DIExpression()
    !0 = distinct !DIExpression()

Also update some documentation to consistently use the inline syntax for
DIExpression, and to describe the restrictions on `distinct` for nodes
where applicable.

Reviewed By: StephenTozer, t-tye

Differential Revision: https://reviews.llvm.org/D104827
  • Loading branch information
slinder1 committed Nov 9, 2021
1 parent 181763d commit ee76525
Show file tree
Hide file tree
Showing 18 changed files with 585 additions and 394 deletions.
60 changes: 32 additions & 28 deletions llvm/docs/LangRef.rst
Expand Up @@ -5200,21 +5200,22 @@ metadata nodes are related to debug info.
DICompileUnit
"""""""""""""

``DICompileUnit`` nodes represent a compile unit. The ``enums:``,
``retainedTypes:``, ``globals:``, ``imports:`` and ``macros:`` fields are tuples
containing the debug info to be emitted along with the compile unit, regardless
of code optimizations (some nodes are only emitted if there are references to
them from instructions). The ``debugInfoForProfiling:`` field is a boolean
indicating whether or not line-table discriminators are updated to provide
more-accurate debug info for profiling results.
``DICompileUnit`` nodes represent a compile unit. ``DICompileUnit`` nodes must
be ``distinct``. The ``enums:``, ``retainedTypes:``, ``globals:``, ``imports:``
and ``macros:`` fields are tuples containing the debug info to be emitted along
with the compile unit, regardless of code optimizations (some nodes are only
emitted if there are references to them from instructions). The
``debugInfoForProfiling:`` field is a boolean indicating whether or not
line-table discriminators are updated to provide more-accurate debug info for
profiling results.

.. code-block:: text

!0 = !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang",
isOptimized: true, flags: "-O2", runtimeVersion: 2,
splitDebugFilename: "abc.debug", emissionKind: FullDebug,
enums: !2, retainedTypes: !3, globals: !4, imports: !5,
macros: !6, dwoId: 0x0abcd)
!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang",
isOptimized: true, flags: "-O2", runtimeVersion: 2,
splitDebugFilename: "abc.debug", emissionKind: FullDebug,
enums: !2, retainedTypes: !3, globals: !4, imports: !5,
macros: !6, dwoId: 0x0abcd)

Compile unit descriptors provide the root scope for objects declared in a
specific compilation unit. File descriptors are defined using this scope. These
Expand Down Expand Up @@ -5625,12 +5626,14 @@ DIExpression
""""""""""""

``DIExpression`` nodes represent expressions that are inspired by the DWARF
expression language. They are used in :ref:`debug intrinsics<dbg_intrinsics>`
(such as ``llvm.dbg.declare`` and ``llvm.dbg.value``) to describe how the
referenced LLVM variable relates to the source language variable. Debug
intrinsics are interpreted left-to-right: start by pushing the value/address
operand of the intrinsic onto a stack, then repeatedly push and evaluate
opcodes from the DIExpression until the final variable description is produced.
expression language. ``DIExpression`` nodes must not be ``distinct``, and are
canonically printed inline at each use. They are used in :ref:`debug
intrinsics<dbg_intrinsics>` (such as ``llvm.dbg.declare`` and
``llvm.dbg.value``) to describe how the referenced LLVM variable relates to the
source language variable. Debug intrinsics are interpreted left-to-right: start
by pushing the value/address operand of the intrinsic onto a stack, then
repeatedly push and evaluate opcodes from the DIExpression until the final
variable description is produced.

The current supported opcode vocabulary is limited:

Expand Down Expand Up @@ -5708,23 +5711,23 @@ The current supported opcode vocabulary is limited:

IR for "*ptr = 4;"
--------------
call void @llvm.dbg.value(metadata i32 4, metadata !17, metadata !20)
call void @llvm.dbg.value(metadata i32 4, metadata !17,
metadata !DIExpression(DW_OP_LLVM_implicit_pointer)))
!17 = !DILocalVariable(name: "ptr1", scope: !12, file: !3, line: 5,
type: !18)
!18 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !19, size: 64)
!19 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
!20 = !DIExpression(DW_OP_LLVM_implicit_pointer))

IR for "**ptr = 4;"
--------------
call void @llvm.dbg.value(metadata i32 4, metadata !17, metadata !21)
call void @llvm.dbg.value(metadata i32 4, metadata !17,
metadata !DIExpression(DW_OP_LLVM_implicit_pointer,
DW_OP_LLVM_implicit_pointer)))
!17 = !DILocalVariable(name: "ptr1", scope: !12, file: !3, line: 5,
type: !18)
!18 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !19, size: 64)
!19 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !20, size: 64)
!20 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
!21 = !DIExpression(DW_OP_LLVM_implicit_pointer,
DW_OP_LLVM_implicit_pointer))

DWARF specifies three kinds of simple location descriptions: Register, memory,
and implicit location descriptions. Note that a location description is
Expand Down Expand Up @@ -5765,12 +5768,13 @@ valid debug intrinsic.
DIArgList
""""""""""""

``DIArgList`` nodes hold a list of constant or SSA value references. These are
used in :ref:`debug intrinsics<dbg_intrinsics>` (currently only in
``DIArgList`` nodes hold a list of constant or SSA value references.
``DIArgList`` must not be ``distinct``, must only be used as an argument to a
function call, and must appear inline at each use. ``DIArgList`` may refer to
function-local values of the containing function. ``DIArgList`` nodes are used
in :ref:`debug intrinsics<dbg_intrinsics>` (currently only in
``llvm.dbg.value``) in combination with a ``DIExpression`` that uses the
``DW_OP_LLVM_arg`` operator. Because a DIArgList may refer to local values
within a function, it must only be used as a function argument, must always be
inlined, and cannot appear in named metadata.
``DW_OP_LLVM_arg`` operator.

.. code-block:: text

Expand Down
81 changes: 40 additions & 41 deletions llvm/docs/SourceLevelDebugging.rst
Expand Up @@ -291,17 +291,17 @@ Compiled to LLVM, this function would be represented like this:
%X = alloca i32, align 4
%Y = alloca i32, align 4
%Z = alloca i32, align 4
call void @llvm.dbg.declare(metadata i32* %X, metadata !11, metadata !13), !dbg !14
store i32 21, i32* %X, align 4, !dbg !14
call void @llvm.dbg.declare(metadata i32* %Y, metadata !15, metadata !13), !dbg !16
store i32 22, i32* %Y, align 4, !dbg !16
call void @llvm.dbg.declare(metadata i32* %Z, metadata !17, metadata !13), !dbg !19
store i32 23, i32* %Z, align 4, !dbg !19
%0 = load i32, i32* %X, align 4, !dbg !20
store i32 %0, i32* %Z, align 4, !dbg !21
%1 = load i32, i32* %Y, align 4, !dbg !22
store i32 %1, i32* %X, align 4, !dbg !23
ret void, !dbg !24
call void @llvm.dbg.declare(metadata i32* %X, metadata !11, metadata !DIExpression()), !dbg !13
store i32 21, i32* %X, align 4, !dbg !13
call void @llvm.dbg.declare(metadata i32* %Y, metadata !14, metadata !DIExpression()), !dbg !15
store i32 22, i32* %Y, align 4, !dbg !15
call void @llvm.dbg.declare(metadata i32* %Z, metadata !16, metadata !DIExpression()), !dbg !18
store i32 23, i32* %Z, align 4, !dbg !18
%0 = load i32, i32* %X, align 4, !dbg !19
store i32 %0, i32* %Z, align 4, !dbg !20
%1 = load i32, i32* %Y, align 4, !dbg !21
store i32 %1, i32* %X, align 4, !dbg !22
ret void, !dbg !23
}
; Function Attrs: nounwind readnone
Expand All @@ -327,18 +327,17 @@ Compiled to LLVM, this function would be represented like this:
!10 = !{!"clang version 3.7.0 (trunk 231150) (llvm/trunk 231154)"}
!11 = !DILocalVariable(name: "X", scope: !4, file: !1, line: 2, type: !12)
!12 = !DIBasicType(name: "int", size: 32, align: 32, encoding: DW_ATE_signed)
!13 = !DIExpression()
!14 = !DILocation(line: 2, column: 9, scope: !4)
!15 = !DILocalVariable(name: "Y", scope: !4, file: !1, line: 3, type: !12)
!16 = !DILocation(line: 3, column: 9, scope: !4)
!17 = !DILocalVariable(name: "Z", scope: !18, file: !1, line: 5, type: !12)
!18 = distinct !DILexicalBlock(scope: !4, file: !1, line: 4, column: 5)
!19 = !DILocation(line: 5, column: 11, scope: !18)
!20 = !DILocation(line: 6, column: 11, scope: !18)
!21 = !DILocation(line: 6, column: 9, scope: !18)
!22 = !DILocation(line: 8, column: 9, scope: !4)
!23 = !DILocation(line: 8, column: 7, scope: !4)
!24 = !DILocation(line: 9, column: 3, scope: !4)
!13 = !DILocation(line: 2, column: 9, scope: !4)
!14 = !DILocalVariable(name: "Y", scope: !4, file: !1, line: 3, type: !12)
!15 = !DILocation(line: 3, column: 9, scope: !4)
!16 = !DILocalVariable(name: "Z", scope: !17, file: !1, line: 5, type: !12)
!17 = distinct !DILexicalBlock(scope: !4, file: !1, line: 4, column: 5)
!18 = !DILocation(line: 5, column: 11, scope: !17)
!19 = !DILocation(line: 6, column: 11, scope: !17)
!20 = !DILocation(line: 6, column: 9, scope: !17)
!21 = !DILocation(line: 8, column: 9, scope: !4)
!22 = !DILocation(line: 8, column: 7, scope: !4)
!23 = !DILocation(line: 9, column: 3, scope: !4)
This example illustrates a few important details about LLVM debugging
Expand All @@ -349,21 +348,21 @@ variable definitions, and the code used to implement the function.

.. code-block:: llvm
call void @llvm.dbg.declare(metadata i32* %X, metadata !11, metadata !13), !dbg !14
call void @llvm.dbg.declare(metadata i32* %X, metadata !11, metadata !DIExpression()), !dbg !13
; [debug line = 2:7] [debug variable = X]
The first intrinsic ``%llvm.dbg.declare`` encodes debugging information for the
variable ``X``. The metadata ``!dbg !14`` attached to the intrinsic provides
variable ``X``. The metadata ``!dbg !13`` attached to the intrinsic provides
scope information for the variable ``X``.

.. code-block:: text
!14 = !DILocation(line: 2, column: 9, scope: !4)
!13 = !DILocation(line: 2, column: 9, scope: !4)
!4 = distinct !DISubprogram(name: "foo", scope: !1, file: !1, line: 1, type: !5,
isLocal: false, isDefinition: true, scopeLine: 1,
isOptimized: false, retainedNodes: !2)
Here ``!14`` is metadata providing `location information
Here ``!13`` is metadata providing `location information
<LangRef.html#dilocation>`_. In this example, scope is encoded by ``!4``, a
`subprogram descriptor <LangRef.html#disubprogram>`_. This way the location
information attached to the intrinsics indicates that the variable ``X`` is
Expand All @@ -373,20 +372,20 @@ Now lets take another example.

.. code-block:: llvm
call void @llvm.dbg.declare(metadata i32* %Z, metadata !17, metadata !13), !dbg !19
call void @llvm.dbg.declare(metadata i32* %Z, metadata !16, metadata !DIExpression()), !dbg !18
; [debug line = 5:9] [debug variable = Z]
The third intrinsic ``%llvm.dbg.declare`` encodes debugging information for
variable ``Z``. The metadata ``!dbg !19`` attached to the intrinsic provides
variable ``Z``. The metadata ``!dbg !18`` attached to the intrinsic provides
scope information for the variable ``Z``.

.. code-block:: text
!18 = distinct !DILexicalBlock(scope: !4, file: !1, line: 4, column: 5)
!19 = !DILocation(line: 5, column: 11, scope: !18)
!17 = distinct !DILexicalBlock(scope: !4, file: !1, line: 4, column: 5)
!18 = !DILocation(line: 5, column: 11, scope: !17)
Here ``!19`` indicates that ``Z`` is declared at line number 5 and column
number 11 inside of lexical scope ``!18``. The lexical scope itself resides
Here ``!18`` indicates that ``Z`` is declared at line number 5 and column
number 11 inside of lexical scope ``!17``. The lexical scope itself resides
inside of subprogram ``!4`` described above.

The scope information attached with each instruction provides a straightforward
Expand Down Expand Up @@ -802,14 +801,14 @@ presents several difficulties:
br label %exit, !dbg !26
truebr:
call void @llvm.dbg.value(metadata i32 %input, metadata !30, metadata !DIExpression()), !dbg !24
call void @llvm.dbg.value(metadata i32 1, metadata !23, metadata !DIExpression()), !dbg !24
call void @llvm.dbg.value(metadata i32 %input, metadata !30, metadata !DIExpression()), !dbg !23
call void @llvm.dbg.value(metadata i32 1, metadata !22, metadata !DIExpression()), !dbg !23
%value1 = add i32 %input, 1
br label %bb1
falsebr:
call void @llvm.dbg.value(metadata i32 %input, metadata !30, metadata !DIExpression()), !dbg !24
call void @llvm.dbg.value(metadata i32 2, metadata !23, metadata !DIExpression()), !dbg !24
call void @llvm.dbg.value(metadata i32 %input, metadata !30, metadata !DIExpression()), !dbg !23
call void @llvm.dbg.value(metadata i32 2, metadata !22, metadata !DIExpression()), !dbg !23
%value = add i32 %input, 2
br label %bb1
Expand All @@ -820,7 +819,7 @@ presents several difficulties:
Here the difficulties are:

* The control flow is roughly the opposite of basic block order
* The value of the ``!23`` variable merges into ``%bb1``, but there is no PHI
* The value of the ``!22`` variable merges into ``%bb1``, but there is no PHI
node

As mentioned above, the ``llvm.dbg.value`` intrinsics essentially form an
Expand All @@ -833,9 +832,9 @@ location, which would lead to a large number of debugging intrinsics being
generated.

Examining the example above, variable ``!30`` is assigned ``%input`` on both
conditional paths through the function, while ``!23`` is assigned differing
conditional paths through the function, while ``!22`` is assigned differing
constant values on either path. Where control flow merges in ``%bb1`` we would
want ``!30`` to keep its location (``%input``), but ``!23`` to become undefined
want ``!30`` to keep its location (``%input``), but ``!22`` to become undefined
as we cannot determine at runtime what value it should have in %bb1 without
inserting a PHI node. mem2reg does not insert the PHI node to avoid changing
codegen when debugging is enabled, and does not insert the other dbg.values
Expand All @@ -854,7 +853,7 @@ DbgEntityHistoryCalculator) to build a map of each instruction to every
valid variable location, without the need to consider control flow. From
the example above, it is otherwise difficult to determine that the location
of variable ``!30`` should flow "up" into block ``%bb1``, but that the location
of variable ``!23`` should not flow "down" into the ``%exit`` block.
of variable ``!22`` should not flow "down" into the ``%exit`` block.

.. _ccxx_frontend:

Expand Down
3 changes: 2 additions & 1 deletion llvm/include/llvm/AsmParser/LLParser.h
Expand Up @@ -520,7 +520,8 @@ namespace llvm {
template <class ParserTy> bool parseMDFieldsImplBody(ParserTy ParseField);
template <class ParserTy>
bool parseMDFieldsImpl(ParserTy ParseField, LocTy &ClosingLoc);
bool parseSpecializedMDNode(MDNode *&N, bool IsDistinct = false);
bool parseSpecializedMDNode(MDNode *&N, bool IsDistinct = false,
LocTy DistinctLoc = LocTy());

#define HANDLE_SPECIALIZED_MDNODE_LEAF(CLASS) \
bool parse##CLASS(MDNode *&Result, bool IsDistinct);
Expand Down

0 comments on commit ee76525

Please sign in to comment.