[LangRef] Clarify definition of fragments and debug intrinsics #82019

slinder1 · 2024-02-16T18:27:06Z

Try to give a more complete description of what we call a "fragment", and define "spatial overlap" between variables and fragments to make the details of when one intrinsic ends the effects of another more direct and complete.

Also change the description of the debug intrinsics to the pseudo-grammar style from LangRef.

Try to give a more complete description of what we call a "fragment", and define "spatial overlap" between variables and fragments to make the details of when one intrinsic ends the effects of another more direct and complete. Also change the description of the debug intrinsics to the pseudo-grammar style from LangRef.

llvmbot · 2024-02-16T18:27:26Z

@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-debuginfo

Author: Scott Linder (slinder1)

Changes

Try to give a more complete description of what we call a "fragment", and define "spatial overlap" between variables and fragments to make the details of when one intrinsic ends the effects of another more direct and complete.

Also change the description of the debug intrinsics to the pseudo-grammar style from LangRef.

Full diff: https://github.com/llvm/llvm-project/pull/82019.diff

1 Files Affected:

(modified) llvm/docs/SourceLevelDebugging.rst (+49-17)

diff --git a/llvm/docs/SourceLevelDebugging.rst b/llvm/docs/SourceLevelDebugging.rst
index e1df7c355ee092..ccd455a59f5a5b 100644
--- a/llvm/docs/SourceLevelDebugging.rst
+++ b/llvm/docs/SourceLevelDebugging.rst
@@ -130,6 +130,40 @@ debugging information influences optimization passes then it will be reported
 as a failure.  See :doc:`TestingGuide` for more information on LLVM test
 infrastructure and how to run various tests.
 
+.. _variables_and_variable_fragments:
+
+Variables and Variable Fragments
+================================
+
+Source language variables (or just "variables") are represented by `local
+variable <LangRef.html#dilocalvariable>`_ and `global variable
+<LangRef.html#diglobalvariable>`_ metadata nodes.
+
+When variables are not allocated to contiguous memory or to a single LLVM value
+the metadata must record enough information to describe each "piece" or
+"fragment" of the source variable. Each fragment is a contiguous span of bits
+of the variable it is a part of.
+
+This is achieved by encoding fragment information at the end of the
+``DIExpression`` with the ``DW_OP_LLVM_fragment`` operation, whose operands are
+the bit offset of the fragment relative to the start of the variable, and the
+fragment size in bits.
+
+.. note::
+
+   The ``DW_OP_LLVM_fragment`` operation acts only to encode the fragment
+   information, and does not have an effect on the semantics of the expression.
+
+A debug intrinsic which refers to a ``DIExpression`` ending with a fragment
+operation provides information about the fragment of the variable it refers to,
+rather than the whole variable.
+
+An equivalence relation over the set of all variables and variable fragments
+called "spatially overlapping" is defined in order to describe when intrinsics
+terminate the effects of other intrinsics. A variable spatially overlaps with
+itself and all fragments of itself. Fragments additionally spatially overlap
+with other fragments sharing one common bit of the same variable.
+
 .. _format:
 
 Debugging information format
@@ -180,7 +214,9 @@ track source local variables through optimization and code generation.
 
 .. code-block:: llvm
 
-  void @llvm.dbg.declare(metadata, metadata, metadata)
+  void @llvm.dbg.declare(metadata <type> <value>,
+                         metadata <!DILocalVariable>,
+                         metadata <!DIExpression>)
 
 This intrinsic provides information about a local element (e.g., variable).
 The first argument is metadata holding the address of variable, typically a
@@ -221,7 +257,9 @@ agree on the memory location.
 
 .. code-block:: llvm
 
-  void @llvm.dbg.value(metadata, metadata, metadata)
+  void @llvm.dbg.value(metadata <<type> <value>|!DIArgList>,
+                       metadata <!DILocalVariable>,
+                       metadata <!DIExpression>)
 
 This intrinsic provides information when a user source variable is set to a new
 value.  The first argument is the new value (wrapped as metadata).  The second
@@ -243,12 +281,12 @@ the complex expression derives the direct value.
 
 .. code-block:: llvm
 
-  void @llvm.dbg.assign(Value *Value,
-                        DIExpression *ValueExpression,
-                        DILocalVariable *Variable,
-                        DIAssignID *ID,
-                        Value *Address,
-                        DIExpression *AddressExpression)
+  void @llvm.dbg.assign(metadata <<type> <value>|!DIArgList>,
+                        metadata <!DIExpression>,
+                        metadata <!DILocalVariable>,
+                        metadata <!DIAssignID>,
+                        metadata <type> <value>,
+                        metadata <!DIExpression>)
 
 This intrinsic marks the position in IR where a source assignment occurred. It
 encodes the value of the variable. It references the store, if any, that
@@ -259,13 +297,6 @@ argument is a ``DIAssignID`` used to reference a store. The fifth is the
 destination of the store (wrapped as metadata), and the sixth is a `complex
 expression <LangRef.html#diexpression>`_ that modifies it.
 
-The formal LLVM-IR signature is:
-
-.. code-block:: llvm
-
-  void @llvm.dbg.assign(metadata, metadata, metadata, metadata, metadata, metadata)
-
-
 See :doc:`AssignmentTracking` for more info.
 
 Object lifetimes and scoping
@@ -417,8 +448,9 @@ values through compilation, when objects are promoted to SSA values an
 ``llvm.dbg.value`` intrinsic is created for each assignment, recording the
 variable's new location. Compared with the ``llvm.dbg.declare`` intrinsic:
 
-* A dbg.value terminates the effect of any preceding dbg.values for (any
-  overlapping fragments of) the specified variable.
+* A dbg.value terminates the effect of any preceding dbg.values for any
+  spatially overlapping variables or fragments of the specified variable or
+  fragment.
 * The dbg.value's position in the IR defines where in the instruction stream
   the variable's value changes.
 * Operands can be constants, indicating the variable is assigned a

felipepiovezan · 2024-02-16T18:37:41Z

llvm/docs/SourceLevelDebugging.rst

+<LangRef.html#diglobalvariable>`_ metadata nodes.
+
+When variables are not allocated to contiguous memory or to a single LLVM value
+the metadata must record enough information to describe each "piece" or


One minor concern here, we say that:

When variables are not allocated

And then one of two options:

to contiguous memory
or
to a single LLVM value

This second part feels off to me: "allocated to a single LLVM value"

One suggestion would be to avoid mentioning allocation, and instead combine the two possibilities in terms of "describing":

"When variables cannot be described by a single use of an LLVM value, ...".

Btw I really like the wording of everything else!

I agree that the wording could be improved - I think that your proposed wording is better, but might be ambiguous with respect to variadic debug values, which aren't necessarily fragments but may be "described" by many LLVM values. Perhaps "When variables are split up across multiple LLVM values...", and it may also be worthwhile to be direct and give an example, i.e. "...such as for structs that have been split into separate LLVM values for each member...". No particularly strong feelings on this part from my end though!

Could we say "cannot be described by a single expression" then? IIRC when we have multiple fragments, each fragment is described by its own expression (and its own intrinsic)

I agree, the wording I proposed is imprecise/ambiguous; I think I also bled over concepts from the "use fragments for everything" world I was originally thinking in terms of.

I like the framing in terms of the number of expressions for many cases, but we still have cases where we use a single fragment+expression to describe that only one piece of a variable is defined, right? Maybe rather than starting with the "reason" for fragments I can start with just exactly what they are, then in a note go into the various cases they are used and give some examples?

Maybe rather than starting with the "reason" for fragments

I think you make a good point here. We don't need to justify the reason for fragments here (arguably it is self evident); we can just state what fragments are:

"A variable may be described in terms of fragments, where a fragment is a contiguous span of bits. This is achieved by ... [rest of the second paragraph]".

If we want to provide motivation, we can do so later through an example or just state in high level terms one code transformation that may cause fragments to appear.

SLTozer

Aside from the comments above, this all LGTM - thanks for this!

adrian-prantl · 2024-02-17T00:17:15Z

llvm/docs/SourceLevelDebugging.rst

+Variables and Variable Fragments
+================================
+
+Source language variables (or just "variables") are represented by `local


should we add "and constants"?

Just imo, but I think we could skip mentioning constants, on the basis that the sentence as written isn't exclusive (it doesn't specify that only variables are represented by these classes), and I don't believe(?) that we use fragments for constants since we should always have a single whole definition for them, so the following text would only concern variables. No complaints from me about adding a mention of constants here, though.

I think the terminology gets a bit fuzzy, because "constants" can mean different things in different source languages anyway, right?

I didn't audit every other mention of "variable" in the document, but I imagine this same ambiguity comes up elsewhere, so it would be ideal to get a consistent definition to point back at. DWARF seems to call these "data objects" and says:

Chapter 4 Data Object and Object List Entries This section presents the debugging information entries that describe individual data objects: variables, parameters and constants, and lists of those objects that may be grouped in a single declaration, such as a common block.

Maybe we can just adopt "data object" here?

In the interest of not bike-shedding this patch forever, I'll avoid commenting too much more (it already looks "good enough" to me!), but I think using "data object" outside of the DWARF context may just be adding confusion. Personally I feel like "Variable" is a good term, since the information here explicitly applies to just "variables" and not really to constants, but ultimately I put my vote for either "variables" or "variables and constants".

adrian-prantl · 2024-02-17T00:20:45Z

llvm/docs/SourceLevelDebugging.rst

@@ -417,8 +448,9 @@ values through compilation, when objects are promoted to SSA values an
 ``llvm.dbg.value`` intrinsic is created for each assignment, recording the
 variable's new location. Compared with the ``llvm.dbg.declare`` intrinsic:

-* A dbg.value terminates the effect of any preceding dbg.values for (any
-  overlapping fragments of) the specified variable.
+* A dbg.value terminates the effect of any preceding dbg.values for any


should we be consistent and use llvm.dbg.value here?

adrian-prantl · 2024-02-17T00:24:16Z

llvm/docs/SourceLevelDebugging.rst

-* A dbg.value terminates the effect of any preceding dbg.values for (any
-  overlapping fragments of) the specified variable.
+* A dbg.value terminates the effect of any preceding dbg.values for any
+  spatially overlapping variables or fragments of the specified variable or


The way this is worded sounds as if in this example:

%1 = ... call llvm.dbg.value(metadata %1, DILocalVariable("x"), DIExpression()) %2 = ... call llvm.dbg.value(metadata %1, DILocalVariable("y"), DIExpression())

The second intrinsic ends the range of "x", but that's not the case. It only terminates it if it refers to the same variable or a fragment of it.

Suggested change

spatially overlapping variables or fragments of the specified variable or

spatially overlapping llvm.dbg.values of the specified variable or

Perhaps this - I think given the definition of spatial overlaps, it should be clear that it applies to both fragment and non-fragment variable definitions.

Right, it might be good to add a note here, and to make the term "spatially overlapping" be a link back to the definition, but the intention was to define "spatial overlap" such that this could just talk in terms of the universe of all variables and fragments and by definition still be restricted to the way LLVM currently works. In other words, a dbg.value for "x" cannot spatially overlap with a dbg.value for "y" by definition, so the wording here does not need to repeat that restriction.

Edit: at least it doesn't need to repeat it "normatively"; if a note helps the reader understand I'm happy to include one!

I tried to include more kinds of objects under the umbrella term "variable", but I'm not particularly happy with how circular it sounds. Any more thoughts on better approaches are appreciated. I also didn't understand initially that the current behavior in the most common cases of spatial overlap between fragments is not necessarily the intended semantics. Instead it is just a legal shortcut that balances the complexity of handling partial overlap with arbitrary expressions with the utility to the debugger for these rarer occurences. None of this is terribly exact, but I still think having something written about these aspects is better than having nothing, and might save future contributors a bit of the learning curve. Let me know what you think!

OCHyams

I tried to include more kinds of objects under the umbrella term
"variable", but I'm not particularly happy with how circular it sounds.
Any more thoughts on better approaches are appreciated.

IMO what you have now is good.

This all LGTM, thank you for working on this!

I think everyone's previous concerns have been addressed, but I'll hold off on Accepting to give the other reviewers a chance to have another look at this.

jryans

Thanks for working on this, seems like a good improvement to me. 😄

slinder1 added documentation debuginfo llvm:ir labels Feb 16, 2024

slinder1 requested review from jryans, jmorse, adrian-prantl, SLTozer, t-tye, epilk, OCHyams and kzhuravl February 16, 2024 18:27

felipepiovezan reviewed Feb 16, 2024

View reviewed changes

SLTozer reviewed Feb 16, 2024

View reviewed changes

adrian-prantl reviewed Feb 17, 2024

View reviewed changes

OCHyams reviewed Apr 8, 2024

View reviewed changes

jryans approved these changes Apr 8, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LangRef] Clarify definition of fragments and debug intrinsics #82019

[LangRef] Clarify definition of fragments and debug intrinsics #82019

slinder1 commented Feb 16, 2024

llvmbot commented Feb 16, 2024 •

edited

felipepiovezan Feb 16, 2024

felipepiovezan Feb 16, 2024

SLTozer Feb 16, 2024

felipepiovezan Feb 16, 2024 •

edited

slinder1 Feb 16, 2024

felipepiovezan Feb 17, 2024

SLTozer left a comment

adrian-prantl Feb 17, 2024

SLTozer Feb 19, 2024

slinder1 Feb 19, 2024

SLTozer Feb 20, 2024

adrian-prantl Feb 17, 2024

adrian-prantl Feb 17, 2024

SLTozer Feb 19, 2024

slinder1 Feb 19, 2024 •

edited

OCHyams left a comment

jryans left a comment

	spatially overlapping variables or fragments of the specified variable or
	spatially overlapping llvm.dbg.values of the specified variable or

[LangRef] Clarify definition of fragments and debug intrinsics #82019

Are you sure you want to change the base?

[LangRef] Clarify definition of fragments and debug intrinsics #82019

Conversation

slinder1 commented Feb 16, 2024

llvmbot commented Feb 16, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

felipepiovezan Feb 16, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SLTozer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

slinder1 Feb 19, 2024 • edited

Choose a reason for hiding this comment

OCHyams left a comment

Choose a reason for hiding this comment

jryans left a comment

Choose a reason for hiding this comment

llvmbot commented Feb 16, 2024 •

edited

felipepiovezan Feb 16, 2024 •

edited

slinder1 Feb 19, 2024 •

edited