Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LangRef] Clarify definition of fragments and debug intrinsics #82019

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

slinder1
Copy link
Contributor

Try to give a more complete description of what we call a "fragment", and define "spatial overlap" between variables and fragments to make the details of when one intrinsic ends the effects of another more direct and complete.

Also change the description of the debug intrinsics to the pseudo-grammar style from LangRef.

Try to give a more complete description of what we call a "fragment",
and define "spatial overlap" between variables and fragments to make
the details of when one intrinsic ends the effects of another more
direct and complete.

Also change the description of the debug intrinsics to the
pseudo-grammar style from LangRef.
@llvmbot
Copy link
Collaborator

llvmbot commented Feb 16, 2024

@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-debuginfo

Author: Scott Linder (slinder1)

Changes

Try to give a more complete description of what we call a "fragment", and define "spatial overlap" between variables and fragments to make the details of when one intrinsic ends the effects of another more direct and complete.

Also change the description of the debug intrinsics to the pseudo-grammar style from LangRef.


Full diff: https://github.com/llvm/llvm-project/pull/82019.diff

1 Files Affected:

  • (modified) llvm/docs/SourceLevelDebugging.rst (+49-17)
diff --git a/llvm/docs/SourceLevelDebugging.rst b/llvm/docs/SourceLevelDebugging.rst
index e1df7c355ee092..ccd455a59f5a5b 100644
--- a/llvm/docs/SourceLevelDebugging.rst
+++ b/llvm/docs/SourceLevelDebugging.rst
@@ -130,6 +130,40 @@ debugging information influences optimization passes then it will be reported
 as a failure.  See :doc:`TestingGuide` for more information on LLVM test
 infrastructure and how to run various tests.
 
+.. _variables_and_variable_fragments:
+
+Variables and Variable Fragments
+================================
+
+Source language variables (or just "variables") are represented by `local
+variable <LangRef.html#dilocalvariable>`_ and `global variable
+<LangRef.html#diglobalvariable>`_ metadata nodes.
+
+When variables are not allocated to contiguous memory or to a single LLVM value
+the metadata must record enough information to describe each "piece" or
+"fragment" of the source variable. Each fragment is a contiguous span of bits
+of the variable it is a part of.
+
+This is achieved by encoding fragment information at the end of the
+``DIExpression`` with the ``DW_OP_LLVM_fragment`` operation, whose operands are
+the bit offset of the fragment relative to the start of the variable, and the
+fragment size in bits.
+
+.. note::
+
+   The ``DW_OP_LLVM_fragment`` operation acts only to encode the fragment
+   information, and does not have an effect on the semantics of the expression.
+
+A debug intrinsic which refers to a ``DIExpression`` ending with a fragment
+operation provides information about the fragment of the variable it refers to,
+rather than the whole variable.
+
+An equivalence relation over the set of all variables and variable fragments
+called "spatially overlapping" is defined in order to describe when intrinsics
+terminate the effects of other intrinsics. A variable spatially overlaps with
+itself and all fragments of itself. Fragments additionally spatially overlap
+with other fragments sharing one common bit of the same variable.
+
 .. _format:
 
 Debugging information format
@@ -180,7 +214,9 @@ track source local variables through optimization and code generation.
 
 .. code-block:: llvm
 
-  void @llvm.dbg.declare(metadata, metadata, metadata)
+  void @llvm.dbg.declare(metadata <type> <value>,
+                         metadata <!DILocalVariable>,
+                         metadata <!DIExpression>)
 
 This intrinsic provides information about a local element (e.g., variable).
 The first argument is metadata holding the address of variable, typically a
@@ -221,7 +257,9 @@ agree on the memory location.
 
 .. code-block:: llvm
 
-  void @llvm.dbg.value(metadata, metadata, metadata)
+  void @llvm.dbg.value(metadata <<type> <value>|!DIArgList>,
+                       metadata <!DILocalVariable>,
+                       metadata <!DIExpression>)
 
 This intrinsic provides information when a user source variable is set to a new
 value.  The first argument is the new value (wrapped as metadata).  The second
@@ -243,12 +281,12 @@ the complex expression derives the direct value.
 
 .. code-block:: llvm
 
-  void @llvm.dbg.assign(Value *Value,
-                        DIExpression *ValueExpression,
-                        DILocalVariable *Variable,
-                        DIAssignID *ID,
-                        Value *Address,
-                        DIExpression *AddressExpression)
+  void @llvm.dbg.assign(metadata <<type> <value>|!DIArgList>,
+                        metadata <!DIExpression>,
+                        metadata <!DILocalVariable>,
+                        metadata <!DIAssignID>,
+                        metadata <type> <value>,
+                        metadata <!DIExpression>)
 
 This intrinsic marks the position in IR where a source assignment occurred. It
 encodes the value of the variable. It references the store, if any, that
@@ -259,13 +297,6 @@ argument is a ``DIAssignID`` used to reference a store. The fifth is the
 destination of the store (wrapped as metadata), and the sixth is a `complex
 expression <LangRef.html#diexpression>`_ that modifies it.
 
-The formal LLVM-IR signature is:
-
-.. code-block:: llvm
-
-  void @llvm.dbg.assign(metadata, metadata, metadata, metadata, metadata, metadata)
-
-
 See :doc:`AssignmentTracking` for more info.
 
 Object lifetimes and scoping
@@ -417,8 +448,9 @@ values through compilation, when objects are promoted to SSA values an
 ``llvm.dbg.value`` intrinsic is created for each assignment, recording the
 variable's new location. Compared with the ``llvm.dbg.declare`` intrinsic:
 
-* A dbg.value terminates the effect of any preceding dbg.values for (any
-  overlapping fragments of) the specified variable.
+* A dbg.value terminates the effect of any preceding dbg.values for any
+  spatially overlapping variables or fragments of the specified variable or
+  fragment.
 * The dbg.value's position in the IR defines where in the instruction stream
   the variable's value changes.
 * Operands can be constants, indicating the variable is assigned a

<LangRef.html#diglobalvariable>`_ metadata nodes.

When variables are not allocated to contiguous memory or to a single LLVM value
the metadata must record enough information to describe each "piece" or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor concern here, we say that:

When variables are not allocated

And then one of two options:

to contiguous memory
or
to a single LLVM value

This second part feels off to me: "allocated to a single LLVM value"

One suggestion would be to avoid mentioning allocation, and instead combine the two possibilities in terms of "describing":

"When variables cannot be described by a single use of an LLVM value, ...".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw I really like the wording of everything else!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the wording could be improved - I think that your proposed wording is better, but might be ambiguous with respect to variadic debug values, which aren't necessarily fragments but may be "described" by many LLVM values. Perhaps "When variables are split up across multiple LLVM values...", and it may also be worthwhile to be direct and give an example, i.e. "...such as for structs that have been split into separate LLVM values for each member...". No particularly strong feelings on this part from my end though!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we say "cannot be described by a single expression" then? IIRC when we have multiple fragments, each fragment is described by its own expression (and its own intrinsic)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, the wording I proposed is imprecise/ambiguous; I think I also bled over concepts from the "use fragments for everything" world I was originally thinking in terms of.

I like the framing in terms of the number of expressions for many cases, but we still have cases where we use a single fragment+expression to describe that only one piece of a variable is defined, right? Maybe rather than starting with the "reason" for fragments I can start with just exactly what they are, then in a note go into the various cases they are used and give some examples?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rather than starting with the "reason" for fragments

I think you make a good point here. We don't need to justify the reason for fragments here (arguably it is self evident); we can just state what fragments are:

"A variable may be described in terms of fragments, where a fragment is a contiguous span of bits. This is achieved by ... [rest of the second paragraph]".

If we want to provide motivation, we can do so later through an example or just state in high level terms one code transformation that may cause fragments to appear.

Copy link
Contributor

@SLTozer SLTozer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from the comments above, this all LGTM - thanks for this!

Variables and Variable Fragments
================================

Source language variables (or just "variables") are represented by `local
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add "and constants"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just imo, but I think we could skip mentioning constants, on the basis that the sentence as written isn't exclusive (it doesn't specify that only variables are represented by these classes), and I don't believe(?) that we use fragments for constants since we should always have a single whole definition for them, so the following text would only concern variables. No complaints from me about adding a mention of constants here, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the terminology gets a bit fuzzy, because "constants" can mean different things in different source languages anyway, right?

I didn't audit every other mention of "variable" in the document, but I imagine this same ambiguity comes up elsewhere, so it would be ideal to get a consistent definition to point back at. DWARF seems to call these "data objects" and says:

Chapter 4
Data Object and Object List Entries
This section presents the debugging information entries that describe individual
data objects: variables, parameters and constants, and lists of those objects that
may be grouped in a single declaration, such as a common block.

Maybe we can just adopt "data object" here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the interest of not bike-shedding this patch forever, I'll avoid commenting too much more (it already looks "good enough" to me!), but I think using "data object" outside of the DWARF context may just be adding confusion. Personally I feel like "Variable" is a good term, since the information here explicitly applies to just "variables" and not really to constants, but ultimately I put my vote for either "variables" or "variables and constants".

@@ -417,8 +448,9 @@ values through compilation, when objects are promoted to SSA values an
``llvm.dbg.value`` intrinsic is created for each assignment, recording the
variable's new location. Compared with the ``llvm.dbg.declare`` intrinsic:

* A dbg.value terminates the effect of any preceding dbg.values for (any
overlapping fragments of) the specified variable.
* A dbg.value terminates the effect of any preceding dbg.values for any
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we be consistent and use llvm.dbg.value here?

* A dbg.value terminates the effect of any preceding dbg.values for (any
overlapping fragments of) the specified variable.
* A dbg.value terminates the effect of any preceding dbg.values for any
spatially overlapping variables or fragments of the specified variable or
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way this is worded sounds as if in this example:

%1 = ...
call llvm.dbg.value(metadata %1, DILocalVariable("x"), DIExpression())
%2 = ...
call llvm.dbg.value(metadata %1, DILocalVariable("y"), DIExpression())

The second intrinsic ends the range of "x", but that's not the case. It only terminates it if it refers to the same variable or a fragment of it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
spatially overlapping variables or fragments of the specified variable or
spatially overlapping llvm.dbg.values of the specified variable or

Perhaps this - I think given the definition of spatial overlaps, it should be clear that it applies to both fragment and non-fragment variable definitions.

Copy link
Contributor Author

@slinder1 slinder1 Feb 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it might be good to add a note here, and to make the term "spatially overlapping" be a link back to the definition, but the intention was to define "spatial overlap" such that this could just talk in terms of the universe of all variables and fragments and by definition still be restricted to the way LLVM currently works. In other words, a dbg.value for "x" cannot spatially overlap with a dbg.value for "y" by definition, so the wording here does not need to repeat that restriction.

Edit: at least it doesn't need to repeat it "normatively"; if a note helps the reader understand I'm happy to include one!

I tried to include more kinds of objects under the umbrella term
"variable", but I'm not particularly happy with how circular it sounds.
Any more thoughts on better approaches are appreciated.

I also didn't understand initially that the current behavior in the most
common cases of spatial overlap between fragments is not necessarily
the intended semantics. Instead it is just a legal shortcut
that balances the complexity of handling partial overlap with arbitrary
expressions with the utility to the debugger for these rarer occurences.

None of this is terribly exact, but I still think having something
written about these aspects is better than having nothing, and might
save future contributors a bit of the learning curve. Let me know what
you think!
Copy link
Contributor

@OCHyams OCHyams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to include more kinds of objects under the umbrella term
"variable", but I'm not particularly happy with how circular it sounds.
Any more thoughts on better approaches are appreciated.

IMO what you have now is good.

This all LGTM, thank you for working on this!

I think everyone's previous concerns have been addressed, but I'll hold off on Accepting to give the other reviewers a chance to have another look at this.

Copy link
Member

@jryans jryans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this, seems like a good improvement to me. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants