-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LangRef] Clarify definition of fragments and debug intrinsics #82019
base: main
Are you sure you want to change the base?
Conversation
Try to give a more complete description of what we call a "fragment", and define "spatial overlap" between variables and fragments to make the details of when one intrinsic ends the effects of another more direct and complete. Also change the description of the debug intrinsics to the pseudo-grammar style from LangRef.
@llvm/pr-subscribers-llvm-ir @llvm/pr-subscribers-debuginfo Author: Scott Linder (slinder1) ChangesTry to give a more complete description of what we call a "fragment", and define "spatial overlap" between variables and fragments to make the details of when one intrinsic ends the effects of another more direct and complete. Also change the description of the debug intrinsics to the pseudo-grammar style from LangRef. Full diff: https://github.com/llvm/llvm-project/pull/82019.diff 1 Files Affected:
diff --git a/llvm/docs/SourceLevelDebugging.rst b/llvm/docs/SourceLevelDebugging.rst
index e1df7c355ee092..ccd455a59f5a5b 100644
--- a/llvm/docs/SourceLevelDebugging.rst
+++ b/llvm/docs/SourceLevelDebugging.rst
@@ -130,6 +130,40 @@ debugging information influences optimization passes then it will be reported
as a failure. See :doc:`TestingGuide` for more information on LLVM test
infrastructure and how to run various tests.
+.. _variables_and_variable_fragments:
+
+Variables and Variable Fragments
+================================
+
+Source language variables (or just "variables") are represented by `local
+variable <LangRef.html#dilocalvariable>`_ and `global variable
+<LangRef.html#diglobalvariable>`_ metadata nodes.
+
+When variables are not allocated to contiguous memory or to a single LLVM value
+the metadata must record enough information to describe each "piece" or
+"fragment" of the source variable. Each fragment is a contiguous span of bits
+of the variable it is a part of.
+
+This is achieved by encoding fragment information at the end of the
+``DIExpression`` with the ``DW_OP_LLVM_fragment`` operation, whose operands are
+the bit offset of the fragment relative to the start of the variable, and the
+fragment size in bits.
+
+.. note::
+
+ The ``DW_OP_LLVM_fragment`` operation acts only to encode the fragment
+ information, and does not have an effect on the semantics of the expression.
+
+A debug intrinsic which refers to a ``DIExpression`` ending with a fragment
+operation provides information about the fragment of the variable it refers to,
+rather than the whole variable.
+
+An equivalence relation over the set of all variables and variable fragments
+called "spatially overlapping" is defined in order to describe when intrinsics
+terminate the effects of other intrinsics. A variable spatially overlaps with
+itself and all fragments of itself. Fragments additionally spatially overlap
+with other fragments sharing one common bit of the same variable.
+
.. _format:
Debugging information format
@@ -180,7 +214,9 @@ track source local variables through optimization and code generation.
.. code-block:: llvm
- void @llvm.dbg.declare(metadata, metadata, metadata)
+ void @llvm.dbg.declare(metadata <type> <value>,
+ metadata <!DILocalVariable>,
+ metadata <!DIExpression>)
This intrinsic provides information about a local element (e.g., variable).
The first argument is metadata holding the address of variable, typically a
@@ -221,7 +257,9 @@ agree on the memory location.
.. code-block:: llvm
- void @llvm.dbg.value(metadata, metadata, metadata)
+ void @llvm.dbg.value(metadata <<type> <value>|!DIArgList>,
+ metadata <!DILocalVariable>,
+ metadata <!DIExpression>)
This intrinsic provides information when a user source variable is set to a new
value. The first argument is the new value (wrapped as metadata). The second
@@ -243,12 +281,12 @@ the complex expression derives the direct value.
.. code-block:: llvm
- void @llvm.dbg.assign(Value *Value,
- DIExpression *ValueExpression,
- DILocalVariable *Variable,
- DIAssignID *ID,
- Value *Address,
- DIExpression *AddressExpression)
+ void @llvm.dbg.assign(metadata <<type> <value>|!DIArgList>,
+ metadata <!DIExpression>,
+ metadata <!DILocalVariable>,
+ metadata <!DIAssignID>,
+ metadata <type> <value>,
+ metadata <!DIExpression>)
This intrinsic marks the position in IR where a source assignment occurred. It
encodes the value of the variable. It references the store, if any, that
@@ -259,13 +297,6 @@ argument is a ``DIAssignID`` used to reference a store. The fifth is the
destination of the store (wrapped as metadata), and the sixth is a `complex
expression <LangRef.html#diexpression>`_ that modifies it.
-The formal LLVM-IR signature is:
-
-.. code-block:: llvm
-
- void @llvm.dbg.assign(metadata, metadata, metadata, metadata, metadata, metadata)
-
-
See :doc:`AssignmentTracking` for more info.
Object lifetimes and scoping
@@ -417,8 +448,9 @@ values through compilation, when objects are promoted to SSA values an
``llvm.dbg.value`` intrinsic is created for each assignment, recording the
variable's new location. Compared with the ``llvm.dbg.declare`` intrinsic:
-* A dbg.value terminates the effect of any preceding dbg.values for (any
- overlapping fragments of) the specified variable.
+* A dbg.value terminates the effect of any preceding dbg.values for any
+ spatially overlapping variables or fragments of the specified variable or
+ fragment.
* The dbg.value's position in the IR defines where in the instruction stream
the variable's value changes.
* Operands can be constants, indicating the variable is assigned a
|
llvm/docs/SourceLevelDebugging.rst
Outdated
<LangRef.html#diglobalvariable>`_ metadata nodes. | ||
|
||
When variables are not allocated to contiguous memory or to a single LLVM value | ||
the metadata must record enough information to describe each "piece" or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor concern here, we say that:
When variables are not allocated
And then one of two options:
to contiguous memory
or
to a single LLVM value
This second part feels off to me: "allocated to a single LLVM value"
One suggestion would be to avoid mentioning allocation, and instead combine the two possibilities in terms of "describing":
"When variables cannot be described by a single use of an LLVM value, ...".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw I really like the wording of everything else!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the wording could be improved - I think that your proposed wording is better, but might be ambiguous with respect to variadic debug values, which aren't necessarily fragments but may be "described" by many LLVM values. Perhaps "When variables are split up across multiple LLVM values...", and it may also be worthwhile to be direct and give an example, i.e. "...such as for structs that have been split into separate LLVM values for each member...". No particularly strong feelings on this part from my end though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we say "cannot be described by a single expression" then? IIRC when we have multiple fragments, each fragment is described by its own expression (and its own intrinsic)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, the wording I proposed is imprecise/ambiguous; I think I also bled over concepts from the "use fragments for everything" world I was originally thinking in terms of.
I like the framing in terms of the number of expressions for many cases, but we still have cases where we use a single fragment+expression to describe that only one piece of a variable is defined, right? Maybe rather than starting with the "reason" for fragments I can start with just exactly what they are, then in a note go into the various cases they are used and give some examples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rather than starting with the "reason" for fragments
I think you make a good point here. We don't need to justify the reason for fragments here (arguably it is self evident); we can just state what fragments are:
"A variable may be described in terms of fragments, where a fragment is a contiguous span of bits. This is achieved by ... [rest of the second paragraph]".
If we want to provide motivation, we can do so later through an example or just state in high level terms one code transformation that may cause fragments to appear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from the comments above, this all LGTM - thanks for this!
llvm/docs/SourceLevelDebugging.rst
Outdated
Variables and Variable Fragments | ||
================================ | ||
|
||
Source language variables (or just "variables") are represented by `local |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we add "and constants"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just imo, but I think we could skip mentioning constants, on the basis that the sentence as written isn't exclusive (it doesn't specify that only variables are represented by these classes), and I don't believe(?) that we use fragments for constants since we should always have a single whole definition for them, so the following text would only concern variables. No complaints from me about adding a mention of constants here, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the terminology gets a bit fuzzy, because "constants" can mean different things in different source languages anyway, right?
I didn't audit every other mention of "variable" in the document, but I imagine this same ambiguity comes up elsewhere, so it would be ideal to get a consistent definition to point back at. DWARF seems to call these "data objects" and says:
Chapter 4
Data Object and Object List Entries
This section presents the debugging information entries that describe individual
data objects: variables, parameters and constants, and lists of those objects that
may be grouped in a single declaration, such as a common block.
Maybe we can just adopt "data object" here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the interest of not bike-shedding this patch forever, I'll avoid commenting too much more (it already looks "good enough" to me!), but I think using "data object" outside of the DWARF context may just be adding confusion. Personally I feel like "Variable" is a good term, since the information here explicitly applies to just "variables" and not really to constants, but ultimately I put my vote for either "variables" or "variables and constants".
llvm/docs/SourceLevelDebugging.rst
Outdated
@@ -417,8 +448,9 @@ values through compilation, when objects are promoted to SSA values an | |||
``llvm.dbg.value`` intrinsic is created for each assignment, recording the | |||
variable's new location. Compared with the ``llvm.dbg.declare`` intrinsic: | |||
|
|||
* A dbg.value terminates the effect of any preceding dbg.values for (any | |||
overlapping fragments of) the specified variable. | |||
* A dbg.value terminates the effect of any preceding dbg.values for any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we be consistent and use llvm.dbg.value
here?
llvm/docs/SourceLevelDebugging.rst
Outdated
* A dbg.value terminates the effect of any preceding dbg.values for (any | ||
overlapping fragments of) the specified variable. | ||
* A dbg.value terminates the effect of any preceding dbg.values for any | ||
spatially overlapping variables or fragments of the specified variable or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way this is worded sounds as if in this example:
%1 = ...
call llvm.dbg.value(metadata %1, DILocalVariable("x"), DIExpression())
%2 = ...
call llvm.dbg.value(metadata %1, DILocalVariable("y"), DIExpression())
The second intrinsic ends the range of "x", but that's not the case. It only terminates it if it refers to the same variable or a fragment of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spatially overlapping variables or fragments of the specified variable or | |
spatially overlapping llvm.dbg.values of the specified variable or |
Perhaps this - I think given the definition of spatial overlaps, it should be clear that it applies to both fragment and non-fragment variable definitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, it might be good to add a note here, and to make the term "spatially overlapping" be a link back to the definition, but the intention was to define "spatial overlap" such that this could just talk in terms of the universe of all variables and fragments and by definition still be restricted to the way LLVM currently works. In other words, a dbg.value
for "x" cannot spatially overlap with a dbg.value
for "y" by definition, so the wording here does not need to repeat that restriction.
Edit: at least it doesn't need to repeat it "normatively"; if a note helps the reader understand I'm happy to include one!
I tried to include more kinds of objects under the umbrella term "variable", but I'm not particularly happy with how circular it sounds. Any more thoughts on better approaches are appreciated. I also didn't understand initially that the current behavior in the most common cases of spatial overlap between fragments is not necessarily the intended semantics. Instead it is just a legal shortcut that balances the complexity of handling partial overlap with arbitrary expressions with the utility to the debugger for these rarer occurences. None of this is terribly exact, but I still think having something written about these aspects is better than having nothing, and might save future contributors a bit of the learning curve. Let me know what you think!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to include more kinds of objects under the umbrella term
"variable", but I'm not particularly happy with how circular it sounds.
Any more thoughts on better approaches are appreciated.
IMO what you have now is good.
This all LGTM, thank you for working on this!
I think everyone's previous concerns have been addressed, but I'll hold off on Accepting to give the other reviewers a chance to have another look at this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this, seems like a good improvement to me. 😄
Try to give a more complete description of what we call a "fragment", and define "spatial overlap" between variables and fragments to make the details of when one intrinsic ends the effects of another more direct and complete.
Also change the description of the debug intrinsics to the pseudo-grammar style from LangRef.