Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LangRef] Try to clarify some Metadata semantics #81948

Merged
merged 3 commits into from
Mar 28, 2024

Conversation

slinder1
Copy link
Contributor

General cleanup in LangRef (and two outdated comments in LLParser.cpp) with the aim of making it easier to understand some of the terminology and subtle idiosyncrasies related to metadata in the IR.

I'm still not happy with the fact that "node" is used both informally and with a particular category of metadata in mind, depending on the context. This also bleeds into the type names in the implementation.

There are also several places where names from the implementation appear in the document with no other context or definition. In some cases I added a parenthetical to section titles to tie the two together, but I don't think this is ideal.

I also think it might be useful to define the "abstract" metadata classes like "DIScope" in the document, so the hierarchy of metadata node kinds is direct, and so we can avoid repetitive descriptions of all of the members of on part of the hierarchy. This inheritance doesn't have to be in terms of C++ classes, but using the same names as the implementation seems helpful, and we already do it for many other things.

Finally I added sections for the specialized nodes which are implemented in the IR but didn't have documentation in LangRef yet. These could use some work, and I admit I didn't dig too deep into the specifics beyond enumerating the fields, but I think we would ideally always have a LangRef section for every kind of node.

General cleanup in LangRef (and two outdated comments in LLParser.cpp)
with the aim of making it easier to understand some of the terminology
and subtle idiosyncrasies related to metadata in the IR.

I'm still not happy with the fact that "node" is used both informally
and with a particular category of metadata in mind, depending on the
context. This also bleeds into the type names in the implementation.

There are also several places where names from the implementation appear
in the document with no other context or definition. In some cases I
added a parenthetical to section titles to tie the two together, but I
don't think this is ideal.

I also think it might be useful to define the "abstract" metadata
classes like "DIScope" in the document, so the hierarchy of metadata
node kinds is direct, and so we can avoid repetitive descriptions of all
of the members of on part of the hierarchy. This inheritance doesn't
have to be in terms of C++ classes, but using the same names as the
implementation seems helpful, and we already do it for many other
things.
@llvmbot
Copy link
Collaborator

llvmbot commented Feb 16, 2024

@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-debuginfo

Author: Scott Linder (slinder1)

Changes

General cleanup in LangRef (and two outdated comments in LLParser.cpp) with the aim of making it easier to understand some of the terminology and subtle idiosyncrasies related to metadata in the IR.

I'm still not happy with the fact that "node" is used both informally and with a particular category of metadata in mind, depending on the context. This also bleeds into the type names in the implementation.

There are also several places where names from the implementation appear in the document with no other context or definition. In some cases I added a parenthetical to section titles to tie the two together, but I don't think this is ideal.

I also think it might be useful to define the "abstract" metadata classes like "DIScope" in the document, so the hierarchy of metadata node kinds is direct, and so we can avoid repetitive descriptions of all of the members of on part of the hierarchy. This inheritance doesn't have to be in terms of C++ classes, but using the same names as the implementation seems helpful, and we already do it for many other things.

Finally I added sections for the specialized nodes which are implemented in the IR but didn't have documentation in LangRef yet. These could use some work, and I admit I didn't dig too deep into the specifics beyond enumerating the fields, but I think we would ideally always have a LangRef section for every kind of node.


Full diff: https://github.com/llvm/llvm-project/pull/81948.diff

2 Files Affected:

  • (modified) llvm/docs/LangRef.rst (+140-16)
  • (modified) llvm/lib/AsmParser/LLParser.cpp (+1-2)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index fd2e3aacd0169c..40ab9abe880604 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -5574,34 +5574,106 @@ occurs on.
 Metadata
 ========
 
-LLVM IR allows metadata to be attached to instructions and global objects in the
-program that can convey extra information about the code to the optimizers and
-code generator. One example application of metadata is source-level
-debug information. There are two metadata primitives: strings and nodes.
+LLVM IR allows metadata to be attached to instructions and global objects in
+the program that can convey extra information about the code to the optimizers
+and code generator.
 
-Metadata does not have a type, and is not a value. If referenced from a
-``call`` instruction, it uses the ``metadata`` type.
+There are two metadata primitives: strings and nodes. There are
+also specialized nodes which have a distinguished name and a set of named
+arguments.
+
+.. note::
+
+    One example application of metadata is source-level debug information,
+    which is currently the only use of specialized nodes.
+
+Metadata does not have a type, and is not a value.
+
+A value of non-\ ``metadata`` type can be used in a metadata context using the
+syntax '``<type> <value>``'.
+
+All other metadata is identified in syntax as starting with an exclamation
+point ('``!``').
+
+Metadata may be used in the following value contexts by using the ``metadata``
+type:
+
+- Arguments to certain intrinsic functions, as described in their specification.
+- Arguments to the ``catchpad``/``cleanuppad`` instructions.
+
+.. note::
+
+    Metadata can be "wrapped" in a ``MetadataAsValue`` so it can be referenced
+    in a value context: ``MetadataAsValue`` is-a ``Value``.
+
+    A typed value can be "wrapped" in ``ValueAsMetadata`` so it can be
+    referenced in a metadata context: ``ValueAsMetadata`` is-a ``Metadata``.
+
+    There is no explicit syntax for a ``ValueAsMetadata``, and instead
+    the fact that a type identifier cannot begin with an exclamation point
+    is used to resolve ambiguity.
+
+    A ``metadata`` type implies a ``MetadataAsValue``, and when followed with a
+    '``<type> <value>``' pair it wraps the typed value in a ``ValueAsMetadata``.
 
-All metadata are identified in syntax by an exclamation point ('``!``').
+    For example, the first argument
+    to this call is a ``MetadataAsValue(ValueAsMetadata(Value))``:
+
+    .. code-block:: llvm
+
+        call void @llvm.foo(metadata i32 1)
+
+    Whereas the first argument to this call is a ``MetadataAsValue(MDNode)``:
+
+    .. code-block:: llvm
+
+        call void @llvm.foo(metadata !0)
+
+    The first element of this ``MDTuple`` is a ``MDNode``:
+
+    .. code-block:: llvm
+
+        !{!0}
+
+    And the first element of this ``MDTuple`` is a ``ValueAsMetadata(Value)``:
+
+    .. code-block:: llvm
+
+        !{i32 1}
 
 .. _metadata-string:
 
-Metadata Nodes and Metadata Strings
------------------------------------
+Metadata Strings (``MDString``)
+-------------------------------
+
+.. FIXME Either fix all references to "MDString" in the docs, or make that
+   identifier a formal part of the document.
 
 A metadata string is a string surrounded by double quotes. It can
 contain any character by escaping non-printable characters with
 "``\xx``" where "``xx``" is the two digit hex code. For example:
 "``!"test\00"``".
 
-Metadata nodes are represented with notation similar to structure
-constants (a comma separated list of elements, surrounded by braces and
-preceded by an exclamation point). Metadata nodes can have any values as
+.. note::
+
+   A metadata string is metadata, but is not a meatdata node.
+
+.. _metadata-node:
+
+Metadata Nodes (``MDNode``)
+---------------------------
+
+.. FIXME Either fix all references to "MDNode" in the docs, or make that
+   identifier a formal part of the document.
+
+Metadata tuples are represented with notation similar to structure
+constants: a comma separated list of elements, surrounded by braces and
+preceded by an exclamation point. Metadata nodes can have any values as
 their operand. For example:
 
 .. code-block:: llvm
 
-    !{ !"test\00", i32 10}
+    !{!"test\00", i32 10}
 
 Metadata nodes that aren't uniqued use the ``distinct`` keyword. For example:
 
@@ -5628,6 +5700,12 @@ intrinsic is using three metadata arguments:
 
     call void @llvm.dbg.value(metadata !24, metadata !25, metadata !26)
 
+
+.. FIXME Attachments cannot be ValueAsMetadata, but we don't have a
+   particularly clear way to refer to ValueAsMetadata without getting into
+   implementation details. Ideally the restriction would be explicit somewhere,
+   though?
+
 Metadata can be attached to an instruction. Here metadata ``!21`` is attached
 to the ``add`` instruction using the ``!dbg`` identifier:
 
@@ -6261,7 +6339,7 @@ valid debug intrinsic.
     !5 = !DIExpression(DW_OP_constu, 42, DW_OP_stack_value)
 
 DIAssignID
-""""""""""""
+""""""""""
 
 ``DIAssignID`` nodes have no operands and are always distinct. They are used to
 link together `@llvm.dbg.assign` intrinsics (:ref:`debug
@@ -6276,7 +6354,13 @@ Assignment Tracking <AssignmentTracking.html>`_ for more info.
     !2 = distinct !DIAssignID()
 
 DIArgList
-""""""""""""
+"""""""""
+
+.. FIXME In the implementation this is not a "node", but as it can only appear
+   inline in a function context that distinction isn't observable anyway. Even
+   if it is not required, it would be nice to be more clear about what is a
+   "node", and what that actually means. The names in the implementation could
+   also be updated to mirror whatever we decide here.
 
 ``DIArgList`` nodes hold a list of constant or SSA value references. These are
 used in :ref:`debug intrinsics<dbg_intrinsics>` (currently only in
@@ -6292,7 +6376,7 @@ inlined, and cannot appear in named metadata.
                    metadata !DIExpression(DW_OP_LLVM_arg, 0, DW_OP_LLVM_arg, 1, DW_OP_plus))
 
 DIFlags
-"""""""""""""""
+"""""""
 
 These flags encode various properties of DINodes.
 
@@ -6368,6 +6452,46 @@ within the file where the label is declared.
 
   !2 = !DILabel(scope: !0, name: "foo", file: !1, line: 7)
 
+DICommonBlock
+"""""""""""""
+
+``DICommonBlock`` nodes represent Fortran common blocks. The ``scope:`` field
+is mandatory and points to a :ref:`DILexicalBlockFile`, a
+:ref:`DILexicalBlock`, or a :ref:`DISubprogram`. The ``declaration:``,
+``name:``, ``file:``, and ``line:`` fields are optional.
+
+DIModule
+""""""""
+
+``DIModule`` nodes represent a source language module, for example, a Clang
+module, or a Fortran module. The ``scope:`` field is mandatory and points to a
+:ref:`DILexicalBlockFile`, a :ref:`DILexicalBlock`, or a :ref:`DISubprogram`.
+The ``name:`` field is mandatory. The ``configMacros:``, ``includePath:``,
+``apinotes:``, ``file:``, ``line:``, and ``isDecl:`` fields are optional.
+
+DIStringType
+""""""""""""
+
+``DIStringType`` nodes represent a Fortran ``CHARACTER(n)`` type, with a
+dynamic length and location encoded as an expression.
+The ``tag:`` field is optional and defaults to ``DW_TAG_string_type``. The ``name:``,
+``stringLength:``, ``stringLengthExpression``, ``stringLocationExpression:``,
+``size:``, ``align:``, and ``encoding:`` fields are optional.
+
+If not present, the ``size:`` and ``align:`` fields default to the value zero.
+
+The length in bits of the string is specified by the first of the following
+fields present:
+
+- ``stringLength:``, which points to a ``DIVariable`` whose value is the string
+  length in bits.
+- ``stringLengthExpression:``, which points to a ``DIExpression`` which
+  computes the length in bits.
+- ``size``, which contains the literal length in bits.
+
+The ``stringLocationExpression:`` points to a ``DIExpression`` which describes
+the "data location" of the string object, if present.
+
 '``tbaa``' Metadata
 ^^^^^^^^^^^^^^^^^^^
 
diff --git a/llvm/lib/AsmParser/LLParser.cpp b/llvm/lib/AsmParser/LLParser.cpp
index a91e2f690999e0..1a8400db72491e 100644
--- a/llvm/lib/AsmParser/LLParser.cpp
+++ b/llvm/lib/AsmParser/LLParser.cpp
@@ -8142,7 +8142,7 @@ int LLParser::parseInsertValue(Instruction *&Inst, PerFunctionState &PFS) {
 /// parseMDNodeVector
 ///   ::= { Element (',' Element)* }
 /// Element
-///   ::= 'null' | TypeAndValue
+///   ::= 'null' | Metadata
 bool LLParser::parseMDNodeVector(SmallVectorImpl<Metadata *> &Elts) {
   if (parseToken(lltok::lbrace, "expected '{' here"))
     return true;
@@ -8152,7 +8152,6 @@ bool LLParser::parseMDNodeVector(SmallVectorImpl<Metadata *> &Elts) {
     return false;
 
   do {
-    // Null is a special case since it is typeless.
     if (EatIfPresent(lltok::kw_null)) {
       Elts.push_back(nullptr);
       continue;

llvm/docs/LangRef.rst Outdated Show resolved Hide resolved
Copy link
Member

@jryans jryans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, it's great to have these details recorded for future readers. 😄

llvm/docs/LangRef.rst Outdated Show resolved Hide resolved
Copy link
Member

@jmorse jmorse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. It's hard to pin down the details of ValueAsMetadata and MetadataAsValue because they don't follow principled rules (AFAIUI) and exist to glue the Value and Metadata hierarchies together at various points. It's the portion of debug-info that isn't a first-class element of LLVM, so it's no surprise it's hard to document.

@slinder1 slinder1 merged commit eee8c61 into llvm:main Mar 28, 2024
4 of 5 checks passed
@slinder1 slinder1 deleted the langref-metadata-cleanup branch March 28, 2024 22:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants