[AsmParser] Don't require value numbers to be consecutive #78171

nikic · 2024-01-15T15:38:31Z

Currently, the IR parser requires that %n style numbered values are consecutive. This means that the IR becomes invalid as soon as you remove an instruction, argument or block. This makes it very annoying to modify IR without running it through instnamer first.

I don't think there is any good reason to impose this requirement. This PR relaxes it to allow value IDs to be non-consecutive, but it still keeps the requirement that they're increasing (i.e. you can't skip a value number and then assign it later).

I've kept the underlying representation the same and added a guard against skipping large ranges to guard against degenerate cases where the NumberedVals vector would become huge. If people prefer, I could change NumberedVals from a vector to a map + a counter instead, in which case we could support arbitrary numbers without large increases in memory usage.

This only implements support for skipping numbers for local values. We should extend this to global values in the future as well.

nikic · 2024-01-15T15:40:50Z

llvm/test/Assembler/call-nonzero-program-addrspace-2.ll

@@ -3,7 +3,7 @@

 ; Check that numbered variables in a nonzero program address space 200 can be used in a call instruction

-define i8 @test_unnamed(ptr, ptr addrspace(42) %0) {
+define i8 @test_unnamed(ptr, ptr addrspace(42) %1) {


This is a bug in the previous implementation: While %0 is written here, it was actually interpreted as %1.

github-actions · 2024-01-15T15:40:54Z

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:

git-clang-format --diff 480cc413b7f7e73f90646e5feeb598e36e4e9565 db7bcbb2ba2add13b0e39abd12c7b0baf45be7f4 -- llvm/include/llvm/AsmParser/LLParser.h llvm/lib/AsmParser/LLParser.cpp

View the diff from clang-format here.

diff --git a/llvm/lib/AsmParser/LLParser.cpp b/llvm/lib/AsmParser/LLParser.cpp
index ccce7c2392..f06e06c6d2 100644
--- a/llvm/lib/AsmParser/LLParser.cpp
+++ b/llvm/lib/AsmParser/LLParser.cpp
@@ -3290,7 +3290,7 @@ bool LLParser::parseTargetExtType(Type *&Result) {
 LLParser::PerFunctionState::PerFunctionState(LLParser &p, Function &f,
                                              int functionNumber,
                                              ArrayRef<unsigned> UnnamedArgNums)
-  : P(p), F(f), FunctionNumber(functionNumber) {
+    : P(p), F(f), FunctionNumber(functionNumber) {
 
   // Insert unnamed arguments into the NumberedVals list.
   auto It = UnnamedArgNums.begin();

Currently, the IR parser requires that %n style numbered values are consecutive. This means that the IR becomes invalid as soon as you remove an instruction, argument or block. This makes it very annoying to modify IR without running it through instnamer first. I don't think there is any good reason to impose this requirement. This PR relaxes it to allow value IDs to be non-consecutive, but it still keeps the requirement that they're increasing (i.e. you can't skip a value number and then assign it later). I've kept the underlying representation the same and added a guard against skipping large ranges to guard against degenerate cases where the NumberedVals vector would become huge. If people prefer, I could change NumberedVals from a vector to a map + a counter instead, in which case we could support arbitrary numbers without large increases in memory usage. This only implements support for skipping numbers for local values. We should extend this to global values in the future as well.

nikic · 2024-01-17T10:42:39Z

I've kept the underlying representation the same and added a guard against skipping large ranges to guard against degenerate cases where the NumberedVals vector would become huge. If people prefer, I could change NumberedVals from a vector to a map + a counter instead, in which case we could support arbitrary numbers without large increases in memory usage.

I went ahead and did this. The restriction on value skipping seemed pretty odd, and I don't think we really care about the performance difference between using a vector and a map when it comes to IR parsing.

dtcxzyw · 2024-01-17T12:16:58Z

I don't think there is any good reason to impose this requirement. This PR relaxes it to allow value IDs to be non-consecutive, but it still keeps the requirement that they're increasing (i.e. you can't skip a value number and then assign it later).

llvm-project/llvm/docs/LangRef.rst

Lines 131 to 135 in 2db9244

    
           #. Unnamed temporaries are numbered sequentially (using a per-function 
        
              incrementing counter, starting with 0). Note that basic blocks and unnamed 
        
              function parameters are included in this numbering. For example, if the 
        
              entry basic block is not given a label name and all function parameters are 
        
              named, then it will get number 0.

Could you please update the LangRef as well?

llvmbot · 2024-01-17T12:49:43Z

@llvm/pr-subscribers-llvm-ir

Author: Nikita Popov (nikic)

Changes

Currently, the IR parser requires that %n style numbered values are consecutive. This means that the IR becomes invalid as soon as you remove an instruction, argument or block. This makes it very annoying to modify IR without running it through instnamer first.

I don't think there is any good reason to impose this requirement. This PR relaxes it to allow value IDs to be non-consecutive, but it still keeps the requirement that they're increasing (i.e. you can't skip a value number and then assign it later).

I've kept the underlying representation the same and added a guard against skipping large ranges to guard against degenerate cases where the NumberedVals vector would become huge. If people prefer, I could change NumberedVals from a vector to a map + a counter instead, in which case we could support arbitrary numbers without large increases in memory usage.

This only implements support for skipping numbers for local values. We should extend this to global values in the future as well.

Full diff: https://github.com/llvm/llvm-project/pull/78171.diff

10 Files Affected:

(modified) llvm/docs/LangRef.rst (+7-5)
(modified) llvm/include/llvm/AsmParser/LLParser.h (+27-5)
(modified) llvm/lib/AsmParser/LLParser.cpp (+69-37)
(modified) llvm/test/Assembler/call-nonzero-program-addrspace-2.ll (+1-1)
(removed) llvm/test/Assembler/invalid-arg-num-1.ll (-6)
(removed) llvm/test/Assembler/invalid-arg-num-2.ll (-6)
(removed) llvm/test/Assembler/invalid-arg-num-3.ll (-6)
(removed) llvm/test/Assembler/invalid-block-label-num.ll (-7)
(added) llvm/test/Assembler/skip-value-numbers-invalid.ll (+31)
(added) llvm/test/Assembler/skip-value-numbers.ll (+79)

diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index d881deb30049a2..0e64ca88fc2f37 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -128,11 +128,13 @@ lexical features of LLVM:
 #. Comments are delimited with a '``;``' and go until the end of line.
 #. Unnamed temporaries are created when the result of a computation is
    not assigned to a named value.
-#. Unnamed temporaries are numbered sequentially (using a per-function
-   incrementing counter, starting with 0). Note that basic blocks and unnamed
-   function parameters are included in this numbering. For example, if the
-   entry basic block is not given a label name and all function parameters are
-   named, then it will get number 0.
+#. By default, unnamed temporaries are numbered sequentially (using a
+   per-function incrementing counter, starting with 0). However, when explicitly
+   specifying temporary numbers, it is allowed to skip over numbers.
+
+   Note that basic blocks and unnamed function parameters are included in this
+   numbering. For example, if the entry basic block is not given a label name
+   and all function parameters are named, then it will get number 0.
 
 It also shows a convention that we follow in this document. When
 demonstrating instructions, we will follow an instruction with a comment
diff --git a/llvm/include/llvm/AsmParser/LLParser.h b/llvm/include/llvm/AsmParser/LLParser.h
index 54bc3e582e01ae..b0d02eac8e672d 100644
--- a/llvm/include/llvm/AsmParser/LLParser.h
+++ b/llvm/include/llvm/AsmParser/LLParser.h
@@ -203,6 +203,9 @@ namespace llvm {
     bool error(LocTy L, const Twine &Msg) const { return Lex.Error(L, Msg); }
     bool tokError(const Twine &Msg) const { return error(Lex.getLoc(), Msg); }
 
+    bool checkValueID(LocTy L, StringRef Kind, StringRef Prefix,
+                      unsigned NextID, unsigned ID) const;
+
     /// Restore the internal name and slot mappings using the mappings that
     /// were created at an earlier parsing stage.
     void restoreParsingState(const SlotMapping *Slots);
@@ -448,19 +451,35 @@ namespace llvm {
     bool parseFunctionType(Type *&Result);
     bool parseTargetExtType(Type *&Result);
 
+    class NumberedValues {
+      DenseMap<unsigned, Value *> Vals;
+      unsigned NextUnusedID = 0;
+
+    public:
+      unsigned getNext() const { return NextUnusedID; }
+      Value *get(unsigned ID) const { return Vals.lookup(ID); }
+      void add(unsigned ID, Value *V) {
+        assert(ID >= NextUnusedID && "Invalid value ID");
+        Vals.insert({ID, V});
+        NextUnusedID = ID + 1;
+      }
+    };
+
     // Function Semantic Analysis.
     class PerFunctionState {
       LLParser &P;
       Function &F;
       std::map<std::string, std::pair<Value*, LocTy> > ForwardRefVals;
       std::map<unsigned, std::pair<Value*, LocTy> > ForwardRefValIDs;
-      std::vector<Value*> NumberedVals;
+      NumberedValues NumberedVals;
 
       /// FunctionNumber - If this is an unnamed function, this is the slot
       /// number of it, otherwise it is -1.
       int FunctionNumber;
+
     public:
-      PerFunctionState(LLParser &p, Function &f, int functionNumber);
+      PerFunctionState(LLParser &p, Function &f, int functionNumber,
+                       ArrayRef<unsigned> UnnamedArgNums);
       ~PerFunctionState();
 
       Function &getFunction() const { return F; }
@@ -590,9 +609,12 @@ namespace llvm {
       ArgInfo(LocTy L, Type *ty, AttributeSet Attr, const std::string &N)
           : Loc(L), Ty(ty), Attrs(Attr), Name(N) {}
     };
-    bool parseArgumentList(SmallVectorImpl<ArgInfo> &ArgList, bool &IsVarArg);
-    bool parseFunctionHeader(Function *&Fn, bool IsDefine);
-    bool parseFunctionBody(Function &Fn);
+    bool parseArgumentList(SmallVectorImpl<ArgInfo> &ArgList,
+                           SmallVectorImpl<unsigned> &UnnamedArgNums,
+                           bool &IsVarArg);
+    bool parseFunctionHeader(Function *&Fn, bool IsDefine,
+                             SmallVectorImpl<unsigned> &UnnamedArgNums);
+    bool parseFunctionBody(Function &Fn, ArrayRef<unsigned> UnnamedArgNums);
     bool parseBasicBlock(PerFunctionState &PFS);
 
     enum TailCallType { TCT_None, TCT_Tail, TCT_MustTail };
diff --git a/llvm/lib/AsmParser/LLParser.cpp b/llvm/lib/AsmParser/LLParser.cpp
index fb9e1ba875e1fa..ccce7c2392602b 100644
--- a/llvm/lib/AsmParser/LLParser.cpp
+++ b/llvm/lib/AsmParser/LLParser.cpp
@@ -572,7 +572,8 @@ bool LLParser::parseDeclare() {
   }
 
   Function *F;
-  if (parseFunctionHeader(F, false))
+  SmallVector<unsigned> UnnamedArgNums;
+  if (parseFunctionHeader(F, false, UnnamedArgNums))
     return true;
   for (auto &MD : MDs)
     F->addMetadata(MD.first, *MD.second);
@@ -586,8 +587,10 @@ bool LLParser::parseDefine() {
   Lex.Lex();
 
   Function *F;
-  return parseFunctionHeader(F, true) || parseOptionalFunctionMetadata(*F) ||
-         parseFunctionBody(*F);
+  SmallVector<unsigned> UnnamedArgNums;
+  return parseFunctionHeader(F, true, UnnamedArgNums) ||
+         parseOptionalFunctionMetadata(*F) ||
+         parseFunctionBody(*F, UnnamedArgNums);
 }
 
 /// parseGlobalType
@@ -2925,6 +2928,15 @@ bool LLParser::parseOptionalOperandBundles(
   return false;
 }
 
+bool LLParser::checkValueID(LocTy Loc, StringRef Kind, StringRef Prefix,
+                            unsigned NextID, unsigned ID) const {
+  if (ID < NextID)
+    return error(Loc, Kind + " expected to be numbered '" + Prefix +
+                          Twine(NextID) + "' or greater");
+
+  return false;
+}
+
 /// parseArgumentList - parse the argument list for a function type or function
 /// prototype.
 ///   ::= '(' ArgTypeListI ')'
@@ -2935,6 +2947,7 @@ bool LLParser::parseOptionalOperandBundles(
 ///   ::= ArgType (',' ArgType)*
 ///
 bool LLParser::parseArgumentList(SmallVectorImpl<ArgInfo> &ArgList,
+                                 SmallVectorImpl<unsigned> &UnnamedArgNums,
                                  bool &IsVarArg) {
   unsigned CurValID = 0;
   IsVarArg = false;
@@ -2961,12 +2974,19 @@ bool LLParser::parseArgumentList(SmallVectorImpl<ArgInfo> &ArgList,
     if (Lex.getKind() == lltok::LocalVar) {
       Name = Lex.getStrVal();
       Lex.Lex();
-    } else if (Lex.getKind() == lltok::LocalVarID) {
-      if (Lex.getUIntVal() != CurValID)
-        return error(TypeLoc, "argument expected to be numbered '%" +
-                                  Twine(CurValID) + "'");
-      ++CurValID;
-      Lex.Lex();
+    } else {
+      unsigned ArgID;
+      if (Lex.getKind() == lltok::LocalVarID) {
+        ArgID = Lex.getUIntVal();
+        if (checkValueID(TypeLoc, "argument", "%", CurValID, ArgID))
+          return true;
+        Lex.Lex();
+      } else {
+        ArgID = CurValID;
+      }
+
+      UnnamedArgNums.push_back(ArgID);
+      CurValID = ArgID + 1;
     }
 
     if (!FunctionType::isValidArgumentType(ArgTy))
@@ -2995,13 +3015,17 @@ bool LLParser::parseArgumentList(SmallVectorImpl<ArgInfo> &ArgList,
         Name = Lex.getStrVal();
         Lex.Lex();
       } else {
+        unsigned ArgID;
         if (Lex.getKind() == lltok::LocalVarID) {
-          if (Lex.getUIntVal() != CurValID)
-            return error(TypeLoc, "argument expected to be numbered '%" +
-                                      Twine(CurValID) + "'");
+          ArgID = Lex.getUIntVal();
+          if (checkValueID(TypeLoc, "argument", "%", CurValID, ArgID))
+            return true;
           Lex.Lex();
+        } else {
+          ArgID = CurValID;
         }
-        ++CurValID;
+        UnnamedArgNums.push_back(ArgID);
+        CurValID = ArgID + 1;
         Name = "";
       }
 
@@ -3027,7 +3051,8 @@ bool LLParser::parseFunctionType(Type *&Result) {
 
   SmallVector<ArgInfo, 8> ArgList;
   bool IsVarArg;
-  if (parseArgumentList(ArgList, IsVarArg))
+  SmallVector<unsigned> UnnamedArgNums;
+  if (parseArgumentList(ArgList, UnnamedArgNums, IsVarArg))
     return true;
 
   // Reject names on the arguments lists.
@@ -3263,13 +3288,18 @@ bool LLParser::parseTargetExtType(Type *&Result) {
 //===----------------------------------------------------------------------===//
 
 LLParser::PerFunctionState::PerFunctionState(LLParser &p, Function &f,
-                                             int functionNumber)
+                                             int functionNumber,
+                                             ArrayRef<unsigned> UnnamedArgNums)
   : P(p), F(f), FunctionNumber(functionNumber) {
 
   // Insert unnamed arguments into the NumberedVals list.
-  for (Argument &A : F.args())
-    if (!A.hasName())
-      NumberedVals.push_back(&A);
+  auto It = UnnamedArgNums.begin();
+  for (Argument &A : F.args()) {
+    if (!A.hasName()) {
+      unsigned ArgNum = *It++;
+      NumberedVals.add(ArgNum, &A);
+    }
+  }
 }
 
 LLParser::PerFunctionState::~PerFunctionState() {
@@ -3350,7 +3380,7 @@ Value *LLParser::PerFunctionState::getVal(const std::string &Name, Type *Ty,
 
 Value *LLParser::PerFunctionState::getVal(unsigned ID, Type *Ty, LocTy Loc) {
   // Look this name up in the normal function symbol table.
-  Value *Val = ID < NumberedVals.size() ? NumberedVals[ID] : nullptr;
+  Value *Val = NumberedVals.get(ID);
 
   // If this is a forward reference for the value, see if we already created a
   // forward ref record.
@@ -3398,11 +3428,11 @@ bool LLParser::PerFunctionState::setInstName(int NameID,
   if (NameStr.empty()) {
     // If neither a name nor an ID was specified, just use the next ID.
     if (NameID == -1)
-      NameID = NumberedVals.size();
+      NameID = NumberedVals.getNext();
 
-    if (unsigned(NameID) != NumberedVals.size())
-      return P.error(NameLoc, "instruction expected to be numbered '%" +
-                                  Twine(NumberedVals.size()) + "'");
+    if (P.checkValueID(NameLoc, "instruction", "%", NumberedVals.getNext(),
+                       NameID))
+      return true;
 
     auto FI = ForwardRefValIDs.find(NameID);
     if (FI != ForwardRefValIDs.end()) {
@@ -3417,7 +3447,7 @@ bool LLParser::PerFunctionState::setInstName(int NameID,
       ForwardRefValIDs.erase(FI);
     }
 
-    NumberedVals.push_back(Inst);
+    NumberedVals.add(NameID, Inst);
     return false;
   }
 
@@ -3464,15 +3494,15 @@ BasicBlock *LLParser::PerFunctionState::defineBB(const std::string &Name,
                                                  int NameID, LocTy Loc) {
   BasicBlock *BB;
   if (Name.empty()) {
-    if (NameID != -1 && unsigned(NameID) != NumberedVals.size()) {
-      P.error(Loc, "label expected to be numbered '" +
-                       Twine(NumberedVals.size()) + "'");
-      return nullptr;
+    if (NameID != -1) {
+      if (P.checkValueID(Loc, "label", "", NumberedVals.getNext(), NameID))
+        return nullptr;
+    } else {
+      NameID = NumberedVals.getNext();
     }
-    BB = getBB(NumberedVals.size(), Loc);
+    BB = getBB(NameID, Loc);
     if (!BB) {
-      P.error(Loc, "unable to create block numbered '" +
-                       Twine(NumberedVals.size()) + "'");
+      P.error(Loc, "unable to create block numbered '" + Twine(NameID) + "'");
       return nullptr;
     }
   } else {
@@ -3489,8 +3519,8 @@ BasicBlock *LLParser::PerFunctionState::defineBB(const std::string &Name,
 
   // Remove the block from forward ref sets.
   if (Name.empty()) {
-    ForwardRefValIDs.erase(NumberedVals.size());
-    NumberedVals.push_back(BB);
+    ForwardRefValIDs.erase(NameID);
+    NumberedVals.add(NameID, BB);
   } else {
     // BB forward references are already in the function symbol table.
     ForwardRefVals.erase(Name);
@@ -5934,7 +5964,8 @@ bool LLParser::parseTypeAndBasicBlock(BasicBlock *&BB, LocTy &Loc,
 ///       OptionalCallingConv OptRetAttrs OptUnnamedAddr Type GlobalName
 ///       '(' ArgList ')' OptAddrSpace OptFuncAttrs OptSection OptionalAlign
 ///       OptGC OptionalPrefix OptionalPrologue OptPersonalityFn
-bool LLParser::parseFunctionHeader(Function *&Fn, bool IsDefine) {
+bool LLParser::parseFunctionHeader(Function *&Fn, bool IsDefine,
+                                   SmallVectorImpl<unsigned> &UnnamedArgNums) {
   // parse the linkage.
   LocTy LinkageLoc = Lex.getLoc();
   unsigned Linkage;
@@ -6022,7 +6053,7 @@ bool LLParser::parseFunctionHeader(Function *&Fn, bool IsDefine) {
   Constant *PersonalityFn = nullptr;
   Comdat *C;
 
-  if (parseArgumentList(ArgList, IsVarArg) ||
+  if (parseArgumentList(ArgList, UnnamedArgNums, IsVarArg) ||
       parseOptionalUnnamedAddr(UnnamedAddr) ||
       parseOptionalProgramAddrSpace(AddrSpace) ||
       parseFnAttributeValuePairs(FuncAttrs, FwdRefAttrGrps, false,
@@ -6217,7 +6248,8 @@ bool LLParser::PerFunctionState::resolveForwardRefBlockAddresses() {
 
 /// parseFunctionBody
 ///   ::= '{' BasicBlock+ UseListOrderDirective* '}'
-bool LLParser::parseFunctionBody(Function &Fn) {
+bool LLParser::parseFunctionBody(Function &Fn,
+                                 ArrayRef<unsigned> UnnamedArgNums) {
   if (Lex.getKind() != lltok::lbrace)
     return tokError("expected '{' in function body");
   Lex.Lex();  // eat the {.
@@ -6225,7 +6257,7 @@ bool LLParser::parseFunctionBody(Function &Fn) {
   int FunctionNumber = -1;
   if (!Fn.hasName()) FunctionNumber = NumberedVals.size()-1;
 
-  PerFunctionState PFS(*this, Fn, FunctionNumber);
+  PerFunctionState PFS(*this, Fn, FunctionNumber, UnnamedArgNums);
 
   // Resolve block addresses and allow basic blocks to be forward-declared
   // within this function.
diff --git a/llvm/test/Assembler/call-nonzero-program-addrspace-2.ll b/llvm/test/Assembler/call-nonzero-program-addrspace-2.ll
index f5c5437d8221e6..bc600d56db51b2 100644
--- a/llvm/test/Assembler/call-nonzero-program-addrspace-2.ll
+++ b/llvm/test/Assembler/call-nonzero-program-addrspace-2.ll
@@ -3,7 +3,7 @@
 
 ; Check that numbered variables in a nonzero program address space 200 can be used in a call instruction
 
-define i8 @test_unnamed(ptr, ptr addrspace(42) %0) {
+define i8 @test_unnamed(ptr, ptr addrspace(42) %1) {
   ; Calls with explicit address spaces are fine:
   call addrspace(0) i8 %0(i32 0)
   call addrspace(42) i8 %1(i32 0)
diff --git a/llvm/test/Assembler/invalid-arg-num-1.ll b/llvm/test/Assembler/invalid-arg-num-1.ll
deleted file mode 100644
index ee13c339dec448..00000000000000
--- a/llvm/test/Assembler/invalid-arg-num-1.ll
+++ /dev/null
@@ -1,6 +0,0 @@
-; RUN: not llvm-as < %s -o /dev/null 2>&1 | FileCheck %s
-
-; CHECK: error: argument expected to be numbered '%1'
-define void @foo(i32 %0, i32 %5) {
-  ret void
-}
diff --git a/llvm/test/Assembler/invalid-arg-num-2.ll b/llvm/test/Assembler/invalid-arg-num-2.ll
deleted file mode 100644
index 6ecb0033512d00..00000000000000
--- a/llvm/test/Assembler/invalid-arg-num-2.ll
+++ /dev/null
@@ -1,6 +0,0 @@
-; RUN: not llvm-as < %s -o /dev/null 2>&1 | FileCheck %s
-
-; CHECK: error: argument expected to be numbered '%1'
-define void @foo(i8 %0, i32 %named, i32 %2) {
-  ret void
-}
diff --git a/llvm/test/Assembler/invalid-arg-num-3.ll b/llvm/test/Assembler/invalid-arg-num-3.ll
deleted file mode 100644
index f1638365630be5..00000000000000
--- a/llvm/test/Assembler/invalid-arg-num-3.ll
+++ /dev/null
@@ -1,6 +0,0 @@
-; RUN: not llvm-as < %s -o /dev/null 2>&1 | FileCheck %s
-
-; CHECK: error: argument expected to be numbered '%0'
-define void @foo(i8 %1) {
-  ret void
-}
diff --git a/llvm/test/Assembler/invalid-block-label-num.ll b/llvm/test/Assembler/invalid-block-label-num.ll
deleted file mode 100644
index 4f02300849e68f..00000000000000
--- a/llvm/test/Assembler/invalid-block-label-num.ll
+++ /dev/null
@@ -1,7 +0,0 @@
-; RUN: not llvm-as < %s 2>&1 | FileCheck %s
-
-define void @f () {
-1:
-; CHECK: error: label expected to be numbered '0'
-  ret void
-}
diff --git a/llvm/test/Assembler/skip-value-numbers-invalid.ll b/llvm/test/Assembler/skip-value-numbers-invalid.ll
new file mode 100644
index 00000000000000..67e6b10196bc8e
--- /dev/null
+++ b/llvm/test/Assembler/skip-value-numbers-invalid.ll
@@ -0,0 +1,31 @@
+; RUN: split-file %s %t
+; RUN: not llvm-as < %s %t/instr_smaller_id.ll 2>&1 | FileCheck %s --check-prefix=INSTR-SMALLER-ID
+; RUN: not llvm-as < %s %t/arg_smaller_id.ll 2>&1 | FileCheck %s --check-prefix=ARG-SMALLER-ID
+; RUN: not llvm-as < %s %t/block_smaller_id.ll 2>&1 | FileCheck %s --check-prefix=BLOCK-SMALLER-ID
+
+;--- instr_smaller_id.ll
+
+; INSTR-SMALLER-ID: error: instruction expected to be numbered '%11' or greater
+define i32 @test() {
+  %10 = add i32 1, 2
+  %5 = add i32 %10, 3
+  ret i32 %5
+}
+
+;--- arg_smaller_id.ll
+
+; ARG-SMALLER-ID: error: argument expected to be numbered '%11' or greater
+define i32 @test(i32 %10, i32 %5) {
+  ret i32 %5
+}
+
+;--- block_smaller_id.ll
+
+; BLOCK-SMALLER-ID: error: label expected to be numbered '11' or greater
+define i32 @test() {
+10:
+  br label %5
+
+5:
+  ret i32 0
+}
diff --git a/llvm/test/Assembler/skip-value-numbers.ll b/llvm/test/Assembler/skip-value-numbers.ll
new file mode 100644
index 00000000000000..d74d8d502709b9
--- /dev/null
+++ b/llvm/test/Assembler/skip-value-numbers.ll
@@ -0,0 +1,79 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: -p --version 4
+; RUN: opt -S < %s | FileCheck %s
+
+define i32 @instr() {
+; CHECK-LABEL: define i32 @instr() {
+; CHECK-NEXT:    %1 = add i32 1, 2
+; CHECK-NEXT:    %2 = add i32 %1, 3
+; CHECK-NEXT:    %3 = add i32 %2, 4
+; CHECK-NEXT:    ret i32 %3
+;
+  %10 = add i32 1, 2
+  %20 = add i32 %10, 3
+  %30 = add i32 %20, 4
+  ret i32 %30
+}
+
+define i32 @instr_some_implicit() {
+; CHECK-LABEL: define i32 @instr_some_implicit() {
+; CHECK-NEXT:    %1 = add i32 1, 2
+; CHECK-NEXT:    %2 = add i32 3, 4
+; CHECK-NEXT:    %3 = add i32 %1, %2
+; CHECK-NEXT:    ret i32 %3
+;
+  add i32 1, 2
+  %10 = add i32 3, 4
+  add i32 %1, %10
+  ret i32 %11
+}
+
+define i32 @args(i32 %0, i32 %10, i32 %20) {
+; CHECK-LABEL: define i32 @args(i32 %0, i32 %1, i32 %2) {
+; CHECK-NEXT:    %4 = add i32 %0, %1
+; CHECK-NEXT:    %5 = add i32 %4, %2
+; CHECK-NEXT:    ret i32 %5
+;
+  %30 = add i32 %0, %10
+  %31 = add i32 %30, %20
+  ret i32 %31
+}
+
+define i32 @args_some_implicit(i32, i32 %10, i32) {
+; CHECK-LABEL: define i32 @args_some_implicit(i32 %0, i32 %1, i32 %2) {
+; CHECK-NEXT:    %4 = add i32 %0, %1
+; CHECK-NEXT:    %5 = add i32 %2, %4
+; CHECK-NEXT:    ret i32 %5
+;
+  add i32 %0, %10
+  %20 = add i32 %11, %13
+  ret i32 %20
+}
+
+define i32 @blocks() {
+; CHECK-LABEL: define i32 @blocks() {
+; CHECK-NEXT:    br label %1
+; CHECK:       1:
+; CHECK-NEXT:    ret i32 0
+;
+10:
+  br label %20
+
+20:
+  ret i32 0
+}
+
+define i32 @blocks_some_implicit() {
+; CHECK-LABEL: define i32 @blocks_some_implicit() {
+; CHECK-NEXT:    br label %1
+; CHECK:       1:
+; CHECK-NEXT:    br label %2
+; CHECK:       2:
+; CHECK-NEXT:    ret i32 0
+;
+  br label %10
+
+10:
+  br label %11
+
+  ret i32 0
+}

nhaehnle

Nice, I've run into this issue in the past and it was always slightly annoying to have to go back and get out the instnamer. LGTM.

Side note: the duplicate argument parsing in parseArgumentList is really awkward.

nikic · 2024-01-19T14:04:24Z

Side note: the duplicate argument parsing in parseArgumentList is really awkward.

Agreed. I've pushed 0f20d5a to deduplicate this.

llvm#78171 added support for local value numbers to have gaps. This extends the support for global value numbers (for globals and functions). This means that it is now possible to delete an unnamed global definition/declaration without breaking the IR. This is a lot less common than unnamed local values, but it seems like something we should support for consistency. (Unnamed globals are also common in rustc IR).

#78171 added support for non-consecutive local value numbers. This extends the support for global value numbers (for globals and functions). This means that it is now possible to delete an unnamed global definition/declaration without breaking the IR. This is a lot less common than unnamed local values, but it seems like something we should support for consistency. (Unnamed globals are used a lot in Rust though.)

nikic requested review from fhahn, nhaehnle and dtcxzyw January 15, 2024 15:38

nikic commented Jan 15, 2024

View reviewed changes

nikic added 2 commits January 16, 2024 09:37

use map for numbered values, remove skip restriction

68383aa

nikic force-pushed the skip-value-ids branch from 15e4407 to 68383aa Compare January 17, 2024 10:40

update langref

db7bcbb

llvmbot added the llvm:ir label Jan 17, 2024

nhaehnle approved these changes Jan 18, 2024

View reviewed changes

nikic merged commit 6f37114 into llvm:main Jan 19, 2024
4 of 6 checks passed

nikic deleted the skip-value-ids branch January 19, 2024 13:55

nikic mentioned this pull request Jan 30, 2024

[AsmParser] Support non-consecutive global value numbers #80013

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AsmParser] Don't require value numbers to be consecutive #78171

[AsmParser] Don't require value numbers to be consecutive #78171

nikic commented Jan 15, 2024

nikic Jan 15, 2024

github-actions bot commented Jan 15, 2024 •

edited

nikic commented Jan 17, 2024

dtcxzyw commented Jan 17, 2024

llvmbot commented Jan 17, 2024

nhaehnle left a comment

nikic commented Jan 19, 2024

[AsmParser] Don't require value numbers to be consecutive #78171

[AsmParser] Don't require value numbers to be consecutive #78171

Conversation

nikic commented Jan 15, 2024

nikic Jan 15, 2024

Choose a reason for hiding this comment

github-actions bot commented Jan 15, 2024 • edited

nikic commented Jan 17, 2024

dtcxzyw commented Jan 17, 2024

llvmbot commented Jan 17, 2024

nhaehnle left a comment

Choose a reason for hiding this comment

nikic commented Jan 19, 2024

github-actions bot commented Jan 15, 2024 •

edited