Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AsmParser] Don't require value numbers to be consecutive #78171

Merged
merged 3 commits into from
Jan 19, 2024

Conversation

nikic
Copy link
Contributor

@nikic nikic commented Jan 15, 2024

Currently, the IR parser requires that %n style numbered values are consecutive. This means that the IR becomes invalid as soon as you remove an instruction, argument or block. This makes it very annoying to modify IR without running it through instnamer first.

I don't think there is any good reason to impose this requirement. This PR relaxes it to allow value IDs to be non-consecutive, but it still keeps the requirement that they're increasing (i.e. you can't skip a value number and then assign it later).

I've kept the underlying representation the same and added a guard against skipping large ranges to guard against degenerate cases where the NumberedVals vector would become huge. If people prefer, I could change NumberedVals from a vector to a map + a counter instead, in which case we could support arbitrary numbers without large increases in memory usage.

This only implements support for skipping numbers for local values. We should extend this to global values in the future as well.

@@ -3,7 +3,7 @@

; Check that numbered variables in a nonzero program address space 200 can be used in a call instruction

define i8 @test_unnamed(ptr, ptr addrspace(42) %0) {
define i8 @test_unnamed(ptr, ptr addrspace(42) %1) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bug in the previous implementation: While %0 is written here, it was actually interpreted as %1.

Copy link

github-actions bot commented Jan 15, 2024

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 480cc413b7f7e73f90646e5feeb598e36e4e9565 db7bcbb2ba2add13b0e39abd12c7b0baf45be7f4 -- llvm/include/llvm/AsmParser/LLParser.h llvm/lib/AsmParser/LLParser.cpp
View the diff from clang-format here.
diff --git a/llvm/lib/AsmParser/LLParser.cpp b/llvm/lib/AsmParser/LLParser.cpp
index ccce7c2392..f06e06c6d2 100644
--- a/llvm/lib/AsmParser/LLParser.cpp
+++ b/llvm/lib/AsmParser/LLParser.cpp
@@ -3290,7 +3290,7 @@ bool LLParser::parseTargetExtType(Type *&Result) {
 LLParser::PerFunctionState::PerFunctionState(LLParser &p, Function &f,
                                              int functionNumber,
                                              ArrayRef<unsigned> UnnamedArgNums)
-  : P(p), F(f), FunctionNumber(functionNumber) {
+    : P(p), F(f), FunctionNumber(functionNumber) {
 
   // Insert unnamed arguments into the NumberedVals list.
   auto It = UnnamedArgNums.begin();

Currently, the IR parser requires that %n style numbered values
are consecutive. This means that the IR becomes invalid as soon
as you remove an instruction, argument or block. This makes it
very annoying to modify IR without running it through instnamer
first.

I don't think there is any good reason to impose this requirement.
This PR relaxes it to allow value IDs to be non-consecutive, but
it still keeps the requirement that they're increasing (i.e. you
can't skip a value number and then assign it later).

I've kept the underlying representation the same and added a guard
against skipping large ranges to guard against degenerate cases
where the NumberedVals vector would become huge. If people prefer,
I could change NumberedVals from a vector to a map + a counter
instead, in which case we could support arbitrary numbers without
large increases in memory usage.

This only implements support for skipping numbers for local values.
We should extend this to global values in the future as well.
@nikic
Copy link
Contributor Author

nikic commented Jan 17, 2024

I've kept the underlying representation the same and added a guard against skipping large ranges to guard against degenerate cases where the NumberedVals vector would become huge. If people prefer, I could change NumberedVals from a vector to a map + a counter instead, in which case we could support arbitrary numbers without large increases in memory usage.

I went ahead and did this. The restriction on value skipping seemed pretty odd, and I don't think we really care about the performance difference between using a vector and a map when it comes to IR parsing.

@dtcxzyw
Copy link
Member

dtcxzyw commented Jan 17, 2024

I don't think there is any good reason to impose this requirement. This PR relaxes it to allow value IDs to be non-consecutive, but it still keeps the requirement that they're increasing (i.e. you can't skip a value number and then assign it later).

#. Unnamed temporaries are numbered sequentially (using a per-function
incrementing counter, starting with 0). Note that basic blocks and unnamed
function parameters are included in this numbering. For example, if the
entry basic block is not given a label name and all function parameters are
named, then it will get number 0.

Could you please update the LangRef as well?

@llvmbot
Copy link
Collaborator

llvmbot commented Jan 17, 2024

@llvm/pr-subscribers-llvm-ir

Author: Nikita Popov (nikic)

Changes

Currently, the IR parser requires that %n style numbered values are consecutive. This means that the IR becomes invalid as soon as you remove an instruction, argument or block. This makes it very annoying to modify IR without running it through instnamer first.

I don't think there is any good reason to impose this requirement. This PR relaxes it to allow value IDs to be non-consecutive, but it still keeps the requirement that they're increasing (i.e. you can't skip a value number and then assign it later).

I've kept the underlying representation the same and added a guard against skipping large ranges to guard against degenerate cases where the NumberedVals vector would become huge. If people prefer, I could change NumberedVals from a vector to a map + a counter instead, in which case we could support arbitrary numbers without large increases in memory usage.

This only implements support for skipping numbers for local values. We should extend this to global values in the future as well.


Full diff: https://github.com/llvm/llvm-project/pull/78171.diff

10 Files Affected:

  • (modified) llvm/docs/LangRef.rst (+7-5)
  • (modified) llvm/include/llvm/AsmParser/LLParser.h (+27-5)
  • (modified) llvm/lib/AsmParser/LLParser.cpp (+69-37)
  • (modified) llvm/test/Assembler/call-nonzero-program-addrspace-2.ll (+1-1)
  • (removed) llvm/test/Assembler/invalid-arg-num-1.ll (-6)
  • (removed) llvm/test/Assembler/invalid-arg-num-2.ll (-6)
  • (removed) llvm/test/Assembler/invalid-arg-num-3.ll (-6)
  • (removed) llvm/test/Assembler/invalid-block-label-num.ll (-7)
  • (added) llvm/test/Assembler/skip-value-numbers-invalid.ll (+31)
  • (added) llvm/test/Assembler/skip-value-numbers.ll (+79)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index d881deb30049a2..0e64ca88fc2f37 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -128,11 +128,13 @@ lexical features of LLVM:
 #. Comments are delimited with a '``;``' and go until the end of line.
 #. Unnamed temporaries are created when the result of a computation is
    not assigned to a named value.
-#. Unnamed temporaries are numbered sequentially (using a per-function
-   incrementing counter, starting with 0). Note that basic blocks and unnamed
-   function parameters are included in this numbering. For example, if the
-   entry basic block is not given a label name and all function parameters are
-   named, then it will get number 0.
+#. By default, unnamed temporaries are numbered sequentially (using a
+   per-function incrementing counter, starting with 0). However, when explicitly
+   specifying temporary numbers, it is allowed to skip over numbers.
+
+   Note that basic blocks and unnamed function parameters are included in this
+   numbering. For example, if the entry basic block is not given a label name
+   and all function parameters are named, then it will get number 0.
 
 It also shows a convention that we follow in this document. When
 demonstrating instructions, we will follow an instruction with a comment
diff --git a/llvm/include/llvm/AsmParser/LLParser.h b/llvm/include/llvm/AsmParser/LLParser.h
index 54bc3e582e01ae..b0d02eac8e672d 100644
--- a/llvm/include/llvm/AsmParser/LLParser.h
+++ b/llvm/include/llvm/AsmParser/LLParser.h
@@ -203,6 +203,9 @@ namespace llvm {
     bool error(LocTy L, const Twine &Msg) const { return Lex.Error(L, Msg); }
     bool tokError(const Twine &Msg) const { return error(Lex.getLoc(), Msg); }
 
+    bool checkValueID(LocTy L, StringRef Kind, StringRef Prefix,
+                      unsigned NextID, unsigned ID) const;
+
     /// Restore the internal name and slot mappings using the mappings that
     /// were created at an earlier parsing stage.
     void restoreParsingState(const SlotMapping *Slots);
@@ -448,19 +451,35 @@ namespace llvm {
     bool parseFunctionType(Type *&Result);
     bool parseTargetExtType(Type *&Result);
 
+    class NumberedValues {
+      DenseMap<unsigned, Value *> Vals;
+      unsigned NextUnusedID = 0;
+
+    public:
+      unsigned getNext() const { return NextUnusedID; }
+      Value *get(unsigned ID) const { return Vals.lookup(ID); }
+      void add(unsigned ID, Value *V) {
+        assert(ID >= NextUnusedID && "Invalid value ID");
+        Vals.insert({ID, V});
+        NextUnusedID = ID + 1;
+      }
+    };
+
     // Function Semantic Analysis.
     class PerFunctionState {
       LLParser &P;
       Function &F;
       std::map<std::string, std::pair<Value*, LocTy> > ForwardRefVals;
       std::map<unsigned, std::pair<Value*, LocTy> > ForwardRefValIDs;
-      std::vector<Value*> NumberedVals;
+      NumberedValues NumberedVals;
 
       /// FunctionNumber - If this is an unnamed function, this is the slot
       /// number of it, otherwise it is -1.
       int FunctionNumber;
+
     public:
-      PerFunctionState(LLParser &p, Function &f, int functionNumber);
+      PerFunctionState(LLParser &p, Function &f, int functionNumber,
+                       ArrayRef<unsigned> UnnamedArgNums);
       ~PerFunctionState();
 
       Function &getFunction() const { return F; }
@@ -590,9 +609,12 @@ namespace llvm {
       ArgInfo(LocTy L, Type *ty, AttributeSet Attr, const std::string &N)
           : Loc(L), Ty(ty), Attrs(Attr), Name(N) {}
     };
-    bool parseArgumentList(SmallVectorImpl<ArgInfo> &ArgList, bool &IsVarArg);
-    bool parseFunctionHeader(Function *&Fn, bool IsDefine);
-    bool parseFunctionBody(Function &Fn);
+    bool parseArgumentList(SmallVectorImpl<ArgInfo> &ArgList,
+                           SmallVectorImpl<unsigned> &UnnamedArgNums,
+                           bool &IsVarArg);
+    bool parseFunctionHeader(Function *&Fn, bool IsDefine,
+                             SmallVectorImpl<unsigned> &UnnamedArgNums);
+    bool parseFunctionBody(Function &Fn, ArrayRef<unsigned> UnnamedArgNums);
     bool parseBasicBlock(PerFunctionState &PFS);
 
     enum TailCallType { TCT_None, TCT_Tail, TCT_MustTail };
diff --git a/llvm/lib/AsmParser/LLParser.cpp b/llvm/lib/AsmParser/LLParser.cpp
index fb9e1ba875e1fa..ccce7c2392602b 100644
--- a/llvm/lib/AsmParser/LLParser.cpp
+++ b/llvm/lib/AsmParser/LLParser.cpp
@@ -572,7 +572,8 @@ bool LLParser::parseDeclare() {
   }
 
   Function *F;
-  if (parseFunctionHeader(F, false))
+  SmallVector<unsigned> UnnamedArgNums;
+  if (parseFunctionHeader(F, false, UnnamedArgNums))
     return true;
   for (auto &MD : MDs)
     F->addMetadata(MD.first, *MD.second);
@@ -586,8 +587,10 @@ bool LLParser::parseDefine() {
   Lex.Lex();
 
   Function *F;
-  return parseFunctionHeader(F, true) || parseOptionalFunctionMetadata(*F) ||
-         parseFunctionBody(*F);
+  SmallVector<unsigned> UnnamedArgNums;
+  return parseFunctionHeader(F, true, UnnamedArgNums) ||
+         parseOptionalFunctionMetadata(*F) ||
+         parseFunctionBody(*F, UnnamedArgNums);
 }
 
 /// parseGlobalType
@@ -2925,6 +2928,15 @@ bool LLParser::parseOptionalOperandBundles(
   return false;
 }
 
+bool LLParser::checkValueID(LocTy Loc, StringRef Kind, StringRef Prefix,
+                            unsigned NextID, unsigned ID) const {
+  if (ID < NextID)
+    return error(Loc, Kind + " expected to be numbered '" + Prefix +
+                          Twine(NextID) + "' or greater");
+
+  return false;
+}
+
 /// parseArgumentList - parse the argument list for a function type or function
 /// prototype.
 ///   ::= '(' ArgTypeListI ')'
@@ -2935,6 +2947,7 @@ bool LLParser::parseOptionalOperandBundles(
 ///   ::= ArgType (',' ArgType)*
 ///
 bool LLParser::parseArgumentList(SmallVectorImpl<ArgInfo> &ArgList,
+                                 SmallVectorImpl<unsigned> &UnnamedArgNums,
                                  bool &IsVarArg) {
   unsigned CurValID = 0;
   IsVarArg = false;
@@ -2961,12 +2974,19 @@ bool LLParser::parseArgumentList(SmallVectorImpl<ArgInfo> &ArgList,
     if (Lex.getKind() == lltok::LocalVar) {
       Name = Lex.getStrVal();
       Lex.Lex();
-    } else if (Lex.getKind() == lltok::LocalVarID) {
-      if (Lex.getUIntVal() != CurValID)
-        return error(TypeLoc, "argument expected to be numbered '%" +
-                                  Twine(CurValID) + "'");
-      ++CurValID;
-      Lex.Lex();
+    } else {
+      unsigned ArgID;
+      if (Lex.getKind() == lltok::LocalVarID) {
+        ArgID = Lex.getUIntVal();
+        if (checkValueID(TypeLoc, "argument", "%", CurValID, ArgID))
+          return true;
+        Lex.Lex();
+      } else {
+        ArgID = CurValID;
+      }
+
+      UnnamedArgNums.push_back(ArgID);
+      CurValID = ArgID + 1;
     }
 
     if (!FunctionType::isValidArgumentType(ArgTy))
@@ -2995,13 +3015,17 @@ bool LLParser::parseArgumentList(SmallVectorImpl<ArgInfo> &ArgList,
         Name = Lex.getStrVal();
         Lex.Lex();
       } else {
+        unsigned ArgID;
         if (Lex.getKind() == lltok::LocalVarID) {
-          if (Lex.getUIntVal() != CurValID)
-            return error(TypeLoc, "argument expected to be numbered '%" +
-                                      Twine(CurValID) + "'");
+          ArgID = Lex.getUIntVal();
+          if (checkValueID(TypeLoc, "argument", "%", CurValID, ArgID))
+            return true;
           Lex.Lex();
+        } else {
+          ArgID = CurValID;
         }
-        ++CurValID;
+        UnnamedArgNums.push_back(ArgID);
+        CurValID = ArgID + 1;
         Name = "";
       }
 
@@ -3027,7 +3051,8 @@ bool LLParser::parseFunctionType(Type *&Result) {
 
   SmallVector<ArgInfo, 8> ArgList;
   bool IsVarArg;
-  if (parseArgumentList(ArgList, IsVarArg))
+  SmallVector<unsigned> UnnamedArgNums;
+  if (parseArgumentList(ArgList, UnnamedArgNums, IsVarArg))
     return true;
 
   // Reject names on the arguments lists.
@@ -3263,13 +3288,18 @@ bool LLParser::parseTargetExtType(Type *&Result) {
 //===----------------------------------------------------------------------===//
 
 LLParser::PerFunctionState::PerFunctionState(LLParser &p, Function &f,
-                                             int functionNumber)
+                                             int functionNumber,
+                                             ArrayRef<unsigned> UnnamedArgNums)
   : P(p), F(f), FunctionNumber(functionNumber) {
 
   // Insert unnamed arguments into the NumberedVals list.
-  for (Argument &A : F.args())
-    if (!A.hasName())
-      NumberedVals.push_back(&A);
+  auto It = UnnamedArgNums.begin();
+  for (Argument &A : F.args()) {
+    if (!A.hasName()) {
+      unsigned ArgNum = *It++;
+      NumberedVals.add(ArgNum, &A);
+    }
+  }
 }
 
 LLParser::PerFunctionState::~PerFunctionState() {
@@ -3350,7 +3380,7 @@ Value *LLParser::PerFunctionState::getVal(const std::string &Name, Type *Ty,
 
 Value *LLParser::PerFunctionState::getVal(unsigned ID, Type *Ty, LocTy Loc) {
   // Look this name up in the normal function symbol table.
-  Value *Val = ID < NumberedVals.size() ? NumberedVals[ID] : nullptr;
+  Value *Val = NumberedVals.get(ID);
 
   // If this is a forward reference for the value, see if we already created a
   // forward ref record.
@@ -3398,11 +3428,11 @@ bool LLParser::PerFunctionState::setInstName(int NameID,
   if (NameStr.empty()) {
     // If neither a name nor an ID was specified, just use the next ID.
     if (NameID == -1)
-      NameID = NumberedVals.size();
+      NameID = NumberedVals.getNext();
 
-    if (unsigned(NameID) != NumberedVals.size())
-      return P.error(NameLoc, "instruction expected to be numbered '%" +
-                                  Twine(NumberedVals.size()) + "'");
+    if (P.checkValueID(NameLoc, "instruction", "%", NumberedVals.getNext(),
+                       NameID))
+      return true;
 
     auto FI = ForwardRefValIDs.find(NameID);
     if (FI != ForwardRefValIDs.end()) {
@@ -3417,7 +3447,7 @@ bool LLParser::PerFunctionState::setInstName(int NameID,
       ForwardRefValIDs.erase(FI);
     }
 
-    NumberedVals.push_back(Inst);
+    NumberedVals.add(NameID, Inst);
     return false;
   }
 
@@ -3464,15 +3494,15 @@ BasicBlock *LLParser::PerFunctionState::defineBB(const std::string &Name,
                                                  int NameID, LocTy Loc) {
   BasicBlock *BB;
   if (Name.empty()) {
-    if (NameID != -1 && unsigned(NameID) != NumberedVals.size()) {
-      P.error(Loc, "label expected to be numbered '" +
-                       Twine(NumberedVals.size()) + "'");
-      return nullptr;
+    if (NameID != -1) {
+      if (P.checkValueID(Loc, "label", "", NumberedVals.getNext(), NameID))
+        return nullptr;
+    } else {
+      NameID = NumberedVals.getNext();
     }
-    BB = getBB(NumberedVals.size(), Loc);
+    BB = getBB(NameID, Loc);
     if (!BB) {
-      P.error(Loc, "unable to create block numbered '" +
-                       Twine(NumberedVals.size()) + "'");
+      P.error(Loc, "unable to create block numbered '" + Twine(NameID) + "'");
       return nullptr;
     }
   } else {
@@ -3489,8 +3519,8 @@ BasicBlock *LLParser::PerFunctionState::defineBB(const std::string &Name,
 
   // Remove the block from forward ref sets.
   if (Name.empty()) {
-    ForwardRefValIDs.erase(NumberedVals.size());
-    NumberedVals.push_back(BB);
+    ForwardRefValIDs.erase(NameID);
+    NumberedVals.add(NameID, BB);
   } else {
     // BB forward references are already in the function symbol table.
     ForwardRefVals.erase(Name);
@@ -5934,7 +5964,8 @@ bool LLParser::parseTypeAndBasicBlock(BasicBlock *&BB, LocTy &Loc,
 ///       OptionalCallingConv OptRetAttrs OptUnnamedAddr Type GlobalName
 ///       '(' ArgList ')' OptAddrSpace OptFuncAttrs OptSection OptionalAlign
 ///       OptGC OptionalPrefix OptionalPrologue OptPersonalityFn
-bool LLParser::parseFunctionHeader(Function *&Fn, bool IsDefine) {
+bool LLParser::parseFunctionHeader(Function *&Fn, bool IsDefine,
+                                   SmallVectorImpl<unsigned> &UnnamedArgNums) {
   // parse the linkage.
   LocTy LinkageLoc = Lex.getLoc();
   unsigned Linkage;
@@ -6022,7 +6053,7 @@ bool LLParser::parseFunctionHeader(Function *&Fn, bool IsDefine) {
   Constant *PersonalityFn = nullptr;
   Comdat *C;
 
-  if (parseArgumentList(ArgList, IsVarArg) ||
+  if (parseArgumentList(ArgList, UnnamedArgNums, IsVarArg) ||
       parseOptionalUnnamedAddr(UnnamedAddr) ||
       parseOptionalProgramAddrSpace(AddrSpace) ||
       parseFnAttributeValuePairs(FuncAttrs, FwdRefAttrGrps, false,
@@ -6217,7 +6248,8 @@ bool LLParser::PerFunctionState::resolveForwardRefBlockAddresses() {
 
 /// parseFunctionBody
 ///   ::= '{' BasicBlock+ UseListOrderDirective* '}'
-bool LLParser::parseFunctionBody(Function &Fn) {
+bool LLParser::parseFunctionBody(Function &Fn,
+                                 ArrayRef<unsigned> UnnamedArgNums) {
   if (Lex.getKind() != lltok::lbrace)
     return tokError("expected '{' in function body");
   Lex.Lex();  // eat the {.
@@ -6225,7 +6257,7 @@ bool LLParser::parseFunctionBody(Function &Fn) {
   int FunctionNumber = -1;
   if (!Fn.hasName()) FunctionNumber = NumberedVals.size()-1;
 
-  PerFunctionState PFS(*this, Fn, FunctionNumber);
+  PerFunctionState PFS(*this, Fn, FunctionNumber, UnnamedArgNums);
 
   // Resolve block addresses and allow basic blocks to be forward-declared
   // within this function.
diff --git a/llvm/test/Assembler/call-nonzero-program-addrspace-2.ll b/llvm/test/Assembler/call-nonzero-program-addrspace-2.ll
index f5c5437d8221e6..bc600d56db51b2 100644
--- a/llvm/test/Assembler/call-nonzero-program-addrspace-2.ll
+++ b/llvm/test/Assembler/call-nonzero-program-addrspace-2.ll
@@ -3,7 +3,7 @@
 
 ; Check that numbered variables in a nonzero program address space 200 can be used in a call instruction
 
-define i8 @test_unnamed(ptr, ptr addrspace(42) %0) {
+define i8 @test_unnamed(ptr, ptr addrspace(42) %1) {
   ; Calls with explicit address spaces are fine:
   call addrspace(0) i8 %0(i32 0)
   call addrspace(42) i8 %1(i32 0)
diff --git a/llvm/test/Assembler/invalid-arg-num-1.ll b/llvm/test/Assembler/invalid-arg-num-1.ll
deleted file mode 100644
index ee13c339dec448..00000000000000
--- a/llvm/test/Assembler/invalid-arg-num-1.ll
+++ /dev/null
@@ -1,6 +0,0 @@
-; RUN: not llvm-as < %s -o /dev/null 2>&1 | FileCheck %s
-
-; CHECK: error: argument expected to be numbered '%1'
-define void @foo(i32 %0, i32 %5) {
-  ret void
-}
diff --git a/llvm/test/Assembler/invalid-arg-num-2.ll b/llvm/test/Assembler/invalid-arg-num-2.ll
deleted file mode 100644
index 6ecb0033512d00..00000000000000
--- a/llvm/test/Assembler/invalid-arg-num-2.ll
+++ /dev/null
@@ -1,6 +0,0 @@
-; RUN: not llvm-as < %s -o /dev/null 2>&1 | FileCheck %s
-
-; CHECK: error: argument expected to be numbered '%1'
-define void @foo(i8 %0, i32 %named, i32 %2) {
-  ret void
-}
diff --git a/llvm/test/Assembler/invalid-arg-num-3.ll b/llvm/test/Assembler/invalid-arg-num-3.ll
deleted file mode 100644
index f1638365630be5..00000000000000
--- a/llvm/test/Assembler/invalid-arg-num-3.ll
+++ /dev/null
@@ -1,6 +0,0 @@
-; RUN: not llvm-as < %s -o /dev/null 2>&1 | FileCheck %s
-
-; CHECK: error: argument expected to be numbered '%0'
-define void @foo(i8 %1) {
-  ret void
-}
diff --git a/llvm/test/Assembler/invalid-block-label-num.ll b/llvm/test/Assembler/invalid-block-label-num.ll
deleted file mode 100644
index 4f02300849e68f..00000000000000
--- a/llvm/test/Assembler/invalid-block-label-num.ll
+++ /dev/null
@@ -1,7 +0,0 @@
-; RUN: not llvm-as < %s 2>&1 | FileCheck %s
-
-define void @f () {
-1:
-; CHECK: error: label expected to be numbered '0'
-  ret void
-}
diff --git a/llvm/test/Assembler/skip-value-numbers-invalid.ll b/llvm/test/Assembler/skip-value-numbers-invalid.ll
new file mode 100644
index 00000000000000..67e6b10196bc8e
--- /dev/null
+++ b/llvm/test/Assembler/skip-value-numbers-invalid.ll
@@ -0,0 +1,31 @@
+; RUN: split-file %s %t
+; RUN: not llvm-as < %s %t/instr_smaller_id.ll 2>&1 | FileCheck %s --check-prefix=INSTR-SMALLER-ID
+; RUN: not llvm-as < %s %t/arg_smaller_id.ll 2>&1 | FileCheck %s --check-prefix=ARG-SMALLER-ID
+; RUN: not llvm-as < %s %t/block_smaller_id.ll 2>&1 | FileCheck %s --check-prefix=BLOCK-SMALLER-ID
+
+;--- instr_smaller_id.ll
+
+; INSTR-SMALLER-ID: error: instruction expected to be numbered '%11' or greater
+define i32 @test() {
+  %10 = add i32 1, 2
+  %5 = add i32 %10, 3
+  ret i32 %5
+}
+
+;--- arg_smaller_id.ll
+
+; ARG-SMALLER-ID: error: argument expected to be numbered '%11' or greater
+define i32 @test(i32 %10, i32 %5) {
+  ret i32 %5
+}
+
+;--- block_smaller_id.ll
+
+; BLOCK-SMALLER-ID: error: label expected to be numbered '11' or greater
+define i32 @test() {
+10:
+  br label %5
+
+5:
+  ret i32 0
+}
diff --git a/llvm/test/Assembler/skip-value-numbers.ll b/llvm/test/Assembler/skip-value-numbers.ll
new file mode 100644
index 00000000000000..d74d8d502709b9
--- /dev/null
+++ b/llvm/test/Assembler/skip-value-numbers.ll
@@ -0,0 +1,79 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: -p --version 4
+; RUN: opt -S < %s | FileCheck %s
+
+define i32 @instr() {
+; CHECK-LABEL: define i32 @instr() {
+; CHECK-NEXT:    %1 = add i32 1, 2
+; CHECK-NEXT:    %2 = add i32 %1, 3
+; CHECK-NEXT:    %3 = add i32 %2, 4
+; CHECK-NEXT:    ret i32 %3
+;
+  %10 = add i32 1, 2
+  %20 = add i32 %10, 3
+  %30 = add i32 %20, 4
+  ret i32 %30
+}
+
+define i32 @instr_some_implicit() {
+; CHECK-LABEL: define i32 @instr_some_implicit() {
+; CHECK-NEXT:    %1 = add i32 1, 2
+; CHECK-NEXT:    %2 = add i32 3, 4
+; CHECK-NEXT:    %3 = add i32 %1, %2
+; CHECK-NEXT:    ret i32 %3
+;
+  add i32 1, 2
+  %10 = add i32 3, 4
+  add i32 %1, %10
+  ret i32 %11
+}
+
+define i32 @args(i32 %0, i32 %10, i32 %20) {
+; CHECK-LABEL: define i32 @args(i32 %0, i32 %1, i32 %2) {
+; CHECK-NEXT:    %4 = add i32 %0, %1
+; CHECK-NEXT:    %5 = add i32 %4, %2
+; CHECK-NEXT:    ret i32 %5
+;
+  %30 = add i32 %0, %10
+  %31 = add i32 %30, %20
+  ret i32 %31
+}
+
+define i32 @args_some_implicit(i32, i32 %10, i32) {
+; CHECK-LABEL: define i32 @args_some_implicit(i32 %0, i32 %1, i32 %2) {
+; CHECK-NEXT:    %4 = add i32 %0, %1
+; CHECK-NEXT:    %5 = add i32 %2, %4
+; CHECK-NEXT:    ret i32 %5
+;
+  add i32 %0, %10
+  %20 = add i32 %11, %13
+  ret i32 %20
+}
+
+define i32 @blocks() {
+; CHECK-LABEL: define i32 @blocks() {
+; CHECK-NEXT:    br label %1
+; CHECK:       1:
+; CHECK-NEXT:    ret i32 0
+;
+10:
+  br label %20
+
+20:
+  ret i32 0
+}
+
+define i32 @blocks_some_implicit() {
+; CHECK-LABEL: define i32 @blocks_some_implicit() {
+; CHECK-NEXT:    br label %1
+; CHECK:       1:
+; CHECK-NEXT:    br label %2
+; CHECK:       2:
+; CHECK-NEXT:    ret i32 0
+;
+  br label %10
+
+10:
+  br label %11
+
+  ret i32 0
+}

Copy link
Collaborator

@nhaehnle nhaehnle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I've run into this issue in the past and it was always slightly annoying to have to go back and get out the instnamer. LGTM.

Side note: the duplicate argument parsing in parseArgumentList is really awkward.

@nikic nikic merged commit 6f37114 into llvm:main Jan 19, 2024
4 of 6 checks passed
@nikic nikic deleted the skip-value-ids branch January 19, 2024 13:55
@nikic
Copy link
Contributor Author

nikic commented Jan 19, 2024

Side note: the duplicate argument parsing in parseArgumentList is really awkward.

Agreed. I've pushed 0f20d5a to deduplicate this.

nikic added a commit to nikic/llvm-project that referenced this pull request Jan 30, 2024
llvm#78171 added support for
local value numbers to have gaps. This extends the support for
global value numbers (for globals and functions).

This means that it is now possible to delete an unnamed global
definition/declaration without breaking the IR.

This is a lot less common than unnamed local values, but it seems
like something we should support for consistency. (Unnamed globals
are also common in rustc IR).
nikic added a commit to nikic/llvm-project that referenced this pull request Jan 30, 2024
llvm#78171 added support for
local value numbers to have gaps. This extends the support for
global value numbers (for globals and functions).

This means that it is now possible to delete an unnamed global
definition/declaration without breaking the IR.

This is a lot less common than unnamed local values, but it seems
like something we should support for consistency. (Unnamed globals
are also common in rustc IR).
nikic added a commit that referenced this pull request Jan 31, 2024
#78171 added support for
non-consecutive local value numbers. This extends the support for global
value numbers (for globals and functions).

This means that it is now possible to delete an unnamed global
definition/declaration without breaking the IR.

This is a lot less common than unnamed local values, but it seems like
something we should support for consistency. (Unnamed globals are used a
lot in Rust though.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants