Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RemoveDIs][DebugInfo] Add DPValue checks to the verifier, prepare DPValue for parsing support #79810

Merged
merged 4 commits into from
Feb 27, 2024

Conversation

SLTozer
Copy link
Contributor

@SLTozer SLTozer commented Jan 29, 2024

As part of the RemoveDIs project, this patch adds support for checking DPValues in the verifier. Although this is not strictly parsing-related, and we currently automatically convert back to the old debug info format immediately after parsing, we are approaching the point where the we can operate end-to-end in the new debug info format, at which point it is appropriate that we can actually validate modules in the new format.
This patch also contains some changes that aren't strictly parsing-related, but are necessary class refactors for parsing support, and are used in the verifier checks (i.e. changing the DILocalVariable field to be a tracking MD reference, and adding a Verifier check to confirm that it is a DILocalVariable).
This patch is intended to follow the printing patch (#79281), but is presented here as a standalone patch for ease-of-review.

@llvmbot
Copy link
Collaborator

llvmbot commented Jan 29, 2024

@llvm/pr-subscribers-debuginfo

@llvm/pr-subscribers-llvm-ir

Author: Stephen Tozer (SLTozer)

Changes

As part of the RemoveDIs project, this patch adds support for checking DPValues in the verifier. Although this is not strictly parsing-related, and we currently automatically convert back to the old debug info format immediately after parsing, we are approaching the point where the we can operate end-to-end in the new debug info format, at which point it is appropriate that we can actually validate modules in the new format.
This patch also contains some changes that aren't strictly parsing-related, but are necessary class refactors for parsing support, and are used in the verifier checks (i.e. changing the DILocalVariable field to be a tracking MD reference, and adding a Verifier check to confirm that it is a DILocalVariable).
This patch is intended to follow the printing patch (#79281), but is presented here as a standalone patch for ease-of-review.


Full diff: https://github.com/llvm/llvm-project/pull/79810.diff

4 Files Affected:

  • (modified) llvm/include/llvm/IR/DebugProgramInstruction.h (+4-3)
  • (modified) llvm/lib/IR/AsmWriter.cpp (+3-3)
  • (modified) llvm/lib/IR/DebugProgramInstruction.cpp (+6)
  • (modified) llvm/lib/IR/Verifier.cpp (+97)
diff --git a/llvm/include/llvm/IR/DebugProgramInstruction.h b/llvm/include/llvm/IR/DebugProgramInstruction.h
index 737417fb9b9a542..74b42d1e81f938a 100644
--- a/llvm/include/llvm/IR/DebugProgramInstruction.h
+++ b/llvm/include/llvm/IR/DebugProgramInstruction.h
@@ -81,7 +81,7 @@ class DPValue : public ilist_node<DPValue>, private DebugValueUser {
   // DebugValueUser superclass instead. The referred to Value can either be a
   // ValueAsMetadata or a DIArgList.
 
-  DILocalVariable *Variable;
+  TrackingMDNodeRef Variable;
   DIExpression *Expression;
   DebugLoc DbgLoc;
   DIExpression *AddressExpression;
@@ -219,7 +219,7 @@ class DPValue : public ilist_node<DPValue>, private DebugValueUser {
   void addVariableLocationOps(ArrayRef<Value *> NewValues,
                               DIExpression *NewExpr);
 
-  void setVariable(DILocalVariable *NewVar) { Variable = NewVar; }
+  void setVariable(DILocalVariable *NewVar);
 
   void setExpression(DIExpression *NewExpr) { Expression = NewExpr; }
 
@@ -240,7 +240,8 @@ class DPValue : public ilist_node<DPValue>, private DebugValueUser {
   void setKillLocation();
   bool isKillLocation() const;
 
-  DILocalVariable *getVariable() const { return Variable; }
+  DILocalVariable *getVariable() const;
+  MDNode *getRawVariable() const { return Variable; }
 
   DIExpression *getExpression() const { return Expression; }
 
diff --git a/llvm/lib/IR/AsmWriter.cpp b/llvm/lib/IR/AsmWriter.cpp
index 3c15784a0ed5eba..80e089c8caa13d2 100644
--- a/llvm/lib/IR/AsmWriter.cpp
+++ b/llvm/lib/IR/AsmWriter.cpp
@@ -1138,10 +1138,10 @@ void SlotTracker::processFunctionMetadata(const Function &F) {
 }
 
 void SlotTracker::processDPValueMetadata(const DPValue &DPV) {
-  CreateMetadataSlot(DPV.getVariable());
-  CreateMetadataSlot(DPV.getDebugLoc());
+  CreateMetadataSlot(DPV.getRawVariable());
+  CreateMetadataSlot(DPV.getDebugLoc().getAsMDNode());
   if (DPV.isDbgAssign()) {
-    CreateMetadataSlot(DPV.getAssignID());
+    CreateMetadataSlot(cast<MDNode>(DPV.getRawAssignID()));
   }
 }
 
diff --git a/llvm/lib/IR/DebugProgramInstruction.cpp b/llvm/lib/IR/DebugProgramInstruction.cpp
index fd234685d5fd4bd..38faf90064e3cb3 100644
--- a/llvm/lib/IR/DebugProgramInstruction.cpp
+++ b/llvm/lib/IR/DebugProgramInstruction.cpp
@@ -111,6 +111,8 @@ DPValue *DPValue::createLinkedDPVAssign(Instruction *LinkedInstr, Value *Val,
   return NewDPVAssign;
 }
 
+void DPValue::setVariable(DILocalVariable *NewVar) { Variable.reset(NewVar); }
+
 iterator_range<DPValue::location_op_iterator> DPValue::location_ops() const {
   auto *MD = getRawLocation();
   // If a Value has been deleted, the "location" for this DPValue will be
@@ -249,6 +251,10 @@ bool DPValue::isKillLocation() const {
          any_of(location_ops(), [](Value *V) { return isa<UndefValue>(V); });
 }
 
+DILocalVariable *DPValue::getVariable() const {
+  return cast<DILocalVariable>(Variable.get());
+}
+
 std::optional<uint64_t> DPValue::getFragmentSizeInBits() const {
   if (auto Fragment = getExpression()->getFragmentInfo())
     return Fragment->SizeInBits;
diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp
index 91cf91fbc788bd9..5f0b04d425898d1 100644
--- a/llvm/lib/IR/Verifier.cpp
+++ b/llvm/lib/IR/Verifier.cpp
@@ -178,6 +178,26 @@ struct VerifierSupport {
       V->print(*OS, MST, false);
   }
 
+  void Write(DPValue::LocationType Type) {
+    switch (Type) {
+    case DPValue::LocationType::Value:
+      *OS << "value";
+      break;
+    case DPValue::LocationType::Declare:
+      *OS << "declare";
+      break;
+    case DPValue::LocationType::Assign:
+      *OS << "assign";
+      break;
+    case DPValue::LocationType::End:
+      *OS << "end";
+      break;
+    case DPValue::LocationType::Any:
+      *OS << "any";
+      break;
+    };
+  }
+
   void Write(const Metadata *MD) {
     if (!MD)
       return;
@@ -522,6 +542,7 @@ class Verifier : public InstVisitor<Verifier>, VerifierSupport {
 
   void visitTemplateParams(const MDNode &N, const Metadata &RawParams);
 
+  void visit(DPValue &DPV);
   // InstVisitor overrides...
   using InstVisitor<Verifier>::visit;
   void visit(Instruction &I);
@@ -650,6 +671,8 @@ class Verifier : public InstVisitor<Verifier>, VerifierSupport {
   } while (false)
 
 void Verifier::visit(Instruction &I) {
+  for (auto &DPV : I.getDbgValueRange())
+    visit(DPV);
   for (unsigned i = 0, e = I.getNumOperands(); i != e; ++i)
     Check(I.getOperand(i) != nullptr, "Operand is null", &I);
   InstVisitor<Verifier>::visit(I);
@@ -6152,6 +6175,80 @@ static DISubprogram *getSubprogram(Metadata *LocalScope) {
   return nullptr;
 }
 
+void Verifier::visit(DPValue &DPV) {
+  CheckDI(DPV.getType() == DPValue::LocationType::Value ||
+              DPV.getType() == DPValue::LocationType::Declare ||
+              DPV.getType() == DPValue::LocationType::Assign,
+          "invalid #dbg record type", &DPV, DPV.getType());
+  StringRef Kind;
+  switch (DPV.getType()) {
+  case DPValue::LocationType::Value:
+    Kind = "value";
+    break;
+  case DPValue::LocationType::Declare:
+    Kind = "declare";
+    break;
+  case DPValue::LocationType::Assign:
+    Kind = "assign";
+    break;
+  default:
+    llvm_unreachable("Tried to print a DPValue with an invalid LocationType!");
+  };
+  auto *MD = DPV.getRawLocation();
+  CheckDI(isa<ValueAsMetadata>(MD) || isa<DIArgList>(MD) ||
+              (isa<MDNode>(MD) && !cast<MDNode>(MD)->getNumOperands()),
+          "invalid #dbg_" + Kind + " address/value", &DPV, MD);
+  CheckDI(isa<DILocalVariable>(DPV.getRawVariable()),
+          "invalid #dbg_" + Kind + " variable", &DPV, DPV.getRawVariable());
+  CheckDI(DPV.getExpression(), "missing #dbg_" + Kind + " expression", &DPV,
+          DPV.getExpression());
+
+  if (DPV.isDbgAssign()) {
+    CheckDI(isa<DIAssignID>(DPV.getRawAssignID()),
+            "invalid #dbg_assign DIAssignID", &DPV, DPV.getRawAssignID());
+    const auto *RawAddr = DPV.getRawAddress();
+    CheckDI(
+        isa<ValueAsMetadata>(RawAddr) ||
+            (isa<MDNode>(RawAddr) && !cast<MDNode>(RawAddr)->getNumOperands()),
+        "invalid #dbg_assign address", &DPV, DPV.getRawAddress());
+    CheckDI(DPV.getAddressExpression(),
+            "missing #dbg_assign address expression", &DPV,
+            DPV.getAddressExpression());
+    // All of the linked instructions should be in the same function as DPV.
+    for (Instruction *I : at::getAssignmentInsts(&DPV))
+      CheckDI(DPV.getFunction() == I->getFunction(),
+              "inst not in same function as #dbg_assign", I, &DPV);
+  }
+
+  if (MDNode *N = DPV.getDebugLoc().getAsMDNode()) {
+    CheckDI(isa<DILocation>(N), "invalid #dbg_" + Kind + " location", &DPV, N);
+    visitDILocation(*cast<DILocation>(N));
+  }
+
+  BasicBlock *BB = DPV.getParent();
+  Function *F = BB ? BB->getParent() : nullptr;
+
+  // The scopes for variables and !dbg attachments must agree.
+  DILocalVariable *Var = DPV.getVariable();
+  DILocation *Loc = DPV.getDebugLoc();
+  CheckDI(Loc, "missing #dbg_" + Kind + " DILocation", &DPV, BB, F);
+
+  DISubprogram *VarSP = getSubprogram(Var->getRawScope());
+  DISubprogram *LocSP = getSubprogram(Loc->getRawScope());
+  if (!VarSP || !LocSP)
+    return; // Broken scope chains are checked elsewhere.
+
+  CheckDI(VarSP == LocSP,
+          "mismatched subprogram between #dbg_" + Kind +
+              " variable and DILocation",
+          &DPV, BB, F, Var, Var->getScope()->getSubprogram(), Loc,
+          Loc->getScope()->getSubprogram());
+
+  // This check is redundant with one in visitLocalVariable().
+  CheckDI(isType(Var->getRawType()), "invalid type ref", Var,
+          Var->getRawType());
+}
+
 void Verifier::visitVPIntrinsic(VPIntrinsic &VPI) {
   if (auto *VPCast = dyn_cast<VPCastIntrinsic>(&VPI)) {
     auto *RetTy = cast<VectorType>(VPCast->getType());

Copy link
Member

@jmorse jmorse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the checking, is it not the case that CheckDI will try to print out the DPValue that's broken anyway? IMO trying to form the intrinsic-name at runtime is too much effort if it's about to be printed out. IIRC the verifier does run at runtime in normal compilations, thus this is on a (not especially) hot path.

Is it worth shuffling in the code in BasicBlock::validateDbgValues into the main Verifier now too?

llvm/include/llvm/IR/DebugProgramInstruction.h Outdated Show resolved Hide resolved
Kind = "assign";
break;
default:
llvm_unreachable("Tried to print a DPValue with an invalid LocationType!");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will CheckDI cause this function to return if there's an error? If it doesn't, and the earlier CheckDI falls down to here then we'll have a verifier error that's always an assertion failure too, which seems too brittle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will cause it to return if there's no error, which is why I felt safe to add this - it'll be removed though anyway due to the string printing changes, so either way it'll be fine.

llvm/lib/IR/Verifier.cpp Show resolved Hide resolved
llvm/lib/IR/Verifier.cpp Show resolved Hide resolved
llvm/lib/IR/Verifier.cpp Show resolved Hide resolved
@SLTozer
Copy link
Contributor Author

SLTozer commented Feb 22, 2024

IMO trying to form the intrinsic-name at runtime is too much effort if it's about to be printed out.

SGTM! (We currently do this string business for debug intrinsics, but I think that's just an improvement for DbgRecords to capitalize on).

@SLTozer
Copy link
Contributor Author

SLTozer commented Feb 23, 2024

Is it worth shuffling in the code in BasicBlock::validateDbgValues into the main Verifier now too?

I've got this done now, but just a question for you since you added it - currently this check is a hard error in the verifier, should we change it now to a debug info error instead?

Copy link

github-actions bot commented Feb 23, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Member

@jmorse jmorse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM,

I've got this done now, but just a question for you since you added it - currently this check is a hard error in the verifier, should we change it now to a debug info error instead?

I'm not sure -- I'd say that for any situation where the debug-info is totally broken and we can only ever expect to crash, there should be a hard error. "Break hard, break fast" gets us better detection of problems instead of having them present later.

For situations where the debug-info isn't correct, but in a way that makes it meaningless / stupid / unreliable rather than causing a crash, those should be debug-info dropping errors. (I think you've done fine at picking between the two!).

@SLTozer SLTozer merged commit ed35ad1 into llvm:main Feb 27, 2024
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants