Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce DIExpressionOptimizer #69769

Closed

Conversation

rastogishubham
Copy link
Contributor

@rastogishubham rastogishubham commented Oct 20, 2023

DIExpressionOptimizer is a lightweight class that can be used to optimize DIExpressions that can get very large, by involving some simple constant folding and removing appends of unnecessary operations.

For example:

If we have a DIExpression: {dwarf::DW_OP_LLVM_arg, 0, dwarf::DW_OP_constu, 2, dwarf::DW_OP_plus} and want to append {dwarf::DW_OP_constu, 3, dwarf::DW_OP_plus}

Instead of having the final DIExpression be {dwarf::DW_OP_LLVM_arg, 0, dwarf::DW_OP_constu, 2, dwarf::DW_OP_plus, dwarf::DW_OP_constu, 3, dwarf::DW_OP_plus}.

It will be reduced to {dwarf::DW_OP_LLVM_arg, 0, dwarf::DW_OP_constu, 5, dwarf::DW_OP_plus}

There is an implementation for DW_OP_mul and DW_OP_plus_uconst as well

It also just flat out does not append {DW_OP_constu, 1, DW_OP_mul}, {DW_OP_constu, 1, DW_OP_div}, {DW_OP_constu, 0, DW_OP_plus}, {DW_OP_constu, 0, DW_OP_minus}, {DW_OP_constu, 0, DW_OP_shl}, {DW_OP_constu, 0, DW_OP_shr}

Hopefully this can be improved upon by the larger community to add more optimizations in the future.

This patch is part of a stacked diff, I was not sure how to do this any better because github squashes multiple commits in one PR, but the PR this depends on is:

#69768

DIExpressionOptimizer is a lightweight class that can be used to
optimize DIExpressions that can get very large, by involving some simple
constant folding, and removing appends of unecessary operations.
@llvmbot
Copy link
Collaborator

llvmbot commented Oct 20, 2023

@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-debuginfo

Author: Shubham Sandeep Rastogi (rastogishubham)

Changes

DIExpressionOptimizer is a lightweight class that can be used to optimize DIExpressions that can get very large, by involving some simple constant folding and removing appends of unnecessary operations.


Full diff: https://github.com/llvm/llvm-project/pull/69769.diff

3 Files Affected:

  • (modified) llvm/include/llvm/IR/DebugInfoMetadata.h (+103)
  • (modified) llvm/lib/CodeGen/AsmPrinter/DwarfExpression.h (-61)
  • (modified) llvm/lib/IR/DebugInfoMetadata.cpp (+137)
diff --git a/llvm/include/llvm/IR/DebugInfoMetadata.h b/llvm/include/llvm/IR/DebugInfoMetadata.h
index b347664883fd9f2..b9df644e44ccf36 100644
--- a/llvm/include/llvm/IR/DebugInfoMetadata.h
+++ b/llvm/include/llvm/IR/DebugInfoMetadata.h
@@ -3073,6 +3073,109 @@ template <> struct DenseMapInfo<DIExpression::FragmentInfo> {
   static bool isEqual(const FragInfo &A, const FragInfo &B) { return A == B; }
 };
 
+class DIExpressionOptimizer {
+
+  // When looking at a DIExpression such as {DW_OP_constu, 1, DW_OP_constu, 2,
+  // DW_OP_plus} and trying to append {DW_OP_consts, 3, DW_OP_minus}
+  // NewMathOperator = DW_OP_minus
+  // OperandRight = 3
+  // OperatorRight = DW_OP_consts
+  // CurrMathOperator = DW_OP_plus
+  // OperandLeft = 2
+  // OperandLeft = DW_OP_constu
+
+  /// The math operator of the new subexpression being appended to the
+  /// DIExpression.
+  uint64_t NewMathOperator = 0;
+
+  /// The operand of the new subexpression being appended to the DIExpression.
+  uint64_t OperandRight;
+
+  /// The operator attached to OperandRight.
+  uint64_t OperatorRight = 0;
+
+  /// The math operator at the end of the current DIExpression.
+  uint64_t CurrMathOperator = 0;
+
+  /// The operand at the end of the current DIExpression at the top of the
+  /// stack.
+  uint64_t OperandLeft;
+
+  /// The operator attached to OperandLeft.
+  uint64_t OperatorLeft = 0;
+
+  bool operatorIsCommutative(uint64_t Operator);
+
+  SmallVector<uint64_t> getOperatorLocations(ArrayRef<uint64_t> Ops);
+
+  void reset();
+
+public:
+  bool operatorCanBeOptimized(uint64_t Operator);
+  void optimize(SmallVectorImpl<uint64_t> &NewOps, ArrayRef<uint64_t> Ops);
+};
+
+/// Holds a DIExpression and keeps track of how many operands have been consumed
+/// so far.
+class DIExpressionCursor {
+  DIExpression::expr_op_iterator Start, End;
+
+public:
+  DIExpressionCursor(const DIExpression *Expr) {
+    if (!Expr) {
+      assert(Start == End);
+      return;
+    }
+    Start = Expr->expr_op_begin();
+    End = Expr->expr_op_end();
+  }
+
+  DIExpressionCursor(ArrayRef<uint64_t> Expr)
+      : Start(Expr.begin()), End(Expr.end()) {}
+
+  DIExpressionCursor(const DIExpressionCursor &) = default;
+
+  /// Consume one operation.
+  std::optional<DIExpression::ExprOperand> take() {
+    if (Start == End)
+      return std::nullopt;
+    return *(Start++);
+  }
+
+  /// Consume N operations.
+  void consume(unsigned N) { std::advance(Start, N); }
+
+  /// Return the current operation.
+  std::optional<DIExpression::ExprOperand> peek() const {
+    if (Start == End)
+      return std::nullopt;
+    return *(Start);
+  }
+
+  /// Return the next operation.
+  std::optional<DIExpression::ExprOperand> peekNext() const {
+    if (Start == End)
+      return std::nullopt;
+
+    auto Next = Start.getNext();
+    if (Next == End)
+      return std::nullopt;
+
+    return *Next;
+  }
+
+  /// Determine whether there are any operations left in this expression.
+  operator bool() const { return Start != End; }
+
+  DIExpression::expr_op_iterator begin() const { return Start; }
+  DIExpression::expr_op_iterator end() const { return End; }
+
+  /// Retrieve the fragment information, if any.
+  std::optional<DIExpression::FragmentInfo> getFragmentInfo() const {
+    return DIExpression::getFragmentInfo(Start, End);
+  }
+};
+
 /// Global variables.
 ///
 /// TODO: Remove DisplayName.  It's always equal to Name.
diff --git a/llvm/lib/CodeGen/AsmPrinter/DwarfExpression.h b/llvm/lib/CodeGen/AsmPrinter/DwarfExpression.h
index 667a9efc6f6c04f..4daa78b15b8e298 100644
--- a/llvm/lib/CodeGen/AsmPrinter/DwarfExpression.h
+++ b/llvm/lib/CodeGen/AsmPrinter/DwarfExpression.h
@@ -31,67 +31,6 @@ class DIELoc;
 class TargetRegisterInfo;
 class MachineLocation;
 
-/// Holds a DIExpression and keeps track of how many operands have been consumed
-/// so far.
-class DIExpressionCursor {
-  DIExpression::expr_op_iterator Start, End;
-
-public:
-  DIExpressionCursor(const DIExpression *Expr) {
-    if (!Expr) {
-      assert(Start == End);
-      return;
-    }
-    Start = Expr->expr_op_begin();
-    End = Expr->expr_op_end();
-  }
-
-  DIExpressionCursor(ArrayRef<uint64_t> Expr)
-      : Start(Expr.begin()), End(Expr.end()) {}
-
-  DIExpressionCursor(const DIExpressionCursor &) = default;
-
-  /// Consume one operation.
-  std::optional<DIExpression::ExprOperand> take() {
-    if (Start == End)
-      return std::nullopt;
-    return *(Start++);
-  }
-
-  /// Consume N operations.
-  void consume(unsigned N) { std::advance(Start, N); }
-
-  /// Return the current operation.
-  std::optional<DIExpression::ExprOperand> peek() const {
-    if (Start == End)
-      return std::nullopt;
-    return *(Start);
-  }
-
-  /// Return the next operation.
-  std::optional<DIExpression::ExprOperand> peekNext() const {
-    if (Start == End)
-      return std::nullopt;
-
-    auto Next = Start.getNext();
-    if (Next == End)
-      return std::nullopt;
-
-    return *Next;
-  }
-
-  /// Determine whether there are any operations left in this expression.
-  operator bool() const { return Start != End; }
-
-  DIExpression::expr_op_iterator begin() const { return Start; }
-  DIExpression::expr_op_iterator end() const { return End; }
-
-  /// Retrieve the fragment information, if any.
-  std::optional<DIExpression::FragmentInfo> getFragmentInfo() const {
-    return DIExpression::getFragmentInfo(Start, End);
-  }
-};
-
 /// Base class containing the logic for constructing DWARF expressions
 /// independently of whether they are emitted into a DIE or into a .debug_loc
 /// entry.
diff --git a/llvm/lib/IR/DebugInfoMetadata.cpp b/llvm/lib/IR/DebugInfoMetadata.cpp
index f7f36129ec8557c..dfda851b77777ad 100644
--- a/llvm/lib/IR/DebugInfoMetadata.cpp
+++ b/llvm/lib/IR/DebugInfoMetadata.cpp
@@ -1734,6 +1734,143 @@ const DIExpression *DIExpression::extractAddressClass(const DIExpression *Expr,
   return Expr;
 }
 
+SmallVector<uint64_t>
+DIExpressionOptimizer::getOperatorLocations(ArrayRef<uint64_t> Ops) {
+  SmallVector<uint64_t> OpLocs;
+  uint64_t Loc = 0;
+  DIExpressionCursor Cursor(Ops);
+
+  while (Cursor.peek()) {
+    OpLocs.push_back(Loc);
+    auto Size = Cursor.peek()->getSize();
+    Loc += Size;
+    Cursor.consume(1);
+  }
+
+  return OpLocs;
+}
+
+bool DIExpressionOptimizer::operatorIsCommutative(uint64_t Operator) {
+  return Operator == dwarf::DW_OP_mul || Operator == dwarf::DW_OP_plus;
+}
+
+bool DIExpressionOptimizer::operatorCanBeOptimized(uint64_t Operator) {
+  switch (Operator) {
+  case dwarf::DW_OP_plus:
+  case dwarf::DW_OP_plus_uconst:
+  case dwarf::DW_OP_minus:
+  case dwarf::DW_OP_mul:
+  case dwarf::DW_OP_div:
+  case dwarf::DW_OP_shl:
+  case dwarf::DW_OP_shr:
+    return true;
+  default:
+    return false;
+  }
+}
+
+void DIExpressionOptimizer::reset() {
+  NewMathOperator = 0;
+  CurrMathOperator = 0;
+  OperatorLeft = 0;
+  OperatorRight = 0;
+}
+
+void DIExpressionOptimizer::optimize(SmallVectorImpl<uint64_t> &NewOps,
+                                     ArrayRef<uint64_t> Ops) {
+
+  if (Ops.size() == 2 && Ops[0] == dwarf::DW_OP_plus_uconst) {
+    // Convert a {DW_OP_plus_uconst, <constant value>} expression to
+    // {DW_OP_constu, <constant value>, DW_OP_plus}
+    NewMathOperator = dwarf::DW_OP_plus;
+    OperandRight = Ops[1];
+    OperatorRight = dwarf::DW_OP_constu;
+  } else if (Ops.size() == 3) {
+    NewMathOperator = Ops[2];
+    OperandRight = Ops[1];
+    OperatorRight = Ops[0];
+  } else {
+    // Currently, DIExpressionOptimizer only supports arithmetic operations
+    // such as Ops = {DW_OP_constu, 1, DW_OP_mul}
+    NewOps.append(Ops.begin(), Ops.end());
+    return;
+  }
+
+  if (OperatorRight != dwarf::DW_OP_constu &&
+      OperatorRight != dwarf::DW_OP_consts) {
+    NewOps.append(Ops.begin(), Ops.end());
+    return;
+  }
+
+  uint64_t CurrOperator;
+
+  if ((NewMathOperator == dwarf::DW_OP_mul ||
+       NewMathOperator == dwarf::DW_OP_div) &&
+      OperandRight == 1) {
+    // It is a multiply or divide by 1
+    reset();
+    return;
+  }
+  if ((NewMathOperator == dwarf::DW_OP_plus ||
+       NewMathOperator == dwarf::DW_OP_minus ||
+       NewMathOperator == dwarf::DW_OP_shl ||
+       NewMathOperator == dwarf::DW_OP_shr) &&
+      OperandRight == 0) {
+    // It is a add, subtract, shift left, or shift right with 0
+    reset();
+    return;
+  }
+
+  // Get the locations of the operators in the DIExpression
+  auto OperatorLocs = getOperatorLocations(NewOps);
+
+  for (auto It = OperatorLocs.rbegin(); It != OperatorLocs.rend(); It++) {
+    // Only look at the last two operators of the DIExpression to see if the new
+    // operators can be constant folded with them.
+    if (std::distance(OperatorLocs.rbegin(), It) > 1)
+      break;
+    CurrOperator = NewOps[*It];
+    if (CurrOperator == dwarf::DW_OP_mul || CurrOperator == dwarf::DW_OP_div ||
+        CurrOperator == dwarf::DW_OP_plus ||
+        CurrOperator == dwarf::DW_OP_minus) {
+      CurrMathOperator = CurrOperator;
+      continue;
+    }
+    if (CurrOperator == dwarf::DW_OP_plus_uconst &&
+        NewMathOperator == dwarf::DW_OP_plus) {
+      NewOps[*It + 1] = OperandRight + NewOps[*It + 1];
+      reset();
+      return;
+    }
+    if (CurrOperator == dwarf::DW_OP_constu ||
+        CurrOperator == dwarf::DW_OP_consts) {
+      OperatorLeft = CurrOperator;
+      OperandLeft = NewOps[*It + 1];
+
+      if (NewMathOperator == CurrMathOperator &&
+          operatorIsCommutative(NewMathOperator)) {
+        switch (NewMathOperator) {
+        case dwarf::DW_OP_plus:
+          NewOps[*It + 1] = OperandRight + OperandLeft;
+          break;
+        case dwarf::DW_OP_mul:
+          NewOps[*It + 1] = OperandRight * OperandLeft;
+          break;
+        default:
+          llvm_unreachable("Operator is not multiplication or addition!");
+        }
+        reset();
+        return;
+      }
+      break;
+    }
+    break;
+  }
+  NewOps.append(Ops.begin(), Ops.end());
+  reset();
+  return;
+}
+
 DIExpression *DIExpression::prepend(const DIExpression *Expr, uint8_t Flags,
                                     int64_t Offset) {
   SmallVector<uint64_t, 8> Ops;

@pogo59
Copy link
Collaborator

pogo59 commented Oct 20, 2023

how does this relate to (or supersede) DIExpression::constantFold() ?

@rastogishubham
Copy link
Contributor Author

Hi @pogo59

It seems like DIExpression::constantFold only works on DW_OP_LLVM_convert operations. While this is folding mathematical expressions such as two DW_OP_plus in succession

@pogo59
Copy link
Collaborator

pogo59 commented Oct 23, 2023

I suspect we would prefer not to have two separate constant folders. Can you combine them?

Copy link
Collaborator

@adrian-prantl adrian-prantl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in general this looks very useful.
Can you explain why DIExpressionOptimizer needs to keep state?

@@ -3073,6 +3073,109 @@ template <> struct DenseMapInfo<DIExpression::FragmentInfo> {
static bool isEqual(const FragInfo &A, const FragInfo &B) { return A == B; }
};

class DIExpressionOptimizer {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a Doxygen comment for the entire class with just a sentence on what this does?

}

bool DIExpressionOptimizer::operatorIsCommutative(uint64_t Operator) {
return Operator == dwarf::DW_OP_mul || Operator == dwarf::DW_OP_plus;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about DW_OP_and, or, xor?

}
}

void DIExpressionOptimizer::reset() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed for performance reasons? If not, it is easier to reason about the code if a fresh DIExpressionOptimizer is created for each use.

void DIExpressionOptimizer::optimize(SmallVectorImpl<uint64_t> &NewOps,
ArrayRef<uint64_t> Ops) {

if (Ops.size() == 2 && Ops[0] == dwarf::DW_OP_plus_uconst) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why == 2 and not >= 2?


/// The math operator of the new subexpression being appended to the
/// DIExpression.
uint64_t NewMathOperator = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about calling MathOperator more generically BinOp, or BinaryOperator, so all operations that take two arguments fit the description?

if (OperatorRight != dwarf::DW_OP_constu &&
OperatorRight != dwarf::DW_OP_consts) {
NewOps.append(Ops.begin(), Ops.end());
return;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the state preserved in this case but not in all the others?

@adrian-prantl
Copy link
Collaborator

I suspect we would prefer not to have two separate constant folders. Can you combine them?

+1, that sounds like a good idea!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants