[AssignmentTracking] Skip large types in redundant debug info pruning #74329

OCHyams · 2023-12-04T15:22:25Z

Fix #74189 (crash report).

The pruning code uses a BitVector to track which parts of a variable have been
defined in order to find redundant debug records. BitVector uses a u32 to track
size; variable of types with a bit-size greater than max(u32) ish* can't be
represented using a BitVector.

Fix the assertion by introducing a limit on type size. Improve performance by
bringing the limit down to a sensible number and tracking byte-sizes instead
of bit-sizes.

Skipping variables in this pruning code doesn't cause debug info correctness issues;
it just means there may be some extra redundant debug records.

(*) max(u32) - 63 due to BitVector::NumBitWords implementation.

llvmbot · 2023-12-04T15:22:52Z

@llvm/pr-subscribers-debuginfo

Author: Orlando Cazalet-Hyams (OCHyams)

Changes

The pruning code uses a BitVector for varibles. BitVector supports sizes less than
max(uint32)-63, so skip equal and larger types. Fixes #74189 (crash report).

The reason it's max(uint32)-63 and not max(uint32) is because of BitVector::NumBitWords:

  unsigned NumBitWords(unsigned S) const {
    return (S + BITWORD_SIZE-1) / BITWORD_SIZE;
  }

The numerator evaluation yields an incorrect value when S + BITWORD_SIZE wraps to >= 1. I'm hesitant to change BitVector code, so have gone with this defensive fix.

Full diff: https://github.com/llvm/llvm-project/pull/74329.diff

2 Files Affected:

(modified) llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp (+4-1)
(added) llvm/test/DebugInfo/assignment-tracking/X86/large-type.ll (+58)

diff --git a/llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp b/llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp
index f00528023c91d..2349fadb1ea6d 100644
--- a/llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp
+++ b/llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp
@@ -2296,8 +2296,11 @@ removeRedundantDbgLocsUsingBackwardScan(const BasicBlock *BB,
           getAggregate(FnVarLocs.getVariable(RIt->VariableID));
       uint64_t SizeInBits = Aggr.first->getSizeInBits().value_or(0);
 
-      if (SizeInBits == 0) {
+      const uint64_t MaxSize = std::numeric_limits<unsigned>::max() - 63;
+      if (SizeInBits == 0 || SizeInBits > MaxSize) {
         // If the size is unknown (0) then keep this location def to be safe.
+        // Do the same for defs of very large variables, which can't be
+        // represented with a BitVector.
         NewDefsReversed.push_back(*RIt);
         continue;
       }
diff --git a/llvm/test/DebugInfo/assignment-tracking/X86/large-type.ll b/llvm/test/DebugInfo/assignment-tracking/X86/large-type.ll
new file mode 100644
index 0000000000000..5fa7ead58574d
--- /dev/null
+++ b/llvm/test/DebugInfo/assignment-tracking/X86/large-type.ll
@@ -0,0 +1,58 @@
+; RUN: llc %s -stop-after=finalize-isel -o - \
+; RUN: | FileCheck %s --implicit-check-not=DBG_
+
+;; Based on optimized IR from C source:
+;; int main () {
+;;   char a1[__INT_MAX__];
+;;   a1[__INT_MAX__ - 1] = 5;
+;;   return a1[__INT_MAX__ - 1];
+;; }
+;;
+;; Check obscenely large types don't cause a crash.
+; CHECK: DBG_VALUE 5, $noreg, ![[#]], !DIExpression(DW_OP_LLVM_fragment, 4294967280, 8)
+; CHECK: DBG_VALUE 6, $noreg, ![[#]], !DIExpression(DW_OP_LLVM_fragment, 0, 8)
+; CHECK: DBG_VALUE 7, $noreg, ![[#]], !DIExpression(DW_OP_LLVM_fragment, 0, 8)
+
+define dso_local i32 @main() local_unnamed_addr !dbg !10 {
+entry:
+;; FIXME: SROA currently creates incorrect fragments if bit_offset > max(u32),
+;; with and without assignment-tracking.
+  tail call void @llvm.dbg.value(metadata i8 5, metadata !15, metadata !DIExpression(DW_OP_LLVM_fragment, 4294967280, 8)), !dbg !20
+;; These two were inserted by hand.
+  tail call void @llvm.dbg.value(metadata i8 6, metadata !22, metadata !DIExpression(DW_OP_LLVM_fragment, 0, 8)), !dbg !20
+  tail call void @llvm.dbg.value(metadata i8 7, metadata !23, metadata !DIExpression(DW_OP_LLVM_fragment, 0, 8)), !dbg !20
+  ret i32 5, !dbg !21
+}
+
+declare void @llvm.dbg.value(metadata, metadata, metadata)
+
+!llvm.dbg.cu = !{!0}
+!llvm.module.flags = !{!2, !3, !4, !5, !6, !7, !8}
+!llvm.ident = !{!9}
+
+!0 = distinct !DICompileUnit(language: DW_LANG_C11, file: !1, producer: "clang version 18.0.0", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, splitDebugInlining: false, nameTableKind: None)
+!1 = !DIFile(filename: "test.c", directory: "/")
+!2 = !{i32 7, !"Dwarf Version", i32 5}
+!3 = !{i32 2, !"Debug Info Version", i32 3}
+!4 = !{i32 1, !"wchar_size", i32 4}
+!5 = !{i32 8, !"PIC Level", i32 2}
+!6 = !{i32 7, !"PIE Level", i32 2}
+!7 = !{i32 7, !"uwtable", i32 2}
+!8 = !{i32 7, !"debug-info-assignment-tracking", i1 true}
+!9 = !{!"clang version 18.0.0"}
+!10 = distinct !DISubprogram(name: "main", scope: !1, file: !1, line: 3, type: !11, scopeLine: 4, flags: DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0, retainedNodes: !14)
+!11 = !DISubroutineType(types: !12)
+!12 = !{!13}
+!13 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
+!14 = !{!15}
+!15 = !DILocalVariable(name: "a1", scope: !10, file: !1, line: 5, type: !16)
+!16 = !DICompositeType(tag: DW_TAG_array_type, baseType: !17, size: 17179869176, elements: !18)
+!17 = !DIBasicType(name: "char", size: 8, encoding: DW_ATE_signed_char)
+!18 = !{!19}
+!19 = !DISubrange(count: 2147483647)
+!20 = !DILocation(line: 0, scope: !10)
+!21 = !DILocation(line: 7, column: 3, scope: !10)
+!22 = !DILocalVariable(name: "a2", scope: !10, file: !1, line: 5, type: !16)
+!23 = !DILocalVariable(name: "a3", scope: !10, file: !1, line: 5, type: !16)
+!24 = !DICompositeType(tag: DW_TAG_array_type, baseType: !17, size: 4294967232, elements: !18)
+!25 = !DICompositeType(tag: DW_TAG_array_type, baseType: !17, size: 4294967233, elements: !18)

The pruning code uses a BitVector for varibles. BitVector supports sizes less than max(uint32)-63, so skip equal and larger types. Fixes llvm#74189 (crash report). The reason it's `max(uint32)-63` and not `max(uint32)` is because of `BitVector::NumBitWords`: unsigned NumBitWords(unsigned S) const { return (S + BITWORD_SIZE-1) / BITWORD_SIZE; } The numerator evaluation yields an incorrect value when `S + BITWORD_SIZE` wraps to `>= 1`

jryans

Overall reasonable, but see comments below.

Also, there's a typo in the commit and PR title: "AssignemntTracking" should be "AssignmentTracking"

jryans · 2023-12-04T17:27:54Z

llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp

@@ -2296,8 +2296,11 @@ removeRedundantDbgLocsUsingBackwardScan(const BasicBlock *BB,
          getAggregate(FnVarLocs.getVariable(RIt->VariableID));
      uint64_t SizeInBits = Aggr.first->getSizeInBits().value_or(0);

-      if (SizeInBits == 0) {
+      const uint64_t MaxSize = std::numeric_limits<unsigned>::max() - 63;


Is there some constant etc. from the BitVector code that could be reused here to compute this 63 value? It feels like a random magic number at the moment.

Can do - I'll change this if we keep the cutoff tied to bitvector.

nikic · 2023-12-04T19:46:48Z

Independently of the crash, doesn't this mean that this code will be incredibly slow for large types?

OCHyams · 2023-12-05T08:19:51Z

Thanks both for the reviews.

@nikic

Independently of the crash, doesn't this mean that this code will be incredibly slow for large types?

There's probably a point at which pruning becomes unprofitable. We could lower the threshold - do you have a ballpark figure you'd suggest?

jmorse · 2023-12-05T15:04:42Z

This sounds good, while agreeing with the comments so far. It's tempting to say that "anyone who's allocating a massively sized stack variable is asking for trouble" and to not support that situation well. Unfortunately that's also exactly the scenario where developers probably want their debugger to work well. What are the implications of over-sized types being ignored by this code, are there any correctness issues or is it just efficiency?

OCHyams · 2023-12-05T16:19:17Z

What are the implications of over-sized types being ignored by this code, are there any correctness issues or is it just efficiency?

Just efficiency.

As well as choosing a smaller size limit, another easy win here could be to use byte sizes rather than bit sizes, which would bring down the bitvector sizes in all cases (at the cost accidentally not pruning where some fragments don't fit into bytes exactly, but that'll probably be a very rare situation and doesn't impose any correctness problems).

I can also have a look to see if there's something else we can use. Ideally we actually want something like an interval map, though IntervalMap as it exists now is not useful for this since it doesn't support overlapping insertion (and also might be a bit heavy weight too).

jryans

Thanks, the issues I mentioned appear to be resolved. I have no opinion on the details of exactly what cutoff to use, so I'll leave that for others to weigh in on.

Issues resolved

nikic · 2023-12-07T14:50:21Z

llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp

@@ -2295,33 +2295,41 @@ removeRedundantDbgLocsUsingBackwardScan(const BasicBlock *BB,
      DebugAggregate Aggr =
          getAggregate(FnVarLocs.getVariable(RIt->VariableID));
      uint64_t SizeInBits = Aggr.first->getSizeInBits().value_or(0);
+      uint64_t SizeInBytes = SizeInBits + 7 / 8;


(Just to embarrass my friend: Note that 7/8 == 0. But divideCeil fixes that.)

nikic · 2023-12-07T14:51:43Z

llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp


      DIExpression::FragmentInfo Fragment =
          RIt->Expr->getFragmentInfo().value_or(
              DIExpression::FragmentInfo(SizeInBits, 0));
      bool InvalidFragment = Fragment.endInBits() > SizeInBits;
+      uint64_t StartInBytes = Fragment.startInBits() + 7 / 8;


Not familiar with this code, but shouldn't the start be rounded down?

Yep that's right, nice catch.

OCHyams · 2023-12-07T16:20:39Z

Address outstanding feedback
Update summary

SLTozer

Got a small question about the BitVector FIXME, but otherwise this LGTM.

SLTozer · 2023-12-11T11:26:13Z

llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp

        // If the size is unknown (0) then keep this location def to be safe.
+        // Do the same for defs of large variables, which would be expensive
+        // to represent with a BitVector.
+        // FIXME: Don't use a BitVector.


Is there a reason to not use a BitVector here, given that as discussed we don't lose correctness, and for the vast majority of cases we are well below this limit?

No I suppose not. I was thinking it'd be good to replace it with some custom interval map (one with properties that make using it worth it here), but that's a hypothetical data structure for which we don't know the performance characteristics. More investigation would be required - it's not clear that the solution in this patch would be better or worse, so I'll remove the comment.

OCHyams requested a review from jmorse December 4, 2023 15:22

llvmbot added the debuginfo label Dec 4, 2023

OCHyams mentioned this pull request Dec 4, 2023

Clang-18 crash when compiled with -O2 -g -c. #74189

Closed

OCHyams force-pushed the large-types branch from 989bdf6 to 2af2bf6 Compare December 4, 2023 16:07

jryans previously requested changes Dec 4, 2023

View reviewed changes

OCHyams changed the title ~~[AssignemntTracking] Skip large types in redundant debug info pruning~~ [AssignmentTracking] Skip large types in redundant debug info pruning Dec 5, 2023

Bring cutoff down, track bytes rather than bits

6c8bb29

OCHyams requested a review from nikic December 6, 2023 14:39

jryans reviewed Dec 6, 2023

View reviewed changes

nikic reviewed Dec 7, 2023

View reviewed changes

address comments

eb75b66

SLTozer approved these changes Dec 11, 2023

View reviewed changes

rm comment

a1a8e34

OCHyams merged commit 5d55839 into llvm:main Dec 11, 2023
3 of 4 checks passed

OCHyams deleted the large-types branch January 5, 2024 10:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AssignmentTracking] Skip large types in redundant debug info pruning #74329

[AssignmentTracking] Skip large types in redundant debug info pruning #74329

OCHyams commented Dec 4, 2023 •

edited

llvmbot commented Dec 4, 2023

jryans left a comment

jryans Dec 4, 2023

OCHyams Dec 5, 2023

nikic commented Dec 4, 2023

OCHyams commented Dec 5, 2023

jmorse commented Dec 5, 2023

OCHyams commented Dec 5, 2023

jryans left a comment

nikic Dec 7, 2023

pogo59 Dec 7, 2023

nikic Dec 7, 2023

OCHyams Dec 7, 2023

OCHyams commented Dec 7, 2023

SLTozer left a comment

SLTozer Dec 11, 2023

OCHyams Dec 11, 2023

[AssignmentTracking] Skip large types in redundant debug info pruning #74329

[AssignmentTracking] Skip large types in redundant debug info pruning #74329

Conversation

OCHyams commented Dec 4, 2023 • edited

llvmbot commented Dec 4, 2023

jryans left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikic commented Dec 4, 2023

OCHyams commented Dec 5, 2023

jmorse commented Dec 5, 2023

OCHyams commented Dec 5, 2023

jryans left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OCHyams commented Dec 7, 2023

SLTozer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OCHyams commented Dec 4, 2023 •

edited