Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Symbolizer] Support for Missing Line Numbers. #82240

Merged
merged 22 commits into from
Aug 5, 2024

Conversation

ampandey-1995
Copy link
Contributor

@ampandey-1995 ampandey-1995 commented Feb 19, 2024

LLVM Symbolizer attempt to symbolize addresses of optimized binaries reports missing line numbers for some cases. It maybe due to compiler which sometimes cannot map an instruction to line number due to optimizations. Symbolizer should handle those cases gracefully.

Adding an option '--skip-line-zero' to symbolizer so as to report the nearest non-zero line number.

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 19, 2024

@llvm/pr-subscribers-lld-wasm
@llvm/pr-subscribers-lld-macho
@llvm/pr-subscribers-llvm-binary-utilities
@llvm/pr-subscribers-lld-coff
@llvm/pr-subscribers-debuginfo

@llvm/pr-subscribers-lld

Author: None (ampandey-1995)

Changes

LLVM Symbolizer attempt to symbolize addresses of optimized binaries reports missing line numbers for some cases. It maybe due to compiler which sometimes cannot map an instruction to line number due to optimizations. Symbolizer should handle those cases gracefully.

Adding an option '-approximate-line-info=<before/after>' to symbolizer so as to report the nearest non-zero line number.


Full diff: https://github.com/llvm/llvm-project/pull/82240.diff

12 Files Affected:

  • (modified) bolt/lib/Core/BinaryFunction.cpp (+2-1)
  • (modified) lld/Common/DWARF.cpp (+2-1)
  • (modified) llvm/docs/CommandGuide/llvm-symbolizer.rst (+4)
  • (modified) llvm/include/llvm/DebugInfo/DIContext.h (+5-3)
  • (modified) llvm/include/llvm/DebugInfo/DWARF/DWARFDebugLine.h (+10-6)
  • (modified) llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h (+2)
  • (modified) llvm/lib/DebugInfo/DWARF/DWARFContext.cpp (+5-5)
  • (modified) llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp (+28-7)
  • (modified) llvm/lib/DebugInfo/Symbolize/Symbolize.cpp (+6-2)
  • (added) llvm/test/tools/llvm-symbolizer/approximate-line-info.s (+142)
  • (modified) llvm/tools/llvm-symbolizer/Opts.td (+1)
  • (modified) llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp (+8)
diff --git a/bolt/lib/Core/BinaryFunction.cpp b/bolt/lib/Core/BinaryFunction.cpp
index 54f2f9d972a461..0b37eab302cdc9 100644
--- a/bolt/lib/Core/BinaryFunction.cpp
+++ b/bolt/lib/Core/BinaryFunction.cpp
@@ -192,7 +192,8 @@ static SMLoc findDebugLineInformationForInstructionAt(
 
   SMLoc NullResult = DebugLineTableRowRef::NULL_ROW.toSMLoc();
   uint32_t RowIndex = LineTable->lookupAddress(
-      {Address, object::SectionedAddress::UndefSection});
+      {Address, object::SectionedAddress::UndefSection},
+      DILineInfoSpecifier::ApproximateLineKind::None);
   if (RowIndex == LineTable->UnknownRowIndex)
     return NullResult;
 
diff --git a/lld/Common/DWARF.cpp b/lld/Common/DWARF.cpp
index 2cd8ca4575dee5..8e1e9c6e530157 100644
--- a/lld/Common/DWARF.cpp
+++ b/lld/Common/DWARF.cpp
@@ -93,7 +93,8 @@ std::optional<DILineInfo> DWARFCache::getDILineInfo(uint64_t offset,
   DILineInfo info;
   for (const llvm::DWARFDebugLine::LineTable *lt : lineTables) {
     if (lt->getFileLineInfoForAddress(
-            {offset, sectionIndex}, nullptr,
+            {offset, sectionIndex},
+            DILineInfoSpecifier::ApproximateLineKind::None, nullptr,
             DILineInfoSpecifier::FileLineInfoKind::AbsoluteFilePath, info))
       return info;
   }
diff --git a/llvm/docs/CommandGuide/llvm-symbolizer.rst b/llvm/docs/CommandGuide/llvm-symbolizer.rst
index 59c0ab6d196ace..ee60d97babcbc7 100644
--- a/llvm/docs/CommandGuide/llvm-symbolizer.rst
+++ b/llvm/docs/CommandGuide/llvm-symbolizer.rst
@@ -216,6 +216,10 @@ OPTIONS
   This can be used to perform lookups as if the object were relocated by the
   offset.
 
+.. option:: --approximate-line-info=<before/after>
+
+  Search the object to find the approximate non-zero line numbers nearest to for a given address.
+
 .. option:: --basenames, -s
 
   Print just the file's name without any directories, instead of the
diff --git a/llvm/include/llvm/DebugInfo/DIContext.h b/llvm/include/llvm/DebugInfo/DIContext.h
index 288ddf77bdfda7..d3a625b31d2060 100644
--- a/llvm/include/llvm/DebugInfo/DIContext.h
+++ b/llvm/include/llvm/DebugInfo/DIContext.h
@@ -152,14 +152,16 @@ struct DILineInfoSpecifier {
     RelativeFilePath,
     AbsoluteFilePath
   };
+  enum ApproximateLineKind { None, Before, After };
   using FunctionNameKind = DINameKind;
-
   FileLineInfoKind FLIKind;
   FunctionNameKind FNKind;
+  ApproximateLineKind ALKind;
 
   DILineInfoSpecifier(FileLineInfoKind FLIKind = FileLineInfoKind::RawValue,
-                      FunctionNameKind FNKind = FunctionNameKind::None)
-      : FLIKind(FLIKind), FNKind(FNKind) {}
+                      FunctionNameKind FNKind = FunctionNameKind::None,
+                      ApproximateLineKind ALKind = ApproximateLineKind::None)
+      : FLIKind(FLIKind), FNKind(FNKind), ALKind(ALKind) {}
 
   inline bool operator==(const DILineInfoSpecifier &RHS) const {
     return FLIKind == RHS.FLIKind && FNKind == RHS.FNKind;
diff --git a/llvm/include/llvm/DebugInfo/DWARF/DWARFDebugLine.h b/llvm/include/llvm/DebugInfo/DWARF/DWARFDebugLine.h
index ce3bae6a1760c2..cb3531b75730f1 100644
--- a/llvm/include/llvm/DebugInfo/DWARF/DWARFDebugLine.h
+++ b/llvm/include/llvm/DebugInfo/DWARF/DWARFDebugLine.h
@@ -240,7 +240,9 @@ class DWARFDebugLine {
 
     /// Returns the index of the row with file/line info for a given address,
     /// or UnknownRowIndex if there is no such row.
-    uint32_t lookupAddress(object::SectionedAddress Address) const;
+    uint32_t
+    lookupAddress(object::SectionedAddress Address,
+                  DILineInfoSpecifier::ApproximateLineKind LineKind) const;
 
     bool lookupAddressRange(object::SectionedAddress Address, uint64_t Size,
                             std::vector<uint32_t> &Result) const;
@@ -266,10 +268,10 @@ class DWARFDebugLine {
 
     /// Fills the Result argument with the file and line information
     /// corresponding to Address. Returns true on success.
-    bool getFileLineInfoForAddress(object::SectionedAddress Address,
-                                   const char *CompDir,
-                                   DILineInfoSpecifier::FileLineInfoKind Kind,
-                                   DILineInfo &Result) const;
+    bool getFileLineInfoForAddress(
+        object::SectionedAddress Address,
+        DILineInfoSpecifier::ApproximateLineKind LineKind, const char *CompDir,
+        DILineInfoSpecifier::FileLineInfoKind Kind, DILineInfo &Result) const;
 
     /// Extracts directory name by its Entry in include directories table
     /// in prologue. Returns true on success.
@@ -301,7 +303,9 @@ class DWARFDebugLine {
     getSourceByIndex(uint64_t FileIndex,
                      DILineInfoSpecifier::FileLineInfoKind Kind) const;
 
-    uint32_t lookupAddressImpl(object::SectionedAddress Address) const;
+    uint32_t
+    lookupAddressImpl(object::SectionedAddress Address,
+                      DILineInfoSpecifier::ApproximateLineKind LineKind) const;
 
     bool lookupAddressRangeImpl(object::SectionedAddress Address, uint64_t Size,
                                 std::vector<uint32_t> &Result) const;
diff --git a/llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h b/llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h
index 11a169cfc20a69..7b560f4b7dbb2f 100644
--- a/llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h
+++ b/llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h
@@ -44,6 +44,7 @@ using namespace object;
 
 using FunctionNameKind = DILineInfoSpecifier::FunctionNameKind;
 using FileLineInfoKind = DILineInfoSpecifier::FileLineInfoKind;
+using ApproximateLineKind = DILineInfoSpecifier::ApproximateLineKind;
 
 class CachedBinary;
 
@@ -52,6 +53,7 @@ class LLVMSymbolizer {
   struct Options {
     FunctionNameKind PrintFunctions = FunctionNameKind::LinkageName;
     FileLineInfoKind PathStyle = FileLineInfoKind::AbsoluteFilePath;
+    ApproximateLineKind ApproximateLineNumbers = ApproximateLineKind::None;
     bool UseSymbolTable = true;
     bool Demangle = true;
     bool RelativeAddresses = false;
diff --git a/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp b/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
index b7297c18da7c99..9bf7dbd0acc109 100644
--- a/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
+++ b/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
@@ -1742,8 +1742,8 @@ DILineInfo DWARFContext::getLineInfoForAddress(object::SectionedAddress Address,
   if (Spec.FLIKind != FileLineInfoKind::None) {
     if (const DWARFLineTable *LineTable = getLineTableForUnit(CU)) {
       LineTable->getFileLineInfoForAddress(
-          {Address.Address, Address.SectionIndex}, CU->getCompilationDir(),
-          Spec.FLIKind, Result);
+          {Address.Address, Address.SectionIndex}, Spec.ALKind,
+          CU->getCompilationDir(), Spec.FLIKind, Result);
     }
   }
 
@@ -1838,7 +1838,7 @@ DWARFContext::getInliningInfoForAddress(object::SectionedAddress Address,
       DILineInfo Frame;
       LineTable = getLineTableForUnit(CU);
       if (LineTable && LineTable->getFileLineInfoForAddress(
-                           {Address.Address, Address.SectionIndex},
+                           {Address.Address, Address.SectionIndex}, Spec.ALKind,
                            CU->getCompilationDir(), Spec.FLIKind, Frame))
         InliningInfo.addFrame(Frame);
     }
@@ -1865,8 +1865,8 @@ DWARFContext::getInliningInfoForAddress(object::SectionedAddress Address,
         // For the topmost routine, get file/line info from line table.
         if (LineTable)
           LineTable->getFileLineInfoForAddress(
-              {Address.Address, Address.SectionIndex}, CU->getCompilationDir(),
-              Spec.FLIKind, Frame);
+              {Address.Address, Address.SectionIndex}, Spec.ALKind,
+              CU->getCompilationDir(), Spec.FLIKind, Frame);
       } else {
         // Otherwise, use call file, call line and call column from
         // previous DIE in inlined chain.
diff --git a/llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp b/llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp
index 28f05644a3aa11..c6baad8ee9b5ea 100644
--- a/llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp
+++ b/llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp
@@ -1297,10 +1297,11 @@ uint32_t DWARFDebugLine::LineTable::findRowInSeq(
 }
 
 uint32_t DWARFDebugLine::LineTable::lookupAddress(
-    object::SectionedAddress Address) const {
+    object::SectionedAddress Address,
+    DILineInfoSpecifier::ApproximateLineKind LineKind) const {
 
   // Search for relocatable addresses
-  uint32_t Result = lookupAddressImpl(Address);
+  uint32_t Result = lookupAddressImpl(Address, LineKind);
 
   if (Result != UnknownRowIndex ||
       Address.SectionIndex == object::SectionedAddress::UndefSection)
@@ -1308,11 +1309,12 @@ uint32_t DWARFDebugLine::LineTable::lookupAddress(
 
   // Search for absolute addresses
   Address.SectionIndex = object::SectionedAddress::UndefSection;
-  return lookupAddressImpl(Address);
+  return lookupAddressImpl(Address, LineKind);
 }
 
 uint32_t DWARFDebugLine::LineTable::lookupAddressImpl(
-    object::SectionedAddress Address) const {
+    object::SectionedAddress Address,
+    DILineInfoSpecifier::ApproximateLineKind LineKind) const {
   // First, find an instruction sequence containing the given address.
   DWARFDebugLine::Sequence Sequence;
   Sequence.SectionIndex = Address.SectionIndex;
@@ -1321,7 +1323,24 @@ uint32_t DWARFDebugLine::LineTable::lookupAddressImpl(
                                       DWARFDebugLine::Sequence::orderByHighPC);
   if (It == Sequences.end() || It->SectionIndex != Address.SectionIndex)
     return UnknownRowIndex;
-  return findRowInSeq(*It, Address);
+
+  uint32_t RowIndex = findRowInSeq(*It, Address);
+  if (LineKind == DILineInfoSpecifier::ApproximateLineKind::Before) {
+    for (auto SeqInst = Sequence.HighPC; SeqInst >= It->LowPC; --SeqInst) {
+      if (Rows[RowIndex].Line)
+        break;
+      Address.Address--;
+      RowIndex = findRowInSeq(*It, Address);
+    }
+  } else if (LineKind == DILineInfoSpecifier::ApproximateLineKind::After) {
+    for (auto SeqInst = Sequence.HighPC; SeqInst < It->HighPC; ++SeqInst) {
+      if (Rows[RowIndex].Line)
+        break;
+      Address.Address++;
+      RowIndex = findRowInSeq(*It, Address);
+    }
+  }
+  return RowIndex;
 }
 
 bool DWARFDebugLine::LineTable::lookupAddressRange(
@@ -1461,10 +1480,12 @@ bool DWARFDebugLine::Prologue::getFileNameByIndex(
 }
 
 bool DWARFDebugLine::LineTable::getFileLineInfoForAddress(
-    object::SectionedAddress Address, const char *CompDir,
+    object::SectionedAddress Address,
+    DILineInfoSpecifier::ApproximateLineKind LineKind, const char *CompDir,
     FileLineInfoKind Kind, DILineInfo &Result) const {
   // Get the index of row we're looking for in the line table.
-  uint32_t RowIndex = lookupAddress(Address);
+  uint32_t RowIndex = lookupAddress(Address, LineKind);
+
   if (RowIndex == -1U)
     return false;
   // Take file number and line/column from the row.
diff --git a/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp b/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
index 5f29226c14b705..18be137d91d694 100644
--- a/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
+++ b/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
@@ -71,7 +71,9 @@ LLVMSymbolizer::symbolizeCodeCommon(const T &ModuleSpecifier,
     ModuleOffset.Address += Info->getModulePreferredBase();
 
   DILineInfo LineInfo = Info->symbolizeCode(
-      ModuleOffset, DILineInfoSpecifier(Opts.PathStyle, Opts.PrintFunctions),
+      ModuleOffset,
+      DILineInfoSpecifier(Opts.PathStyle, Opts.PrintFunctions,
+                          Opts.ApproximateLineNumbers),
       Opts.UseSymbolTable);
   if (Opts.Demangle)
     LineInfo.FunctionName = DemangleName(LineInfo.FunctionName, Info);
@@ -116,7 +118,9 @@ Expected<DIInliningInfo> LLVMSymbolizer::symbolizeInlinedCodeCommon(
     ModuleOffset.Address += Info->getModulePreferredBase();
 
   DIInliningInfo InlinedContext = Info->symbolizeInlinedCode(
-      ModuleOffset, DILineInfoSpecifier(Opts.PathStyle, Opts.PrintFunctions),
+      ModuleOffset,
+      DILineInfoSpecifier(Opts.PathStyle, Opts.PrintFunctions,
+                          Opts.ApproximateLineNumbers),
       Opts.UseSymbolTable);
   if (Opts.Demangle) {
     for (int i = 0, n = InlinedContext.getNumberOfFrames(); i < n; i++) {
diff --git a/llvm/test/tools/llvm-symbolizer/approximate-line-info.s b/llvm/test/tools/llvm-symbolizer/approximate-line-info.s
new file mode 100644
index 00000000000000..b7d56b0e64534c
--- /dev/null
+++ b/llvm/test/tools/llvm-symbolizer/approximate-line-info.s
@@ -0,0 +1,142 @@
+# REQUIRES: x86-registered-target
+
+# RUN: llvm-mc -g -filetype=obj -triple=x86_64-pc-linux %s -o %t.o
+# RUN: llvm-symbolizer --obj=%t.o 0xa | FileCheck --check-prefix=APPROX-NONE %s
+# RUN: llvm-symbolizer --obj=%t.o --approximate-line-info=before 0xa | FileCheck --check-prefix=APPROX-BEFORE %s
+# RUN: llvm-symbolizer --obj=%t.o --approximate-line-info=after 0xa | FileCheck --check-prefix=APPROX-AFTER %s
+
+# APPROX-NONE: main
+# APPROX-NONE-NEXT: /home/ampandey/test-hip/main.c:0:6
+# APPROX-BEFORE: main
+# APPROX-BEFORE-NEXT: /home/ampandey/test-hip/main.c:4:6
+# APPROX-AFTER: main
+# APPROX-AFTER-NEXT: /home/ampandey/test-hip/main.c:8:2
+
+## Generated from C Code
+##
+## int foo = 0;
+## int x=89;
+## int main() {
+## if(x)
+##  return foo;
+## else
+##  return x;
+## }
+##
+## clang -S -O3 -gline-tables-only --target=x86_64-pc-linux
+
+	.text
+	.file	"main.c"
+	.globl	main                            # -- Begin function main
+	.p2align	4, 0x90
+	.type	main,@function
+main:                                   # @main
+.Lfunc_begin0:
+	.file	0 "/home/ampandey/test-hip" "main.c" md5 0x26c3fbaea8e6febaf09ef44d37ec5ecc
+	.cfi_startproc
+# %bb.0:                                # %entry
+	.loc	0 4 6 prologue_end              # main.c:4:6
+	movl	x(%rip), %eax
+	testl	%eax, %eax
+	je	.LBB0_2
+# %bb.1:                                # %entry
+	.loc	0 0 6 is_stmt 0                 # main.c:0:6
+	movl	foo(%rip), %eax
+.LBB0_2:                                # %entry
+	.loc	0 8 2 is_stmt 1                 # main.c:8:2
+	retq
+.Ltmp0:
+.Lfunc_end0:
+	.size	main, .Lfunc_end0-main
+	.cfi_endproc
+                                        # -- End function
+	.type	foo,@object                     # @foo
+	.bss
+	.globl	foo
+	.p2align	2, 0x0
+foo:
+	.long	0                               # 0x0
+	.size	foo, 4
+
+	.type	x,@object                       # @x
+	.data
+	.globl	x
+	.p2align	2, 0x0
+x:
+	.long	89                              # 0x59
+	.size	x, 4
+
+	.section	.debug_abbrev,"",@progbits
+	.byte	1                               # Abbreviation Code
+	.byte	17                              # DW_TAG_compile_unit
+	.byte	0                               # DW_CHILDREN_no
+	.byte	37                              # DW_AT_producer
+	.byte	37                              # DW_FORM_strx1
+	.byte	19                              # DW_AT_language
+	.byte	5                               # DW_FORM_data2
+	.byte	3                               # DW_AT_name
+	.byte	37                              # DW_FORM_strx1
+	.byte	114                             # DW_AT_str_offsets_base
+	.byte	23                              # DW_FORM_sec_offset
+	.byte	16                              # DW_AT_stmt_list
+	.byte	23                              # DW_FORM_sec_offset
+	.byte	27                              # DW_AT_comp_dir
+	.byte	37                              # DW_FORM_strx1
+	.byte	17                              # DW_AT_low_pc
+	.byte	27                              # DW_FORM_addrx
+	.byte	18                              # DW_AT_high_pc
+	.byte	6                               # DW_FORM_data4
+	.byte	115                             # DW_AT_addr_base
+	.byte	23                              # DW_FORM_sec_offset
+	.byte	0                               # EOM(1)
+	.byte	0                               # EOM(2)
+	.byte	0                               # EOM(3)
+	.section	.debug_info,"",@progbits
+.Lcu_begin0:
+	.long	.Ldebug_info_end0-.Ldebug_info_start0 # Length of Unit
+.Ldebug_info_start0:
+	.short	5                               # DWARF version number
+	.byte	1                               # DWARF Unit Type
+	.byte	8                               # Address Size (in bytes)
+	.long	.debug_abbrev                   # Offset Into Abbrev. Section
+	.byte	1                               # Abbrev [1] 0xc:0x17 DW_TAG_compile_unit
+	.byte	0                               # DW_AT_producer
+	.short	29                              # DW_AT_language
+	.byte	1                               # DW_AT_name
+	.long	.Lstr_offsets_base0             # DW_AT_str_offsets_base
+	.long	.Lline_table_start0             # DW_AT_stmt_list
+	.byte	2                               # DW_AT_comp_dir
+	.byte	0                               # DW_AT_low_pc
+	.long	.Lfunc_end0-.Lfunc_begin0       # DW_AT_high_pc
+	.long	.Laddr_table_base0              # DW_AT_addr_base
+.Ldebug_info_end0:
+	.section	.debug_str_offsets,"",@progbits
+	.long	16                              # Length of String Offsets Set
+	.short	5
+	.short	0
+.Lstr_offsets_base0:
+	.section	.debug_str,"MS",@progbits,1
+.Linfo_string0:
+	.asciz	"clang version 19.0.0git (git@github.com:ampandey-1995/llvm-project.git 6751baed8d1ee8c5fd12fe5a06aa67275fc1ebf6)" # string offset=0
+.Linfo_string1:
+	.asciz	"main.c"                        # string offset=113
+.Linfo_string2:
+	.asciz	"/home/ampandey/test-hip"       # string offset=120
+	.section	.debug_str_offsets,"",@progbits
+	.long	.Linfo_string0
+	.long	.Linfo_string1
+	.long	.Linfo_string2
+	.section	.debug_addr,"",@progbits
+	.long	.Ldebug_addr_end0-.Ldebug_addr_start0 # Length of contribution
+.Ldebug_addr_start0:
+	.short	5                               # DWARF version number
+	.byte	8                               # Address size
+	.byte	0                               # Segment selector size
+.Laddr_table_base0:
+	.quad	.Lfunc_begin0
+.Ldebug_addr_end0:
+	.ident	"clang version 19.0.0git (git@github.com:ampandey-1995/llvm-project.git 6751baed8d1ee8c5fd12fe5a06aa67275fc1ebf6)"
+	.section	".note.GNU-stack","",@progbits
+	.addrsig
+	.section	.debug_line,"",@progbits
+.Lline_table_start0:
diff --git a/llvm/tools/llvm-symbolizer/Opts.td b/llvm/tools/llvm-symbolizer/Opts.td
index edc80bfe59673b..80ec4721c45e00 100644
--- a/llvm/tools/llvm-symbolizer/Opts.td
+++ b/llvm/tools/llvm-symbolizer/Opts.td
@@ -17,6 +17,7 @@ def grp_mach_o : OptionGroup<"kind">,
                  HelpText<"llvm-symbolizer Mach-O Specific Options">;
 
 def addresses : F<"addresses", "Show address before line information">;
+defm approximate_line_info : Eq<"approximate-line-info","Find approximate non-zero line number information nearest to given address.">,Values<"<before/after>">;
 defm adjust_vma
     : Eq<"adjust-vma", "Add specified offset to object file addresses">,
       MetaVarName<"<offset>">;
diff --git a/llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp b/llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
index b98bdbc388faf2..530dbdfd5c8b5e 100644
--- a/llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
+++ b/llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
@@ -482,6 +482,14 @@ int llvm_symbolizer_main(int argc, char **argv, const llvm::ToolContext &) {
   } else {
     Opts.PathStyle = DILineInfoSpecifier::FileLineInfoKind::AbsoluteFilePath;
   }
+  StringRef ApproximateLineKindVal =
+      Args.getLastArgValue(OPT_approximate_line_info_EQ);
+  Opts.ApproximateLineNumbers =
+      ApproximateLineKindVal == "before"
+          ? DILineInfoSpecifier::ApproximateLineKind::Before
+      : ApproximateLineKindVal == "after"
+          ? DILineInfoSpecifier::ApproximateLineKind::After
+          : DILineInfoSpecifier::ApproximateLineKind::None;
   Opts.DebugFileDirectory = Args.getAllArgValues(OPT_debug_file_directory_EQ);
   Opts.DefaultArch = Args.getLastArgValue(OPT_default_arch_EQ).str();
   Opts.Demangle = Args.hasFlag(OPT_demangle, OPT_no_demangle, !IsAddr2Line);

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 19, 2024

@llvm/pr-subscribers-lld-elf

Author: None (ampandey-1995)

Changes

LLVM Symbolizer attempt to symbolize addresses of optimized binaries reports missing line numbers for some cases. It maybe due to compiler which sometimes cannot map an instruction to line number due to optimizations. Symbolizer should handle those cases gracefully.

Adding an option '-approximate-line-info=<before/after>' to symbolizer so as to report the nearest non-zero line number.


Full diff: https://github.com/llvm/llvm-project/pull/82240.diff

12 Files Affected:

  • (modified) bolt/lib/Core/BinaryFunction.cpp (+2-1)
  • (modified) lld/Common/DWARF.cpp (+2-1)
  • (modified) llvm/docs/CommandGuide/llvm-symbolizer.rst (+4)
  • (modified) llvm/include/llvm/DebugInfo/DIContext.h (+5-3)
  • (modified) llvm/include/llvm/DebugInfo/DWARF/DWARFDebugLine.h (+10-6)
  • (modified) llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h (+2)
  • (modified) llvm/lib/DebugInfo/DWARF/DWARFContext.cpp (+5-5)
  • (modified) llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp (+28-7)
  • (modified) llvm/lib/DebugInfo/Symbolize/Symbolize.cpp (+6-2)
  • (added) llvm/test/tools/llvm-symbolizer/approximate-line-info.s (+142)
  • (modified) llvm/tools/llvm-symbolizer/Opts.td (+1)
  • (modified) llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp (+8)
diff --git a/bolt/lib/Core/BinaryFunction.cpp b/bolt/lib/Core/BinaryFunction.cpp
index 54f2f9d972a461..0b37eab302cdc9 100644
--- a/bolt/lib/Core/BinaryFunction.cpp
+++ b/bolt/lib/Core/BinaryFunction.cpp
@@ -192,7 +192,8 @@ static SMLoc findDebugLineInformationForInstructionAt(
 
   SMLoc NullResult = DebugLineTableRowRef::NULL_ROW.toSMLoc();
   uint32_t RowIndex = LineTable->lookupAddress(
-      {Address, object::SectionedAddress::UndefSection});
+      {Address, object::SectionedAddress::UndefSection},
+      DILineInfoSpecifier::ApproximateLineKind::None);
   if (RowIndex == LineTable->UnknownRowIndex)
     return NullResult;
 
diff --git a/lld/Common/DWARF.cpp b/lld/Common/DWARF.cpp
index 2cd8ca4575dee5..8e1e9c6e530157 100644
--- a/lld/Common/DWARF.cpp
+++ b/lld/Common/DWARF.cpp
@@ -93,7 +93,8 @@ std::optional<DILineInfo> DWARFCache::getDILineInfo(uint64_t offset,
   DILineInfo info;
   for (const llvm::DWARFDebugLine::LineTable *lt : lineTables) {
     if (lt->getFileLineInfoForAddress(
-            {offset, sectionIndex}, nullptr,
+            {offset, sectionIndex},
+            DILineInfoSpecifier::ApproximateLineKind::None, nullptr,
             DILineInfoSpecifier::FileLineInfoKind::AbsoluteFilePath, info))
       return info;
   }
diff --git a/llvm/docs/CommandGuide/llvm-symbolizer.rst b/llvm/docs/CommandGuide/llvm-symbolizer.rst
index 59c0ab6d196ace..ee60d97babcbc7 100644
--- a/llvm/docs/CommandGuide/llvm-symbolizer.rst
+++ b/llvm/docs/CommandGuide/llvm-symbolizer.rst
@@ -216,6 +216,10 @@ OPTIONS
   This can be used to perform lookups as if the object were relocated by the
   offset.
 
+.. option:: --approximate-line-info=<before/after>
+
+  Search the object to find the approximate non-zero line numbers nearest to for a given address.
+
 .. option:: --basenames, -s
 
   Print just the file's name without any directories, instead of the
diff --git a/llvm/include/llvm/DebugInfo/DIContext.h b/llvm/include/llvm/DebugInfo/DIContext.h
index 288ddf77bdfda7..d3a625b31d2060 100644
--- a/llvm/include/llvm/DebugInfo/DIContext.h
+++ b/llvm/include/llvm/DebugInfo/DIContext.h
@@ -152,14 +152,16 @@ struct DILineInfoSpecifier {
     RelativeFilePath,
     AbsoluteFilePath
   };
+  enum ApproximateLineKind { None, Before, After };
   using FunctionNameKind = DINameKind;
-
   FileLineInfoKind FLIKind;
   FunctionNameKind FNKind;
+  ApproximateLineKind ALKind;
 
   DILineInfoSpecifier(FileLineInfoKind FLIKind = FileLineInfoKind::RawValue,
-                      FunctionNameKind FNKind = FunctionNameKind::None)
-      : FLIKind(FLIKind), FNKind(FNKind) {}
+                      FunctionNameKind FNKind = FunctionNameKind::None,
+                      ApproximateLineKind ALKind = ApproximateLineKind::None)
+      : FLIKind(FLIKind), FNKind(FNKind), ALKind(ALKind) {}
 
   inline bool operator==(const DILineInfoSpecifier &RHS) const {
     return FLIKind == RHS.FLIKind && FNKind == RHS.FNKind;
diff --git a/llvm/include/llvm/DebugInfo/DWARF/DWARFDebugLine.h b/llvm/include/llvm/DebugInfo/DWARF/DWARFDebugLine.h
index ce3bae6a1760c2..cb3531b75730f1 100644
--- a/llvm/include/llvm/DebugInfo/DWARF/DWARFDebugLine.h
+++ b/llvm/include/llvm/DebugInfo/DWARF/DWARFDebugLine.h
@@ -240,7 +240,9 @@ class DWARFDebugLine {
 
     /// Returns the index of the row with file/line info for a given address,
     /// or UnknownRowIndex if there is no such row.
-    uint32_t lookupAddress(object::SectionedAddress Address) const;
+    uint32_t
+    lookupAddress(object::SectionedAddress Address,
+                  DILineInfoSpecifier::ApproximateLineKind LineKind) const;
 
     bool lookupAddressRange(object::SectionedAddress Address, uint64_t Size,
                             std::vector<uint32_t> &Result) const;
@@ -266,10 +268,10 @@ class DWARFDebugLine {
 
     /// Fills the Result argument with the file and line information
     /// corresponding to Address. Returns true on success.
-    bool getFileLineInfoForAddress(object::SectionedAddress Address,
-                                   const char *CompDir,
-                                   DILineInfoSpecifier::FileLineInfoKind Kind,
-                                   DILineInfo &Result) const;
+    bool getFileLineInfoForAddress(
+        object::SectionedAddress Address,
+        DILineInfoSpecifier::ApproximateLineKind LineKind, const char *CompDir,
+        DILineInfoSpecifier::FileLineInfoKind Kind, DILineInfo &Result) const;
 
     /// Extracts directory name by its Entry in include directories table
     /// in prologue. Returns true on success.
@@ -301,7 +303,9 @@ class DWARFDebugLine {
     getSourceByIndex(uint64_t FileIndex,
                      DILineInfoSpecifier::FileLineInfoKind Kind) const;
 
-    uint32_t lookupAddressImpl(object::SectionedAddress Address) const;
+    uint32_t
+    lookupAddressImpl(object::SectionedAddress Address,
+                      DILineInfoSpecifier::ApproximateLineKind LineKind) const;
 
     bool lookupAddressRangeImpl(object::SectionedAddress Address, uint64_t Size,
                                 std::vector<uint32_t> &Result) const;
diff --git a/llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h b/llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h
index 11a169cfc20a69..7b560f4b7dbb2f 100644
--- a/llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h
+++ b/llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h
@@ -44,6 +44,7 @@ using namespace object;
 
 using FunctionNameKind = DILineInfoSpecifier::FunctionNameKind;
 using FileLineInfoKind = DILineInfoSpecifier::FileLineInfoKind;
+using ApproximateLineKind = DILineInfoSpecifier::ApproximateLineKind;
 
 class CachedBinary;
 
@@ -52,6 +53,7 @@ class LLVMSymbolizer {
   struct Options {
     FunctionNameKind PrintFunctions = FunctionNameKind::LinkageName;
     FileLineInfoKind PathStyle = FileLineInfoKind::AbsoluteFilePath;
+    ApproximateLineKind ApproximateLineNumbers = ApproximateLineKind::None;
     bool UseSymbolTable = true;
     bool Demangle = true;
     bool RelativeAddresses = false;
diff --git a/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp b/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
index b7297c18da7c99..9bf7dbd0acc109 100644
--- a/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
+++ b/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
@@ -1742,8 +1742,8 @@ DILineInfo DWARFContext::getLineInfoForAddress(object::SectionedAddress Address,
   if (Spec.FLIKind != FileLineInfoKind::None) {
     if (const DWARFLineTable *LineTable = getLineTableForUnit(CU)) {
       LineTable->getFileLineInfoForAddress(
-          {Address.Address, Address.SectionIndex}, CU->getCompilationDir(),
-          Spec.FLIKind, Result);
+          {Address.Address, Address.SectionIndex}, Spec.ALKind,
+          CU->getCompilationDir(), Spec.FLIKind, Result);
     }
   }
 
@@ -1838,7 +1838,7 @@ DWARFContext::getInliningInfoForAddress(object::SectionedAddress Address,
       DILineInfo Frame;
       LineTable = getLineTableForUnit(CU);
       if (LineTable && LineTable->getFileLineInfoForAddress(
-                           {Address.Address, Address.SectionIndex},
+                           {Address.Address, Address.SectionIndex}, Spec.ALKind,
                            CU->getCompilationDir(), Spec.FLIKind, Frame))
         InliningInfo.addFrame(Frame);
     }
@@ -1865,8 +1865,8 @@ DWARFContext::getInliningInfoForAddress(object::SectionedAddress Address,
         // For the topmost routine, get file/line info from line table.
         if (LineTable)
           LineTable->getFileLineInfoForAddress(
-              {Address.Address, Address.SectionIndex}, CU->getCompilationDir(),
-              Spec.FLIKind, Frame);
+              {Address.Address, Address.SectionIndex}, Spec.ALKind,
+              CU->getCompilationDir(), Spec.FLIKind, Frame);
       } else {
         // Otherwise, use call file, call line and call column from
         // previous DIE in inlined chain.
diff --git a/llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp b/llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp
index 28f05644a3aa11..c6baad8ee9b5ea 100644
--- a/llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp
+++ b/llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp
@@ -1297,10 +1297,11 @@ uint32_t DWARFDebugLine::LineTable::findRowInSeq(
 }
 
 uint32_t DWARFDebugLine::LineTable::lookupAddress(
-    object::SectionedAddress Address) const {
+    object::SectionedAddress Address,
+    DILineInfoSpecifier::ApproximateLineKind LineKind) const {
 
   // Search for relocatable addresses
-  uint32_t Result = lookupAddressImpl(Address);
+  uint32_t Result = lookupAddressImpl(Address, LineKind);
 
   if (Result != UnknownRowIndex ||
       Address.SectionIndex == object::SectionedAddress::UndefSection)
@@ -1308,11 +1309,12 @@ uint32_t DWARFDebugLine::LineTable::lookupAddress(
 
   // Search for absolute addresses
   Address.SectionIndex = object::SectionedAddress::UndefSection;
-  return lookupAddressImpl(Address);
+  return lookupAddressImpl(Address, LineKind);
 }
 
 uint32_t DWARFDebugLine::LineTable::lookupAddressImpl(
-    object::SectionedAddress Address) const {
+    object::SectionedAddress Address,
+    DILineInfoSpecifier::ApproximateLineKind LineKind) const {
   // First, find an instruction sequence containing the given address.
   DWARFDebugLine::Sequence Sequence;
   Sequence.SectionIndex = Address.SectionIndex;
@@ -1321,7 +1323,24 @@ uint32_t DWARFDebugLine::LineTable::lookupAddressImpl(
                                       DWARFDebugLine::Sequence::orderByHighPC);
   if (It == Sequences.end() || It->SectionIndex != Address.SectionIndex)
     return UnknownRowIndex;
-  return findRowInSeq(*It, Address);
+
+  uint32_t RowIndex = findRowInSeq(*It, Address);
+  if (LineKind == DILineInfoSpecifier::ApproximateLineKind::Before) {
+    for (auto SeqInst = Sequence.HighPC; SeqInst >= It->LowPC; --SeqInst) {
+      if (Rows[RowIndex].Line)
+        break;
+      Address.Address--;
+      RowIndex = findRowInSeq(*It, Address);
+    }
+  } else if (LineKind == DILineInfoSpecifier::ApproximateLineKind::After) {
+    for (auto SeqInst = Sequence.HighPC; SeqInst < It->HighPC; ++SeqInst) {
+      if (Rows[RowIndex].Line)
+        break;
+      Address.Address++;
+      RowIndex = findRowInSeq(*It, Address);
+    }
+  }
+  return RowIndex;
 }
 
 bool DWARFDebugLine::LineTable::lookupAddressRange(
@@ -1461,10 +1480,12 @@ bool DWARFDebugLine::Prologue::getFileNameByIndex(
 }
 
 bool DWARFDebugLine::LineTable::getFileLineInfoForAddress(
-    object::SectionedAddress Address, const char *CompDir,
+    object::SectionedAddress Address,
+    DILineInfoSpecifier::ApproximateLineKind LineKind, const char *CompDir,
     FileLineInfoKind Kind, DILineInfo &Result) const {
   // Get the index of row we're looking for in the line table.
-  uint32_t RowIndex = lookupAddress(Address);
+  uint32_t RowIndex = lookupAddress(Address, LineKind);
+
   if (RowIndex == -1U)
     return false;
   // Take file number and line/column from the row.
diff --git a/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp b/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
index 5f29226c14b705..18be137d91d694 100644
--- a/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
+++ b/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
@@ -71,7 +71,9 @@ LLVMSymbolizer::symbolizeCodeCommon(const T &ModuleSpecifier,
     ModuleOffset.Address += Info->getModulePreferredBase();
 
   DILineInfo LineInfo = Info->symbolizeCode(
-      ModuleOffset, DILineInfoSpecifier(Opts.PathStyle, Opts.PrintFunctions),
+      ModuleOffset,
+      DILineInfoSpecifier(Opts.PathStyle, Opts.PrintFunctions,
+                          Opts.ApproximateLineNumbers),
       Opts.UseSymbolTable);
   if (Opts.Demangle)
     LineInfo.FunctionName = DemangleName(LineInfo.FunctionName, Info);
@@ -116,7 +118,9 @@ Expected<DIInliningInfo> LLVMSymbolizer::symbolizeInlinedCodeCommon(
     ModuleOffset.Address += Info->getModulePreferredBase();
 
   DIInliningInfo InlinedContext = Info->symbolizeInlinedCode(
-      ModuleOffset, DILineInfoSpecifier(Opts.PathStyle, Opts.PrintFunctions),
+      ModuleOffset,
+      DILineInfoSpecifier(Opts.PathStyle, Opts.PrintFunctions,
+                          Opts.ApproximateLineNumbers),
       Opts.UseSymbolTable);
   if (Opts.Demangle) {
     for (int i = 0, n = InlinedContext.getNumberOfFrames(); i < n; i++) {
diff --git a/llvm/test/tools/llvm-symbolizer/approximate-line-info.s b/llvm/test/tools/llvm-symbolizer/approximate-line-info.s
new file mode 100644
index 00000000000000..b7d56b0e64534c
--- /dev/null
+++ b/llvm/test/tools/llvm-symbolizer/approximate-line-info.s
@@ -0,0 +1,142 @@
+# REQUIRES: x86-registered-target
+
+# RUN: llvm-mc -g -filetype=obj -triple=x86_64-pc-linux %s -o %t.o
+# RUN: llvm-symbolizer --obj=%t.o 0xa | FileCheck --check-prefix=APPROX-NONE %s
+# RUN: llvm-symbolizer --obj=%t.o --approximate-line-info=before 0xa | FileCheck --check-prefix=APPROX-BEFORE %s
+# RUN: llvm-symbolizer --obj=%t.o --approximate-line-info=after 0xa | FileCheck --check-prefix=APPROX-AFTER %s
+
+# APPROX-NONE: main
+# APPROX-NONE-NEXT: /home/ampandey/test-hip/main.c:0:6
+# APPROX-BEFORE: main
+# APPROX-BEFORE-NEXT: /home/ampandey/test-hip/main.c:4:6
+# APPROX-AFTER: main
+# APPROX-AFTER-NEXT: /home/ampandey/test-hip/main.c:8:2
+
+## Generated from C Code
+##
+## int foo = 0;
+## int x=89;
+## int main() {
+## if(x)
+##  return foo;
+## else
+##  return x;
+## }
+##
+## clang -S -O3 -gline-tables-only --target=x86_64-pc-linux
+
+	.text
+	.file	"main.c"
+	.globl	main                            # -- Begin function main
+	.p2align	4, 0x90
+	.type	main,@function
+main:                                   # @main
+.Lfunc_begin0:
+	.file	0 "/home/ampandey/test-hip" "main.c" md5 0x26c3fbaea8e6febaf09ef44d37ec5ecc
+	.cfi_startproc
+# %bb.0:                                # %entry
+	.loc	0 4 6 prologue_end              # main.c:4:6
+	movl	x(%rip), %eax
+	testl	%eax, %eax
+	je	.LBB0_2
+# %bb.1:                                # %entry
+	.loc	0 0 6 is_stmt 0                 # main.c:0:6
+	movl	foo(%rip), %eax
+.LBB0_2:                                # %entry
+	.loc	0 8 2 is_stmt 1                 # main.c:8:2
+	retq
+.Ltmp0:
+.Lfunc_end0:
+	.size	main, .Lfunc_end0-main
+	.cfi_endproc
+                                        # -- End function
+	.type	foo,@object                     # @foo
+	.bss
+	.globl	foo
+	.p2align	2, 0x0
+foo:
+	.long	0                               # 0x0
+	.size	foo, 4
+
+	.type	x,@object                       # @x
+	.data
+	.globl	x
+	.p2align	2, 0x0
+x:
+	.long	89                              # 0x59
+	.size	x, 4
+
+	.section	.debug_abbrev,"",@progbits
+	.byte	1                               # Abbreviation Code
+	.byte	17                              # DW_TAG_compile_unit
+	.byte	0                               # DW_CHILDREN_no
+	.byte	37                              # DW_AT_producer
+	.byte	37                              # DW_FORM_strx1
+	.byte	19                              # DW_AT_language
+	.byte	5                               # DW_FORM_data2
+	.byte	3                               # DW_AT_name
+	.byte	37                              # DW_FORM_strx1
+	.byte	114                             # DW_AT_str_offsets_base
+	.byte	23                              # DW_FORM_sec_offset
+	.byte	16                              # DW_AT_stmt_list
+	.byte	23                              # DW_FORM_sec_offset
+	.byte	27                              # DW_AT_comp_dir
+	.byte	37                              # DW_FORM_strx1
+	.byte	17                              # DW_AT_low_pc
+	.byte	27                              # DW_FORM_addrx
+	.byte	18                              # DW_AT_high_pc
+	.byte	6                               # DW_FORM_data4
+	.byte	115                             # DW_AT_addr_base
+	.byte	23                              # DW_FORM_sec_offset
+	.byte	0                               # EOM(1)
+	.byte	0                               # EOM(2)
+	.byte	0                               # EOM(3)
+	.section	.debug_info,"",@progbits
+.Lcu_begin0:
+	.long	.Ldebug_info_end0-.Ldebug_info_start0 # Length of Unit
+.Ldebug_info_start0:
+	.short	5                               # DWARF version number
+	.byte	1                               # DWARF Unit Type
+	.byte	8                               # Address Size (in bytes)
+	.long	.debug_abbrev                   # Offset Into Abbrev. Section
+	.byte	1                               # Abbrev [1] 0xc:0x17 DW_TAG_compile_unit
+	.byte	0                               # DW_AT_producer
+	.short	29                              # DW_AT_language
+	.byte	1                               # DW_AT_name
+	.long	.Lstr_offsets_base0             # DW_AT_str_offsets_base
+	.long	.Lline_table_start0             # DW_AT_stmt_list
+	.byte	2                               # DW_AT_comp_dir
+	.byte	0                               # DW_AT_low_pc
+	.long	.Lfunc_end0-.Lfunc_begin0       # DW_AT_high_pc
+	.long	.Laddr_table_base0              # DW_AT_addr_base
+.Ldebug_info_end0:
+	.section	.debug_str_offsets,"",@progbits
+	.long	16                              # Length of String Offsets Set
+	.short	5
+	.short	0
+.Lstr_offsets_base0:
+	.section	.debug_str,"MS",@progbits,1
+.Linfo_string0:
+	.asciz	"clang version 19.0.0git (git@github.com:ampandey-1995/llvm-project.git 6751baed8d1ee8c5fd12fe5a06aa67275fc1ebf6)" # string offset=0
+.Linfo_string1:
+	.asciz	"main.c"                        # string offset=113
+.Linfo_string2:
+	.asciz	"/home/ampandey/test-hip"       # string offset=120
+	.section	.debug_str_offsets,"",@progbits
+	.long	.Linfo_string0
+	.long	.Linfo_string1
+	.long	.Linfo_string2
+	.section	.debug_addr,"",@progbits
+	.long	.Ldebug_addr_end0-.Ldebug_addr_start0 # Length of contribution
+.Ldebug_addr_start0:
+	.short	5                               # DWARF version number
+	.byte	8                               # Address size
+	.byte	0                               # Segment selector size
+.Laddr_table_base0:
+	.quad	.Lfunc_begin0
+.Ldebug_addr_end0:
+	.ident	"clang version 19.0.0git (git@github.com:ampandey-1995/llvm-project.git 6751baed8d1ee8c5fd12fe5a06aa67275fc1ebf6)"
+	.section	".note.GNU-stack","",@progbits
+	.addrsig
+	.section	.debug_line,"",@progbits
+.Lline_table_start0:
diff --git a/llvm/tools/llvm-symbolizer/Opts.td b/llvm/tools/llvm-symbolizer/Opts.td
index edc80bfe59673b..80ec4721c45e00 100644
--- a/llvm/tools/llvm-symbolizer/Opts.td
+++ b/llvm/tools/llvm-symbolizer/Opts.td
@@ -17,6 +17,7 @@ def grp_mach_o : OptionGroup<"kind">,
                  HelpText<"llvm-symbolizer Mach-O Specific Options">;
 
 def addresses : F<"addresses", "Show address before line information">;
+defm approximate_line_info : Eq<"approximate-line-info","Find approximate non-zero line number information nearest to given address.">,Values<"<before/after>">;
 defm adjust_vma
     : Eq<"adjust-vma", "Add specified offset to object file addresses">,
       MetaVarName<"<offset>">;
diff --git a/llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp b/llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
index b98bdbc388faf2..530dbdfd5c8b5e 100644
--- a/llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
+++ b/llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
@@ -482,6 +482,14 @@ int llvm_symbolizer_main(int argc, char **argv, const llvm::ToolContext &) {
   } else {
     Opts.PathStyle = DILineInfoSpecifier::FileLineInfoKind::AbsoluteFilePath;
   }
+  StringRef ApproximateLineKindVal =
+      Args.getLastArgValue(OPT_approximate_line_info_EQ);
+  Opts.ApproximateLineNumbers =
+      ApproximateLineKindVal == "before"
+          ? DILineInfoSpecifier::ApproximateLineKind::Before
+      : ApproximateLineKindVal == "after"
+          ? DILineInfoSpecifier::ApproximateLineKind::After
+          : DILineInfoSpecifier::ApproximateLineKind::None;
   Opts.DebugFileDirectory = Args.getAllArgValues(OPT_debug_file_directory_EQ);
   Opts.DefaultArch = Args.getLastArgValue(OPT_default_arch_EQ).str();
   Opts.Demangle = Args.hasFlag(OPT_demangle, OPT_no_demangle, !IsAddr2Line);

@jh7370
Copy link
Collaborator

jh7370 commented Feb 19, 2024

Hi @ampandey-1995, how does this relate to #71032?

@ampandey-1995
Copy link
Contributor Author

ampandey-1995 commented Feb 19, 2024

Hi @ampandey-1995, how does this relate to #71032?

This patch is similar to the functionality achieved in 71032 and somewhat refined based on comments of @jh7370 & @dwblaikie .

@jh7370
Copy link
Collaborator

jh7370 commented Feb 19, 2024

Hi @ampandey-1995, how does this relate to #71032?

This patch is similar to the functionality achieved in 71032 and somewhat refined based on comments of @jh7370 & @dwblaikie .

Okay, is there a reason you haven't just updated that PR rather than create an entirely new one?

@ampandey-1995
Copy link
Contributor Author

ampandey-1995 commented Feb 19, 2024

Hi @ampandey-1995, how does this relate to #71032?

This patch is similar to the functionality achieved in 71032 and somewhat refined based on comments of @jh7370 & @dwblaikie .

Okay, is there a reason you haven't just updated that PR rather than create an entirely new one?

Apologies, I will close the old PR.

@gbreynoo
Copy link
Collaborator

Hi @ampandey-1995, I'm glad to see this come up again.

There was some discussion in the previous PR that we didn't get to the bottom of, so I'll state it here:
@jh7370 mentioned the potential different methods of "Use of the last address before the current one with a non-zero line value" vs "The last line table entry before the one for the specified address, with a non-zero line value". See #71032 (comment).

I think the llvm-symbolizer command guide should clearly specify the method used to derive the line number estimation. Maybe the help output as well.

There is also the question of would it be useful to have both methods available to the user, as outlined by @jh7370 in #71032 (comment). I'm torn as although I see the value in giving the user the option, in most cases just outputting the previous entry in the line table would probably be good enough for an estimate and save making this too complicated.

I also think it should be clearer in the output of llvm-symbolizer when it is an approximate output value. You can input multiple addresses into llvm-symbolizer in one invocation so it would be useful to see which outputs required an estimate vs which are accurate.

@ampandey-1995 ampandey-1995 changed the title Support for Missing Line Numbers. [Symbolizer] Support for Missing Line Numbers. Feb 19, 2024
@ampandey-1995
Copy link
Contributor Author

ampandey-1995 commented Feb 22, 2024

Hi @ampandey-1995, I'm glad to see this come up again.

Thanks @gbreynoo for reviewing the patch.

There was some discussion in the previous PR that we didn't get to the bottom of, so I'll state it here: @jh7370 mentioned the potential different methods of "Use of the last address before the current one with a non-zero line value" vs "The last line table entry before the one for the specified address, with a non-zero line value". See #71032 (comment).

Thanks again for pointing the comments of @jh7370. I think this patch is somewhat related to querying of line table entries(Second method) for extracting significant line information.

Previously , the patch 71032 was based on the approach of <incrementing/decrementing> address from the address having no line information but that approach dosen't fit well since querying llvm-symbolizer for every address used a lot of symbolizer API's calls invoking DWARF API's to extract line information. Also, if no debug information is present in the object then llvm-symbolizer will keep calling into DWARF API's as we don't get any non-zero line information which sometimes hangs the llvm-symbolizer tool itself.

The current patch tries to query the address(having no line information) by introspecting the line table having bounds [lowPC,highPC]. The search happens usually from [lowPC,SearchPC] if "before" is mentioned or from [SearchPC,highPC] if "after" is mentioned as value of option --approximate-line-info.

I think the llvm-symbolizer command guide should clearly specify the method used to derive the line number estimation. Maybe the help output as well.

Ok I will update the command guide & help output.

There is also the question of would it be useful to have both methods available to the user, as outlined by @jh7370 in #71032 (comment). I'm torn as although I see the value in giving the user the option, in most cases just outputting the previous entry in the line table would probably be good enough for an estimate and save making this too complicated.

I agree with you about the approach of querying line table as a good estimation method and also in terms of performance.

I also think it should be clearer in the output of llvm-symbolizer when it is an approximate output value. You can input multiple addresses into llvm-symbolizer in one invocation so it would be useful to see which outputs required an estimate vs which are accurate.

Yeah, Thanks will do that. Is it ok to attach a tag such as (approximate) similar to (inlined by) at the end of Line:Column information?

@gbreynoo
Copy link
Collaborator

I also think it should be clearer in the output of llvm-symbolizer when it is an approximate output value. You can input multiple addresses into llvm-symbolizer in one invocation so it would be useful to see which outputs required an estimate vs which are accurate.

Yeah, Thanks will do that. Is it ok to attach a tag such as (approximate) similar to (inlined by) at the end of Line:Column information?

That sounds good to me. With there being no equivalent functionality in addr2line to follow, I think you are right to follow the inline output behavior.

ampandey-AMD and others added 20 commits July 30, 2024 14:42
LLVM Symbolizer attempt to symbolize addresses of optimized binaries
reports missing line numbers for some cases. It maybe due to compiler
which sometimes cannot map an instruction to line number due to
optimizations. Symbolizer should handle those cases gracefully.

Adding an option '--skip-line-zero' to symbolizer so as to report the
nearest non-zero line number.
Update the command guide,help output.  Adding (approximate) tag to for
approximated line information output.
1. Simplifying loop logic.
2. Adding logic to search non-zero line information only upto function
   boundaries.
1. Add approximate-line-handcrafted.s test case.
1. Add test case approximate-line-handcrafted.s.
2. Add descriptive comments for RUN lines.
1. Remove check-lines 'APPROX-*'.
2. Add new check 'MULTIPLE-ROWS'.
1. Remove .debug_str section.
2. Remove non-CU DIE's.
Copy link
Collaborator

@dwblaikie dwblaikie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, with one nit/requested assertion.

Thanks again for your patience/persistence.

llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp Show resolved Hide resolved
@dwblaikie
Copy link
Collaborator

Looks good - can you commit this yourself, or do you need someone to commit it on your behalf?

@ampandey-1995
Copy link
Contributor Author

ampandey-1995 commented Aug 5, 2024

I have a commit access to llvm-project in this github account.

@dwblaikie Thanks again for the approval.

I'll merge after Pre CI passes.

@ampandey-1995 ampandey-1995 merged commit 0886440 into llvm:main Aug 5, 2024
8 checks passed
banach-space pushed a commit to banach-space/llvm-project that referenced this pull request Aug 7, 2024
LLVM Symbolizer attempt to symbolize addresses of optimized binaries
reports missing line numbers for some cases. It maybe due to compiler
which sometimes cannot map an instruction to line number due to
optimizations. Symbolizer should handle those cases gracefully.

Adding an option '--skip-line-zero' to symbolizer so as to report the
nearest non-zero line number.

---------

Co-authored-by: Amit Pandey <amit.pandey@amd.com>
kstoimenov pushed a commit to kstoimenov/llvm-project that referenced this pull request Aug 15, 2024
LLVM Symbolizer attempt to symbolize addresses of optimized binaries
reports missing line numbers for some cases. It maybe due to compiler
which sometimes cannot map an instruction to line number due to
optimizations. Symbolizer should handle those cases gracefully.

Adding an option '--skip-line-zero' to symbolizer so as to report the
nearest non-zero line number.

---------

Co-authored-by: Amit Pandey <amit.pandey@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants