-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[gsymutil] Fix how invalid LLVM_stmt_seq is set and used #164015
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-debuginfo Author: Peter Rong (DataCorrupted) ChangesPreviously, invalid offset is set to UINT64_MAX, this is not right when DWARF32, which leads to incorrect debug into in GSYM, the branch:
will always be true. Patch is 44.09 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/164015.diff 4 Files Affected:
diff --git a/llvm/include/llvm/BinaryFormat/Dwarf.h b/llvm/include/llvm/BinaryFormat/Dwarf.h
index 5039a3fe7ecc7..8b5c895788371 100644
--- a/llvm/include/llvm/BinaryFormat/Dwarf.h
+++ b/llvm/include/llvm/BinaryFormat/Dwarf.h
@@ -173,7 +173,7 @@ enum DecimalSignEncoding {
};
enum EndianityEncoding {
- // Endianity attribute values
+// Endianity attribute values
#define HANDLE_DW_END(ID, NAME) DW_END_##NAME = ID,
#include "llvm/BinaryFormat/Dwarf.def"
DW_END_lo_user = 0x40,
@@ -1128,6 +1128,9 @@ struct FormParams {
uint8_t getDwarfOffsetByteSize() const {
return dwarf::getDwarfOffsetByteSize(Format);
}
+ inline uint64_t getDwarfMaxOffset() const {
+ return (getDwarfOffsetByteSize() == 4) ? UINT32_MAX : UINT64_MAX;
+ }
explicit operator bool() const { return Version && AddrSize; }
};
diff --git a/llvm/lib/DWARFLinker/Classic/DWARFLinker.cpp b/llvm/lib/DWARFLinker/Classic/DWARFLinker.cpp
index 8052773812a2c..8637b55c78f9c 100644
--- a/llvm/lib/DWARFLinker/Classic/DWARFLinker.cpp
+++ b/llvm/lib/DWARFLinker/Classic/DWARFLinker.cpp
@@ -2427,11 +2427,13 @@ void DWARFLinker::DIECloner::generateLineTableForUnit(CompileUnit &Unit) {
uint64_t OrigStmtSeq = StmtSeq.get();
// 1. Get the original row index from the stmt list offset.
auto OrigRowIter = SeqOffToOrigRow.find(OrigStmtSeq);
+ const uint64_t InvalidOffset =
+ Unit.getOrigUnit().getFormParams().getDwarfMaxOffset();
// Check whether we have an output sequence for the StmtSeq offset.
// Some sequences are discarded by the DWARFLinker if they are invalid
// (empty).
if (OrigRowIter == SeqOffToOrigRow.end()) {
- StmtSeq.set(UINT64_MAX);
+ StmtSeq.set(InvalidOffset);
continue;
}
size_t OrigRowIndex = OrigRowIter->second;
@@ -2441,7 +2443,7 @@ void DWARFLinker::DIECloner::generateLineTableForUnit(CompileUnit &Unit) {
if (NewRowIter == OrigRowToNewRow.end()) {
// If the original row index is not found in the map, update the
// stmt_sequence attribute to the 'invalid offset' magic value.
- StmtSeq.set(UINT64_MAX);
+ StmtSeq.set(InvalidOffset);
continue;
}
diff --git a/llvm/lib/DebugInfo/GSYM/DwarfTransformer.cpp b/llvm/lib/DebugInfo/GSYM/DwarfTransformer.cpp
index fa39603437dd9..2d723fa91a164 100644
--- a/llvm/lib/DebugInfo/GSYM/DwarfTransformer.cpp
+++ b/llvm/lib/DebugInfo/GSYM/DwarfTransformer.cpp
@@ -82,7 +82,6 @@ struct llvm::gsym::CUInfo {
}
};
-
static DWARFDie GetParentDeclContextDIE(DWARFDie &Die) {
if (DWARFDie SpecDie =
Die.getAttributeValueAsReferencedDie(dwarf::DW_AT_specification)) {
@@ -170,7 +169,7 @@ getQualifiedNameIndex(DWARFDie &Die, uint64_t Language, GsymCreator &Gsym) {
// templates
if (ParentName.front() == '<' && ParentName.back() == '>')
Name = "{" + ParentName.substr(1, ParentName.size() - 2).str() + "}" +
- "::" + Name;
+ "::" + Name;
else
Name = ParentName.str() + "::" + Name;
}
@@ -320,12 +319,16 @@ static void convertFunctionLineTable(OutputAggregator &Out, CUInfo &CUI,
// Attempt to retrieve DW_AT_LLVM_stmt_sequence if present.
std::optional<uint64_t> StmtSeqOffset;
if (auto StmtSeqAttr = Die.find(llvm::dwarf::DW_AT_LLVM_stmt_sequence)) {
- // The `DW_AT_LLVM_stmt_sequence` attribute might be set to `UINT64_MAX`
- // when it refers to an empty line sequence. In such cases, the DWARF linker
- // will exclude the empty sequence from the final output and assign
- // `UINT64_MAX` to the `DW_AT_LLVM_stmt_sequence` attribute.
- uint64_t StmtSeqVal = dwarf::toSectionOffset(StmtSeqAttr, UINT64_MAX);
- if (StmtSeqVal != UINT64_MAX)
+ // The `DW_AT_LLVM_stmt_sequence` attribute might be set to an invalid
+ // sentinel value when it refers to an empty line sequence. In such cases,
+ // the DWARF linker will exclude the empty sequence from the final output
+ // and assign the sentinel value to the `DW_AT_LLVM_stmt_sequence`
+ // attribute. The sentinel value is UINT32_MAX for DWARF32 and UINT64_MAX
+ // for DWARF64.
+ const uint64_t InvalidOffset =
+ Die.getDwarfUnit()->getFormParams().getDwarfMaxOffset();
+ uint64_t StmtSeqVal = dwarf::toSectionOffset(StmtSeqAttr, InvalidOffset);
+ if (StmtSeqVal != InvalidOffset)
StmtSeqOffset = StmtSeqVal;
}
@@ -436,7 +439,7 @@ static void convertFunctionLineTable(OutputAggregator &Out, CUInfo &CUI,
// Skip multiple line entries for the same file and line.
auto LastLE = FI.OptLineTable->last();
if (LastLE && LastLE->File == FileIdx && LastLE->Line == Row.Line)
- continue;
+ continue;
// Only push a row if it isn't an end sequence. End sequence markers are
// included for the last address in a function or the last contiguous
// address in a sequence.
@@ -726,8 +729,8 @@ llvm::Error DwarfTransformer::verify(StringRef GsymPath,
for (uint32_t I = 0; I < NumAddrs; ++I) {
auto FuncAddr = Gsym->getAddress(I);
if (!FuncAddr)
- return createStringError(std::errc::invalid_argument,
- "failed to extract address[%i]", I);
+ return createStringError(std::errc::invalid_argument,
+ "failed to extract address[%i]", I);
auto FI = Gsym->getFunctionInfo(*FuncAddr);
if (!FI)
@@ -742,8 +745,7 @@ llvm::Error DwarfTransformer::verify(StringRef GsymPath,
if (!LR)
return LR.takeError();
- auto DwarfInlineInfos =
- DICtx.getInliningInfoForAddress(SectAddr, DLIS);
+ auto DwarfInlineInfos = DICtx.getInliningInfoForAddress(SectAddr, DLIS);
uint32_t NumDwarfInlineInfos = DwarfInlineInfos.getNumberOfFrames();
if (NumDwarfInlineInfos == 0) {
DwarfInlineInfos.addFrame(
@@ -781,8 +783,7 @@ llvm::Error DwarfTransformer::verify(StringRef GsymPath,
continue;
}
- for (size_t Idx = 0, count = LR->Locations.size(); Idx < count;
- ++Idx) {
+ for (size_t Idx = 0, count = LR->Locations.size(); Idx < count; ++Idx) {
const auto &gii = LR->Locations[Idx];
if (Idx < NumDwarfInlineInfos) {
const auto &dii = DwarfInlineInfos.getFrame(Idx);
diff --git a/llvm/test/tools/llvm-gsymutil/ARM_AArch64/macho-gsym-merged-callsites-dsym.yaml b/llvm/test/tools/llvm-gsymutil/ARM_AArch64/macho-gsym-merged-callsites-dsym.yaml
index 42ef7f301e536..c2b4ef00d7bd2 100644
--- a/llvm/test/tools/llvm-gsymutil/ARM_AArch64/macho-gsym-merged-callsites-dsym.yaml
+++ b/llvm/test/tools/llvm-gsymutil/ARM_AArch64/macho-gsym-merged-callsites-dsym.yaml
@@ -4,6 +4,7 @@
# RUN: split-file %s %t
# RUN: yaml2obj %t/merged_callsites.dSYM.yaml -o %t/merged_callsites.dSYM
+# The object file is manually tempered with such that the LLVM_stmt_seq of function_bad_stmt_seq is 0xFFFFFFFF.
# RUN: llvm-gsymutil --num-threads=1 --convert=%t/merged_callsites.dSYM --merged-functions --callsites-yaml-file=%t/callsites.yaml -o %t/call_sites_dSYM.gsym
# RUN: llvm-gsymutil --num-threads=1 --convert=%t/merged_callsites.dSYM --merged-functions --dwarf-callsites -o %t/dwarf_call_sites_dSYM.gsym
@@ -36,7 +37,13 @@
# CHECK-MERGED-CALLSITES: CallSites (by relative return offset):
# CHECK-MERGED-CALLSITES-NEXT: 0x[[#%.4x,]] Flags[None] MatchRegex[function2_copy1]
-# CHECK-MERGED-CALLSITES: FunctionInfo @ 0x[[#%x,MAIN:]]: [0x[[#%x,MAIN_START:]] - 0x[[#%x,MAIN_END:]]) "main"
+# If we don't do anything, a function with bad LLVM_stmt_seq won't have any call site filter
+# More importantly, the line number will be at the function definition.
+# CHECK-MERGED-CALLSITES: FunctionInfo @ 0x[[#%x,BAD_STMT_SEQ:]]: [0x[[#%x,BAD_STMT_SEQ_START:]] - 0x[[#%x,BAD_STMT_SEQ_END:]]) "function_bad_stmt_seq"
+# CHECK-MERGED-CALLSITES-NEXT: LineTable:
+# CHECK-MERGED-CALLSITES-NEXT: 0x00000001000003b0 ./merged_funcs_test.cpp:65
+
+# CHECK-MERGED-CALLSITES-NEXT: FunctionInfo @ 0x[[#%x,MAIN:]]: [0x[[#%x,MAIN_START:]] - 0x[[#%x,MAIN_END:]]) "main"
# CHECK-MERGED-CALLSITES: CallSites (by relative return offset):
# CHECK-MERGED-CALLSITES-NEXT: 0x[[#%.4x,]] Flags[None] MatchRegex[function1]
# CHECK-MERGED-CALLSITES-NEXT: 0x[[#%.4x,]] Flags[None] MatchRegex[function2_copy2]
@@ -44,16 +51,17 @@
# CHECK-MERGED-CALLSITES-NEXT: 0x[[#%.4x,]] Flags[None] MatchRegex[function3_copy2]
# CHECK-MERGED-CALLSITES-NEXT: 0x[[#%.4x,]] Flags[None] MatchRegex[function2_copy1]
-
### Check that we can correctly resove merged functions using callstacks:
### Resolve two callstacks containing merged functions.
### We use the value obtained from `CallSites:[FILTER]` to pass to the next call to `llvm-gsymutil` via `--merged-functions-filter`.
### The callstacks resolve differently based on the merged functions filter.
-### 0x00000001000003d0 => 0x000000010000037c => 0x000000010000035c => 0x0000000100000340
-### 0x00000001000003e8 =========================> 0x000000010000035c => 0x0000000100000340
+### 0x00000001000003d8 => 0x000000010000037c => 0x000000010000035c => 0x0000000100000340
+### 0x00000001000003f0 =========================> 0x000000010000035c => 0x0000000100000340
+###
+### We added a new function, for main function, +8 for line numbers, +0x8 for addresses.
-# RUN: llvm-gsymutil %t/dwarf_call_sites_dSYM.gsym --merged-functions --address=0x00000001000003d0 | FileCheck --check-prefix=CHECK-C1 %s
-# CHECK-C1: 0x00000001000003d0: main + 32 @ ./merged_funcs_test.cpp:63
+# RUN: llvm-gsymutil %t/dwarf_call_sites_dSYM.gsym --merged-functions --address=0x00000001000003d8 | FileCheck --check-prefix=CHECK-C1 %s
+# CHECK-C1: 0x00000001000003d8: main + 32 @ ./merged_funcs_test.cpp:71
# CHECK-C1-NEXT: CallSites: function2_copy2
# RUN: llvm-gsymutil %t/dwarf_call_sites_dSYM.gsym --merged-functions --address=0x000000010000037c --merged-functions-filter="function2_copy2" | FileCheck --check-prefix=CHECK-C2 %s
@@ -73,9 +81,9 @@
### ----------------------------------------------------------------------------------------------------------------------------------
### Resolve the 2nd call stack - the 2nd and 3rd addresses are the same but they resolve to a different function because of the filter
-# RUN: llvm-gsymutil %t/dwarf_call_sites_dSYM.gsym --address=0x00000001000003e8 --merged-functions | FileCheck --check-prefix=CHECK-C5 %s
-# CHECK-C5: Found 1 function at address 0x00000001000003e8:
-# CHECK-C5-NEXT: 0x00000001000003e8: main + 56 @ ./merged_funcs_test.cpp:64
+# RUN: llvm-gsymutil %t/dwarf_call_sites_dSYM.gsym --address=0x00000001000003f0 --merged-functions | FileCheck --check-prefix=CHECK-C5 %s
+# CHECK-C5: Found 1 function at address 0x00000001000003f0:
+# CHECK-C5-NEXT: 0x00000001000003f0: main + 56 @ ./merged_funcs_test.cpp:72
# CHECK-C5-NEXT: CallSites: function3_copy2
# RUN: llvm-gsymutil %t/dwarf_call_sites_dSYM.gsym --merged-functions --address=0x000000010000035c --merged-functions-filter="function3_copy2" | FileCheck --check-prefix=CHECK-C6 %s
@@ -87,6 +95,9 @@
# CHECK-C7: Found 1 function at address 0x0000000100000340:
# CHECK-C7-NEXT: 0x0000000100000340: function4_copy2 + 8 @ ./merged_funcs_test.cpp:14
+# RUN: llvm-gsymutil %t/dwarf_call_sites_dSYM.gsym --address=0x00000001000003b4 --merged-functions | FileCheck --check-prefix=CHECK-BAD %s
+# CHECK-BAD: Found 1 function at address 0x00000001000003b4:
+# CHECK-BAD-NEXT: 0x00000001000003b4: function_bad_stmt_seq + 4 @ ./merged_funcs_test.cpp:65
#--- merged_funcs_test.cpp
#define ATTRIB extern "C" __attribute__((noinline))
@@ -148,6 +159,14 @@ ATTRIB int function1(int a) {
return result;
}
+// Intentional multi-line function definition
+ATTRIB
+int function_bad_stmt_seq(
+ int a
+) {
+ return a + 1;
+}
+
int main() {
int sum = 0;
sum += function1(1);
@@ -155,6 +174,7 @@ int main() {
sum += function4_copy2(4);
sum += function3_copy2(41);
sum += function2_copy1(11);
+ sum += function_bad_stmt_seq(sum);
return sum;
}
@@ -229,7 +249,7 @@ FileHeader:
LoadCommands:
- cmd: LC_UUID
cmdsize: 24
- uuid: 4C4C441A-5555-3144-A124-E395C0E7AA96
+ uuid: 4C4C44C9-5555-3144-A16C-82A90B2087B3
- cmd: LC_BUILD_VERSION
cmdsize: 24
platform: 1
@@ -239,9 +259,9 @@ LoadCommands:
- cmd: LC_SYMTAB
cmdsize: 24
symoff: 4096
- nsyms: 10
- stroff: 4256
- strsize: 156
+ nsyms: 11
+ stroff: 4272
+ strsize: 179
- cmd: LC_SEGMENT_64
cmdsize: 72
segname: __PAGEZERO
@@ -268,7 +288,7 @@ LoadCommands:
- sectname: __text
segname: __TEXT
addr: 0x100000338
- size: 208
+ size: 228
offset: 0x0
align: 2
reloff: 0x0
@@ -277,7 +297,7 @@ LoadCommands:
reserved1: 0x0
reserved2: 0x0
reserved3: 0x0
- content: CFFAEDFE0C000001000000000A00000008000000C005000000000000000000001B000000180000004C4C441A55553144A124E395C0E7AA9632000000180000000100000000000B0000000B00000000000200000018000000001000000A000000A01000009C00000019000000480000005F5F504147455A45524F00000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000019000000980000005F5F54455854000000000000000000000000000001000000
+ content: CFFAEDFE0C000001000000000A00000008000000C005000000000000000000001B000000180000004C4C44C955553144A16C82A90B2087B332000000180000000100000000000B0000000B00000000000200000018000000001000000B000000B0100000B300000019000000480000005F5F504147455A45524F00000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000019000000980000005F5F544558540000000000000000000000000000010000000040000000000000000000000000000000000000
- cmd: LC_SEGMENT_64
cmdsize: 152
segname: __DATA
@@ -308,7 +328,7 @@ LoadCommands:
vmaddr: 4295000064
vmsize: 4096
fileoff: 4096
- filesize: 316
+ filesize: 355
maxprot: 1
initprot: 1
nsects: 0
@@ -319,7 +339,7 @@ LoadCommands:
vmaddr: 4295004160
vmsize: 4096
fileoff: 8192
- filesize: 3507
+ filesize: 3884
maxprot: 7
initprot: 3
nsects: 11
@@ -328,7 +348,7 @@ LoadCommands:
- sectname: __debug_line
segname: __DWARF
addr: 0x100009000
- size: 323
+ size: 357
offset: 0x2000
align: 0
reloff: 0x0
@@ -339,9 +359,9 @@ LoadCommands:
reserved3: 0x0
- sectname: __debug_aranges
segname: __DWARF
- addr: 0x100009143
+ addr: 0x100009165
size: 48
- offset: 0x2143
+ offset: 0x2165
align: 0
reloff: 0x0
nreloc: 0
@@ -351,9 +371,9 @@ LoadCommands:
reserved3: 0x0
- sectname: __debug_loc
segname: __DWARF
- addr: 0x100009173
- size: 1026
- offset: 0x2173
+ addr: 0x100009195
+ size: 1083
+ offset: 0x2195
align: 0
reloff: 0x0
nreloc: 0
@@ -361,12 +381,12 @@ LoadCommands:
reserved1: 0x0
reserved2: 0x0
reserved3: 0x0
- content: 00000000000000000800000000000000010050080000000000000014000000000000000400A301509F0000000000000000000000000000000004000000000000000C0000000000000001005800000000000000000000000000000000080000000000000014000000000000000100500000000000000000000000000000000000000000000000000800000000000000010050080000000000000014000000000000000400A301509F0000000000000000000000000000000004000000000000000C0000000000000001005800000000000000000000000000000000080000000000000014000000000000000100500000000000000000000000000000000014000000000000002000000000000000010050200000000000000034000000000000000400A301509F00000000000000000000000000000000240000000000000034000000000000000100500000000000000000000000000000000014000000000000002000000000000000010050200000000000000034000000000000000400A301509F00000000000000000000000000000000240000000000000034000000000000000100500000000000000000000000000000000034000000000000004000000000000000010050400000000000000058000000000000000400A301509F000000000000000000000000000000003C0000000000000040000000000000000300707E9F000000000000000000000000000000004400000000000000580000000000000001005000000000000000000000000000000000340000000000000040000000000000000300707E9F00000000000000000000000000000000440000000000000058000000000000000100500000000000000000000000000000000034000000000000004000000000000000010050400000000000000058000000000000000400A301509F000000000000000000000000000000003C0000000000000040000000000000000300707E9F000000000000000000000000000000004400000000000000580000000000000001005000000000000000000000000000000000340000000000000040000000000000000300707E9F00000000000000000000000000000000440000000000000058000000000000000100500000000000000000000000000000000058000000000000006400000000000000010050640000000000000078000000000000000400A301509F00000000000000000000000000000000680000000000000078000000000000000100500000000000000000000000000000000084000000000000009000000000000000030011009F90000000000000009C000000000000000100639C00000000000000A800000000000000010064B800000000000000C00000000000000001006300000000000000000000000000000000
+ content: 00000000000000000800000000000000010050080000000000000014000000000000000400A301509F0000000000000000000000000000000004000000000000000C0000000000000001005800000000000000000000000000000000080000000000000014000000000000000100500000000000000000000000000000000000000000000000000800000000000000010050080000000000000014000000000000000400A301509F0000000000000000000000000000000004000000000000000C0000000000000001005800000000000000000000000000000000080000000000000014000000000000000100500000000000000000000000000000000014000000000000002000000000000000010050200000000000000034000000000000000400A301509F00000000000000000000000000000000240000000000000034000000000000000100500000000000000000000000000000000014000000000000002000000000000000010050200000000000000034000000000000000400A301509F00000000000000000000000000000000240000000000000034000000000000000100500000000000000000000000000000000034000000000000004000000000000000010050400000000000000058000000000000000400A301509F000000000000000000000000000000003C0000000000000040000000000000000300707E9F000000000000000000000000000000004400000000000000580000000000000001005000000000000000000000000000000000340000000000000040000000000000000300707E9F00000000000000000000000000000000440000000000000058000000000000000100500000000000000000000000000000000034000000000000004000000000000000010050400000000000000058000000000000000400A301509F000000000000000000000000000000003C0000000000000040000000000000000300707E9F000000000000000000000000000000004400000000000000580000000000000001005000000000000000000000000000000000340000000000000040000000000000000300707E9F00000000000000000000000000000000440000000000000058000000000000000100500000000000000000000000000000000058000000000000006400000000000000010050640000000000000078000000000000000400A301509F00000000000000000000000000000000680000000000000078000000000000000100500000000000000000000000000000000078000000000000007C000000000000000100507C0000000000000080000000000000000400A301509F000000000000000000000000000000008C000000000000009800000000000000030011009F9800000000000000A400000000000000010063A400000000000000B000000000000000010064C000000000000000D40000000000000001006300000000000000000000000000000000
- sectname: __debug_info
segname: __DWARF
- addr: ...
[truncated]
|
@llvm/pr-subscribers-llvm-binary-utilities Author: Peter Rong (DataCorrupted) ChangesPreviously, invalid offset is set to UINT64_MAX, this is not right when DWARF32, which leads to incorrect debug into in GSYM, the branch:
will always be true. Patch is 44.09 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/164015.diff 4 Files Affected:
diff --git a/llvm/include/llvm/BinaryFormat/Dwarf.h b/llvm/include/llvm/BinaryFormat/Dwarf.h
index 5039a3fe7ecc7..8b5c895788371 100644
--- a/llvm/include/llvm/BinaryFormat/Dwarf.h
+++ b/llvm/include/llvm/BinaryFormat/Dwarf.h
@@ -173,7 +173,7 @@ enum DecimalSignEncoding {
};
enum EndianityEncoding {
- // Endianity attribute values
+// Endianity attribute values
#define HANDLE_DW_END(ID, NAME) DW_END_##NAME = ID,
#include "llvm/BinaryFormat/Dwarf.def"
DW_END_lo_user = 0x40,
@@ -1128,6 +1128,9 @@ struct FormParams {
uint8_t getDwarfOffsetByteSize() const {
return dwarf::getDwarfOffsetByteSize(Format);
}
+ inline uint64_t getDwarfMaxOffset() const {
+ return (getDwarfOffsetByteSize() == 4) ? UINT32_MAX : UINT64_MAX;
+ }
explicit operator bool() const { return Version && AddrSize; }
};
diff --git a/llvm/lib/DWARFLinker/Classic/DWARFLinker.cpp b/llvm/lib/DWARFLinker/Classic/DWARFLinker.cpp
index 8052773812a2c..8637b55c78f9c 100644
--- a/llvm/lib/DWARFLinker/Classic/DWARFLinker.cpp
+++ b/llvm/lib/DWARFLinker/Classic/DWARFLinker.cpp
@@ -2427,11 +2427,13 @@ void DWARFLinker::DIECloner::generateLineTableForUnit(CompileUnit &Unit) {
uint64_t OrigStmtSeq = StmtSeq.get();
// 1. Get the original row index from the stmt list offset.
auto OrigRowIter = SeqOffToOrigRow.find(OrigStmtSeq);
+ const uint64_t InvalidOffset =
+ Unit.getOrigUnit().getFormParams().getDwarfMaxOffset();
// Check whether we have an output sequence for the StmtSeq offset.
// Some sequences are discarded by the DWARFLinker if they are invalid
// (empty).
if (OrigRowIter == SeqOffToOrigRow.end()) {
- StmtSeq.set(UINT64_MAX);
+ StmtSeq.set(InvalidOffset);
continue;
}
size_t OrigRowIndex = OrigRowIter->second;
@@ -2441,7 +2443,7 @@ void DWARFLinker::DIECloner::generateLineTableForUnit(CompileUnit &Unit) {
if (NewRowIter == OrigRowToNewRow.end()) {
// If the original row index is not found in the map, update the
// stmt_sequence attribute to the 'invalid offset' magic value.
- StmtSeq.set(UINT64_MAX);
+ StmtSeq.set(InvalidOffset);
continue;
}
diff --git a/llvm/lib/DebugInfo/GSYM/DwarfTransformer.cpp b/llvm/lib/DebugInfo/GSYM/DwarfTransformer.cpp
index fa39603437dd9..2d723fa91a164 100644
--- a/llvm/lib/DebugInfo/GSYM/DwarfTransformer.cpp
+++ b/llvm/lib/DebugInfo/GSYM/DwarfTransformer.cpp
@@ -82,7 +82,6 @@ struct llvm::gsym::CUInfo {
}
};
-
static DWARFDie GetParentDeclContextDIE(DWARFDie &Die) {
if (DWARFDie SpecDie =
Die.getAttributeValueAsReferencedDie(dwarf::DW_AT_specification)) {
@@ -170,7 +169,7 @@ getQualifiedNameIndex(DWARFDie &Die, uint64_t Language, GsymCreator &Gsym) {
// templates
if (ParentName.front() == '<' && ParentName.back() == '>')
Name = "{" + ParentName.substr(1, ParentName.size() - 2).str() + "}" +
- "::" + Name;
+ "::" + Name;
else
Name = ParentName.str() + "::" + Name;
}
@@ -320,12 +319,16 @@ static void convertFunctionLineTable(OutputAggregator &Out, CUInfo &CUI,
// Attempt to retrieve DW_AT_LLVM_stmt_sequence if present.
std::optional<uint64_t> StmtSeqOffset;
if (auto StmtSeqAttr = Die.find(llvm::dwarf::DW_AT_LLVM_stmt_sequence)) {
- // The `DW_AT_LLVM_stmt_sequence` attribute might be set to `UINT64_MAX`
- // when it refers to an empty line sequence. In such cases, the DWARF linker
- // will exclude the empty sequence from the final output and assign
- // `UINT64_MAX` to the `DW_AT_LLVM_stmt_sequence` attribute.
- uint64_t StmtSeqVal = dwarf::toSectionOffset(StmtSeqAttr, UINT64_MAX);
- if (StmtSeqVal != UINT64_MAX)
+ // The `DW_AT_LLVM_stmt_sequence` attribute might be set to an invalid
+ // sentinel value when it refers to an empty line sequence. In such cases,
+ // the DWARF linker will exclude the empty sequence from the final output
+ // and assign the sentinel value to the `DW_AT_LLVM_stmt_sequence`
+ // attribute. The sentinel value is UINT32_MAX for DWARF32 and UINT64_MAX
+ // for DWARF64.
+ const uint64_t InvalidOffset =
+ Die.getDwarfUnit()->getFormParams().getDwarfMaxOffset();
+ uint64_t StmtSeqVal = dwarf::toSectionOffset(StmtSeqAttr, InvalidOffset);
+ if (StmtSeqVal != InvalidOffset)
StmtSeqOffset = StmtSeqVal;
}
@@ -436,7 +439,7 @@ static void convertFunctionLineTable(OutputAggregator &Out, CUInfo &CUI,
// Skip multiple line entries for the same file and line.
auto LastLE = FI.OptLineTable->last();
if (LastLE && LastLE->File == FileIdx && LastLE->Line == Row.Line)
- continue;
+ continue;
// Only push a row if it isn't an end sequence. End sequence markers are
// included for the last address in a function or the last contiguous
// address in a sequence.
@@ -726,8 +729,8 @@ llvm::Error DwarfTransformer::verify(StringRef GsymPath,
for (uint32_t I = 0; I < NumAddrs; ++I) {
auto FuncAddr = Gsym->getAddress(I);
if (!FuncAddr)
- return createStringError(std::errc::invalid_argument,
- "failed to extract address[%i]", I);
+ return createStringError(std::errc::invalid_argument,
+ "failed to extract address[%i]", I);
auto FI = Gsym->getFunctionInfo(*FuncAddr);
if (!FI)
@@ -742,8 +745,7 @@ llvm::Error DwarfTransformer::verify(StringRef GsymPath,
if (!LR)
return LR.takeError();
- auto DwarfInlineInfos =
- DICtx.getInliningInfoForAddress(SectAddr, DLIS);
+ auto DwarfInlineInfos = DICtx.getInliningInfoForAddress(SectAddr, DLIS);
uint32_t NumDwarfInlineInfos = DwarfInlineInfos.getNumberOfFrames();
if (NumDwarfInlineInfos == 0) {
DwarfInlineInfos.addFrame(
@@ -781,8 +783,7 @@ llvm::Error DwarfTransformer::verify(StringRef GsymPath,
continue;
}
- for (size_t Idx = 0, count = LR->Locations.size(); Idx < count;
- ++Idx) {
+ for (size_t Idx = 0, count = LR->Locations.size(); Idx < count; ++Idx) {
const auto &gii = LR->Locations[Idx];
if (Idx < NumDwarfInlineInfos) {
const auto &dii = DwarfInlineInfos.getFrame(Idx);
diff --git a/llvm/test/tools/llvm-gsymutil/ARM_AArch64/macho-gsym-merged-callsites-dsym.yaml b/llvm/test/tools/llvm-gsymutil/ARM_AArch64/macho-gsym-merged-callsites-dsym.yaml
index 42ef7f301e536..c2b4ef00d7bd2 100644
--- a/llvm/test/tools/llvm-gsymutil/ARM_AArch64/macho-gsym-merged-callsites-dsym.yaml
+++ b/llvm/test/tools/llvm-gsymutil/ARM_AArch64/macho-gsym-merged-callsites-dsym.yaml
@@ -4,6 +4,7 @@
# RUN: split-file %s %t
# RUN: yaml2obj %t/merged_callsites.dSYM.yaml -o %t/merged_callsites.dSYM
+# The object file is manually tempered with such that the LLVM_stmt_seq of function_bad_stmt_seq is 0xFFFFFFFF.
# RUN: llvm-gsymutil --num-threads=1 --convert=%t/merged_callsites.dSYM --merged-functions --callsites-yaml-file=%t/callsites.yaml -o %t/call_sites_dSYM.gsym
# RUN: llvm-gsymutil --num-threads=1 --convert=%t/merged_callsites.dSYM --merged-functions --dwarf-callsites -o %t/dwarf_call_sites_dSYM.gsym
@@ -36,7 +37,13 @@
# CHECK-MERGED-CALLSITES: CallSites (by relative return offset):
# CHECK-MERGED-CALLSITES-NEXT: 0x[[#%.4x,]] Flags[None] MatchRegex[function2_copy1]
-# CHECK-MERGED-CALLSITES: FunctionInfo @ 0x[[#%x,MAIN:]]: [0x[[#%x,MAIN_START:]] - 0x[[#%x,MAIN_END:]]) "main"
+# If we don't do anything, a function with bad LLVM_stmt_seq won't have any call site filter
+# More importantly, the line number will be at the function definition.
+# CHECK-MERGED-CALLSITES: FunctionInfo @ 0x[[#%x,BAD_STMT_SEQ:]]: [0x[[#%x,BAD_STMT_SEQ_START:]] - 0x[[#%x,BAD_STMT_SEQ_END:]]) "function_bad_stmt_seq"
+# CHECK-MERGED-CALLSITES-NEXT: LineTable:
+# CHECK-MERGED-CALLSITES-NEXT: 0x00000001000003b0 ./merged_funcs_test.cpp:65
+
+# CHECK-MERGED-CALLSITES-NEXT: FunctionInfo @ 0x[[#%x,MAIN:]]: [0x[[#%x,MAIN_START:]] - 0x[[#%x,MAIN_END:]]) "main"
# CHECK-MERGED-CALLSITES: CallSites (by relative return offset):
# CHECK-MERGED-CALLSITES-NEXT: 0x[[#%.4x,]] Flags[None] MatchRegex[function1]
# CHECK-MERGED-CALLSITES-NEXT: 0x[[#%.4x,]] Flags[None] MatchRegex[function2_copy2]
@@ -44,16 +51,17 @@
# CHECK-MERGED-CALLSITES-NEXT: 0x[[#%.4x,]] Flags[None] MatchRegex[function3_copy2]
# CHECK-MERGED-CALLSITES-NEXT: 0x[[#%.4x,]] Flags[None] MatchRegex[function2_copy1]
-
### Check that we can correctly resove merged functions using callstacks:
### Resolve two callstacks containing merged functions.
### We use the value obtained from `CallSites:[FILTER]` to pass to the next call to `llvm-gsymutil` via `--merged-functions-filter`.
### The callstacks resolve differently based on the merged functions filter.
-### 0x00000001000003d0 => 0x000000010000037c => 0x000000010000035c => 0x0000000100000340
-### 0x00000001000003e8 =========================> 0x000000010000035c => 0x0000000100000340
+### 0x00000001000003d8 => 0x000000010000037c => 0x000000010000035c => 0x0000000100000340
+### 0x00000001000003f0 =========================> 0x000000010000035c => 0x0000000100000340
+###
+### We added a new function, for main function, +8 for line numbers, +0x8 for addresses.
-# RUN: llvm-gsymutil %t/dwarf_call_sites_dSYM.gsym --merged-functions --address=0x00000001000003d0 | FileCheck --check-prefix=CHECK-C1 %s
-# CHECK-C1: 0x00000001000003d0: main + 32 @ ./merged_funcs_test.cpp:63
+# RUN: llvm-gsymutil %t/dwarf_call_sites_dSYM.gsym --merged-functions --address=0x00000001000003d8 | FileCheck --check-prefix=CHECK-C1 %s
+# CHECK-C1: 0x00000001000003d8: main + 32 @ ./merged_funcs_test.cpp:71
# CHECK-C1-NEXT: CallSites: function2_copy2
# RUN: llvm-gsymutil %t/dwarf_call_sites_dSYM.gsym --merged-functions --address=0x000000010000037c --merged-functions-filter="function2_copy2" | FileCheck --check-prefix=CHECK-C2 %s
@@ -73,9 +81,9 @@
### ----------------------------------------------------------------------------------------------------------------------------------
### Resolve the 2nd call stack - the 2nd and 3rd addresses are the same but they resolve to a different function because of the filter
-# RUN: llvm-gsymutil %t/dwarf_call_sites_dSYM.gsym --address=0x00000001000003e8 --merged-functions | FileCheck --check-prefix=CHECK-C5 %s
-# CHECK-C5: Found 1 function at address 0x00000001000003e8:
-# CHECK-C5-NEXT: 0x00000001000003e8: main + 56 @ ./merged_funcs_test.cpp:64
+# RUN: llvm-gsymutil %t/dwarf_call_sites_dSYM.gsym --address=0x00000001000003f0 --merged-functions | FileCheck --check-prefix=CHECK-C5 %s
+# CHECK-C5: Found 1 function at address 0x00000001000003f0:
+# CHECK-C5-NEXT: 0x00000001000003f0: main + 56 @ ./merged_funcs_test.cpp:72
# CHECK-C5-NEXT: CallSites: function3_copy2
# RUN: llvm-gsymutil %t/dwarf_call_sites_dSYM.gsym --merged-functions --address=0x000000010000035c --merged-functions-filter="function3_copy2" | FileCheck --check-prefix=CHECK-C6 %s
@@ -87,6 +95,9 @@
# CHECK-C7: Found 1 function at address 0x0000000100000340:
# CHECK-C7-NEXT: 0x0000000100000340: function4_copy2 + 8 @ ./merged_funcs_test.cpp:14
+# RUN: llvm-gsymutil %t/dwarf_call_sites_dSYM.gsym --address=0x00000001000003b4 --merged-functions | FileCheck --check-prefix=CHECK-BAD %s
+# CHECK-BAD: Found 1 function at address 0x00000001000003b4:
+# CHECK-BAD-NEXT: 0x00000001000003b4: function_bad_stmt_seq + 4 @ ./merged_funcs_test.cpp:65
#--- merged_funcs_test.cpp
#define ATTRIB extern "C" __attribute__((noinline))
@@ -148,6 +159,14 @@ ATTRIB int function1(int a) {
return result;
}
+// Intentional multi-line function definition
+ATTRIB
+int function_bad_stmt_seq(
+ int a
+) {
+ return a + 1;
+}
+
int main() {
int sum = 0;
sum += function1(1);
@@ -155,6 +174,7 @@ int main() {
sum += function4_copy2(4);
sum += function3_copy2(41);
sum += function2_copy1(11);
+ sum += function_bad_stmt_seq(sum);
return sum;
}
@@ -229,7 +249,7 @@ FileHeader:
LoadCommands:
- cmd: LC_UUID
cmdsize: 24
- uuid: 4C4C441A-5555-3144-A124-E395C0E7AA96
+ uuid: 4C4C44C9-5555-3144-A16C-82A90B2087B3
- cmd: LC_BUILD_VERSION
cmdsize: 24
platform: 1
@@ -239,9 +259,9 @@ LoadCommands:
- cmd: LC_SYMTAB
cmdsize: 24
symoff: 4096
- nsyms: 10
- stroff: 4256
- strsize: 156
+ nsyms: 11
+ stroff: 4272
+ strsize: 179
- cmd: LC_SEGMENT_64
cmdsize: 72
segname: __PAGEZERO
@@ -268,7 +288,7 @@ LoadCommands:
- sectname: __text
segname: __TEXT
addr: 0x100000338
- size: 208
+ size: 228
offset: 0x0
align: 2
reloff: 0x0
@@ -277,7 +297,7 @@ LoadCommands:
reserved1: 0x0
reserved2: 0x0
reserved3: 0x0
- content: CFFAEDFE0C000001000000000A00000008000000C005000000000000000000001B000000180000004C4C441A55553144A124E395C0E7AA9632000000180000000100000000000B0000000B00000000000200000018000000001000000A000000A01000009C00000019000000480000005F5F504147455A45524F00000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000019000000980000005F5F54455854000000000000000000000000000001000000
+ content: CFFAEDFE0C000001000000000A00000008000000C005000000000000000000001B000000180000004C4C44C955553144A16C82A90B2087B332000000180000000100000000000B0000000B00000000000200000018000000001000000B000000B0100000B300000019000000480000005F5F504147455A45524F00000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000019000000980000005F5F544558540000000000000000000000000000010000000040000000000000000000000000000000000000
- cmd: LC_SEGMENT_64
cmdsize: 152
segname: __DATA
@@ -308,7 +328,7 @@ LoadCommands:
vmaddr: 4295000064
vmsize: 4096
fileoff: 4096
- filesize: 316
+ filesize: 355
maxprot: 1
initprot: 1
nsects: 0
@@ -319,7 +339,7 @@ LoadCommands:
vmaddr: 4295004160
vmsize: 4096
fileoff: 8192
- filesize: 3507
+ filesize: 3884
maxprot: 7
initprot: 3
nsects: 11
@@ -328,7 +348,7 @@ LoadCommands:
- sectname: __debug_line
segname: __DWARF
addr: 0x100009000
- size: 323
+ size: 357
offset: 0x2000
align: 0
reloff: 0x0
@@ -339,9 +359,9 @@ LoadCommands:
reserved3: 0x0
- sectname: __debug_aranges
segname: __DWARF
- addr: 0x100009143
+ addr: 0x100009165
size: 48
- offset: 0x2143
+ offset: 0x2165
align: 0
reloff: 0x0
nreloc: 0
@@ -351,9 +371,9 @@ LoadCommands:
reserved3: 0x0
- sectname: __debug_loc
segname: __DWARF
- addr: 0x100009173
- size: 1026
- offset: 0x2173
+ addr: 0x100009195
+ size: 1083
+ offset: 0x2195
align: 0
reloff: 0x0
nreloc: 0
@@ -361,12 +381,12 @@ LoadCommands:
reserved1: 0x0
reserved2: 0x0
reserved3: 0x0
- content: 00000000000000000800000000000000010050080000000000000014000000000000000400A301509F0000000000000000000000000000000004000000000000000C0000000000000001005800000000000000000000000000000000080000000000000014000000000000000100500000000000000000000000000000000000000000000000000800000000000000010050080000000000000014000000000000000400A301509F0000000000000000000000000000000004000000000000000C0000000000000001005800000000000000000000000000000000080000000000000014000000000000000100500000000000000000000000000000000014000000000000002000000000000000010050200000000000000034000000000000000400A301509F00000000000000000000000000000000240000000000000034000000000000000100500000000000000000000000000000000014000000000000002000000000000000010050200000000000000034000000000000000400A301509F00000000000000000000000000000000240000000000000034000000000000000100500000000000000000000000000000000034000000000000004000000000000000010050400000000000000058000000000000000400A301509F000000000000000000000000000000003C0000000000000040000000000000000300707E9F000000000000000000000000000000004400000000000000580000000000000001005000000000000000000000000000000000340000000000000040000000000000000300707E9F00000000000000000000000000000000440000000000000058000000000000000100500000000000000000000000000000000034000000000000004000000000000000010050400000000000000058000000000000000400A301509F000000000000000000000000000000003C0000000000000040000000000000000300707E9F000000000000000000000000000000004400000000000000580000000000000001005000000000000000000000000000000000340000000000000040000000000000000300707E9F00000000000000000000000000000000440000000000000058000000000000000100500000000000000000000000000000000058000000000000006400000000000000010050640000000000000078000000000000000400A301509F00000000000000000000000000000000680000000000000078000000000000000100500000000000000000000000000000000084000000000000009000000000000000030011009F90000000000000009C000000000000000100639C00000000000000A800000000000000010064B800000000000000C00000000000000001006300000000000000000000000000000000
+ content: 00000000000000000800000000000000010050080000000000000014000000000000000400A301509F0000000000000000000000000000000004000000000000000C0000000000000001005800000000000000000000000000000000080000000000000014000000000000000100500000000000000000000000000000000000000000000000000800000000000000010050080000000000000014000000000000000400A301509F0000000000000000000000000000000004000000000000000C0000000000000001005800000000000000000000000000000000080000000000000014000000000000000100500000000000000000000000000000000014000000000000002000000000000000010050200000000000000034000000000000000400A301509F00000000000000000000000000000000240000000000000034000000000000000100500000000000000000000000000000000014000000000000002000000000000000010050200000000000000034000000000000000400A301509F00000000000000000000000000000000240000000000000034000000000000000100500000000000000000000000000000000034000000000000004000000000000000010050400000000000000058000000000000000400A301509F000000000000000000000000000000003C0000000000000040000000000000000300707E9F000000000000000000000000000000004400000000000000580000000000000001005000000000000000000000000000000000340000000000000040000000000000000300707E9F00000000000000000000000000000000440000000000000058000000000000000100500000000000000000000000000000000034000000000000004000000000000000010050400000000000000058000000000000000400A301509F000000000000000000000000000000003C0000000000000040000000000000000300707E9F000000000000000000000000000000004400000000000000580000000000000001005000000000000000000000000000000000340000000000000040000000000000000300707E9F00000000000000000000000000000000440000000000000058000000000000000100500000000000000000000000000000000058000000000000006400000000000000010050640000000000000078000000000000000400A301509F00000000000000000000000000000000680000000000000078000000000000000100500000000000000000000000000000000078000000000000007C000000000000000100507C0000000000000080000000000000000400A301509F000000000000000000000000000000008C000000000000009800000000000000030011009F9800000000000000A400000000000000010063A400000000000000B000000000000000010064C000000000000000D40000000000000001006300000000000000000000000000000000
- sectname: __debug_info
segname: __DWARF
- addr: ...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
@JDevlieghere Let me know your thoughts please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine to me, but let's give @alx32 a chance to chime in too, to confirm nothing is explicitly expecting the old offset and would break.
@@ -4,6 +4,7 @@ | |||
|
|||
# RUN: split-file %s %t | |||
# RUN: yaml2obj %t/merged_callsites.dSYM.yaml -o %t/merged_callsites.dSYM | |||
# The object file is manually tempered with such that the LLVM_stmt_seq of function_bad_stmt_seq is 0xFFFFFFFF. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# The object file is manually tempered with such that the LLVM_stmt_seq of function_bad_stmt_seq is 0xFFFFFFFF. | |
# The object file is manually tampered with such that the LLVM_stmt_seq of function_bad_stmt_seq is 0xFFFFFFFF. |
Thanks. @alx32 is on leave and I'm not sure whether he can get the notification (reason I popped up and started fixing things) . I'll wait until Monday to land it anyway. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for fixing.
Previously, invalid offset is set to UINT64_MAX, this is not right when DWARF32, which leads to incorrect debug into in GSYM, the branch:
will always be true.
In this PR, commit 1 sets up a test that demonstrates the problem, commit 2 fixes it.
Diffing commit 1 and 2 in this PR shows how after the PR the symbolicated line number changed from function definition to function body