Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[llvm-dwp] Fix merging of debug_str_offsets with multiple contributions #90461

Merged

Conversation

molar
Copy link
Contributor

@molar molar commented Apr 29, 2024

This pull request will change the merging of debug_str_offset to merge per contribution and correctly copy over each contribution header to the merged section. I have added some test data which is in dwarf5 format as this is where the section contribution header was introduced, as far as i can tell.

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Collaborator

llvmbot commented Apr 29, 2024

@llvm/pr-subscribers-debuginfo

Author: Morten larsen (molar)

Changes

This pull request will change the merging of debug_str_offset to merge per contribution and correctly copy over each contribution header to the merged section. I have added some test data which is in dwarf5 format as this is where the section contribution header was introduced, as far as i can tell.


Full diff: https://github.com/llvm/llvm-project/pull/90461.diff

5 Files Affected:

  • (modified) llvm/lib/DWP/DWP.cpp (+33-9)
  • (added) llvm/test/tools/llvm-dwp/Inputs/merge_v5/notypes/ab.dwp ()
  • (added) llvm/test/tools/llvm-dwp/Inputs/merge_v5/notypes/c.dwo ()
  • (modified) llvm/test/tools/llvm-dwp/X86/merge.test (+5)
  • (added) llvm/test/tools/llvm-dwp/X86/merge_v5.test (+48)
diff --git a/llvm/lib/DWP/DWP.cpp b/llvm/lib/DWP/DWP.cpp
index 77bd22d1f071c6..fecd184ca68a86 100644
--- a/llvm/lib/DWP/DWP.cpp
+++ b/llvm/lib/DWP/DWP.cpp
@@ -415,6 +415,17 @@ Expected<InfoSectionUnitHeader> parseInfoSectionUnitHeader(StringRef Info) {
   return Header;
 }
 
+static void writeNewOffsetsTo(MCStreamer &Out, DataExtractor &Data,
+                              DenseMap<uint64_t, uint32_t> &OffsetRemapping,
+                              uint64_t &Offset, uint64_t &Size) {
+
+  while (Offset < Size) {
+    auto OldOffset = Data.getU32(&Offset);
+    auto NewOffset = OffsetRemapping[OldOffset];
+    Out.emitIntValue(NewOffset, 4);
+  }
+}
+
 void writeStringsAndOffsets(MCStreamer &Out, DWPStringPool &Strings,
                             MCSection *StrOffsetSection,
                             StringRef CurStrSection,
@@ -439,17 +450,30 @@ void writeStringsAndOffsets(MCStreamer &Out, DWPStringPool &Strings,
 
   Out.switchSection(StrOffsetSection);
 
-  uint64_t HeaderSize = debugStrOffsetsHeaderSize(Data, Version);
   uint64_t Offset = 0;
   uint64_t Size = CurStrOffsetSection.size();
-  // FIXME: This can be caused by bad input and should be handled as such.
-  assert(HeaderSize <= Size && "StrOffsetSection size is less than its header");
-  // Copy the header to the output.
-  Out.emitBytes(Data.getBytes(&Offset, HeaderSize));
-  while (Offset < Size) {
-    auto OldOffset = Data.getU32(&Offset);
-    auto NewOffset = OffsetRemapping[OldOffset];
-    Out.emitIntValue(NewOffset, 4);
+  if (Version > 4) {
+    while (Offset < Size) {
+      uint64_t HeaderSize = debugStrOffsetsHeaderSize(Data, Version);
+      assert(HeaderSize <= Size - Offset &&
+             "StrOffsetSection size is less than its header");
+
+      uint64_t ContributionEnd = 0;
+      uint64_t ContributionSize = 0;
+      uint64_t HeaderLengthOffset = Offset;
+      if (HeaderSize == 8) {
+        ContributionSize = Data.getU32(&HeaderLengthOffset);
+      } else if (HeaderSize == 16) {
+        HeaderLengthOffset += 4; // skip the dwarf64 marker
+        ContributionSize = Data.getU64(&HeaderLengthOffset);
+      }
+      ContributionEnd = ContributionSize + HeaderLengthOffset;
+      Out.emitBytes(Data.getBytes(&Offset, HeaderSize));
+      writeNewOffsetsTo(Out, Data, OffsetRemapping, Offset, ContributionEnd);
+    }
+
+  } else {
+    writeNewOffsetsTo(Out, Data, OffsetRemapping, Offset, Size);
   }
 }
 
diff --git a/llvm/test/tools/llvm-dwp/Inputs/merge_v5/notypes/ab.dwp b/llvm/test/tools/llvm-dwp/Inputs/merge_v5/notypes/ab.dwp
new file mode 100644
index 00000000000000..a2388f0e37dff0
Binary files /dev/null and b/llvm/test/tools/llvm-dwp/Inputs/merge_v5/notypes/ab.dwp differ
diff --git a/llvm/test/tools/llvm-dwp/Inputs/merge_v5/notypes/c.dwo b/llvm/test/tools/llvm-dwp/Inputs/merge_v5/notypes/c.dwo
new file mode 100644
index 00000000000000..e58a4b0c34d0bb
Binary files /dev/null and b/llvm/test/tools/llvm-dwp/Inputs/merge_v5/notypes/c.dwo differ
diff --git a/llvm/test/tools/llvm-dwp/X86/merge.test b/llvm/test/tools/llvm-dwp/X86/merge.test
index 0cf56bd858a04b..25a1315ee63f33 100644
--- a/llvm/test/tools/llvm-dwp/X86/merge.test
+++ b/llvm/test/tools/llvm-dwp/X86/merge.test
@@ -44,3 +44,8 @@ CHECK:    Index  Signature INFO                                        ABBREV
 CHECK-DAG:       [[DWOC]]  [0x00000000[[#COFF]], 0x00000000[[#AOFF]]) [0x0000[[CAOFF]], 0x0000[[AAOFF]]) [0x00000000, 0x00000011) [0x00000000, 0x00000018)
 CHECK-DAG:       [[DWOA]]  [0x00000000[[#AOFF]], 0x00000000[[#BOFF]]) [0x0000[[AAOFF]], 0x0000[[BAOFF]]) [0x00000011, 0x00000022) [0x00000018, 0x00000028)
 CHECK-DAG:       [[DWOB]]  [0x00000000[[#BOFF]], 0x00000000[[#XOFF]]) [0x0000[[BAOFF]], 0x000000c3)      [0x00000022, 0x00000033) [0x00000028, 0x0000003c)
+
+CHECK-LABEL: .debug_str_offsets.dwo contents:
+CHECK: Contribution size = 24, Format = DWARF32, Version = 4
+CHECK: Contribution size = 16, Format = DWARF32, Version = 4
+CHECK: Contribution size = 20, Format = DWARF32, Version = 4
diff --git a/llvm/test/tools/llvm-dwp/X86/merge_v5.test b/llvm/test/tools/llvm-dwp/X86/merge_v5.test
new file mode 100644
index 00000000000000..ea9ef030e5d05b
--- /dev/null
+++ b/llvm/test/tools/llvm-dwp/X86/merge_v5.test
@@ -0,0 +1,48 @@
+RUN: llvm-dwp %p/../Inputs/merge_v5/notypes/c.dwo %p/../Inputs/merge_v5/notypes/ab.dwp -o - | \
+RUN:   llvm-dwarfdump -v - | FileCheck --check-prefix=CHECK %s
+
+DWP from a DWO (c.dwo) and a DWP (ab.dwp, created from a.dwo and b.dwo)
+Make sure the entries for A and B are updated correctly when read/processed from ab.dwp
+a.cpp:
+  struct foo { };
+  foo a;
+
+b.cpp:
+  struct bar { };
+  void b(bar) {
+  }
+
+c.cpp:
+  typedef int baz;
+  baz c() {
+  }
+
+CHECK-LABEL: .debug_abbrev.dwo contents:
+CHECK-LABEL: Abbrev table for offset:
+CHECK: 0x0000[[CAOFF:.*]]
+CHECK-LABEL: Abbrev table for offset:
+CHECK: 0x0000[[AAOFF:.*]]
+CHECK-LABEL: Abbrev table for offset:
+CHECK: 0x0000[[BAOFF:.*]]
+
+CHECK: .debug_info.dwo contents:
+CHECK: 0x[[#%.8x,COFF:]]:
+CHECK-LABEL: Compile Unit: length = {{.*}}, format = DWARF32, version = 0x0005, unit_type = DW_UT_split_compile, abbr_offset =
+CHECK:         0x[[CAOFF]], addr_size = 0x08, DWO_id = 0x[[DWOC:.*]] (next unit at 0x[[#%.8x,BOFF:]])
+CHECK: [[#BOFF]]:
+CHECK-LABEL: Compile Unit: length = {{.*}}, format = DWARF32, version = 0x0005, unit_type = DW_UT_split_compile, abbr_offset =
+CHECK:         0x[[BAOFF]], addr_size = 0x08, DWO_id = 0x[[DWOB:.*]] (next unit at 0x[[#%.8x,AOFF:]])
+CHECK: [[#AOFF]]:
+CHECK-LABEL: Compile Unit: length = {{.*}}, format = DWARF32, version = 0x0005, unit_type = DW_UT_split_compile, abbr_offset =
+CHECK:         0x[[AAOFF]], addr_size = 0x08, DWO_id = 0x[[DWOA:.*]] (next unit at 0x[[#%.8x,XOFF:]])
+
+CHECK-LABEL: .debug_cu_index contents:
+CHECK:    Index Signature          INFO                                     ABBREV                                  STR_OFFSETS 
+CHECK-DAG:       [[DWOC]]  [0x00000000[[#COFF]], 0x00000000[[#BOFF]]) [0x0000[[CAOFF]], 0x0000[[AAOFF]]) [0x00000000, 0x00000024) 
+CHECK-DAG:       [[DWOB]]  [0x00000000[[#BOFF]], 0x00000000[[#AOFF]]) [0x0000[[BAOFF]],  
+CHECK-DAG:       [[DWOA]]  [0x00000000[[#AOFF]], 0x00000000[[#XOFF]]) [0x0000[[AAOFF]], 
+
+CHECK-LABEL: .debug_str_offsets.dwo contents:
+CHECK: Contribution size = 32, Format = DWARF32, Version = 5
+CHECK: Contribution size = 24, Format = DWARF32, Version = 5
+CHECK: Contribution size = 28, Format = DWARF32, Version = 5

@molar
Copy link
Contributor Author

molar commented May 13, 2024

Ping @dwblaikie kindly requesting a review

@dwblaikie
Copy link
Collaborator

Thanks!

Generally I think we're moving away from checked in binary object files - could you make the test from checked in assembly instead (perhaps using @MaskRay's new https://llvm.org/docs/TestingGuide.html#elaborated-tests perhaps)

So if I'm understanding correctly - you're saying llvm-dwp did not work when an input was a DWP and was DWARFv5? huh, surprising oversight... oh, I guess not that many people use dwp files as input to dwp actions, so I can see how some people would've been working along fine with this state of affairs.

@felipepiovezan
Copy link
Contributor

felipepiovezan commented May 14, 2024

Generally I think we're moving away from checked in binary object files - could you make the test from checked in assembly instead (perhaps using @MaskRay's new https://llvm.org/docs/TestingGuide.html#elaborated-tests perhaps)

I think the dwarf YAML support should work for debug_str_offsets too?

@molar molar force-pushed the molar_fix_dwp_merging_debug_str_offsets_upstream branch 3 times, most recently from 7847693 to 120b7b1 Compare May 17, 2024 17:45
llvm/test/tools/llvm-dwp/X86/merge_v5.test Outdated Show resolved Hide resolved
Comment on lines 34 to 38
CHECK-LABEL: .debug_str_offsets.dwo contents:
CHECK: Contribution size = 32, Format = DWARF32, Version = 5
CHECK: Contribution size = 24, Format = DWARF32, Version = 5
CHECK: Contribution size = 28, Format = DWARF32, Version = 5

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the old behavior would've been a single contribution, and now there's three? /might/ be worth checking the content too, but I wouldn't insist on it/I could see it going either way, the length probably provides a fair amount of checking (like we didn't emit all the strings into one contribution, and then emit two empty headers)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a verbose check of the contents, let me know if there is a better way to do it, other than just comparing against the output.

@molar
Copy link
Contributor Author

molar commented May 22, 2024

Thanks!

Generally I think we're moving away from checked in binary object files - could you make the test from checked in assembly instead (perhaps using @MaskRay's new https://llvm.org/docs/TestingGuide.html#elaborated-tests perhaps)

So if I'm understanding correctly - you're saying llvm-dwp did not work when an input was a DWP and was DWARFv5? huh, surprising oversight... oh, I guess not that many people use dwp files as input to dwp actions, so I can see how some people would've been working along fine with this state of affairs.

Thank you for looking at this PR.

We encounter this bug when using bazel along with llvm-dwp tool to generate .dwp files for our top-level binaries. In bazel it will switch to a tree merging when the number of .dwo inputs is larger than 100. It does not use the -e flag together with the binary which does the correct thing.

https://github.com/bazelbuild/bazel/blob/7048ef5eb5395679241f2321a8104b26ecee01c0/src/main/starlark/builtins_bzl/common/cc/cc_debug_helper.bzl#L129

Copy link
Collaborator

@dwblaikie dwblaikie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

@MaskRay
Copy link
Member

MaskRay commented May 28, 2024

(Might be useful to use [DWP] Fix ... or [llvm-dwp] Fix ... for the title)

@molar molar force-pushed the molar_fix_dwp_merging_debug_str_offsets_upstream branch from 8b4abbe to d843f12 Compare June 11, 2024 07:39
@molar molar changed the title Fix merging of debug_str_offsets with multiple contributions [llvm-dwp] Fix merging of debug_str_offsets with multiple contributions Jun 11, 2024
@dwblaikie
Copy link
Collaborator

Do you need someone to merge this for you?

@molar
Copy link
Contributor Author

molar commented Jun 25, 2024

Do you need someone to merge this for you?

Yes please

@dwblaikie dwblaikie merged commit 699cd9a into llvm:main Jun 25, 2024
6 checks passed
Copy link

@molar Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested
by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as
the builds can include changes from many authors. It is not uncommon for your
change to be included in a build that fails due to someone else's changes, or
infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself.
This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

AlexisPerry pushed a commit to llvm-project-tlp/llvm-project that referenced this pull request Jul 9, 2024
…ns (llvm#90461)

This pull request will change the merging of ``debug_str_offset`` to
merge per contribution and correctly copy over each contribution header
to the merged section. I have added some test data which is in dwarf5
format as this is where the section contribution header was introduced,
as far as i can tell.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants