Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LLDB][ELF] Fix section unification to not just use names. #90099

Merged
merged 5 commits into from
May 1, 2024

Conversation

al45tair
Copy link
Contributor

Section unification cannot just use names, because it's valid for ELF binaries to have multiple sections with the same name. We should check other section properties too.

Fixes #88001.

rdar://124467787

Section unification cannot just use names, because it's valid for ELF
binaries to have multiple sections with the same name.  We should check
other section properties too.

rdar://124467787
@llvmbot
Copy link
Collaborator

llvmbot commented Apr 25, 2024

@llvm/pr-subscribers-lldb

Author: Alastair Houghton (al45tair)

Changes

Section unification cannot just use names, because it's valid for ELF binaries to have multiple sections with the same name. We should check other section properties too.

Fixes #88001.

rdar://124467787


Full diff: https://github.com/llvm/llvm-project/pull/90099.diff

1 Files Affected:

  • (modified) lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp (+53-11)
diff --git a/lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp b/lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
index 0d95a1c12bde35..60fc68c96bcc1c 100644
--- a/lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
+++ b/lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
@@ -1854,6 +1854,49 @@ class VMAddressProvider {
 };
 }
 
+namespace {
+  // We have to do this because ELF doesn't have section IDs, and also
+  // doesn't require section names to be unique.  (We use the section index
+  // for section IDs, but that isn't guaranteed to be the same in separate
+  // debug images.)
+  SectionSP FindMatchingSection(const SectionList &section_list,
+                                SectionSP section) {
+    SectionSP sect_sp;
+
+    addr_t vm_addr = section->GetFileAddress();
+    ConstString name = section->GetName();
+    offset_t file_size = section->GetFileSize();
+    offset_t byte_size = section->GetByteSize();
+    SectionType type = section->GetType();
+    bool thread_specific = section->IsThreadSpecific();
+    uint32_t permissions = section->GetPermissions();
+    uint32_t alignment = section->GetLog2Align();
+
+    for (auto sect_iter = section_list.begin();
+         sect_iter != section_list.end();
+         ++sect_iter) {
+      if ((*sect_iter)->GetName() == name
+          && (*sect_iter)->GetType() == type
+          && (*sect_iter)->IsThreadSpecific() == thread_specific
+          && (*sect_iter)->GetPermissions() == permissions
+          && (*sect_iter)->GetFileSize() == file_size
+          && (*sect_iter)->GetByteSize() == byte_size
+          && (*sect_iter)->GetFileAddress() == vm_addr
+          && (*sect_iter)->GetLog2Align() == alignment) {
+        sect_sp = *sect_iter;
+        break;
+      } else {
+        sect_sp = FindMatchingSection((*sect_iter)->GetChildren(),
+                                      section);
+        if (sect_sp)
+          break;
+      }
+    }
+
+    return sect_sp;
+  }
+}
+
 void ObjectFileELF::CreateSections(SectionList &unified_section_list) {
   if (m_sections_up)
     return;
@@ -2067,10 +2110,8 @@ unsigned ObjectFileELF::ParseSymbols(Symtab *symtab, user_id_t start_id,
   SectionList *module_section_list =
       module_sp ? module_sp->GetSectionList() : nullptr;
 
-  // Local cache to avoid doing a FindSectionByName for each symbol. The "const
-  // char*" key must came from a ConstString object so they can be compared by
-  // pointer
-  std::unordered_map<const char *, lldb::SectionSP> section_name_to_section;
+  // Cache the section mapping
+  std::unordered_map<lldb::SectionSP, lldb::SectionSP> section_map;
 
   unsigned i;
   for (i = 0; i < num_symbols; ++i) {
@@ -2275,14 +2316,15 @@ unsigned ObjectFileELF::ParseSymbols(Symtab *symtab, user_id_t start_id,
 
     if (symbol_section_sp && module_section_list &&
         module_section_list != section_list) {
-      ConstString sect_name = symbol_section_sp->GetName();
-      auto section_it = section_name_to_section.find(sect_name.GetCString());
-      if (section_it == section_name_to_section.end())
+      auto section_it = section_map.find(symbol_section_sp);
+      if (section_it == section_map.end()) {
         section_it =
-            section_name_to_section
-                .emplace(sect_name.GetCString(),
-                         module_section_list->FindSectionByName(sect_name))
-                .first;
+          section_map
+              .emplace(symbol_section_sp,
+                       FindMatchingSection(*module_section_list,
+                                           symbol_section_sp))
+              .first;
+      }
       if (section_it->second)
         symbol_section_sp = section_it->second;
     }

Copy link

github-actions bot commented Apr 25, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Member

@JDevlieghere JDevlieghere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test for this with obj2yaml?

lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp Outdated Show resolved Hide resolved
Comment on lines 1879 to 1885
&& (*sect_iter)->GetType() == type
&& (*sect_iter)->IsThreadSpecific() == thread_specific
&& (*sect_iter)->GetPermissions() == permissions
&& (*sect_iter)->GetFileSize() == file_size
&& (*sect_iter)->GetByteSize() == byte_size
&& (*sect_iter)->GetFileAddress() == vm_addr
&& (*sect_iter)->GetLog2Align() == alignment) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it could be an operator == in the Section class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to do that, because I didn't want to give the impression that this was a general purpose way to compare two Sections. While it checks many of the section attributes, it doesn't check all of them (intentionally in the case of which object they come from), and I think that would be surprising behaviour from an == operator.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I correct in understanding that this code is used when unifying the sections of a separate debug file with a stripped file? If so, then I believe this comparison is too strict. An object file produced by objcopy --only-keep-debug will only have placeholder sections instead of the sections that contained actual code. That means (at least) their file size and file offset will be different from that in the original file.

I think it would be sufficient to use the file address (called virtual address in ELF), possibly confirmed by section name, as the unification key.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did wonder about exactly how strict we want to be (and exactly which things to compare). This isn't testing file offset, but you're right that it does check file size and maybe it shouldn't.

Copy link
Contributor Author

@al45tair al45tair Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@labath Let's get rid of the file sizes; the other things should all be the same, correct?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The section type can also change. You can see that with .text, because there we have a name-based fallback, but with anything else, it changes from "code" to "regular":

  Showing 1 subsections
    Index: 0
    ID: 0x6
    Name: .foo
    Type: regular
    Permissions: r-x
    Thread specific: no
$ bin/lldb-test object-file /tmp/a.out  | grep -e foo -C 3
  Showing 1 subsections
    Index: 0
    ID: 0x6
    Name: .foo
    Type: code
    Permissions: r-x
    Thread specific: no

The other fields are probably fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I've removed the section type test as well.

lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp Outdated Show resolved Hide resolved
lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp Outdated Show resolved Hide resolved
@al45tair
Copy link
Contributor Author

Can we add a test for this with obj2yaml?

Not sure what you're thinking here. We can't use yaml2obj to generate an object with this problem because that tool assumes symbols are in a uniquely named section (which is mostly reasonable), and obj2yaml doesn't exercise the code in LLDB. I'll take a look and see what we can do as a test.

Fixed a couple of nits from review, and fixed up formatting.

Also added a test.

rdar://124467787
@labath
Copy link
Collaborator

labath commented Apr 26, 2024

Can we add a test for this with obj2yaml?

Not sure what you're thinking here. We can't use yaml2obj to generate an object with this problem because that tool assumes symbols are in a uniquely named section

this test seems to indicate that's possible (the trick appears to me in (n) name tag suffixes). Can you give that a shot?

@al45tair
Copy link
Contributor Author

this test seems to indicate that's possible (the trick appears to me in (n) name tag suffixes). Can you give that a shot?

I've actually just included a binary file that exhibits the problem.

As pointed out by @labath, if you use `objcopy --only-keep-debug`, only
placeholder sections will be present so the file sizes won't match.

We already ignore file offsets when doing the comparison.

rdar://124467787
@labath
Copy link
Collaborator

labath commented Apr 26, 2024

this test seems to indicate that's possible (the trick appears to me in (n) name tag suffixes). Can you give that a shot?

I've actually just included a binary file that exhibits the problem.

I saw that, but a textual test is definitely preferable, particularly after the linux xz fiasco. This wouldn't be the first test binary in our repo, but in this case, I think it actually adds a lot of value, since yaml2obj operates on the same level as what you are testing (so you can see the input of the test eyeball-verify that the expected output makes sense).

@al45tair
Copy link
Contributor Author

I saw that, but a textual test is definitely preferable, particularly after the linux xz fiasco. This wouldn't be the first test binary in our repo, but in this case, I think it actually adds a lot of value, since yaml2obj operates on the same level as what you are testing (so you can see the input of the test eyeball-verify that the expected output makes sense).

:-) Fair point (though this is very much not like the xz incident — I'm a colleague of Jonas's at Apple, albeit in a different team, with a fairly longstanding history of contributing to OSS projects, and doing something like that would be… very career limiting).

I'll have a go at changing things to use yaml2obj.

Rather than including a binary, use `yaml2obj` for the duplicate section
name test.

rdar://124467787
@labath
Copy link
Collaborator

labath commented Apr 29, 2024

I saw that, but a textual test is definitely preferable, particularly after the linux xz fiasco. This wouldn't be the first test binary in our repo, but in this case, I think it actually adds a lot of value, since yaml2obj operates on the same level as what you are testing (so you can see the input of the test eyeball-verify that the expected output makes sense).

:-) Fair point (though this is very much not like the xz incident — I'm a colleague of Jonas's at Apple, albeit in a different team, with a fairly longstanding history of contributing to OSS projects, and doing something like that would be… very career limiting).

I'll have a go at changing things to use yaml2obj.

The new test looks great.

@al45tair
Copy link
Contributor Author

@labath @JDevlieghere Are we happy with this PR now? If so, I'll cherry pick it to the various Apple forks; I need it in order to merge swiftlang/swift#72061, which we need for both my fully statically linked Swift SDK work and for Amazon Linux 2023 support in Swift.

The section type can change when using `objcopy --only-keep-debug`,
apparently.

rdar://124467787
Copy link
Member

@JDevlieghere JDevlieghere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[LLDB][ELF] LLDB gets confused if there are multiple .text sections in a binary
4 participants