Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[lld/MachO] Fix assert on unsorted data-in-code entries #81758

Merged
merged 2 commits into from
Feb 16, 2024

Conversation

nico
Copy link
Contributor

@nico nico commented Feb 14, 2024

When the data-in-code entries are in separate sections, they are not guaranteed to be sorted. In particular, 68b1cc36f3df marked some libc++ string functions as noinline, which leads to global ctors involving strings now producing data-in-code sections in __TEXT,__StaticInit, which is why this now happens in practice.

Since data-in-code entries are relatively rare and small, just sort them. No observed performance impact.

See also crbug.com/41487860

When the data-in-code entries are in separate sections, they are
not guaranteed to be sorted. In particular, 68b1cc36f3df marked
some libc++ string functions as noinline, which leads to global
ctors involving strings now producing data-in-code sections in
__TEXT,__StaticInit, which is why this now happens in practice.

Since data-in-code entries are relatively rare and small, just
sort them. No observed performance impact.

See also crbug.com/41487860
@llvmbot
Copy link
Collaborator

llvmbot commented Feb 14, 2024

@llvm/pr-subscribers-lld

@llvm/pr-subscribers-lld-macho

Author: Nico Weber (nico)

Changes

When the data-in-code entries are in separate sections, they are not guaranteed to be sorted. In particular, 68b1cc36f3df marked some libc++ string functions as noinline, which leads to global ctors involving strings now producing data-in-code sections in __TEXT,__StaticInit, which is why this now happens in practice.

Since data-in-code entries are relatively rare and small, just sort them. No observed performance impact.

See also crbug.com/41487860


Full diff: https://github.com/llvm/llvm-project/pull/81758.diff

2 Files Affected:

  • (modified) lld/MachO/SyntheticSections.cpp (+8-5)
  • (modified) lld/test/MachO/data-in-code.s (+17-5)
diff --git a/lld/MachO/SyntheticSections.cpp b/lld/MachO/SyntheticSections.cpp
index 544847d3d448c4..6dbf27034f115e 100644
--- a/lld/MachO/SyntheticSections.cpp
+++ b/lld/MachO/SyntheticSections.cpp
@@ -1050,10 +1050,13 @@ static std::vector<MachO::data_in_code_entry> collectDataInCodeEntries() {
     if (entries.empty())
       continue;
 
-    assert(is_sorted(entries, [](const data_in_code_entry &lhs,
-                                 const data_in_code_entry &rhs) {
+    std::vector<MachO::data_in_code_entry> sortedEntries;
+    sortedEntries.assign(entries.begin(), entries.end());
+    llvm::sort(sortedEntries, [](const data_in_code_entry &lhs,
+                                      const data_in_code_entry &rhs) {
       return lhs.offset < rhs.offset;
-    }));
+    });
+
     // For each code subsection find 'data in code' entries residing in it.
     // Compute the new offset values as
     // <offset within subsection> + <subsection address> - <__TEXT address>.
@@ -1066,12 +1069,12 @@ static std::vector<MachO::data_in_code_entry> collectDataInCodeEntries() {
           continue;
         const uint64_t beginAddr = section->addr + subsec.offset;
         auto it = llvm::lower_bound(
-            entries, beginAddr,
+            sortedEntries, beginAddr,
             [](const MachO::data_in_code_entry &entry, uint64_t addr) {
               return entry.offset < addr;
             });
         const uint64_t endAddr = beginAddr + isec->getSize();
-        for (const auto end = entries.end();
+        for (const auto end = sortedEntries.end();
              it != end && it->offset + it->length <= endAddr; ++it)
           dataInCodeEntries.push_back(
               {static_cast<uint32_t>(isec->getVA(it->offset - beginAddr) -
diff --git a/lld/test/MachO/data-in-code.s b/lld/test/MachO/data-in-code.s
index 49aa7655a84b0c..1a09359cd26767 100644
--- a/lld/test/MachO/data-in-code.s
+++ b/lld/test/MachO/data-in-code.s
@@ -6,7 +6,7 @@
 # RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %t/bar.s -o %t/bar.o
 # RUN: %lld -lSystem %t/foo.o %t/bar.o -o %t/main.exe
 # RUN: llvm-otool -l %t/main.exe > %t/objdump
-# RUN: llvm-objdump --macho --data-in-code %t/main.exe >> %t/objdump
+# RUN: llvm-otool -Gv %t/main.exe >> %t/objdump
 # RUN: FileCheck %s < %t/objdump
 
 # CHECK-LABEL:  sectname __text
@@ -18,12 +18,13 @@
 # CHECK-LABEL:  cmd LC_DATA_IN_CODE
 # CHECK-NEXT:   cmdsize 16
 # CHECK-NEXT:   dataoff
-# CHECK-NEXT:   datasize 16
+# CHECK-NEXT:   datasize 24
 
-# CHECK-LABEL:  Data in code table (2 entries)
+# CHECK-LABEL:  Data in code table (3 entries)
 # CHECK-NEXT:   offset length kind
 # CHECK-NEXT:   [[#%x,TEXT + 28]] 24 JUMP_TABLE32
-# CHECK-NEXT:   [[#%x,TEXT + 68]] 12 JUMP_TABLE32
+# CHECK-NEXT:   [[#%x,TEXT + 68]] 8 JUMP_TABLE32
+# CHECK-NEXT:   [[#%x,TEXT + 84]] 12 JUMP_TABLE32
 
 # RUN: %lld -lSystem %t/foo.o %t/bar.o -no_data_in_code_info -o %t/main.exe
 # RUN: llvm-otool -l %t/main.exe | FileCheck --check-prefix=OMIT %s
@@ -32,11 +33,22 @@
 
 # RUN: %lld -lSystem %t/foo.o %t/bar.o -no_data_in_code_info -data_in_code_info -o %t/main.exe
 # RUN: llvm-otool -l %t/main.exe > %t/objdump
-# RUN: llvm-objdump --macho --data-in-code %t/main.exe >> %t/objdump
+# RUN: llvm-otool -Gv %t/main.exe >> %t/objdump
 # RUN: FileCheck %s < %t/objdump
 
 #--- foo.s
 .text
+.section __TEXT,__StaticInit,regular,pure_instructions
+.p2align 4, 0x90
+_some_init_function:
+retq
+.p2align 2, 0x90
+.data_region jt32
+.long 0
+.long 0
+.end_data_region
+
+.section __TEXT,__text,regular,pure_instructions
 .globl _main
 .p2align 4, 0x90
 _main:

Copy link

github-actions bot commented Feb 14, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@nico nico requested a review from BertalanD February 16, 2024 12:07
Copy link
Member

@BertalanD BertalanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

It's a bit disappointing that the assembler does not guarantee that entries be sorted in an object file when multiple sections are involved, but oh well.

@nico nico merged commit 624ea34 into llvm:main Feb 16, 2024
4 checks passed
@nico nico deleted the lld-crash-fix branch February 16, 2024 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants