Skip to content

Conversation

@int3
Copy link
Contributor

@int3 int3 commented Oct 22, 2025

We typically shouldn't get this, but when we do (e.g. in #139439) we should error out gracefully instead of crashing.

Note that we are stricter than ld64 here; ld64 appears to be able to handle section offsets that point outside literal sections if the end result is a valid pointer to another section in the input object file. Supporting this would probably be a pain given our current design, and it seems like enough of an edge case that it's onot worth it.

@llvmbot
Copy link
Member

llvmbot commented Oct 22, 2025

@llvm/pr-subscribers-lld-macho

Author: Jez Ng (int3)

Changes

We typically shouldn't get this, but when we do (e.g. in #139439) we should error out gracefully instead of crashing.

Note that we are stricter than ld64 here; ld64 appears to be able to handle section offsets that point outside literal sections if the end result is a valid pointer to another section in the input object file. Supporting this would probably be a pain given our current design, and it seems like enough of an edge case that it's onot worth it.


Full diff: https://github.com/llvm/llvm-project/pull/164660.diff

2 Files Affected:

  • (modified) lld/MachO/InputSection.cpp (+3)
  • (added) lld/test/MachO/invalid/bad-offsets.s (+41)
diff --git a/lld/MachO/InputSection.cpp b/lld/MachO/InputSection.cpp
index b173e14cc86a8..2b2d28ef63e2d 100644
--- a/lld/MachO/InputSection.cpp
+++ b/lld/MachO/InputSection.cpp
@@ -348,6 +348,9 @@ WordLiteralInputSection::WordLiteralInputSection(const Section &section,
 }
 
 uint64_t WordLiteralInputSection::getOffset(uint64_t off) const {
+  if (off >= data.size())
+    fatal(toString(this) + ": offset is outside the section");
+
   auto *osec = cast<WordLiteralSection>(parent);
   const uintptr_t buf = reinterpret_cast<uintptr_t>(data.data());
   switch (sectionType(getFlags())) {
diff --git a/lld/test/MachO/invalid/bad-offsets.s b/lld/test/MachO/invalid/bad-offsets.s
new file mode 100644
index 0000000000000..1942765dc7498
--- /dev/null
+++ b/lld/test/MachO/invalid/bad-offsets.s
@@ -0,0 +1,41 @@
+## Test that we properly detect and report out-of-bounds offsets in literal sections.
+## We're intentionally testing fatal errors (for malformed input files), and
+## fatal errors aren't supported for testing when main is run twice.
+# XFAIL: main-run-twice
+
+# REQUIRES: x86
+# RUN: rm -rf %t; split-file %s %t
+
+## Test WordLiteralInputSection bounds checking
+# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %t/word-literal.s -o %t/word-literal.o
+# RUN: not %lld -dylib %t/word-literal.o -o /dev/null 2>&1 | FileCheck %s --check-prefix=WORD
+
+## Test CStringInputSection bounds checking
+# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %t/cstring.s -o %t/cstring.o
+# RUN: not %lld -dylib %t/cstring.o -o /dev/null 2>&1 | FileCheck %s --check-prefix=CSTRING
+
+# WORD: error: {{.*}}word-literal.o:(__literal4): offset is outside the section
+# CSTRING: error: {{.*}}cstring.o:(__cstring): offset is outside the section
+
+#--- word-literal.s
+## Create a 4-byte literal section with a reference that points past the end
+.section __TEXT,__literal4,4byte_literals
+_literal:
+  .word 0x01020304
+
+.text
+.globl _main
+_main:
+  .long _literal + 4
+
+#--- cstring.s
+## Create a cstring section with a reference that points past the end
+.cstring
+_str:
+  .asciz "foo"
+
+.text
+.globl _main
+_main:
+  ## Reference past the null terminator (offset 4 in a 4-byte string including null)
+  .long _str + 4
\ No newline at end of file

@llvmbot
Copy link
Member

llvmbot commented Oct 22, 2025

@llvm/pr-subscribers-lld

Author: Jez Ng (int3)

Changes

We typically shouldn't get this, but when we do (e.g. in #139439) we should error out gracefully instead of crashing.

Note that we are stricter than ld64 here; ld64 appears to be able to handle section offsets that point outside literal sections if the end result is a valid pointer to another section in the input object file. Supporting this would probably be a pain given our current design, and it seems like enough of an edge case that it's onot worth it.


Full diff: https://github.com/llvm/llvm-project/pull/164660.diff

2 Files Affected:

  • (modified) lld/MachO/InputSection.cpp (+3)
  • (added) lld/test/MachO/invalid/bad-offsets.s (+41)
diff --git a/lld/MachO/InputSection.cpp b/lld/MachO/InputSection.cpp
index b173e14cc86a8..2b2d28ef63e2d 100644
--- a/lld/MachO/InputSection.cpp
+++ b/lld/MachO/InputSection.cpp
@@ -348,6 +348,9 @@ WordLiteralInputSection::WordLiteralInputSection(const Section &section,
 }
 
 uint64_t WordLiteralInputSection::getOffset(uint64_t off) const {
+  if (off >= data.size())
+    fatal(toString(this) + ": offset is outside the section");
+
   auto *osec = cast<WordLiteralSection>(parent);
   const uintptr_t buf = reinterpret_cast<uintptr_t>(data.data());
   switch (sectionType(getFlags())) {
diff --git a/lld/test/MachO/invalid/bad-offsets.s b/lld/test/MachO/invalid/bad-offsets.s
new file mode 100644
index 0000000000000..1942765dc7498
--- /dev/null
+++ b/lld/test/MachO/invalid/bad-offsets.s
@@ -0,0 +1,41 @@
+## Test that we properly detect and report out-of-bounds offsets in literal sections.
+## We're intentionally testing fatal errors (for malformed input files), and
+## fatal errors aren't supported for testing when main is run twice.
+# XFAIL: main-run-twice
+
+# REQUIRES: x86
+# RUN: rm -rf %t; split-file %s %t
+
+## Test WordLiteralInputSection bounds checking
+# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %t/word-literal.s -o %t/word-literal.o
+# RUN: not %lld -dylib %t/word-literal.o -o /dev/null 2>&1 | FileCheck %s --check-prefix=WORD
+
+## Test CStringInputSection bounds checking
+# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %t/cstring.s -o %t/cstring.o
+# RUN: not %lld -dylib %t/cstring.o -o /dev/null 2>&1 | FileCheck %s --check-prefix=CSTRING
+
+# WORD: error: {{.*}}word-literal.o:(__literal4): offset is outside the section
+# CSTRING: error: {{.*}}cstring.o:(__cstring): offset is outside the section
+
+#--- word-literal.s
+## Create a 4-byte literal section with a reference that points past the end
+.section __TEXT,__literal4,4byte_literals
+_literal:
+  .word 0x01020304
+
+.text
+.globl _main
+_main:
+  .long _literal + 4
+
+#--- cstring.s
+## Create a cstring section with a reference that points past the end
+.cstring
+_str:
+  .asciz "foo"
+
+.text
+.globl _main
+_main:
+  ## Reference past the null terminator (offset 4 in a 4-byte string including null)
+  .long _str + 4
\ No newline at end of file

@int3 int3 marked this pull request as draft October 22, 2025 17:13
We typically shouldn't get this, but when we do (e.g. in llvm#139439) we should
error out gracefully instead of crashing.

Note that we are stricter than ld64 here; ld64 appears to be able to handle
section offsets that point outside literal sections if the end result is a
valid pointer to another section in the input object file. Supporting this
would probably be a pain given our current design, and it seems like enough of
an edge case that it's onot worth it.
@int3 int3 marked this pull request as ready for review October 22, 2025 17:49
@int3 int3 requested review from ellishg and oontvoo October 22, 2025 20:27
@ellishg
Copy link
Contributor

ellishg commented Oct 24, 2025

I also see ConcatInputSection::getOffset()

uint64_t getOffset(uint64_t off) const override { return outSecOff + off; }

and CStringInputSection::getOffset()

uint64_t CStringInputSection::getOffset(uint64_t off) const {
const StringPiece &piece = getStringPiece(off);
uint64_t addend = off - piece.inSecOff;
return piece.outSecOff + addend;
}

For completeness, can we add this check to those functions too?

@int3
Copy link
Contributor Author

int3 commented Oct 25, 2025

CStringInputSection::getOffset() does this check via getStringPiece(). The test I added in this PR exercises this code path too.

I am reluctant to add the check for ConcatInputSections::getOffset since 1) it is likely a hot code path and 2) it is not technically an error to ask for an offset that's outside the subsection. As mentioned in the top post, ld64 seems to handle it fine even for literal sections. It's only because we look up the section contents as part of WordLiteralInputSection::getOffset that it becomes an issue.

Copy link
Contributor

@ellishg ellishg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants