Skip to content

Conversation

aokblast
Copy link
Contributor

@aokblast aokblast commented Oct 7, 2025

In ELF file, there is a possible extended header for those phnum, shnum,
and shstrndx larger than the maximum of 16 bits. This extended header
use section 0 to record these fields in 32 bits.

We implment this feature so that programs rely on ELFFile::program_headers() can get the
correct number of segments. Also, the consumers don't have to check the
section 0 themselve, insteead, they can use the getPhNum() as an
alternative.

@llvmbot
Copy link
Member

llvmbot commented Oct 7, 2025

@llvm/pr-subscribers-llvm-binary-utilities

@llvm/pr-subscribers-lld

Author: None (aokblast)

Changes

In ELF file, there is a possible extended header for those phnum, shnum, and shstrndx larger than the maximum of 16 bits. This extended header use section 0 to record these fields in 32 bits. For most of the ELF writers like lld, we already have the mechanism to synthesis this special section 0. However, the parser part don't have such infra and therefore we add it.

Also, we modify some test cases. For those expected-error test cases, their error emission get early. For those expected-correct test cases, we modify the output since we support more than 65535 sections now.


Patch is 21.57 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/162288.diff

8 Files Affected:

  • (modified) lld/ELF/SyntheticSections.cpp (-1)
  • (modified) lld/ELF/Writer.cpp (+10-1)
  • (modified) llvm/include/llvm/Object/ELF.h (+43-10)
  • (modified) llvm/include/llvm/Object/ELFTypes.h (+5)
  • (modified) llvm/test/Object/invalid.test (+2-2)
  • (modified) llvm/test/tools/llvm-objcopy/ELF/many-sections.test (+1-1)
  • (modified) llvm/test/tools/llvm-readobj/ELF/file-headers.test (+63-60)
  • (modified) llvm/tools/llvm-readobj/ELFDumper.cpp (+19-13)
diff --git a/lld/ELF/SyntheticSections.cpp b/lld/ELF/SyntheticSections.cpp
index bbf4b29a9fda58..71294782d9a31d 100644
--- a/lld/ELF/SyntheticSections.cpp
+++ b/lld/ELF/SyntheticSections.cpp
@@ -4428,7 +4428,6 @@ void elf::writeEhdr(Ctx &ctx, uint8_t *buf, Partition &part) {
   eHdr->e_version = EV_CURRENT;
   eHdr->e_flags = ctx.arg.eflags;
   eHdr->e_ehsize = sizeof(typename ELFT::Ehdr);
-  eHdr->e_phnum = part.phdrs.size();
   eHdr->e_shentsize = sizeof(typename ELFT::Shdr);
 
   if (!ctx.arg.relocatable) {
diff --git a/lld/ELF/Writer.cpp b/lld/ELF/Writer.cpp
index 4fa80397cbfa74..713d8279aab541 100644
--- a/lld/ELF/Writer.cpp
+++ b/lld/ELF/Writer.cpp
@@ -2907,7 +2907,8 @@ template <class ELFT> void Writer<ELFT>::writeHeader() {
   // the value. The sentinel values and fields are:
   // e_shnum = 0, SHdrs[0].sh_size = number of sections.
   // e_shstrndx = SHN_XINDEX, SHdrs[0].sh_link = .shstrtab section index.
-  auto *sHdrs = reinterpret_cast<Elf_Shdr *>(ctx.bufferStart + eHdr->e_shoff);
+  // e_phnum = 0xFFFF, SHdrs[0]
+  auto *sHdrs = reinterpret_cast<Elf_Shdr *>(ctx.bufferStart + eHdr->e_smhoff);
   size_t num = ctx.outputSections.size() + 1;
   if (num >= SHN_LORESERVE)
     sHdrs->sh_size = num;
@@ -2922,6 +2923,14 @@ template <class ELFT> void Writer<ELFT>::writeHeader() {
     eHdr->e_shstrndx = strTabIndex;
   }
 
+  num = part.phdrs.size();
+  if (num >= 0xFFFF) {
+    eHdr->e_phnum = 0xFFFF;
+    sHdrs->sh_info = num;
+  } else {
+    eHdr->e_phnum = num;
+  }
+
   for (OutputSection *sec : ctx.outputSections)
     sec->writeHeaderTo<ELFT>(++sHdrs);
 }
diff --git a/llvm/include/llvm/Object/ELF.h b/llvm/include/llvm/Object/ELF.h
index 59f63eb6b5bb69..4374924371d1cd 100644
--- a/llvm/include/llvm/Object/ELF.h
+++ b/llvm/include/llvm/Object/ELF.h
@@ -278,9 +278,17 @@ class ELFFile {
   std::vector<Elf_Shdr> FakeSections;
   SmallString<0> FakeSectionStrings;
 
+  // Handle extended header in section 0
+  Elf_Word e_phnum;
+  Elf_Word e_shnum;
+  Elf_Word e_shstrndx;
+
   ELFFile(StringRef Object);
 
 public:
+  const Elf_Word getPhNum() const { return e_phnum; }
+  const Elf_Word getShNum() const { return e_shnum; }
+  const Elf_Word getShStrNdx() const { return e_shstrndx; }
   const Elf_Ehdr &getHeader() const {
     return *reinterpret_cast<const Elf_Ehdr *>(base());
   }
@@ -379,22 +387,21 @@ class ELFFile {
 
   /// Iterate over program header table.
   Expected<Elf_Phdr_Range> program_headers() const {
-    if (getHeader().e_phnum && getHeader().e_phentsize != sizeof(Elf_Phdr))
+    if (e_phnum && getHeader().e_phentsize != sizeof(Elf_Phdr))
       return createError("invalid e_phentsize: " +
                          Twine(getHeader().e_phentsize));
 
-    uint64_t HeadersSize =
-        (uint64_t)getHeader().e_phnum * getHeader().e_phentsize;
+    uint64_t HeadersSize = (uint64_t)e_phnum * getHeader().e_phentsize;
     uint64_t PhOff = getHeader().e_phoff;
     if (PhOff + HeadersSize < PhOff || PhOff + HeadersSize > getBufSize())
       return createError("program headers are longer than binary of size " +
                          Twine(getBufSize()) + ": e_phoff = 0x" +
                          Twine::utohexstr(getHeader().e_phoff) +
-                         ", e_phnum = " + Twine(getHeader().e_phnum) +
+                         ", e_phnum = " + Twine(e_phnum) +
                          ", e_phentsize = " + Twine(getHeader().e_phentsize));
 
     auto *Begin = reinterpret_cast<const Elf_Phdr *>(base() + PhOff);
-    return ArrayRef(Begin, Begin + getHeader().e_phnum);
+    return ArrayRef(Begin, Begin + e_phnum);
   }
 
   /// Get an iterator over notes in a program header.
@@ -772,7 +779,7 @@ template <class ELFT>
 Expected<StringRef>
 ELFFile<ELFT>::getSectionStringTable(Elf_Shdr_Range Sections,
                                      WarningHandler WarnHandler) const {
-  uint32_t Index = getHeader().e_shstrndx;
+  uint32_t Index = e_shstrndx;
   if (Index == ELF::SHN_XINDEX) {
     // If the section name string table section index is greater than
     // or equal to SHN_LORESERVE, then the actual index of the section name
@@ -889,7 +896,12 @@ Expected<uint64_t> ELFFile<ELFT>::getDynSymtabSize() const {
   return 0;
 }
 
-template <class ELFT> ELFFile<ELFT>::ELFFile(StringRef Object) : Buf(Object) {}
+template <class ELFT> ELFFile<ELFT>::ELFFile(StringRef Object) : Buf(Object) {
+  auto Header = getHeader();
+  e_phnum = Header.e_phnum;
+  e_shnum = Header.e_shnum;
+  e_shstrndx = Header.e_shstrndx;
+}
 
 template <class ELFT>
 Expected<ELFFile<ELFT>> ELFFile<ELFT>::create(StringRef Object) {
@@ -897,7 +909,29 @@ Expected<ELFFile<ELFT>> ELFFile<ELFT>::create(StringRef Object) {
     return createError("invalid buffer: the size (" + Twine(Object.size()) +
                        ") is smaller than an ELF header (" +
                        Twine(sizeof(Elf_Ehdr)) + ")");
-  return ELFFile(Object);
+  ELFFile Result(Object);
+
+  //
+  // sections() parse the total number of sections by considering the
+  // extended headers
+  //
+  if (Result.getHeader().HasHeaderExtension()) {
+    auto TableOrErr = Result.sections();
+    if (!TableOrErr)
+      return TableOrErr.takeError();
+    if ((*TableOrErr).size() == 0)
+      return Result;
+    auto SecOrErr = object::getSection<ELFT>(*TableOrErr, 0);
+    if (!SecOrErr)
+      return SecOrErr.takeError();
+    if (Result.e_phnum == 0xFFFF)
+      Result.e_phnum = (*SecOrErr)->sh_info;
+    if (Result.e_shnum == ELF::SHN_UNDEF)
+      Result.e_shnum = (*SecOrErr)->sh_size;
+    if (Result.e_shstrndx == ELF::SHN_XINDEX)
+      Result.e_shstrndx = (*SecOrErr)->sh_link;
+  }
+  return Result;
 }
 
 /// Used by llvm-objdump -d (which needs sections for disassembly) to
@@ -940,7 +974,6 @@ Expected<typename ELFT::ShdrRange> ELFFile<ELFT>::sections() const {
   if (getHeader().e_shentsize != sizeof(Elf_Shdr))
     return createError("invalid e_shentsize in ELF header: " +
                        Twine(getHeader().e_shentsize));
-
   const uint64_t FileSize = Buf.size();
   if (SectionTableOffset + sizeof(Elf_Shdr) > FileSize ||
       SectionTableOffset + (uintX_t)sizeof(Elf_Shdr) < SectionTableOffset)
@@ -956,7 +989,7 @@ Expected<typename ELFT::ShdrRange> ELFFile<ELFT>::sections() const {
   const Elf_Shdr *First =
       reinterpret_cast<const Elf_Shdr *>(base() + SectionTableOffset);
 
-  uintX_t NumSections = getHeader().e_shnum;
+  uintX_t NumSections = e_shnum;
   if (NumSections == 0)
     NumSections = First->sh_size;
 
diff --git a/llvm/include/llvm/Object/ELFTypes.h b/llvm/include/llvm/Object/ELFTypes.h
index 5a26e2fc314586..232f6be9b4c498 100644
--- a/llvm/include/llvm/Object/ELFTypes.h
+++ b/llvm/include/llvm/Object/ELFTypes.h
@@ -529,6 +529,11 @@ struct Elf_Ehdr_Impl {
 
   unsigned char getFileClass() const { return e_ident[ELF::EI_CLASS]; }
   unsigned char getDataEncoding() const { return e_ident[ELF::EI_DATA]; }
+  bool HasHeaderExtension() const {
+    return (e_phnum == 0xFFFF || e_shnum == ELF::SHN_UNDEF ||
+            ELF::SHN_XINDEX == e_phnum) &&
+           e_shoff != 0;
+  }
 };
 
 template <endianness Endianness>
diff --git a/llvm/test/Object/invalid.test b/llvm/test/Object/invalid.test
index 58ec3cbeadd192..2bf23b45cdbb8d 100644
--- a/llvm/test/Object/invalid.test
+++ b/llvm/test/Object/invalid.test
@@ -556,7 +556,7 @@ Sections:
 # RUN: yaml2obj --docnum=25 %s -o %t25
 # RUN: not llvm-readobj -h %t25 2>&1 | FileCheck -DFILE=%t25 --check-prefix=INVALID-SEC-NUM1 %s
 
-# INVALID-SEC-NUM1: error: '[[FILE]]': unable to continue dumping, the file is corrupt: invalid section header table offset (e_shoff = 0x58) or invalid number of sections specified in the first section header's sh_size field (0x3ffffffffffffff)
+# INVALID-SEC-NUM1: error: '[[FILE]]': invalid section header table offset (e_shoff = 0x58) or invalid number of sections specified in the first section header's sh_size field (0x3ffffffffffffff)
 
 --- !ELF
 FileHeader:
@@ -575,7 +575,7 @@ Sections:
 # RUN: yaml2obj --docnum=26 %s -o %t26
 # RUN: not llvm-readobj -h %t26 2>&1 | FileCheck -DFILE=%t26 --check-prefix=INVALID-SEC-NUM2 %s
 
-# INVALID-SEC-NUM2: error: '[[FILE]]': unable to continue dumping, the file is corrupt: invalid number of sections specified in the NULL section's sh_size field (288230376151711744)
+# INVALID-SEC-NUM2: error: '[[FILE]]': invalid number of sections specified in the NULL section's sh_size field (288230376151711744)
 
 --- !ELF
 FileHeader:
diff --git a/llvm/test/tools/llvm-objcopy/ELF/many-sections.test b/llvm/test/tools/llvm-objcopy/ELF/many-sections.test
index 6622db237026fa..8b49454f985785 100644
--- a/llvm/test/tools/llvm-objcopy/ELF/many-sections.test
+++ b/llvm/test/tools/llvm-objcopy/ELF/many-sections.test
@@ -6,7 +6,7 @@ RUN: llvm-readobj --file-headers --sections --symbols %t2 | FileCheck %s
 RUN: llvm-readelf --symbols %t2 | FileCheck --check-prefix=SYMS %s
 
 ## The ELF header should have e_shnum == 0 and e_shstrndx == SHN_XINDEX.
-# CHECK:        SectionHeaderCount: 0
+# CHECK:        SectionHeaderCount: 0 (65540)
 # CHECK-NEXT:   StringTableSectionIndex: 65535
 
 ## The first section header should store the real section header count and
diff --git a/llvm/test/tools/llvm-readobj/ELF/file-headers.test b/llvm/test/tools/llvm-readobj/ELF/file-headers.test
index 97ab9f092b2287..d2fbed1b756564 100644
--- a/llvm/test/tools/llvm-readobj/ELF/file-headers.test
+++ b/llvm/test/tools/llvm-readobj/ELF/file-headers.test
@@ -143,64 +143,67 @@ FileHeader:
 # RUN: yaml2obj %s --docnum=4 -o %t.invalid1
 # RUN: not llvm-readobj --file-headers %t.invalid1 2>&1 \
 # RUN:  | FileCheck %s --implicit-check-not=warning: -DFILE=%t.invalid1 \
-# RUN:    -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-LLVM
+# RUN:    -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-LLVM-TC1
 # RUN: not llvm-readelf --file-headers %t.invalid1 2>&1 \
 # RUN:  | FileCheck %s --implicit-check-not=warning: -DFILE=%t.invalid1 \
-# RUN:    -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-GNU
-
-# INVALID-LLVM:      File: [[FILE]]
-# INVALID-LLVM-NEXT: Format: elf64-unknown
-# INVALID-LLVM-NEXT: Arch: unknown
-# INVALID-LLVM-NEXT: AddressSize: 64bit
-# INVALID-LLVM-NEXT: LoadName: <Not found>
-# INVALID-LLVM-NEXT: ElfHeader {
-# INVALID-LLVM-NEXT:   Ident {
-# INVALID-LLVM-NEXT:     Magic: (7F 45 4C 46)
-# INVALID-LLVM-NEXT:     Class: 64-bit (0x2)
-# INVALID-LLVM-NEXT:     DataEncoding: LittleEndian (0x1)
-# INVALID-LLVM-NEXT:     FileVersion: 1
-# INVALID-LLVM-NEXT:     OS/ABI: SystemV (0x0)
-# INVALID-LLVM-NEXT:     ABIVersion: 0
-# INVALID-LLVM-NEXT:     Unused: (00 00 00 00 00 00 00)
-# INVALID-LLVM-NEXT:   }
-# INVALID-LLVM-NEXT:   Type: Relocatable (0x1)
-# INVALID-LLVM-NEXT:   Machine: EM_NONE (0x0)
-# INVALID-LLVM-NEXT:   Version: 1
-# INVALID-LLVM-NEXT:   Entry: 0x0
-# INVALID-LLVM-NEXT:   ProgramHeaderOffset: 0x0
-# INVALID-LLVM-NEXT:   SectionHeaderOffset: 0x1000
-# INVALID-LLVM-NEXT:   Flags [ (0x0)
-# INVALID-LLVM-NEXT:   ]
-# INVALID-LLVM-NEXT:   HeaderSize: 64
-# INVALID-LLVM-NEXT:   ProgramHeaderEntrySize: 0
-# INVALID-LLVM-NEXT:   ProgramHeaderCount: 0
-# INVALID-LLVM-NEXT:   SectionHeaderEntrySize: 64
-# INVALID-LLVM-NEXT:   SectionHeaderCount: [[SECHDRCOUNT]]
-# INVALID-LLVM-NEXT:   StringTableSectionIndex: [[SECHDRSTRTABINDEX]]
-# INVALID-LLVM-NEXT: }
-# INVALID-LLVM-NEXT: error: '[[FILE]]': unable to continue dumping, the file is corrupt: section header table goes past the end of the file: e_shoff = 0x1000
-
-# INVALID-GNU:      ELF Header:
-# INVALID-GNU-NEXT:   Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
-# INVALID-GNU-NEXT:   Class:                             ELF64
-# INVALID-GNU-NEXT:   Data:                              2's complement, little endian
-# INVALID-GNU-NEXT:   Version:                           1 (current)
-# INVALID-GNU-NEXT:   OS/ABI:                            UNIX - System V
-# INVALID-GNU-NEXT:   ABI Version:                       0
-# INVALID-GNU-NEXT:   Type:                              REL (Relocatable file)
-# INVALID-GNU-NEXT:   Machine:                           None
-# INVALID-GNU-NEXT:   Version:                           0x1
-# INVALID-GNU-NEXT:   Entry point address:               0x0
-# INVALID-GNU-NEXT:   Start of program headers:          0 (bytes into file)
-# INVALID-GNU-NEXT:   Start of section headers:          4096 (bytes into file)
-# INVALID-GNU-NEXT:   Flags:                             0x0
-# INVALID-GNU-NEXT:   Size of this header:               64 (bytes)
-# INVALID-GNU-NEXT:   Size of program headers:           0 (bytes)
-# INVALID-GNU-NEXT:   Number of program headers:         0
-# INVALID-GNU-NEXT:   Size of section headers:           64 (bytes)
-# INVALID-GNU-NEXT:   Number of section headers:         [[SECHDRCOUNT]]
-# INVALID-GNU-NEXT:   Section header string table index: [[SECHDRSTRTABINDEX]]
-# INVALID-GNU-NEXT: error: '[[FILE]]': unable to continue dumping, the file is corrupt: section header table goes past the end of the file: e_shoff = 0x1000
+# RUN:    -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-GNU-TC1
+
+# INVALID-LLVM-TC1:      File: [[FILE]]
+# INVALID-LLVM-TC1-NEXT: Format: elf64-unknown
+# INVALID-LLVM-TC1-NEXT: Arch: unknown
+# INVALID-LLVM-TC1-NEXT: AddressSize: 64bit
+# INVALID-LLVM-TC1-NEXT: LoadName: <Not found>
+# INVALID-LLVM-TC1-NEXT: ElfHeader {
+# INVALID-LLVM-TC1-NEXT:   Ident {
+# INVALID-LLVM-TC1-NEXT:     Magic: (7F 45 4C 46)
+# INVALID-LLVM-TC1-NEXT:     Class: 64-bit (0x2)
+# INVALID-LLVM-TC1-NEXT:     DataEncoding: LittleEndian (0x1)
+# INVALID-LLVM-TC1-NEXT:     FileVersion: 1
+# INVALID-LLVM-TC1-NEXT:     OS/ABI: SystemV (0x0)
+# INVALID-LLVM-TC1-NEXT:     ABIVersion: 0
+# INVALID-LLVM-TC1-NEXT:     Unused: (00 00 00 00 00 00 00)
+# INVALID-LLVM-TC1-NEXT:   }
+# INVALID-LLVM-TC1-NEXT:   Type: Relocatable (0x1)
+# INVALID-LLVM-TC1-NEXT:   Machine: EM_NONE (0x0)
+# INVALID-LLVM-TC1-NEXT:   Version: 1
+# INVALID-LLVM-TC1-NEXT:   Entry: 0x0
+# INVALID-LLVM-TC1-NEXT:   ProgramHeaderOffset: 0x0
+# INVALID-LLVM-TC1-NEXT:   SectionHeaderOffset: 0x1000
+# INVALID-LLVM-TC1-NEXT:   Flags [ (0x0)
+# INVALID-LLVM-TC1-NEXT:   ]
+# INVALID-LLVM-TC1-NEXT:   HeaderSize: 64
+# INVALID-LLVM-TC1-NEXT:   ProgramHeaderEntrySize: 0
+# INVALID-LLVM-TC1-NEXT:   ProgramHeaderCount: 0
+# INVALID-LLVM-TC1-NEXT:   SectionHeaderEntrySize: 64
+# INVALID-LLVM-TC1-NEXT:   SectionHeaderCount: [[SECHDRCOUNT]]
+# INVALID-LLVM-TC1-NEXT:   StringTableSectionIndex: [[SECHDRSTRTABINDEX]]
+# INVALID-LLVM-TC1-NEXT: }
+# INVALID-LLVM-TC1-NEXT: error: '[[FILE]]': unable to continue dumping, the file is corrupt: section header table goes past the end of the file: e_shoff = 0x1000
+
+# INVALID-GNU-TC1:      ELF Header:
+# INVALID-GNU-TC1-NEXT:   Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
+# INVALID-GNU-TC1-NEXT:   Class:                             ELF64
+# INVALID-GNU-TC1-NEXT:   Data:                              2's complement, little endian
+# INVALID-GNU-TC1-NEXT:   Version:                           1 (current)
+# INVALID-GNU-TC1-NEXT:   OS/ABI:                            UNIX - System V
+# INVALID-GNU-TC1-NEXT:   ABI Version:                       0
+# INVALID-GNU-TC1-NEXT:   Type:                              REL (Relocatable file)
+# INVALID-GNU-TC1-NEXT:   Machine:                           None
+# INVALID-GNU-TC1-NEXT:   Version:                           0x1
+# INVALID-GNU-TC1-NEXT:   Entry point address:               0x0
+# INVALID-GNU-TC1-NEXT:   Start of program headers:          0 (bytes into file)
+# INVALID-GNU-TC1-NEXT:   Start of section headers:          4096 (bytes into file)
+# INVALID-GNU-TC1-NEXT:   Flags:                             0x0
+# INVALID-GNU-TC1-NEXT:   Size of this header:               64 (bytes)
+# INVALID-GNU-TC1-NEXT:   Size of program headers:           0 (bytes)
+# INVALID-GNU-TC1-NEXT:   Number of program headers:         0
+# INVALID-GNU-TC1-NEXT:   Size of section headers:           64 (bytes)
+# INVALID-GNU-TC1-NEXT:   Number of section headers:         [[SECHDRCOUNT]]
+# INVALID-GNU-TC1-NEXT:   Section header string table index: [[SECHDRSTRTABINDEX]]
+# INVALID-GNU-TC1-NEXT: error: '[[FILE]]': unable to continue dumping, the file is corrupt: section header table goes past the end of the file: e_shoff = 0x1000
+
+# INVALID-LLVM-TC2: error: '[[FILE]]': section header table goes past the end of the file: e_shoff = 0x1000
+# INVALID-GNU-TC2: error: '[[FILE]]': section header table goes past the end of the file: e_shoff = 0x1000
 
 --- !ELF
 FileHeader:
@@ -222,14 +225,14 @@ Sections:
 ## Check we don't dump anything except the file header when the section header table can't be read.
 
 # RUN: not llvm-readobj -a %t.invalid1 2>&1 \
-# RUN:  | FileCheck %s -DFILE=%t.invalid1 -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-LLVM
+# RUN:  | FileCheck %s -DFILE=%t.invalid1 -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-LLVM-TC1
 # RUN: not llvm-readelf -a %t.invalid1 2>&1 \
-# RUN:  | FileCheck %s -DFILE=%t.invalid1 -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-GNU
+# RUN:  | FileCheck %s -DFILE=%t.invalid1 -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-GNU-TC1
 
 ## Check what we print when e_shnum == 0, e_shstrndx == SHN_XINDEX and the section header table can't be read.
 
 # RUN: yaml2obj %s -DSHNUM=0 -DSHSTRNDX=0xffff --docnum=4 -o %t.invalid2
 # RUN: not llvm-readobj --file-headers %t.invalid2 2>&1 \
-# RUN:  | FileCheck %s -DFILE=%t.invalid2 -DSECHDRCOUNT="<?>" -DSECHDRSTRTABINDEX="<?>" --check-prefix=INVALID-LLVM
+# RUN:  | FileCheck %s -DFILE=%t.invalid2 -DSECHDRCOUNT="<?>" -DSECHDRSTRTABINDEX="<?>" --check-prefix=INVALID-LLVM-TC2
 # RUN: not llvm-readelf --file-headers %t.invalid2 2>&1 \
-# RUN:  | FileCheck %s -DFILE=%t.invalid2 -DSECHDRCOUNT="<?>" -DSECHDRSTRTABINDEX="<?>" --check-prefix=INVALID-GNU
+# RUN:  | FileCheck %s -DFILE=%t.invalid2 -DSECHDRCOUNT="<?>" -DSECHDRSTRTABINDEX="<?>" --check-prefix=INVALID-GNU-TC2
diff --git a/llvm/tools/llvm-readobj/ELFDumper.cpp b/llvm/tools/llvm-readobj/ELFDumper.cpp
index ab93316907cc6f..1cfa138d7a7ea7 100644
--- a/llvm/tools/llvm-readobj/ELFDumper.cpp
+++ b/llvm/tools/llvm-readobj/ELFDumper.cpp
@@ -3575,9 +3575,16 @@ static inline void printFields(formatted_raw_ostream &OS, StringRef Str1,
 template <class ELFT>
 static std::string getSectionHeadersNumString(const ELFFile<ELFT> &Obj,
                                               StringRef FileName) {
-  const typename ELFT::Ehdr &ElfHeader = Obj.getHeader();
-  if (ElfHeader.e_shnum != 0)
-    return to_string(ElfHeader.e_shnum);
+  if (Obj.getHeader().e_shnum != 0) {
+    std::string Result;
+    if (Obj.getHeader().e_shnum != Obj.getShNum())
+      raw_string_ostream(Result)
+          << format("%x (%x)", static_cast<int>(Obj.getHeader().e_shnum),
+                    static_cast<int>(Obj.getShNum()));
+    else
+      raw_string_ostream(Result) << Obj.getHeader().e_shnum;
+    return Result;
+  }
 
   Expected<ArrayRef<typename ELFT::Shdr>> ArrOrErr = Obj.sections();
   if (!ArrOrErr) {
@@ -3595,9 +3602,10 @@ static std::string getSectionHeadersNumString(const ELFFile<ELFT> &Obj,
 template <class ELFT>
 static std::string getSectionHeaderTableIndexString(const ELFFile<ELFT> &Obj,
                                                     StringRef FileName) {
-  const typename ELFT::Ehdr &ElfHeader = Obj.getHeader();
-  if (ElfHeader.e_shstrndx != SHN_XINDEX)
-    return to_string(ElfHeader.e_shstrndx);
+  auto strndx = Obj.getHeader().e_shstrndx;
+
+  if (strndx != SHN_XINDEX)
+    return to_string(strndx);
 
   Expected<ArrayRef<typename ELFT::Shdr>> ArrOrErr = Obj.sections();
   if (!ArrOrErr) {
@@ -3609,8 +3617,7 @@ static std::string getSectionHeaderTableIndexString(const ELFFile<ELFT> &Obj,
 
   if (ArrOrErr->empty())
     return "65535 (corrupt: out of range)";
-  return to_string(ElfHeader.e_shstrndx) + " (" +
-         to_string((*ArrOrErr)[0].sh_link) + ")";
+  return to_string(strndx) + " (" + to_string(Obj.getShStrNdx()) + ")";
 }
 
 static const EnumEntry<unsigned> *getObjectFileEnumEntry(unsigned Type) {
@@ -3765,7 +3772,7 @@ template <class ELFT> void GNUELFDumper<ELFT>::printFileHeaders() {
   printFields(OS, "Size of this header:", Str);
   Str = to_string(e.e_phentsize) + " (bytes)";
   printFields(OS, "Size of program headers:", Str);
- ...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Oct 7, 2025

@llvm/pr-subscribers-lld-elf

Author: None (aokblast)

Changes

In ELF file, there is a possible extended header for those phnum, shnum, and shstrndx larger than the maximum of 16 bits. This extended header use section 0 to record these fields in 32 bits. For most of the ELF writers like lld, we already have the mechanism to synthesis this special section 0. However, the parser part don't have such infra and therefore we add it.

Also, we modify some test cases. For those expected-error test cases, their error emission get early. For those expected-correct test cases, we modify the output since we support more than 65535 sections now.


Patch is 21.56 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/162288.diff

8 Files Affected:

  • (modified) lld/ELF/SyntheticSections.cpp (-1)
  • (modified) lld/ELF/Writer.cpp (+10-1)
  • (modified) llvm/include/llvm/Object/ELF.h (+43-10)
  • (modified) llvm/include/llvm/Object/ELFTypes.h (+5)
  • (modified) llvm/test/Object/invalid.test (+2-2)
  • (modified) llvm/test/tools/llvm-objcopy/ELF/many-sections.test (+1-1)
  • (modified) llvm/test/tools/llvm-readobj/ELF/file-headers.test (+63-60)
  • (modified) llvm/tools/llvm-readobj/ELFDumper.cpp (+19-13)
diff --git a/lld/ELF/SyntheticSections.cpp b/lld/ELF/SyntheticSections.cpp
index bbf4b29a9fda5..71294782d9a31 100644
--- a/lld/ELF/SyntheticSections.cpp
+++ b/lld/ELF/SyntheticSections.cpp
@@ -4428,7 +4428,6 @@ void elf::writeEhdr(Ctx &ctx, uint8_t *buf, Partition &part) {
   eHdr->e_version = EV_CURRENT;
   eHdr->e_flags = ctx.arg.eflags;
   eHdr->e_ehsize = sizeof(typename ELFT::Ehdr);
-  eHdr->e_phnum = part.phdrs.size();
   eHdr->e_shentsize = sizeof(typename ELFT::Shdr);
 
   if (!ctx.arg.relocatable) {
diff --git a/lld/ELF/Writer.cpp b/lld/ELF/Writer.cpp
index 4fa80397cbfa7..713d8279aab54 100644
--- a/lld/ELF/Writer.cpp
+++ b/lld/ELF/Writer.cpp
@@ -2907,7 +2907,8 @@ template <class ELFT> void Writer<ELFT>::writeHeader() {
   // the value. The sentinel values and fields are:
   // e_shnum = 0, SHdrs[0].sh_size = number of sections.
   // e_shstrndx = SHN_XINDEX, SHdrs[0].sh_link = .shstrtab section index.
-  auto *sHdrs = reinterpret_cast<Elf_Shdr *>(ctx.bufferStart + eHdr->e_shoff);
+  // e_phnum = 0xFFFF, SHdrs[0]
+  auto *sHdrs = reinterpret_cast<Elf_Shdr *>(ctx.bufferStart + eHdr->e_smhoff);
   size_t num = ctx.outputSections.size() + 1;
   if (num >= SHN_LORESERVE)
     sHdrs->sh_size = num;
@@ -2922,6 +2923,14 @@ template <class ELFT> void Writer<ELFT>::writeHeader() {
     eHdr->e_shstrndx = strTabIndex;
   }
 
+  num = part.phdrs.size();
+  if (num >= 0xFFFF) {
+    eHdr->e_phnum = 0xFFFF;
+    sHdrs->sh_info = num;
+  } else {
+    eHdr->e_phnum = num;
+  }
+
   for (OutputSection *sec : ctx.outputSections)
     sec->writeHeaderTo<ELFT>(++sHdrs);
 }
diff --git a/llvm/include/llvm/Object/ELF.h b/llvm/include/llvm/Object/ELF.h
index 59f63eb6b5bb6..4374924371d1c 100644
--- a/llvm/include/llvm/Object/ELF.h
+++ b/llvm/include/llvm/Object/ELF.h
@@ -278,9 +278,17 @@ class ELFFile {
   std::vector<Elf_Shdr> FakeSections;
   SmallString<0> FakeSectionStrings;
 
+  // Handle extended header in section 0
+  Elf_Word e_phnum;
+  Elf_Word e_shnum;
+  Elf_Word e_shstrndx;
+
   ELFFile(StringRef Object);
 
 public:
+  const Elf_Word getPhNum() const { return e_phnum; }
+  const Elf_Word getShNum() const { return e_shnum; }
+  const Elf_Word getShStrNdx() const { return e_shstrndx; }
   const Elf_Ehdr &getHeader() const {
     return *reinterpret_cast<const Elf_Ehdr *>(base());
   }
@@ -379,22 +387,21 @@ class ELFFile {
 
   /// Iterate over program header table.
   Expected<Elf_Phdr_Range> program_headers() const {
-    if (getHeader().e_phnum && getHeader().e_phentsize != sizeof(Elf_Phdr))
+    if (e_phnum && getHeader().e_phentsize != sizeof(Elf_Phdr))
       return createError("invalid e_phentsize: " +
                          Twine(getHeader().e_phentsize));
 
-    uint64_t HeadersSize =
-        (uint64_t)getHeader().e_phnum * getHeader().e_phentsize;
+    uint64_t HeadersSize = (uint64_t)e_phnum * getHeader().e_phentsize;
     uint64_t PhOff = getHeader().e_phoff;
     if (PhOff + HeadersSize < PhOff || PhOff + HeadersSize > getBufSize())
       return createError("program headers are longer than binary of size " +
                          Twine(getBufSize()) + ": e_phoff = 0x" +
                          Twine::utohexstr(getHeader().e_phoff) +
-                         ", e_phnum = " + Twine(getHeader().e_phnum) +
+                         ", e_phnum = " + Twine(e_phnum) +
                          ", e_phentsize = " + Twine(getHeader().e_phentsize));
 
     auto *Begin = reinterpret_cast<const Elf_Phdr *>(base() + PhOff);
-    return ArrayRef(Begin, Begin + getHeader().e_phnum);
+    return ArrayRef(Begin, Begin + e_phnum);
   }
 
   /// Get an iterator over notes in a program header.
@@ -772,7 +779,7 @@ template <class ELFT>
 Expected<StringRef>
 ELFFile<ELFT>::getSectionStringTable(Elf_Shdr_Range Sections,
                                      WarningHandler WarnHandler) const {
-  uint32_t Index = getHeader().e_shstrndx;
+  uint32_t Index = e_shstrndx;
   if (Index == ELF::SHN_XINDEX) {
     // If the section name string table section index is greater than
     // or equal to SHN_LORESERVE, then the actual index of the section name
@@ -889,7 +896,12 @@ Expected<uint64_t> ELFFile<ELFT>::getDynSymtabSize() const {
   return 0;
 }
 
-template <class ELFT> ELFFile<ELFT>::ELFFile(StringRef Object) : Buf(Object) {}
+template <class ELFT> ELFFile<ELFT>::ELFFile(StringRef Object) : Buf(Object) {
+  auto Header = getHeader();
+  e_phnum = Header.e_phnum;
+  e_shnum = Header.e_shnum;
+  e_shstrndx = Header.e_shstrndx;
+}
 
 template <class ELFT>
 Expected<ELFFile<ELFT>> ELFFile<ELFT>::create(StringRef Object) {
@@ -897,7 +909,29 @@ Expected<ELFFile<ELFT>> ELFFile<ELFT>::create(StringRef Object) {
     return createError("invalid buffer: the size (" + Twine(Object.size()) +
                        ") is smaller than an ELF header (" +
                        Twine(sizeof(Elf_Ehdr)) + ")");
-  return ELFFile(Object);
+  ELFFile Result(Object);
+
+  //
+  // sections() parse the total number of sections by considering the
+  // extended headers
+  //
+  if (Result.getHeader().HasHeaderExtension()) {
+    auto TableOrErr = Result.sections();
+    if (!TableOrErr)
+      return TableOrErr.takeError();
+    if ((*TableOrErr).size() == 0)
+      return Result;
+    auto SecOrErr = object::getSection<ELFT>(*TableOrErr, 0);
+    if (!SecOrErr)
+      return SecOrErr.takeError();
+    if (Result.e_phnum == 0xFFFF)
+      Result.e_phnum = (*SecOrErr)->sh_info;
+    if (Result.e_shnum == ELF::SHN_UNDEF)
+      Result.e_shnum = (*SecOrErr)->sh_size;
+    if (Result.e_shstrndx == ELF::SHN_XINDEX)
+      Result.e_shstrndx = (*SecOrErr)->sh_link;
+  }
+  return Result;
 }
 
 /// Used by llvm-objdump -d (which needs sections for disassembly) to
@@ -940,7 +974,6 @@ Expected<typename ELFT::ShdrRange> ELFFile<ELFT>::sections() const {
   if (getHeader().e_shentsize != sizeof(Elf_Shdr))
     return createError("invalid e_shentsize in ELF header: " +
                        Twine(getHeader().e_shentsize));
-
   const uint64_t FileSize = Buf.size();
   if (SectionTableOffset + sizeof(Elf_Shdr) > FileSize ||
       SectionTableOffset + (uintX_t)sizeof(Elf_Shdr) < SectionTableOffset)
@@ -956,7 +989,7 @@ Expected<typename ELFT::ShdrRange> ELFFile<ELFT>::sections() const {
   const Elf_Shdr *First =
       reinterpret_cast<const Elf_Shdr *>(base() + SectionTableOffset);
 
-  uintX_t NumSections = getHeader().e_shnum;
+  uintX_t NumSections = e_shnum;
   if (NumSections == 0)
     NumSections = First->sh_size;
 
diff --git a/llvm/include/llvm/Object/ELFTypes.h b/llvm/include/llvm/Object/ELFTypes.h
index 5a26e2fc31458..232f6be9b4c49 100644
--- a/llvm/include/llvm/Object/ELFTypes.h
+++ b/llvm/include/llvm/Object/ELFTypes.h
@@ -529,6 +529,11 @@ struct Elf_Ehdr_Impl {
 
   unsigned char getFileClass() const { return e_ident[ELF::EI_CLASS]; }
   unsigned char getDataEncoding() const { return e_ident[ELF::EI_DATA]; }
+  bool HasHeaderExtension() const {
+    return (e_phnum == 0xFFFF || e_shnum == ELF::SHN_UNDEF ||
+            ELF::SHN_XINDEX == e_phnum) &&
+           e_shoff != 0;
+  }
 };
 
 template <endianness Endianness>
diff --git a/llvm/test/Object/invalid.test b/llvm/test/Object/invalid.test
index 58ec3cbeadd19..2bf23b45cdbb8 100644
--- a/llvm/test/Object/invalid.test
+++ b/llvm/test/Object/invalid.test
@@ -556,7 +556,7 @@ Sections:
 # RUN: yaml2obj --docnum=25 %s -o %t25
 # RUN: not llvm-readobj -h %t25 2>&1 | FileCheck -DFILE=%t25 --check-prefix=INVALID-SEC-NUM1 %s
 
-# INVALID-SEC-NUM1: error: '[[FILE]]': unable to continue dumping, the file is corrupt: invalid section header table offset (e_shoff = 0x58) or invalid number of sections specified in the first section header's sh_size field (0x3ffffffffffffff)
+# INVALID-SEC-NUM1: error: '[[FILE]]': invalid section header table offset (e_shoff = 0x58) or invalid number of sections specified in the first section header's sh_size field (0x3ffffffffffffff)
 
 --- !ELF
 FileHeader:
@@ -575,7 +575,7 @@ Sections:
 # RUN: yaml2obj --docnum=26 %s -o %t26
 # RUN: not llvm-readobj -h %t26 2>&1 | FileCheck -DFILE=%t26 --check-prefix=INVALID-SEC-NUM2 %s
 
-# INVALID-SEC-NUM2: error: '[[FILE]]': unable to continue dumping, the file is corrupt: invalid number of sections specified in the NULL section's sh_size field (288230376151711744)
+# INVALID-SEC-NUM2: error: '[[FILE]]': invalid number of sections specified in the NULL section's sh_size field (288230376151711744)
 
 --- !ELF
 FileHeader:
diff --git a/llvm/test/tools/llvm-objcopy/ELF/many-sections.test b/llvm/test/tools/llvm-objcopy/ELF/many-sections.test
index 6622db237026f..8b49454f98578 100644
--- a/llvm/test/tools/llvm-objcopy/ELF/many-sections.test
+++ b/llvm/test/tools/llvm-objcopy/ELF/many-sections.test
@@ -6,7 +6,7 @@ RUN: llvm-readobj --file-headers --sections --symbols %t2 | FileCheck %s
 RUN: llvm-readelf --symbols %t2 | FileCheck --check-prefix=SYMS %s
 
 ## The ELF header should have e_shnum == 0 and e_shstrndx == SHN_XINDEX.
-# CHECK:        SectionHeaderCount: 0
+# CHECK:        SectionHeaderCount: 0 (65540)
 # CHECK-NEXT:   StringTableSectionIndex: 65535
 
 ## The first section header should store the real section header count and
diff --git a/llvm/test/tools/llvm-readobj/ELF/file-headers.test b/llvm/test/tools/llvm-readobj/ELF/file-headers.test
index 97ab9f092b228..d2fbed1b75656 100644
--- a/llvm/test/tools/llvm-readobj/ELF/file-headers.test
+++ b/llvm/test/tools/llvm-readobj/ELF/file-headers.test
@@ -143,64 +143,67 @@ FileHeader:
 # RUN: yaml2obj %s --docnum=4 -o %t.invalid1
 # RUN: not llvm-readobj --file-headers %t.invalid1 2>&1 \
 # RUN:  | FileCheck %s --implicit-check-not=warning: -DFILE=%t.invalid1 \
-# RUN:    -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-LLVM
+# RUN:    -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-LLVM-TC1
 # RUN: not llvm-readelf --file-headers %t.invalid1 2>&1 \
 # RUN:  | FileCheck %s --implicit-check-not=warning: -DFILE=%t.invalid1 \
-# RUN:    -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-GNU
-
-# INVALID-LLVM:      File: [[FILE]]
-# INVALID-LLVM-NEXT: Format: elf64-unknown
-# INVALID-LLVM-NEXT: Arch: unknown
-# INVALID-LLVM-NEXT: AddressSize: 64bit
-# INVALID-LLVM-NEXT: LoadName: <Not found>
-# INVALID-LLVM-NEXT: ElfHeader {
-# INVALID-LLVM-NEXT:   Ident {
-# INVALID-LLVM-NEXT:     Magic: (7F 45 4C 46)
-# INVALID-LLVM-NEXT:     Class: 64-bit (0x2)
-# INVALID-LLVM-NEXT:     DataEncoding: LittleEndian (0x1)
-# INVALID-LLVM-NEXT:     FileVersion: 1
-# INVALID-LLVM-NEXT:     OS/ABI: SystemV (0x0)
-# INVALID-LLVM-NEXT:     ABIVersion: 0
-# INVALID-LLVM-NEXT:     Unused: (00 00 00 00 00 00 00)
-# INVALID-LLVM-NEXT:   }
-# INVALID-LLVM-NEXT:   Type: Relocatable (0x1)
-# INVALID-LLVM-NEXT:   Machine: EM_NONE (0x0)
-# INVALID-LLVM-NEXT:   Version: 1
-# INVALID-LLVM-NEXT:   Entry: 0x0
-# INVALID-LLVM-NEXT:   ProgramHeaderOffset: 0x0
-# INVALID-LLVM-NEXT:   SectionHeaderOffset: 0x1000
-# INVALID-LLVM-NEXT:   Flags [ (0x0)
-# INVALID-LLVM-NEXT:   ]
-# INVALID-LLVM-NEXT:   HeaderSize: 64
-# INVALID-LLVM-NEXT:   ProgramHeaderEntrySize: 0
-# INVALID-LLVM-NEXT:   ProgramHeaderCount: 0
-# INVALID-LLVM-NEXT:   SectionHeaderEntrySize: 64
-# INVALID-LLVM-NEXT:   SectionHeaderCount: [[SECHDRCOUNT]]
-# INVALID-LLVM-NEXT:   StringTableSectionIndex: [[SECHDRSTRTABINDEX]]
-# INVALID-LLVM-NEXT: }
-# INVALID-LLVM-NEXT: error: '[[FILE]]': unable to continue dumping, the file is corrupt: section header table goes past the end of the file: e_shoff = 0x1000
-
-# INVALID-GNU:      ELF Header:
-# INVALID-GNU-NEXT:   Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
-# INVALID-GNU-NEXT:   Class:                             ELF64
-# INVALID-GNU-NEXT:   Data:                              2's complement, little endian
-# INVALID-GNU-NEXT:   Version:                           1 (current)
-# INVALID-GNU-NEXT:   OS/ABI:                            UNIX - System V
-# INVALID-GNU-NEXT:   ABI Version:                       0
-# INVALID-GNU-NEXT:   Type:                              REL (Relocatable file)
-# INVALID-GNU-NEXT:   Machine:                           None
-# INVALID-GNU-NEXT:   Version:                           0x1
-# INVALID-GNU-NEXT:   Entry point address:               0x0
-# INVALID-GNU-NEXT:   Start of program headers:          0 (bytes into file)
-# INVALID-GNU-NEXT:   Start of section headers:          4096 (bytes into file)
-# INVALID-GNU-NEXT:   Flags:                             0x0
-# INVALID-GNU-NEXT:   Size of this header:               64 (bytes)
-# INVALID-GNU-NEXT:   Size of program headers:           0 (bytes)
-# INVALID-GNU-NEXT:   Number of program headers:         0
-# INVALID-GNU-NEXT:   Size of section headers:           64 (bytes)
-# INVALID-GNU-NEXT:   Number of section headers:         [[SECHDRCOUNT]]
-# INVALID-GNU-NEXT:   Section header string table index: [[SECHDRSTRTABINDEX]]
-# INVALID-GNU-NEXT: error: '[[FILE]]': unable to continue dumping, the file is corrupt: section header table goes past the end of the file: e_shoff = 0x1000
+# RUN:    -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-GNU-TC1
+
+# INVALID-LLVM-TC1:      File: [[FILE]]
+# INVALID-LLVM-TC1-NEXT: Format: elf64-unknown
+# INVALID-LLVM-TC1-NEXT: Arch: unknown
+# INVALID-LLVM-TC1-NEXT: AddressSize: 64bit
+# INVALID-LLVM-TC1-NEXT: LoadName: <Not found>
+# INVALID-LLVM-TC1-NEXT: ElfHeader {
+# INVALID-LLVM-TC1-NEXT:   Ident {
+# INVALID-LLVM-TC1-NEXT:     Magic: (7F 45 4C 46)
+# INVALID-LLVM-TC1-NEXT:     Class: 64-bit (0x2)
+# INVALID-LLVM-TC1-NEXT:     DataEncoding: LittleEndian (0x1)
+# INVALID-LLVM-TC1-NEXT:     FileVersion: 1
+# INVALID-LLVM-TC1-NEXT:     OS/ABI: SystemV (0x0)
+# INVALID-LLVM-TC1-NEXT:     ABIVersion: 0
+# INVALID-LLVM-TC1-NEXT:     Unused: (00 00 00 00 00 00 00)
+# INVALID-LLVM-TC1-NEXT:   }
+# INVALID-LLVM-TC1-NEXT:   Type: Relocatable (0x1)
+# INVALID-LLVM-TC1-NEXT:   Machine: EM_NONE (0x0)
+# INVALID-LLVM-TC1-NEXT:   Version: 1
+# INVALID-LLVM-TC1-NEXT:   Entry: 0x0
+# INVALID-LLVM-TC1-NEXT:   ProgramHeaderOffset: 0x0
+# INVALID-LLVM-TC1-NEXT:   SectionHeaderOffset: 0x1000
+# INVALID-LLVM-TC1-NEXT:   Flags [ (0x0)
+# INVALID-LLVM-TC1-NEXT:   ]
+# INVALID-LLVM-TC1-NEXT:   HeaderSize: 64
+# INVALID-LLVM-TC1-NEXT:   ProgramHeaderEntrySize: 0
+# INVALID-LLVM-TC1-NEXT:   ProgramHeaderCount: 0
+# INVALID-LLVM-TC1-NEXT:   SectionHeaderEntrySize: 64
+# INVALID-LLVM-TC1-NEXT:   SectionHeaderCount: [[SECHDRCOUNT]]
+# INVALID-LLVM-TC1-NEXT:   StringTableSectionIndex: [[SECHDRSTRTABINDEX]]
+# INVALID-LLVM-TC1-NEXT: }
+# INVALID-LLVM-TC1-NEXT: error: '[[FILE]]': unable to continue dumping, the file is corrupt: section header table goes past the end of the file: e_shoff = 0x1000
+
+# INVALID-GNU-TC1:      ELF Header:
+# INVALID-GNU-TC1-NEXT:   Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
+# INVALID-GNU-TC1-NEXT:   Class:                             ELF64
+# INVALID-GNU-TC1-NEXT:   Data:                              2's complement, little endian
+# INVALID-GNU-TC1-NEXT:   Version:                           1 (current)
+# INVALID-GNU-TC1-NEXT:   OS/ABI:                            UNIX - System V
+# INVALID-GNU-TC1-NEXT:   ABI Version:                       0
+# INVALID-GNU-TC1-NEXT:   Type:                              REL (Relocatable file)
+# INVALID-GNU-TC1-NEXT:   Machine:                           None
+# INVALID-GNU-TC1-NEXT:   Version:                           0x1
+# INVALID-GNU-TC1-NEXT:   Entry point address:               0x0
+# INVALID-GNU-TC1-NEXT:   Start of program headers:          0 (bytes into file)
+# INVALID-GNU-TC1-NEXT:   Start of section headers:          4096 (bytes into file)
+# INVALID-GNU-TC1-NEXT:   Flags:                             0x0
+# INVALID-GNU-TC1-NEXT:   Size of this header:               64 (bytes)
+# INVALID-GNU-TC1-NEXT:   Size of program headers:           0 (bytes)
+# INVALID-GNU-TC1-NEXT:   Number of program headers:         0
+# INVALID-GNU-TC1-NEXT:   Size of section headers:           64 (bytes)
+# INVALID-GNU-TC1-NEXT:   Number of section headers:         [[SECHDRCOUNT]]
+# INVALID-GNU-TC1-NEXT:   Section header string table index: [[SECHDRSTRTABINDEX]]
+# INVALID-GNU-TC1-NEXT: error: '[[FILE]]': unable to continue dumping, the file is corrupt: section header table goes past the end of the file: e_shoff = 0x1000
+
+# INVALID-LLVM-TC2: error: '[[FILE]]': section header table goes past the end of the file: e_shoff = 0x1000
+# INVALID-GNU-TC2: error: '[[FILE]]': section header table goes past the end of the file: e_shoff = 0x1000
 
 --- !ELF
 FileHeader:
@@ -222,14 +225,14 @@ Sections:
 ## Check we don't dump anything except the file header when the section header table can't be read.
 
 # RUN: not llvm-readobj -a %t.invalid1 2>&1 \
-# RUN:  | FileCheck %s -DFILE=%t.invalid1 -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-LLVM
+# RUN:  | FileCheck %s -DFILE=%t.invalid1 -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-LLVM-TC1
 # RUN: not llvm-readelf -a %t.invalid1 2>&1 \
-# RUN:  | FileCheck %s -DFILE=%t.invalid1 -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-GNU
+# RUN:  | FileCheck %s -DFILE=%t.invalid1 -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-GNU-TC1
 
 ## Check what we print when e_shnum == 0, e_shstrndx == SHN_XINDEX and the section header table can't be read.
 
 # RUN: yaml2obj %s -DSHNUM=0 -DSHSTRNDX=0xffff --docnum=4 -o %t.invalid2
 # RUN: not llvm-readobj --file-headers %t.invalid2 2>&1 \
-# RUN:  | FileCheck %s -DFILE=%t.invalid2 -DSECHDRCOUNT="<?>" -DSECHDRSTRTABINDEX="<?>" --check-prefix=INVALID-LLVM
+# RUN:  | FileCheck %s -DFILE=%t.invalid2 -DSECHDRCOUNT="<?>" -DSECHDRSTRTABINDEX="<?>" --check-prefix=INVALID-LLVM-TC2
 # RUN: not llvm-readelf --file-headers %t.invalid2 2>&1 \
-# RUN:  | FileCheck %s -DFILE=%t.invalid2 -DSECHDRCOUNT="<?>" -DSECHDRSTRTABINDEX="<?>" --check-prefix=INVALID-GNU
+# RUN:  | FileCheck %s -DFILE=%t.invalid2 -DSECHDRCOUNT="<?>" -DSECHDRSTRTABINDEX="<?>" --check-prefix=INVALID-GNU-TC2
diff --git a/llvm/tools/llvm-readobj/ELFDumper.cpp b/llvm/tools/llvm-readobj/ELFDumper.cpp
index ab93316907cc6..1cfa138d7a7ea 100644
--- a/llvm/tools/llvm-readobj/ELFDumper.cpp
+++ b/llvm/tools/llvm-readobj/ELFDumper.cpp
@@ -3575,9 +3575,16 @@ static inline void printFields(formatted_raw_ostream &OS, StringRef Str1,
 template <class ELFT>
 static std::string getSectionHeadersNumString(const ELFFile<ELFT> &Obj,
                                               StringRef FileName) {
-  const typename ELFT::Ehdr &ElfHeader = Obj.getHeader();
-  if (ElfHeader.e_shnum != 0)
-    return to_string(ElfHeader.e_shnum);
+  if (Obj.getHeader().e_shnum != 0) {
+    std::string Result;
+    if (Obj.getHeader().e_shnum != Obj.getShNum())
+      raw_string_ostream(Result)
+          << format("%x (%x)", static_cast<int>(Obj.getHeader().e_shnum),
+                    static_cast<int>(Obj.getShNum()));
+    else
+      raw_string_ostream(Result) << Obj.getHeader().e_shnum;
+    return Result;
+  }
 
   Expected<ArrayRef<typename ELFT::Shdr>> ArrOrErr = Obj.sections();
   if (!ArrOrErr) {
@@ -3595,9 +3602,10 @@ static std::string getSectionHeadersNumString(const ELFFile<ELFT> &Obj,
 template <class ELFT>
 static std::string getSectionHeaderTableIndexString(const ELFFile<ELFT> &Obj,
                                                     StringRef FileName) {
-  const typename ELFT::Ehdr &ElfHeader = Obj.getHeader();
-  if (ElfHeader.e_shstrndx != SHN_XINDEX)
-    return to_string(ElfHeader.e_shstrndx);
+  auto strndx = Obj.getHeader().e_shstrndx;
+
+  if (strndx != SHN_XINDEX)
+    return to_string(strndx);
 
   Expected<ArrayRef<typename ELFT::Shdr>> ArrOrErr = Obj.sections();
   if (!ArrOrErr) {
@@ -3609,8 +3617,7 @@ static std::string getSectionHeaderTableIndexString(const ELFFile<ELFT> &Obj,
 
   if (ArrOrErr->empty())
     return "65535 (corrupt: out of range)";
-  return to_string(ElfHeader.e_shstrndx) + " (" +
-         to_string((*ArrOrErr)[0].sh_link) + ")";
+  return to_string(strndx) + " (" + to_string(Obj.getShStrNdx()) + ")";
 }
 
 static const EnumEntry<unsigned> *getObjectFileEnumEntry(unsigned Type) {
@@ -3765,7 +3772,7 @@ template <class ELFT> void GNUELFDumper<ELFT>::printFileHeaders() {
   printFields(OS, "Size of this header:", Str);
   Str = to_string(e.e_phentsize) + " (bytes)";
   printFields(OS, "Size of program headers:", Str);
-  Str = to_string...
[truncated]

@jh7370 jh7370 requested a review from MaskRay October 7, 2025 14:55
Copy link
Collaborator

@jh7370 jh7370 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build is failing. Please fix it.

I've not looked at the change or the classes in depth, but the code already does try to handle large section counts, for example (see e.g. line 993 in ELF.h that you're trying to modify). What exactly are you trying to achieve that this code doesn't already do?

@aokblast
Copy link
Contributor Author

aokblast commented Oct 7, 2025

The build is failing. Please fix it.

I've not looked at the change or the classes in depth, but the code already does try to handle large section counts, for example (see e.g. line 993 in ELF.h that you're trying to modify). What exactly are you trying to achieve that this code doesn't already do?

Thanks for your reply! I forget to enable lld. I am currently compiling and running test.

Some program, like readelf, currently unable to display 65535+ program headers. I think it is better for us to support this from our Support/Object level. Also, without doing this in Support/Object level, the program_headers() in Object/ELF.h also cannot iterate over 65535+ segments, which many programs rely on this to iterate over all program headers without reading the size themselve.

For section, I just handle them together with segment so it would be more consistent. Line 993 should be removed but cannot since when we use getSection() first time, we need to know the correct number of section in section() so that getSection(0) won't complain about OutOfBound.

@jh7370
Copy link
Collaborator

jh7370 commented Oct 7, 2025

Why do you want to support so many program headers? Just because the spec allows us doesn't mean we need to support it, if there's no use case for it.

@aokblast
Copy link
Contributor Author

aokblast commented Oct 7, 2025

Why do you want to support so many program headers? Just because the spec allows us doesn't mean we need to support it, if there's no use case for it.

It is related to the coredump design in FreeBSD. See: #132216. FreeBSD regards a mmap as a segment in coredump. Therefore, a program has 65535 more mmaps actually uses ExtendedHeader to store the actual number of mmaps and its address.
I am not sure if they have CI to check all segment mapping (phdrs), but I think it is possible for them to do so as we have test-cases to check 65535+ sections and the section headers.
If you think that it is more proper to fix in readelf itself, I can do it since the CI that the issue reported only compare the sections number insteead of their address. I just think that it would be great if we can do "check ExtendedHeader exists -> parse sections -> check section0 exists -> get correct section and segment info" in Support/Object instead of everybody has their own copies. But I am not insist on that.

@MaskRay
Copy link
Member

MaskRay commented Oct 8, 2025

Please remove the lld/ELF change. For linker output we don't intend to support the PN_XNUM program header feature. We also don't accept features without a test.

Link: https://groups.google.com/g/generic-abi/c/-J3lNY8ZKkU ("PN_XNUM extension for program headers")

Suggested title: [Object,ELF] Impelment PN_XNUM extension for program headers

Copy link
Collaborator

@jh7370 jh7370 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. I agree that there's value in having the program header count extension in the parsing and dumping code.

In addition to removing the lld changes, I think this change warrants being split into at leasr two PRs, one (or more) which does the refactoring (if needed) such that both extended program headers and section headers can be handled all together and another that actually implements the program header functionality. You will also probably want a separate change specifically for llvm-objcopy, so that it can write objects containing many program headers.

@@ -278,9 +278,17 @@ class ELFFile {
std::vector<Elf_Shdr> FakeSections;
SmallString<0> FakeSectionStrings;

// Handle extended header in section 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment doesn't really add anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks!

Comment on lines 282 to 284
Elf_Word e_phnum;
Elf_Word e_shnum;
Elf_Word e_shstrndx;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these values aren't the raw values in the header, let's not confuse things by naming them after those fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks!

@@ -889,15 +896,42 @@ Expected<uint64_t> ELFFile<ELFT>::getDynSymtabSize() const {
return 0;
}

template <class ELFT> ELFFile<ELFT>::ELFFile(StringRef Object) : Buf(Object) {}
template <class ELFT> ELFFile<ELFT>::ELFFile(StringRef Object) : Buf(Object) {
auto Header = getHeader();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refer to the LLVM style guide about using auto.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks!

@@ -529,6 +529,11 @@ struct Elf_Ehdr_Impl {

unsigned char getFileClass() const { return e_ident[ELF::EI_CLASS]; }
unsigned char getDataEncoding() const { return e_ident[ELF::EI_DATA]; }
bool HasHeaderExtension() const {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please review the LLVM style guide regarding naming conventions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks!

Copy link
Member

@MaskRay MaskRay Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

header isn't clear - it could mean elf header, program header, or section header.

perhaps hasPhdrNumExtension

0xFFFF => PN_XNUM, a new constant in include/llvm/BinaryFormat/ELF.h

The condition (e_phnum == 0xFFFF || e_shnum == ELF::SHN_UNDEF || ELF::SHN_XINDEX == e_phnum) && e_shoff != 0; doesn't look correct.

Copy link
Contributor Author

@aokblast aokblast Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello the last issue have been fixed locally. I am still figuring out problems in objcopy so I might need some time to upload it.

ELFFile Result(Object);

//
// sections() parse the total number of sections by considering the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment doesn't seem to make sense to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks!

ELFFile(StringRef Object);

public:
const Elf_Word getPhNum() const { return e_phnum; }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const on a value return type doesn't make sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks!

@@ -772,7 +779,7 @@ template <class ELFT>
Expected<StringRef>
ELFFile<ELFT>::getSectionStringTable(Elf_Shdr_Range Sections,
WarningHandler WarnHandler) const {
uint32_t Index = getHeader().e_shstrndx;
uint32_t Index = e_shstrndx;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes like this seem a bit muddled to me, in that I'd expect e_shstrndx to have been resolved to the real index as part of constructing ELFFile, meaning you don't then need to compare it against SHN_XINDEX below.

It suggests to me that either you've got code left around from earlier versions of the change, or you aren't resolving the real values early enough. I don't want to end up in a situation where people will have to study the code to know whether e_shstrndx etc needs to be checked against SHN_XINDEX. This may mean avoiding calls to certain functions during the ELFFile constructor, at least until this resolution has taken place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, it is my bad. I forget to delete the following if condition throughtly. I thought that we cannot call specific function like the ones with Expected<> return type in constructor originally. All other parts is safe after the assignment in create().

@@ -556,7 +556,7 @@ Sections:
# RUN: yaml2obj --docnum=25 %s -o %t25
# RUN: not llvm-readobj -h %t25 2>&1 | FileCheck -DFILE=%t25 --check-prefix=INVALID-SEC-NUM1 %s

# INVALID-SEC-NUM1: error: '[[FILE]]': unable to continue dumping, the file is corrupt: invalid section header table offset (e_shoff = 0x58) or invalid number of sections specified in the first section header's sh_size field (0x3ffffffffffffff)
# INVALID-SEC-NUM1: error: '[[FILE]]': invalid section header table offset (e_shoff = 0x58) or invalid number of sections specified in the first section header's sh_size field (0x3ffffffffffffff)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to preserve the status quo, if possible, regarding the error messages. The refactoring should be entirely non-functional from an end user perspective.

Copy link
Contributor Author

@aokblast aokblast Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emm, I am a little bit uncertain about this. Now, the ELFFile::section() error can be detected in the initialization step of ELFFile since we actually try to read it. Therefore, it makes some error happen early like this case since this error was happen in the middle when dumping instead of constructing ELFFile. Should we ignore silently in construction? Or should we report it right away? If we ignore it, we can preserve the original behavior and allow later step to handle it but we actually create an incorrect ELFFile. If we want to emit the error in ELFFile, we should do some modification like here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we lazily populate the information?

One of the issues we've had in the past with llvm-readobj and other dumping tools is it would error out early because it couldn't read some part of the file, long before it actually needed to display the relevant file information, which meant in turn it became hard to impossible to diagnose the problem when a file was malformed in some way. In practice, when we dump the file headers, whether the entire section header table is readable is irrelevant, for example. We should only need to read the 0th section header to do certain things. I hope that makes sense.

@@ -6,7 +6,7 @@ RUN: llvm-readobj --file-headers --sections --symbols %t2 | FileCheck %s
RUN: llvm-readelf --symbols %t2 | FileCheck --check-prefix=SYMS %s

## The ELF header should have e_shnum == 0 and e_shstrndx == SHN_XINDEX.
# CHECK: SectionHeaderCount: 0
# CHECK: SectionHeaderCount: 0 (65540)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change actually related to the code changes you've made, or is it just tightening up the test and is already the current behaviour? If the latter, please put it in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is part of the readelf (readobj). I would put to seperated PR. Thanks!

@@ -143,64 +143,67 @@ FileHeader:
# RUN: yaml2obj %s --docnum=4 -o %t.invalid1
# RUN: not llvm-readobj --file-headers %t.invalid1 2>&1 \
# RUN: | FileCheck %s --implicit-check-not=warning: -DFILE=%t.invalid1 \
# RUN: -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-LLVM
# RUN: -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-LLVM-TC1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-TC1 is a meaningless suffix. Instead, rename the whole prefix to better describe the specific case that you're interested in, e.g. something like BAD-SHOFF-LLVM. I'd also add a comment at the start of each test case explaining what the case is testing.

In ELF file, there is a possible extended header for those phnum, shnum,
and shstrndx larger than the maximum of 16 bits. This extended header
use section 0 to record these fields in 32 bits.  We implment this
feature so that programs rely on ELFFile::program_headers() can get the
correct number of segments. Also, the consumers don't have to check the
section 0 themselve, insteead, they can use the getPhNum() as an
alternative.
@aokblast
Copy link
Contributor Author

aokblast commented Oct 9, 2025

It seems that github does not have features that can submit patches based on another branch on my local tree. SO I send another PR with two commits. #162648

BTW, should I provide tests for this patch while the readobj patch have already tested it?

Also, thanks for you two to help me review my patch!

@aokblast aokblast changed the title [Object][ELF] Support extended header for Object Parser in ELF [Object,ELF] Impelment PN_XNUM extension for program headers Oct 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants