[Object,ELF] Impelment PN_XNUM extension for program headers #162288

aokblast · 2025-10-07T14:06:43Z

In ELF file, there is a possible extended header for those phnum, shnum,
and shstrndx larger than the maximum of 16 bits. This extended header
use section 0 to record these fields in 32 bits.

We implment this feature so that programs rely on ELFFile::program_headers() can get the
correct number of segments. Also, the consumers don't have to check the
section 0 themselve, insteead, they can use the getPhNum() as an
alternative.

llvmbot · 2025-10-07T14:46:27Z

@llvm/pr-subscribers-llvm-binary-utilities

@llvm/pr-subscribers-lld

Author: None (aokblast)

Changes

In ELF file, there is a possible extended header for those phnum, shnum, and shstrndx larger than the maximum of 16 bits. This extended header use section 0 to record these fields in 32 bits. For most of the ELF writers like lld, we already have the mechanism to synthesis this special section 0. However, the parser part don't have such infra and therefore we add it.

Also, we modify some test cases. For those expected-error test cases, their error emission get early. For those expected-correct test cases, we modify the output since we support more than 65535 sections now.

Patch is 21.57 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/162288.diff

8 Files Affected:

(modified) lld/ELF/SyntheticSections.cpp (-1)
(modified) lld/ELF/Writer.cpp (+10-1)
(modified) llvm/include/llvm/Object/ELF.h (+43-10)
(modified) llvm/include/llvm/Object/ELFTypes.h (+5)
(modified) llvm/test/Object/invalid.test (+2-2)
(modified) llvm/test/tools/llvm-objcopy/ELF/many-sections.test (+1-1)
(modified) llvm/test/tools/llvm-readobj/ELF/file-headers.test (+63-60)
(modified) llvm/tools/llvm-readobj/ELFDumper.cpp (+19-13)

diff --git a/lld/ELF/SyntheticSections.cpp b/lld/ELF/SyntheticSections.cpp
index bbf4b29a9fda58..71294782d9a31d 100644
--- a/lld/ELF/SyntheticSections.cpp
+++ b/lld/ELF/SyntheticSections.cpp
@@ -4428,7 +4428,6 @@ void elf::writeEhdr(Ctx &ctx, uint8_t *buf, Partition &part) {
   eHdr->e_version = EV_CURRENT;
   eHdr->e_flags = ctx.arg.eflags;
   eHdr->e_ehsize = sizeof(typename ELFT::Ehdr);
-  eHdr->e_phnum = part.phdrs.size();
   eHdr->e_shentsize = sizeof(typename ELFT::Shdr);
 
   if (!ctx.arg.relocatable) {
diff --git a/lld/ELF/Writer.cpp b/lld/ELF/Writer.cpp
index 4fa80397cbfa74..713d8279aab541 100644
--- a/lld/ELF/Writer.cpp
+++ b/lld/ELF/Writer.cpp
@@ -2907,7 +2907,8 @@ template <class ELFT> void Writer<ELFT>::writeHeader() {
   // the value. The sentinel values and fields are:
   // e_shnum = 0, SHdrs[0].sh_size = number of sections.
   // e_shstrndx = SHN_XINDEX, SHdrs[0].sh_link = .shstrtab section index.
-  auto *sHdrs = reinterpret_cast<Elf_Shdr *>(ctx.bufferStart + eHdr->e_shoff);
+  // e_phnum = 0xFFFF, SHdrs[0]
+  auto *sHdrs = reinterpret_cast<Elf_Shdr *>(ctx.bufferStart + eHdr->e_smhoff);
   size_t num = ctx.outputSections.size() + 1;
   if (num >= SHN_LORESERVE)
     sHdrs->sh_size = num;
@@ -2922,6 +2923,14 @@ template <class ELFT> void Writer<ELFT>::writeHeader() {
     eHdr->e_shstrndx = strTabIndex;
   }
 
+  num = part.phdrs.size();
+  if (num >= 0xFFFF) {
+    eHdr->e_phnum = 0xFFFF;
+    sHdrs->sh_info = num;
+  } else {
+    eHdr->e_phnum = num;
+  }
+
   for (OutputSection *sec : ctx.outputSections)
     sec->writeHeaderTo<ELFT>(++sHdrs);
 }
diff --git a/llvm/include/llvm/Object/ELF.h b/llvm/include/llvm/Object/ELF.h
index 59f63eb6b5bb69..4374924371d1cd 100644
--- a/llvm/include/llvm/Object/ELF.h
+++ b/llvm/include/llvm/Object/ELF.h
@@ -278,9 +278,17 @@ class ELFFile {
   std::vector<Elf_Shdr> FakeSections;
   SmallString<0> FakeSectionStrings;
 
+  // Handle extended header in section 0
+  Elf_Word e_phnum;
+  Elf_Word e_shnum;
+  Elf_Word e_shstrndx;
+
   ELFFile(StringRef Object);
 
 public:
+  const Elf_Word getPhNum() const { return e_phnum; }
+  const Elf_Word getShNum() const { return e_shnum; }
+  const Elf_Word getShStrNdx() const { return e_shstrndx; }
   const Elf_Ehdr &getHeader() const {
     return *reinterpret_cast<const Elf_Ehdr *>(base());
   }
@@ -379,22 +387,21 @@ class ELFFile {
 
   /// Iterate over program header table.
   Expected<Elf_Phdr_Range> program_headers() const {
-    if (getHeader().e_phnum && getHeader().e_phentsize != sizeof(Elf_Phdr))
+    if (e_phnum && getHeader().e_phentsize != sizeof(Elf_Phdr))
       return createError("invalid e_phentsize: " +
                          Twine(getHeader().e_phentsize));
 
-    uint64_t HeadersSize =
-        (uint64_t)getHeader().e_phnum * getHeader().e_phentsize;
+    uint64_t HeadersSize = (uint64_t)e_phnum * getHeader().e_phentsize;
     uint64_t PhOff = getHeader().e_phoff;
     if (PhOff + HeadersSize < PhOff || PhOff + HeadersSize > getBufSize())
       return createError("program headers are longer than binary of size " +
                          Twine(getBufSize()) + ": e_phoff = 0x" +
                          Twine::utohexstr(getHeader().e_phoff) +
-                         ", e_phnum = " + Twine(getHeader().e_phnum) +
+                         ", e_phnum = " + Twine(e_phnum) +
                          ", e_phentsize = " + Twine(getHeader().e_phentsize));
 
     auto *Begin = reinterpret_cast<const Elf_Phdr *>(base() + PhOff);
-    return ArrayRef(Begin, Begin + getHeader().e_phnum);
+    return ArrayRef(Begin, Begin + e_phnum);
   }
 
   /// Get an iterator over notes in a program header.
@@ -772,7 +779,7 @@ template <class ELFT>
 Expected<StringRef>
 ELFFile<ELFT>::getSectionStringTable(Elf_Shdr_Range Sections,
                                      WarningHandler WarnHandler) const {
-  uint32_t Index = getHeader().e_shstrndx;
+  uint32_t Index = e_shstrndx;
   if (Index == ELF::SHN_XINDEX) {
     // If the section name string table section index is greater than
     // or equal to SHN_LORESERVE, then the actual index of the section name
@@ -889,7 +896,12 @@ Expected<uint64_t> ELFFile<ELFT>::getDynSymtabSize() const {
   return 0;
 }
 
-template <class ELFT> ELFFile<ELFT>::ELFFile(StringRef Object) : Buf(Object) {}
+template <class ELFT> ELFFile<ELFT>::ELFFile(StringRef Object) : Buf(Object) {
+  auto Header = getHeader();
+  e_phnum = Header.e_phnum;
+  e_shnum = Header.e_shnum;
+  e_shstrndx = Header.e_shstrndx;
+}
 
 template <class ELFT>
 Expected<ELFFile<ELFT>> ELFFile<ELFT>::create(StringRef Object) {
@@ -897,7 +909,29 @@ Expected<ELFFile<ELFT>> ELFFile<ELFT>::create(StringRef Object) {
     return createError("invalid buffer: the size (" + Twine(Object.size()) +
                        ") is smaller than an ELF header (" +
                        Twine(sizeof(Elf_Ehdr)) + ")");
-  return ELFFile(Object);
+  ELFFile Result(Object);
+
+  //
+  // sections() parse the total number of sections by considering the
+  // extended headers
+  //
+  if (Result.getHeader().HasHeaderExtension()) {
+    auto TableOrErr = Result.sections();
+    if (!TableOrErr)
+      return TableOrErr.takeError();
+    if ((*TableOrErr).size() == 0)
+      return Result;
+    auto SecOrErr = object::getSection<ELFT>(*TableOrErr, 0);
+    if (!SecOrErr)
+      return SecOrErr.takeError();
+    if (Result.e_phnum == 0xFFFF)
+      Result.e_phnum = (*SecOrErr)->sh_info;
+    if (Result.e_shnum == ELF::SHN_UNDEF)
+      Result.e_shnum = (*SecOrErr)->sh_size;
+    if (Result.e_shstrndx == ELF::SHN_XINDEX)
+      Result.e_shstrndx = (*SecOrErr)->sh_link;
+  }
+  return Result;
 }
 
 /// Used by llvm-objdump -d (which needs sections for disassembly) to
@@ -940,7 +974,6 @@ Expected<typename ELFT::ShdrRange> ELFFile<ELFT>::sections() const {
   if (getHeader().e_shentsize != sizeof(Elf_Shdr))
     return createError("invalid e_shentsize in ELF header: " +
                        Twine(getHeader().e_shentsize));
-
   const uint64_t FileSize = Buf.size();
   if (SectionTableOffset + sizeof(Elf_Shdr) > FileSize ||
       SectionTableOffset + (uintX_t)sizeof(Elf_Shdr) < SectionTableOffset)
@@ -956,7 +989,7 @@ Expected<typename ELFT::ShdrRange> ELFFile<ELFT>::sections() const {
   const Elf_Shdr *First =
       reinterpret_cast<const Elf_Shdr *>(base() + SectionTableOffset);
 
-  uintX_t NumSections = getHeader().e_shnum;
+  uintX_t NumSections = e_shnum;
   if (NumSections == 0)
     NumSections = First->sh_size;
 
diff --git a/llvm/include/llvm/Object/ELFTypes.h b/llvm/include/llvm/Object/ELFTypes.h
index 5a26e2fc314586..232f6be9b4c498 100644
--- a/llvm/include/llvm/Object/ELFTypes.h
+++ b/llvm/include/llvm/Object/ELFTypes.h
@@ -529,6 +529,11 @@ struct Elf_Ehdr_Impl {
 
   unsigned char getFileClass() const { return e_ident[ELF::EI_CLASS]; }
   unsigned char getDataEncoding() const { return e_ident[ELF::EI_DATA]; }
+  bool HasHeaderExtension() const {
+    return (e_phnum == 0xFFFF || e_shnum == ELF::SHN_UNDEF ||
+            ELF::SHN_XINDEX == e_phnum) &&
+           e_shoff != 0;
+  }
 };
 
 template <endianness Endianness>
diff --git a/llvm/test/Object/invalid.test b/llvm/test/Object/invalid.test
index 58ec3cbeadd192..2bf23b45cdbb8d 100644
--- a/llvm/test/Object/invalid.test
+++ b/llvm/test/Object/invalid.test
@@ -556,7 +556,7 @@ Sections:
 # RUN: yaml2obj --docnum=25 %s -o %t25
 # RUN: not llvm-readobj -h %t25 2>&1 | FileCheck -DFILE=%t25 --check-prefix=INVALID-SEC-NUM1 %s
 
-# INVALID-SEC-NUM1: error: '[[FILE]]': unable to continue dumping, the file is corrupt: invalid section header table offset (e_shoff = 0x58) or invalid number of sections specified in the first section header's sh_size field (0x3ffffffffffffff)
+# INVALID-SEC-NUM1: error: '[[FILE]]': invalid section header table offset (e_shoff = 0x58) or invalid number of sections specified in the first section header's sh_size field (0x3ffffffffffffff)
 
 --- !ELF
 FileHeader:
@@ -575,7 +575,7 @@ Sections:
 # RUN: yaml2obj --docnum=26 %s -o %t26
 # RUN: not llvm-readobj -h %t26 2>&1 | FileCheck -DFILE=%t26 --check-prefix=INVALID-SEC-NUM2 %s
 
-# INVALID-SEC-NUM2: error: '[[FILE]]': unable to continue dumping, the file is corrupt: invalid number of sections specified in the NULL section's sh_size field (288230376151711744)
+# INVALID-SEC-NUM2: error: '[[FILE]]': invalid number of sections specified in the NULL section's sh_size field (288230376151711744)
 
 --- !ELF
 FileHeader:
diff --git a/llvm/test/tools/llvm-objcopy/ELF/many-sections.test b/llvm/test/tools/llvm-objcopy/ELF/many-sections.test
index 6622db237026fa..8b49454f985785 100644
--- a/llvm/test/tools/llvm-objcopy/ELF/many-sections.test
+++ b/llvm/test/tools/llvm-objcopy/ELF/many-sections.test
@@ -6,7 +6,7 @@ RUN: llvm-readobj --file-headers --sections --symbols %t2 | FileCheck %s
 RUN: llvm-readelf --symbols %t2 | FileCheck --check-prefix=SYMS %s
 
 ## The ELF header should have e_shnum == 0 and e_shstrndx == SHN_XINDEX.
-# CHECK:        SectionHeaderCount: 0
+# CHECK:        SectionHeaderCount: 0 (65540)
 # CHECK-NEXT:   StringTableSectionIndex: 65535
 
 ## The first section header should store the real section header count and
diff --git a/llvm/test/tools/llvm-readobj/ELF/file-headers.test b/llvm/test/tools/llvm-readobj/ELF/file-headers.test
index 97ab9f092b2287..d2fbed1b756564 100644
--- a/llvm/test/tools/llvm-readobj/ELF/file-headers.test
+++ b/llvm/test/tools/llvm-readobj/ELF/file-headers.test
@@ -143,64 +143,67 @@ FileHeader:
 # RUN: yaml2obj %s --docnum=4 -o %t.invalid1
 # RUN: not llvm-readobj --file-headers %t.invalid1 2>&1 \
 # RUN:  | FileCheck %s --implicit-check-not=warning: -DFILE=%t.invalid1 \
-# RUN:    -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-LLVM
+# RUN:    -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-LLVM-TC1
 # RUN: not llvm-readelf --file-headers %t.invalid1 2>&1 \
 # RUN:  | FileCheck %s --implicit-check-not=warning: -DFILE=%t.invalid1 \
-# RUN:    -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-GNU
-
-# INVALID-LLVM:      File: [[FILE]]
-# INVALID-LLVM-NEXT: Format: elf64-unknown
-# INVALID-LLVM-NEXT: Arch: unknown
-# INVALID-LLVM-NEXT: AddressSize: 64bit
-# INVALID-LLVM-NEXT: LoadName: <Not found>
-# INVALID-LLVM-NEXT: ElfHeader {
-# INVALID-LLVM-NEXT:   Ident {
-# INVALID-LLVM-NEXT:     Magic: (7F 45 4C 46)
-# INVALID-LLVM-NEXT:     Class: 64-bit (0x2)
-# INVALID-LLVM-NEXT:     DataEncoding: LittleEndian (0x1)
-# INVALID-LLVM-NEXT:     FileVersion: 1
-# INVALID-LLVM-NEXT:     OS/ABI: SystemV (0x0)
-# INVALID-LLVM-NEXT:     ABIVersion: 0
-# INVALID-LLVM-NEXT:     Unused: (00 00 00 00 00 00 00)
-# INVALID-LLVM-NEXT:   }
-# INVALID-LLVM-NEXT:   Type: Relocatable (0x1)
-# INVALID-LLVM-NEXT:   Machine: EM_NONE (0x0)
-# INVALID-LLVM-NEXT:   Version: 1
-# INVALID-LLVM-NEXT:   Entry: 0x0
-# INVALID-LLVM-NEXT:   ProgramHeaderOffset: 0x0
-# INVALID-LLVM-NEXT:   SectionHeaderOffset: 0x1000
-# INVALID-LLVM-NEXT:   Flags [ (0x0)
-# INVALID-LLVM-NEXT:   ]
-# INVALID-LLVM-NEXT:   HeaderSize: 64
-# INVALID-LLVM-NEXT:   ProgramHeaderEntrySize: 0
-# INVALID-LLVM-NEXT:   ProgramHeaderCount: 0
-# INVALID-LLVM-NEXT:   SectionHeaderEntrySize: 64
-# INVALID-LLVM-NEXT:   SectionHeaderCount: [[SECHDRCOUNT]]
-# INVALID-LLVM-NEXT:   StringTableSectionIndex: [[SECHDRSTRTABINDEX]]
-# INVALID-LLVM-NEXT: }
-# INVALID-LLVM-NEXT: error: '[[FILE]]': unable to continue dumping, the file is corrupt: section header table goes past the end of the file: e_shoff = 0x1000
-
-# INVALID-GNU:      ELF Header:
-# INVALID-GNU-NEXT:   Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
-# INVALID-GNU-NEXT:   Class:                             ELF64
-# INVALID-GNU-NEXT:   Data:                              2's complement, little endian
-# INVALID-GNU-NEXT:   Version:                           1 (current)
-# INVALID-GNU-NEXT:   OS/ABI:                            UNIX - System V
-# INVALID-GNU-NEXT:   ABI Version:                       0
-# INVALID-GNU-NEXT:   Type:                              REL (Relocatable file)
-# INVALID-GNU-NEXT:   Machine:                           None
-# INVALID-GNU-NEXT:   Version:                           0x1
-# INVALID-GNU-NEXT:   Entry point address:               0x0
-# INVALID-GNU-NEXT:   Start of program headers:          0 (bytes into file)
-# INVALID-GNU-NEXT:   Start of section headers:          4096 (bytes into file)
-# INVALID-GNU-NEXT:   Flags:                             0x0
-# INVALID-GNU-NEXT:   Size of this header:               64 (bytes)
-# INVALID-GNU-NEXT:   Size of program headers:           0 (bytes)
-# INVALID-GNU-NEXT:   Number of program headers:         0
-# INVALID-GNU-NEXT:   Size of section headers:           64 (bytes)
-# INVALID-GNU-NEXT:   Number of section headers:         [[SECHDRCOUNT]]
-# INVALID-GNU-NEXT:   Section header string table index: [[SECHDRSTRTABINDEX]]
-# INVALID-GNU-NEXT: error: '[[FILE]]': unable to continue dumping, the file is corrupt: section header table goes past the end of the file: e_shoff = 0x1000
+# RUN:    -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-GNU-TC1
+
+# INVALID-LLVM-TC1:      File: [[FILE]]
+# INVALID-LLVM-TC1-NEXT: Format: elf64-unknown
+# INVALID-LLVM-TC1-NEXT: Arch: unknown
+# INVALID-LLVM-TC1-NEXT: AddressSize: 64bit
+# INVALID-LLVM-TC1-NEXT: LoadName: <Not found>
+# INVALID-LLVM-TC1-NEXT: ElfHeader {
+# INVALID-LLVM-TC1-NEXT:   Ident {
+# INVALID-LLVM-TC1-NEXT:     Magic: (7F 45 4C 46)
+# INVALID-LLVM-TC1-NEXT:     Class: 64-bit (0x2)
+# INVALID-LLVM-TC1-NEXT:     DataEncoding: LittleEndian (0x1)
+# INVALID-LLVM-TC1-NEXT:     FileVersion: 1
+# INVALID-LLVM-TC1-NEXT:     OS/ABI: SystemV (0x0)
+# INVALID-LLVM-TC1-NEXT:     ABIVersion: 0
+# INVALID-LLVM-TC1-NEXT:     Unused: (00 00 00 00 00 00 00)
+# INVALID-LLVM-TC1-NEXT:   }
+# INVALID-LLVM-TC1-NEXT:   Type: Relocatable (0x1)
+# INVALID-LLVM-TC1-NEXT:   Machine: EM_NONE (0x0)
+# INVALID-LLVM-TC1-NEXT:   Version: 1
+# INVALID-LLVM-TC1-NEXT:   Entry: 0x0
+# INVALID-LLVM-TC1-NEXT:   ProgramHeaderOffset: 0x0
+# INVALID-LLVM-TC1-NEXT:   SectionHeaderOffset: 0x1000
+# INVALID-LLVM-TC1-NEXT:   Flags [ (0x0)
+# INVALID-LLVM-TC1-NEXT:   ]
+# INVALID-LLVM-TC1-NEXT:   HeaderSize: 64
+# INVALID-LLVM-TC1-NEXT:   ProgramHeaderEntrySize: 0
+# INVALID-LLVM-TC1-NEXT:   ProgramHeaderCount: 0
+# INVALID-LLVM-TC1-NEXT:   SectionHeaderEntrySize: 64
+# INVALID-LLVM-TC1-NEXT:   SectionHeaderCount: [[SECHDRCOUNT]]
+# INVALID-LLVM-TC1-NEXT:   StringTableSectionIndex: [[SECHDRSTRTABINDEX]]
+# INVALID-LLVM-TC1-NEXT: }
+# INVALID-LLVM-TC1-NEXT: error: '[[FILE]]': unable to continue dumping, the file is corrupt: section header table goes past the end of the file: e_shoff = 0x1000
+
+# INVALID-GNU-TC1:      ELF Header:
+# INVALID-GNU-TC1-NEXT:   Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
+# INVALID-GNU-TC1-NEXT:   Class:                             ELF64
+# INVALID-GNU-TC1-NEXT:   Data:                              2's complement, little endian
+# INVALID-GNU-TC1-NEXT:   Version:                           1 (current)
+# INVALID-GNU-TC1-NEXT:   OS/ABI:                            UNIX - System V
+# INVALID-GNU-TC1-NEXT:   ABI Version:                       0
+# INVALID-GNU-TC1-NEXT:   Type:                              REL (Relocatable file)
+# INVALID-GNU-TC1-NEXT:   Machine:                           None
+# INVALID-GNU-TC1-NEXT:   Version:                           0x1
+# INVALID-GNU-TC1-NEXT:   Entry point address:               0x0
+# INVALID-GNU-TC1-NEXT:   Start of program headers:          0 (bytes into file)
+# INVALID-GNU-TC1-NEXT:   Start of section headers:          4096 (bytes into file)
+# INVALID-GNU-TC1-NEXT:   Flags:                             0x0
+# INVALID-GNU-TC1-NEXT:   Size of this header:               64 (bytes)
+# INVALID-GNU-TC1-NEXT:   Size of program headers:           0 (bytes)
+# INVALID-GNU-TC1-NEXT:   Number of program headers:         0
+# INVALID-GNU-TC1-NEXT:   Size of section headers:           64 (bytes)
+# INVALID-GNU-TC1-NEXT:   Number of section headers:         [[SECHDRCOUNT]]
+# INVALID-GNU-TC1-NEXT:   Section header string table index: [[SECHDRSTRTABINDEX]]
+# INVALID-GNU-TC1-NEXT: error: '[[FILE]]': unable to continue dumping, the file is corrupt: section header table goes past the end of the file: e_shoff = 0x1000
+
+# INVALID-LLVM-TC2: error: '[[FILE]]': section header table goes past the end of the file: e_shoff = 0x1000
+# INVALID-GNU-TC2: error: '[[FILE]]': section header table goes past the end of the file: e_shoff = 0x1000
 
 --- !ELF
 FileHeader:
@@ -222,14 +225,14 @@ Sections:
 ## Check we don't dump anything except the file header when the section header table can't be read.
 
 # RUN: not llvm-readobj -a %t.invalid1 2>&1 \
-# RUN:  | FileCheck %s -DFILE=%t.invalid1 -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-LLVM
+# RUN:  | FileCheck %s -DFILE=%t.invalid1 -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-LLVM-TC1
 # RUN: not llvm-readelf -a %t.invalid1 2>&1 \
-# RUN:  | FileCheck %s -DFILE=%t.invalid1 -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-GNU
+# RUN:  | FileCheck %s -DFILE=%t.invalid1 -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-GNU-TC1
 
 ## Check what we print when e_shnum == 0, e_shstrndx == SHN_XINDEX and the section header table can't be read.
 
 # RUN: yaml2obj %s -DSHNUM=0 -DSHSTRNDX=0xffff --docnum=4 -o %t.invalid2
 # RUN: not llvm-readobj --file-headers %t.invalid2 2>&1 \
-# RUN:  | FileCheck %s -DFILE=%t.invalid2 -DSECHDRCOUNT="<?>" -DSECHDRSTRTABINDEX="<?>" --check-prefix=INVALID-LLVM
+# RUN:  | FileCheck %s -DFILE=%t.invalid2 -DSECHDRCOUNT="<?>" -DSECHDRSTRTABINDEX="<?>" --check-prefix=INVALID-LLVM-TC2
 # RUN: not llvm-readelf --file-headers %t.invalid2 2>&1 \
-# RUN:  | FileCheck %s -DFILE=%t.invalid2 -DSECHDRCOUNT="<?>" -DSECHDRSTRTABINDEX="<?>" --check-prefix=INVALID-GNU
+# RUN:  | FileCheck %s -DFILE=%t.invalid2 -DSECHDRCOUNT="<?>" -DSECHDRSTRTABINDEX="<?>" --check-prefix=INVALID-GNU-TC2
diff --git a/llvm/tools/llvm-readobj/ELFDumper.cpp b/llvm/tools/llvm-readobj/ELFDumper.cpp
index ab93316907cc6f..1cfa138d7a7ea7 100644
--- a/llvm/tools/llvm-readobj/ELFDumper.cpp
+++ b/llvm/tools/llvm-readobj/ELFDumper.cpp
@@ -3575,9 +3575,16 @@ static inline void printFields(formatted_raw_ostream &OS, StringRef Str1,
 template <class ELFT>
 static std::string getSectionHeadersNumString(const ELFFile<ELFT> &Obj,
                                               StringRef FileName) {
-  const typename ELFT::Ehdr &ElfHeader = Obj.getHeader();
-  if (ElfHeader.e_shnum != 0)
-    return to_string(ElfHeader.e_shnum);
+  if (Obj.getHeader().e_shnum != 0) {
+    std::string Result;
+    if (Obj.getHeader().e_shnum != Obj.getShNum())
+      raw_string_ostream(Result)
+          << format("%x (%x)", static_cast<int>(Obj.getHeader().e_shnum),
+                    static_cast<int>(Obj.getShNum()));
+    else
+      raw_string_ostream(Result) << Obj.getHeader().e_shnum;
+    return Result;
+  }
 
   Expected<ArrayRef<typename ELFT::Shdr>> ArrOrErr = Obj.sections();
   if (!ArrOrErr) {
@@ -3595,9 +3602,10 @@ static std::string getSectionHeadersNumString(const ELFFile<ELFT> &Obj,
 template <class ELFT>
 static std::string getSectionHeaderTableIndexString(const ELFFile<ELFT> &Obj,
                                                     StringRef FileName) {
-  const typename ELFT::Ehdr &ElfHeader = Obj.getHeader();
-  if (ElfHeader.e_shstrndx != SHN_XINDEX)
-    return to_string(ElfHeader.e_shstrndx);
+  auto strndx = Obj.getHeader().e_shstrndx;
+
+  if (strndx != SHN_XINDEX)
+    return to_string(strndx);
 
   Expected<ArrayRef<typename ELFT::Shdr>> ArrOrErr = Obj.sections();
   if (!ArrOrErr) {
@@ -3609,8 +3617,7 @@ static std::string getSectionHeaderTableIndexString(const ELFFile<ELFT> &Obj,
 
   if (ArrOrErr->empty())
     return "65535 (corrupt: out of range)";
-  return to_string(ElfHeader.e_shstrndx) + " (" +
-         to_string((*ArrOrErr)[0].sh_link) + ")";
+  return to_string(strndx) + " (" + to_string(Obj.getShStrNdx()) + ")";
 }
 
 static const EnumEntry<unsigned> *getObjectFileEnumEntry(unsigned Type) {
@@ -3765,7 +3772,7 @@ template <class ELFT> void GNUELFDumper<ELFT>::printFileHeaders() {
   printFields(OS, "Size of this header:", Str);
   Str = to_string(e.e_phentsize) + " (bytes)";
   printFields(OS, "Size of program headers:", Str);
- ...
[truncated]

llvmbot · 2025-10-07T14:46:28Z

@llvm/pr-subscribers-lld-elf

Author: None (aokblast)

Changes

In ELF file, there is a possible extended header for those phnum, shnum, and shstrndx larger than the maximum of 16 bits. This extended header use section 0 to record these fields in 32 bits. For most of the ELF writers like lld, we already have the mechanism to synthesis this special section 0. However, the parser part don't have such infra and therefore we add it.

Also, we modify some test cases. For those expected-error test cases, their error emission get early. For those expected-correct test cases, we modify the output since we support more than 65535 sections now.

Patch is 21.56 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/162288.diff

8 Files Affected:

(modified) lld/ELF/SyntheticSections.cpp (-1)
(modified) lld/ELF/Writer.cpp (+10-1)
(modified) llvm/include/llvm/Object/ELF.h (+43-10)
(modified) llvm/include/llvm/Object/ELFTypes.h (+5)
(modified) llvm/test/Object/invalid.test (+2-2)
(modified) llvm/test/tools/llvm-objcopy/ELF/many-sections.test (+1-1)
(modified) llvm/test/tools/llvm-readobj/ELF/file-headers.test (+63-60)
(modified) llvm/tools/llvm-readobj/ELFDumper.cpp (+19-13)

diff --git a/lld/ELF/SyntheticSections.cpp b/lld/ELF/SyntheticSections.cpp
index bbf4b29a9fda5..71294782d9a31 100644
--- a/lld/ELF/SyntheticSections.cpp
+++ b/lld/ELF/SyntheticSections.cpp
@@ -4428,7 +4428,6 @@ void elf::writeEhdr(Ctx &ctx, uint8_t *buf, Partition &part) {
   eHdr->e_version = EV_CURRENT;
   eHdr->e_flags = ctx.arg.eflags;
   eHdr->e_ehsize = sizeof(typename ELFT::Ehdr);
-  eHdr->e_phnum = part.phdrs.size();
   eHdr->e_shentsize = sizeof(typename ELFT::Shdr);
 
   if (!ctx.arg.relocatable) {
diff --git a/lld/ELF/Writer.cpp b/lld/ELF/Writer.cpp
index 4fa80397cbfa7..713d8279aab54 100644
--- a/lld/ELF/Writer.cpp
+++ b/lld/ELF/Writer.cpp
@@ -2907,7 +2907,8 @@ template <class ELFT> void Writer<ELFT>::writeHeader() {
   // the value. The sentinel values and fields are:
   // e_shnum = 0, SHdrs[0].sh_size = number of sections.
   // e_shstrndx = SHN_XINDEX, SHdrs[0].sh_link = .shstrtab section index.
-  auto *sHdrs = reinterpret_cast<Elf_Shdr *>(ctx.bufferStart + eHdr->e_shoff);
+  // e_phnum = 0xFFFF, SHdrs[0]
+  auto *sHdrs = reinterpret_cast<Elf_Shdr *>(ctx.bufferStart + eHdr->e_smhoff);
   size_t num = ctx.outputSections.size() + 1;
   if (num >= SHN_LORESERVE)
     sHdrs->sh_size = num;
@@ -2922,6 +2923,14 @@ template <class ELFT> void Writer<ELFT>::writeHeader() {
     eHdr->e_shstrndx = strTabIndex;
   }
 
+  num = part.phdrs.size();
+  if (num >= 0xFFFF) {
+    eHdr->e_phnum = 0xFFFF;
+    sHdrs->sh_info = num;
+  } else {
+    eHdr->e_phnum = num;
+  }
+
   for (OutputSection *sec : ctx.outputSections)
     sec->writeHeaderTo<ELFT>(++sHdrs);
 }
diff --git a/llvm/include/llvm/Object/ELF.h b/llvm/include/llvm/Object/ELF.h
index 59f63eb6b5bb6..4374924371d1c 100644
--- a/llvm/include/llvm/Object/ELF.h
+++ b/llvm/include/llvm/Object/ELF.h
@@ -278,9 +278,17 @@ class ELFFile {
   std::vector<Elf_Shdr> FakeSections;
   SmallString<0> FakeSectionStrings;
 
+  // Handle extended header in section 0
+  Elf_Word e_phnum;
+  Elf_Word e_shnum;
+  Elf_Word e_shstrndx;
+
   ELFFile(StringRef Object);
 
 public:
+  const Elf_Word getPhNum() const { return e_phnum; }
+  const Elf_Word getShNum() const { return e_shnum; }
+  const Elf_Word getShStrNdx() const { return e_shstrndx; }
   const Elf_Ehdr &getHeader() const {
     return *reinterpret_cast<const Elf_Ehdr *>(base());
   }
@@ -379,22 +387,21 @@ class ELFFile {
 
   /// Iterate over program header table.
   Expected<Elf_Phdr_Range> program_headers() const {
-    if (getHeader().e_phnum && getHeader().e_phentsize != sizeof(Elf_Phdr))
+    if (e_phnum && getHeader().e_phentsize != sizeof(Elf_Phdr))
       return createError("invalid e_phentsize: " +
                          Twine(getHeader().e_phentsize));
 
-    uint64_t HeadersSize =
-        (uint64_t)getHeader().e_phnum * getHeader().e_phentsize;
+    uint64_t HeadersSize = (uint64_t)e_phnum * getHeader().e_phentsize;
     uint64_t PhOff = getHeader().e_phoff;
     if (PhOff + HeadersSize < PhOff || PhOff + HeadersSize > getBufSize())
       return createError("program headers are longer than binary of size " +
                          Twine(getBufSize()) + ": e_phoff = 0x" +
                          Twine::utohexstr(getHeader().e_phoff) +
-                         ", e_phnum = " + Twine(getHeader().e_phnum) +
+                         ", e_phnum = " + Twine(e_phnum) +
                          ", e_phentsize = " + Twine(getHeader().e_phentsize));
 
     auto *Begin = reinterpret_cast<const Elf_Phdr *>(base() + PhOff);
-    return ArrayRef(Begin, Begin + getHeader().e_phnum);
+    return ArrayRef(Begin, Begin + e_phnum);
   }
 
   /// Get an iterator over notes in a program header.
@@ -772,7 +779,7 @@ template <class ELFT>
 Expected<StringRef>
 ELFFile<ELFT>::getSectionStringTable(Elf_Shdr_Range Sections,
                                      WarningHandler WarnHandler) const {
-  uint32_t Index = getHeader().e_shstrndx;
+  uint32_t Index = e_shstrndx;
   if (Index == ELF::SHN_XINDEX) {
     // If the section name string table section index is greater than
     // or equal to SHN_LORESERVE, then the actual index of the section name
@@ -889,7 +896,12 @@ Expected<uint64_t> ELFFile<ELFT>::getDynSymtabSize() const {
   return 0;
 }
 
-template <class ELFT> ELFFile<ELFT>::ELFFile(StringRef Object) : Buf(Object) {}
+template <class ELFT> ELFFile<ELFT>::ELFFile(StringRef Object) : Buf(Object) {
+  auto Header = getHeader();
+  e_phnum = Header.e_phnum;
+  e_shnum = Header.e_shnum;
+  e_shstrndx = Header.e_shstrndx;
+}
 
 template <class ELFT>
 Expected<ELFFile<ELFT>> ELFFile<ELFT>::create(StringRef Object) {
@@ -897,7 +909,29 @@ Expected<ELFFile<ELFT>> ELFFile<ELFT>::create(StringRef Object) {
     return createError("invalid buffer: the size (" + Twine(Object.size()) +
                        ") is smaller than an ELF header (" +
                        Twine(sizeof(Elf_Ehdr)) + ")");
-  return ELFFile(Object);
+  ELFFile Result(Object);
+
+  //
+  // sections() parse the total number of sections by considering the
+  // extended headers
+  //
+  if (Result.getHeader().HasHeaderExtension()) {
+    auto TableOrErr = Result.sections();
+    if (!TableOrErr)
+      return TableOrErr.takeError();
+    if ((*TableOrErr).size() == 0)
+      return Result;
+    auto SecOrErr = object::getSection<ELFT>(*TableOrErr, 0);
+    if (!SecOrErr)
+      return SecOrErr.takeError();
+    if (Result.e_phnum == 0xFFFF)
+      Result.e_phnum = (*SecOrErr)->sh_info;
+    if (Result.e_shnum == ELF::SHN_UNDEF)
+      Result.e_shnum = (*SecOrErr)->sh_size;
+    if (Result.e_shstrndx == ELF::SHN_XINDEX)
+      Result.e_shstrndx = (*SecOrErr)->sh_link;
+  }
+  return Result;
 }
 
 /// Used by llvm-objdump -d (which needs sections for disassembly) to
@@ -940,7 +974,6 @@ Expected<typename ELFT::ShdrRange> ELFFile<ELFT>::sections() const {
   if (getHeader().e_shentsize != sizeof(Elf_Shdr))
     return createError("invalid e_shentsize in ELF header: " +
                        Twine(getHeader().e_shentsize));
-
   const uint64_t FileSize = Buf.size();
   if (SectionTableOffset + sizeof(Elf_Shdr) > FileSize ||
       SectionTableOffset + (uintX_t)sizeof(Elf_Shdr) < SectionTableOffset)
@@ -956,7 +989,7 @@ Expected<typename ELFT::ShdrRange> ELFFile<ELFT>::sections() const {
   const Elf_Shdr *First =
       reinterpret_cast<const Elf_Shdr *>(base() + SectionTableOffset);
 
-  uintX_t NumSections = getHeader().e_shnum;
+  uintX_t NumSections = e_shnum;
   if (NumSections == 0)
     NumSections = First->sh_size;
 
diff --git a/llvm/include/llvm/Object/ELFTypes.h b/llvm/include/llvm/Object/ELFTypes.h
index 5a26e2fc31458..232f6be9b4c49 100644
--- a/llvm/include/llvm/Object/ELFTypes.h
+++ b/llvm/include/llvm/Object/ELFTypes.h
@@ -529,6 +529,11 @@ struct Elf_Ehdr_Impl {
 
   unsigned char getFileClass() const { return e_ident[ELF::EI_CLASS]; }
   unsigned char getDataEncoding() const { return e_ident[ELF::EI_DATA]; }
+  bool HasHeaderExtension() const {
+    return (e_phnum == 0xFFFF || e_shnum == ELF::SHN_UNDEF ||
+            ELF::SHN_XINDEX == e_phnum) &&
+           e_shoff != 0;
+  }
 };
 
 template <endianness Endianness>
diff --git a/llvm/test/Object/invalid.test b/llvm/test/Object/invalid.test
index 58ec3cbeadd19..2bf23b45cdbb8 100644
--- a/llvm/test/Object/invalid.test
+++ b/llvm/test/Object/invalid.test
@@ -556,7 +556,7 @@ Sections:
 # RUN: yaml2obj --docnum=25 %s -o %t25
 # RUN: not llvm-readobj -h %t25 2>&1 | FileCheck -DFILE=%t25 --check-prefix=INVALID-SEC-NUM1 %s
 
-# INVALID-SEC-NUM1: error: '[[FILE]]': unable to continue dumping, the file is corrupt: invalid section header table offset (e_shoff = 0x58) or invalid number of sections specified in the first section header's sh_size field (0x3ffffffffffffff)
+# INVALID-SEC-NUM1: error: '[[FILE]]': invalid section header table offset (e_shoff = 0x58) or invalid number of sections specified in the first section header's sh_size field (0x3ffffffffffffff)
 
 --- !ELF
 FileHeader:
@@ -575,7 +575,7 @@ Sections:
 # RUN: yaml2obj --docnum=26 %s -o %t26
 # RUN: not llvm-readobj -h %t26 2>&1 | FileCheck -DFILE=%t26 --check-prefix=INVALID-SEC-NUM2 %s
 
-# INVALID-SEC-NUM2: error: '[[FILE]]': unable to continue dumping, the file is corrupt: invalid number of sections specified in the NULL section's sh_size field (288230376151711744)
+# INVALID-SEC-NUM2: error: '[[FILE]]': invalid number of sections specified in the NULL section's sh_size field (288230376151711744)
 
 --- !ELF
 FileHeader:
diff --git a/llvm/test/tools/llvm-objcopy/ELF/many-sections.test b/llvm/test/tools/llvm-objcopy/ELF/many-sections.test
index 6622db237026f..8b49454f98578 100644
--- a/llvm/test/tools/llvm-objcopy/ELF/many-sections.test
+++ b/llvm/test/tools/llvm-objcopy/ELF/many-sections.test
@@ -6,7 +6,7 @@ RUN: llvm-readobj --file-headers --sections --symbols %t2 | FileCheck %s
 RUN: llvm-readelf --symbols %t2 | FileCheck --check-prefix=SYMS %s
 
 ## The ELF header should have e_shnum == 0 and e_shstrndx == SHN_XINDEX.
-# CHECK:        SectionHeaderCount: 0
+# CHECK:        SectionHeaderCount: 0 (65540)
 # CHECK-NEXT:   StringTableSectionIndex: 65535
 
 ## The first section header should store the real section header count and
diff --git a/llvm/test/tools/llvm-readobj/ELF/file-headers.test b/llvm/test/tools/llvm-readobj/ELF/file-headers.test
index 97ab9f092b228..d2fbed1b75656 100644
--- a/llvm/test/tools/llvm-readobj/ELF/file-headers.test
+++ b/llvm/test/tools/llvm-readobj/ELF/file-headers.test
@@ -143,64 +143,67 @@ FileHeader:
 # RUN: yaml2obj %s --docnum=4 -o %t.invalid1
 # RUN: not llvm-readobj --file-headers %t.invalid1 2>&1 \
 # RUN:  | FileCheck %s --implicit-check-not=warning: -DFILE=%t.invalid1 \
-# RUN:    -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-LLVM
+# RUN:    -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-LLVM-TC1
 # RUN: not llvm-readelf --file-headers %t.invalid1 2>&1 \
 # RUN:  | FileCheck %s --implicit-check-not=warning: -DFILE=%t.invalid1 \
-# RUN:    -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-GNU
-
-# INVALID-LLVM:      File: [[FILE]]
-# INVALID-LLVM-NEXT: Format: elf64-unknown
-# INVALID-LLVM-NEXT: Arch: unknown
-# INVALID-LLVM-NEXT: AddressSize: 64bit
-# INVALID-LLVM-NEXT: LoadName: <Not found>
-# INVALID-LLVM-NEXT: ElfHeader {
-# INVALID-LLVM-NEXT:   Ident {
-# INVALID-LLVM-NEXT:     Magic: (7F 45 4C 46)
-# INVALID-LLVM-NEXT:     Class: 64-bit (0x2)
-# INVALID-LLVM-NEXT:     DataEncoding: LittleEndian (0x1)
-# INVALID-LLVM-NEXT:     FileVersion: 1
-# INVALID-LLVM-NEXT:     OS/ABI: SystemV (0x0)
-# INVALID-LLVM-NEXT:     ABIVersion: 0
-# INVALID-LLVM-NEXT:     Unused: (00 00 00 00 00 00 00)
-# INVALID-LLVM-NEXT:   }
-# INVALID-LLVM-NEXT:   Type: Relocatable (0x1)
-# INVALID-LLVM-NEXT:   Machine: EM_NONE (0x0)
-# INVALID-LLVM-NEXT:   Version: 1
-# INVALID-LLVM-NEXT:   Entry: 0x0
-# INVALID-LLVM-NEXT:   ProgramHeaderOffset: 0x0
-# INVALID-LLVM-NEXT:   SectionHeaderOffset: 0x1000
-# INVALID-LLVM-NEXT:   Flags [ (0x0)
-# INVALID-LLVM-NEXT:   ]
-# INVALID-LLVM-NEXT:   HeaderSize: 64
-# INVALID-LLVM-NEXT:   ProgramHeaderEntrySize: 0
-# INVALID-LLVM-NEXT:   ProgramHeaderCount: 0
-# INVALID-LLVM-NEXT:   SectionHeaderEntrySize: 64
-# INVALID-LLVM-NEXT:   SectionHeaderCount: [[SECHDRCOUNT]]
-# INVALID-LLVM-NEXT:   StringTableSectionIndex: [[SECHDRSTRTABINDEX]]
-# INVALID-LLVM-NEXT: }
-# INVALID-LLVM-NEXT: error: '[[FILE]]': unable to continue dumping, the file is corrupt: section header table goes past the end of the file: e_shoff = 0x1000
-
-# INVALID-GNU:      ELF Header:
-# INVALID-GNU-NEXT:   Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
-# INVALID-GNU-NEXT:   Class:                             ELF64
-# INVALID-GNU-NEXT:   Data:                              2's complement, little endian
-# INVALID-GNU-NEXT:   Version:                           1 (current)
-# INVALID-GNU-NEXT:   OS/ABI:                            UNIX - System V
-# INVALID-GNU-NEXT:   ABI Version:                       0
-# INVALID-GNU-NEXT:   Type:                              REL (Relocatable file)
-# INVALID-GNU-NEXT:   Machine:                           None
-# INVALID-GNU-NEXT:   Version:                           0x1
-# INVALID-GNU-NEXT:   Entry point address:               0x0
-# INVALID-GNU-NEXT:   Start of program headers:          0 (bytes into file)
-# INVALID-GNU-NEXT:   Start of section headers:          4096 (bytes into file)
-# INVALID-GNU-NEXT:   Flags:                             0x0
-# INVALID-GNU-NEXT:   Size of this header:               64 (bytes)
-# INVALID-GNU-NEXT:   Size of program headers:           0 (bytes)
-# INVALID-GNU-NEXT:   Number of program headers:         0
-# INVALID-GNU-NEXT:   Size of section headers:           64 (bytes)
-# INVALID-GNU-NEXT:   Number of section headers:         [[SECHDRCOUNT]]
-# INVALID-GNU-NEXT:   Section header string table index: [[SECHDRSTRTABINDEX]]
-# INVALID-GNU-NEXT: error: '[[FILE]]': unable to continue dumping, the file is corrupt: section header table goes past the end of the file: e_shoff = 0x1000
+# RUN:    -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-GNU-TC1
+
+# INVALID-LLVM-TC1:      File: [[FILE]]
+# INVALID-LLVM-TC1-NEXT: Format: elf64-unknown
+# INVALID-LLVM-TC1-NEXT: Arch: unknown
+# INVALID-LLVM-TC1-NEXT: AddressSize: 64bit
+# INVALID-LLVM-TC1-NEXT: LoadName: <Not found>
+# INVALID-LLVM-TC1-NEXT: ElfHeader {
+# INVALID-LLVM-TC1-NEXT:   Ident {
+# INVALID-LLVM-TC1-NEXT:     Magic: (7F 45 4C 46)
+# INVALID-LLVM-TC1-NEXT:     Class: 64-bit (0x2)
+# INVALID-LLVM-TC1-NEXT:     DataEncoding: LittleEndian (0x1)
+# INVALID-LLVM-TC1-NEXT:     FileVersion: 1
+# INVALID-LLVM-TC1-NEXT:     OS/ABI: SystemV (0x0)
+# INVALID-LLVM-TC1-NEXT:     ABIVersion: 0
+# INVALID-LLVM-TC1-NEXT:     Unused: (00 00 00 00 00 00 00)
+# INVALID-LLVM-TC1-NEXT:   }
+# INVALID-LLVM-TC1-NEXT:   Type: Relocatable (0x1)
+# INVALID-LLVM-TC1-NEXT:   Machine: EM_NONE (0x0)
+# INVALID-LLVM-TC1-NEXT:   Version: 1
+# INVALID-LLVM-TC1-NEXT:   Entry: 0x0
+# INVALID-LLVM-TC1-NEXT:   ProgramHeaderOffset: 0x0
+# INVALID-LLVM-TC1-NEXT:   SectionHeaderOffset: 0x1000
+# INVALID-LLVM-TC1-NEXT:   Flags [ (0x0)
+# INVALID-LLVM-TC1-NEXT:   ]
+# INVALID-LLVM-TC1-NEXT:   HeaderSize: 64
+# INVALID-LLVM-TC1-NEXT:   ProgramHeaderEntrySize: 0
+# INVALID-LLVM-TC1-NEXT:   ProgramHeaderCount: 0
+# INVALID-LLVM-TC1-NEXT:   SectionHeaderEntrySize: 64
+# INVALID-LLVM-TC1-NEXT:   SectionHeaderCount: [[SECHDRCOUNT]]
+# INVALID-LLVM-TC1-NEXT:   StringTableSectionIndex: [[SECHDRSTRTABINDEX]]
+# INVALID-LLVM-TC1-NEXT: }
+# INVALID-LLVM-TC1-NEXT: error: '[[FILE]]': unable to continue dumping, the file is corrupt: section header table goes past the end of the file: e_shoff = 0x1000
+
+# INVALID-GNU-TC1:      ELF Header:
+# INVALID-GNU-TC1-NEXT:   Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
+# INVALID-GNU-TC1-NEXT:   Class:                             ELF64
+# INVALID-GNU-TC1-NEXT:   Data:                              2's complement, little endian
+# INVALID-GNU-TC1-NEXT:   Version:                           1 (current)
+# INVALID-GNU-TC1-NEXT:   OS/ABI:                            UNIX - System V
+# INVALID-GNU-TC1-NEXT:   ABI Version:                       0
+# INVALID-GNU-TC1-NEXT:   Type:                              REL (Relocatable file)
+# INVALID-GNU-TC1-NEXT:   Machine:                           None
+# INVALID-GNU-TC1-NEXT:   Version:                           0x1
+# INVALID-GNU-TC1-NEXT:   Entry point address:               0x0
+# INVALID-GNU-TC1-NEXT:   Start of program headers:          0 (bytes into file)
+# INVALID-GNU-TC1-NEXT:   Start of section headers:          4096 (bytes into file)
+# INVALID-GNU-TC1-NEXT:   Flags:                             0x0
+# INVALID-GNU-TC1-NEXT:   Size of this header:               64 (bytes)
+# INVALID-GNU-TC1-NEXT:   Size of program headers:           0 (bytes)
+# INVALID-GNU-TC1-NEXT:   Number of program headers:         0
+# INVALID-GNU-TC1-NEXT:   Size of section headers:           64 (bytes)
+# INVALID-GNU-TC1-NEXT:   Number of section headers:         [[SECHDRCOUNT]]
+# INVALID-GNU-TC1-NEXT:   Section header string table index: [[SECHDRSTRTABINDEX]]
+# INVALID-GNU-TC1-NEXT: error: '[[FILE]]': unable to continue dumping, the file is corrupt: section header table goes past the end of the file: e_shoff = 0x1000
+
+# INVALID-LLVM-TC2: error: '[[FILE]]': section header table goes past the end of the file: e_shoff = 0x1000
+# INVALID-GNU-TC2: error: '[[FILE]]': section header table goes past the end of the file: e_shoff = 0x1000
 
 --- !ELF
 FileHeader:
@@ -222,14 +225,14 @@ Sections:
 ## Check we don't dump anything except the file header when the section header table can't be read.
 
 # RUN: not llvm-readobj -a %t.invalid1 2>&1 \
-# RUN:  | FileCheck %s -DFILE=%t.invalid1 -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-LLVM
+# RUN:  | FileCheck %s -DFILE=%t.invalid1 -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-LLVM-TC1
 # RUN: not llvm-readelf -a %t.invalid1 2>&1 \
-# RUN:  | FileCheck %s -DFILE=%t.invalid1 -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-GNU
+# RUN:  | FileCheck %s -DFILE=%t.invalid1 -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-GNU-TC1
 
 ## Check what we print when e_shnum == 0, e_shstrndx == SHN_XINDEX and the section header table can't be read.
 
 # RUN: yaml2obj %s -DSHNUM=0 -DSHSTRNDX=0xffff --docnum=4 -o %t.invalid2
 # RUN: not llvm-readobj --file-headers %t.invalid2 2>&1 \
-# RUN:  | FileCheck %s -DFILE=%t.invalid2 -DSECHDRCOUNT="<?>" -DSECHDRSTRTABINDEX="<?>" --check-prefix=INVALID-LLVM
+# RUN:  | FileCheck %s -DFILE=%t.invalid2 -DSECHDRCOUNT="<?>" -DSECHDRSTRTABINDEX="<?>" --check-prefix=INVALID-LLVM-TC2
 # RUN: not llvm-readelf --file-headers %t.invalid2 2>&1 \
-# RUN:  | FileCheck %s -DFILE=%t.invalid2 -DSECHDRCOUNT="<?>" -DSECHDRSTRTABINDEX="<?>" --check-prefix=INVALID-GNU
+# RUN:  | FileCheck %s -DFILE=%t.invalid2 -DSECHDRCOUNT="<?>" -DSECHDRSTRTABINDEX="<?>" --check-prefix=INVALID-GNU-TC2
diff --git a/llvm/tools/llvm-readobj/ELFDumper.cpp b/llvm/tools/llvm-readobj/ELFDumper.cpp
index ab93316907cc6..1cfa138d7a7ea 100644
--- a/llvm/tools/llvm-readobj/ELFDumper.cpp
+++ b/llvm/tools/llvm-readobj/ELFDumper.cpp
@@ -3575,9 +3575,16 @@ static inline void printFields(formatted_raw_ostream &OS, StringRef Str1,
 template <class ELFT>
 static std::string getSectionHeadersNumString(const ELFFile<ELFT> &Obj,
                                               StringRef FileName) {
-  const typename ELFT::Ehdr &ElfHeader = Obj.getHeader();
-  if (ElfHeader.e_shnum != 0)
-    return to_string(ElfHeader.e_shnum);
+  if (Obj.getHeader().e_shnum != 0) {
+    std::string Result;
+    if (Obj.getHeader().e_shnum != Obj.getShNum())
+      raw_string_ostream(Result)
+          << format("%x (%x)", static_cast<int>(Obj.getHeader().e_shnum),
+                    static_cast<int>(Obj.getShNum()));
+    else
+      raw_string_ostream(Result) << Obj.getHeader().e_shnum;
+    return Result;
+  }
 
   Expected<ArrayRef<typename ELFT::Shdr>> ArrOrErr = Obj.sections();
   if (!ArrOrErr) {
@@ -3595,9 +3602,10 @@ static std::string getSectionHeadersNumString(const ELFFile<ELFT> &Obj,
 template <class ELFT>
 static std::string getSectionHeaderTableIndexString(const ELFFile<ELFT> &Obj,
                                                     StringRef FileName) {
-  const typename ELFT::Ehdr &ElfHeader = Obj.getHeader();
-  if (ElfHeader.e_shstrndx != SHN_XINDEX)
-    return to_string(ElfHeader.e_shstrndx);
+  auto strndx = Obj.getHeader().e_shstrndx;
+
+  if (strndx != SHN_XINDEX)
+    return to_string(strndx);
 
   Expected<ArrayRef<typename ELFT::Shdr>> ArrOrErr = Obj.sections();
   if (!ArrOrErr) {
@@ -3609,8 +3617,7 @@ static std::string getSectionHeaderTableIndexString(const ELFFile<ELFT> &Obj,
 
   if (ArrOrErr->empty())
     return "65535 (corrupt: out of range)";
-  return to_string(ElfHeader.e_shstrndx) + " (" +
-         to_string((*ArrOrErr)[0].sh_link) + ")";
+  return to_string(strndx) + " (" + to_string(Obj.getShStrNdx()) + ")";
 }
 
 static const EnumEntry<unsigned> *getObjectFileEnumEntry(unsigned Type) {
@@ -3765,7 +3772,7 @@ template <class ELFT> void GNUELFDumper<ELFT>::printFileHeaders() {
   printFields(OS, "Size of this header:", Str);
   Str = to_string(e.e_phentsize) + " (bytes)";
   printFields(OS, "Size of program headers:", Str);
-  Str = to_string...
[truncated]

jh7370

The build is failing. Please fix it.

I've not looked at the change or the classes in depth, but the code already does try to handle large section counts, for example (see e.g. line 993 in ELF.h that you're trying to modify). What exactly are you trying to achieve that this code doesn't already do?

aokblast · 2025-10-07T15:32:05Z

The build is failing. Please fix it.

I've not looked at the change or the classes in depth, but the code already does try to handle large section counts, for example (see e.g. line 993 in ELF.h that you're trying to modify). What exactly are you trying to achieve that this code doesn't already do?

Thanks for your reply! I forget to enable lld. I am currently compiling and running test.

Some program, like readelf, currently unable to display 65535+ program headers. I think it is better for us to support this from our Support/Object level. Also, without doing this in Support/Object level, the program_headers() in Object/ELF.h also cannot iterate over 65535+ segments, which many programs rely on this to iterate over all program headers without reading the size themselve.

For section, I just handle them together with segment so it would be more consistent. Line 993 should be removed but cannot since when we use getSection() first time, we need to know the correct number of section in section() so that getSection(0) won't complain about OutOfBound.

jh7370 · 2025-10-07T16:07:22Z

Why do you want to support so many program headers? Just because the spec allows us doesn't mean we need to support it, if there's no use case for it.

aokblast · 2025-10-07T16:14:21Z

Why do you want to support so many program headers? Just because the spec allows us doesn't mean we need to support it, if there's no use case for it.

It is related to the coredump design in FreeBSD. See: #132216. FreeBSD regards a mmap as a segment in coredump. Therefore, a program has 65535 more mmaps actually uses ExtendedHeader to store the actual number of mmaps and its address.
I am not sure if they have CI to check all segment mapping (phdrs), but I think it is possible for them to do so as we have test-cases to check 65535+ sections and the section headers.
If you think that it is more proper to fix in readelf itself, I can do it since the CI that the issue reported only compare the sections number insteead of their address. I just think that it would be great if we can do "check ExtendedHeader exists -> parse sections -> check section0 exists -> get correct section and segment info" in Support/Object instead of everybody has their own copies. But I am not insist on that.

MaskRay · 2025-10-08T06:26:56Z

Please remove the lld/ELF change. For linker output we don't intend to support the PN_XNUM program header feature. We also don't accept features without a test.

Link: https://groups.google.com/g/generic-abi/c/-J3lNY8ZKkU ("PN_XNUM extension for program headers")

Suggested title: [Object,ELF] Impelment PN_XNUM extension for program headers

jh7370

Thanks for the explanation. I agree that there's value in having the program header count extension in the parsing and dumping code.

In addition to removing the lld changes, I think this change warrants being split into at leasr two PRs, one (or more) which does the refactoring (if needed) such that both extended program headers and section headers can be handled all together and another that actually implements the program header functionality. You will also probably want a separate change specifically for llvm-objcopy, so that it can write objects containing many program headers.

jh7370 · 2025-10-08T07:14:03Z

llvm/include/llvm/Object/ELF.h

@@ -278,9 +278,17 @@ class ELFFile {
  std::vector<Elf_Shdr> FakeSections;
  SmallString<0> FakeSectionStrings;

+  // Handle extended header in section 0


This comment doesn't really add anything.

Fixed. Thanks!

jh7370 · 2025-10-08T07:15:14Z

llvm/include/llvm/Object/ELF.h

+  Elf_Word e_phnum;
+  Elf_Word e_shnum;
+  Elf_Word e_shstrndx;


Since these values aren't the raw values in the header, let's not confuse things by naming them after those fields.

Fixed. Thanks!

jh7370 · 2025-10-08T07:18:11Z

llvm/include/llvm/Object/ELF.h

@@ -889,15 +896,42 @@ Expected<uint64_t> ELFFile<ELFT>::getDynSymtabSize() const {
  return 0;
 }

-template <class ELFT> ELFFile<ELFT>::ELFFile(StringRef Object) : Buf(Object) {}
+template <class ELFT> ELFFile<ELFT>::ELFFile(StringRef Object) : Buf(Object) {
+  auto Header = getHeader();


Please refer to the LLVM style guide about using auto.

Fixed. Thanks!

jh7370 · 2025-10-08T07:19:07Z

llvm/include/llvm/Object/ELFTypes.h

@@ -529,6 +529,11 @@ struct Elf_Ehdr_Impl {

  unsigned char getFileClass() const { return e_ident[ELF::EI_CLASS]; }
  unsigned char getDataEncoding() const { return e_ident[ELF::EI_DATA]; }
+  bool HasHeaderExtension() const {


Please review the LLVM style guide regarding naming conventions.

Fixed. Thanks!

header isn't clear - it could mean elf header, program header, or section header.

perhaps hasPhdrNumExtension

0xFFFF => PN_XNUM, a new constant in include/llvm/BinaryFormat/ELF.h

The condition (e_phnum == 0xFFFF || e_shnum == ELF::SHN_UNDEF || ELF::SHN_XINDEX == e_phnum) && e_shoff != 0; doesn't look correct.

Hello the last issue have been fixed locally. I am still figuring out problems in objcopy so I might need some time to upload it.

jh7370 · 2025-10-08T07:21:33Z

llvm/include/llvm/Object/ELF.h

+  ELFFile Result(Object);
+
+  //
+  // sections() parse the total number of sections by considering the


This comment doesn't seem to make sense to me.

Fixed. Thanks!

jh7370 · 2025-10-08T07:23:35Z

llvm/include/llvm/Object/ELF.h

  ELFFile(StringRef Object);

 public:
+  const Elf_Word getPhNum() const { return e_phnum; }


const on a value return type doesn't make sense.

Fixed. Thanks!

jh7370 · 2025-10-08T07:30:43Z

llvm/include/llvm/Object/ELF.h

@@ -772,7 +779,7 @@ template <class ELFT>
 Expected<StringRef>
 ELFFile<ELFT>::getSectionStringTable(Elf_Shdr_Range Sections,
                                     WarningHandler WarnHandler) const {
-  uint32_t Index = getHeader().e_shstrndx;
+  uint32_t Index = e_shstrndx;


Changes like this seem a bit muddled to me, in that I'd expect e_shstrndx to have been resolved to the real index as part of constructing ELFFile, meaning you don't then need to compare it against SHN_XINDEX below.

It suggests to me that either you've got code left around from earlier versions of the change, or you aren't resolving the real values early enough. I don't want to end up in a situation where people will have to study the code to know whether e_shstrndx etc needs to be checked against SHN_XINDEX. This may mean avoiding calls to certain functions during the ELFFile constructor, at least until this resolution has taken place.

Sorry, it is my bad. I forget to delete the following if condition throughtly. I thought that we cannot call specific function like the ones with Expected<> return type in constructor originally. All other parts is safe after the assignment in create().

jh7370 · 2025-10-08T07:32:25Z

llvm/test/Object/invalid.test

@@ -556,7 +556,7 @@ Sections:
 # RUN: yaml2obj --docnum=25 %s -o %t25
 # RUN: not llvm-readobj -h %t25 2>&1 | FileCheck -DFILE=%t25 --check-prefix=INVALID-SEC-NUM1 %s

-# INVALID-SEC-NUM1: error: '[[FILE]]': unable to continue dumping, the file is corrupt: invalid section header table offset (e_shoff = 0x58) or invalid number of sections specified in the first section header's sh_size field (0x3ffffffffffffff)
+# INVALID-SEC-NUM1: error: '[[FILE]]': invalid section header table offset (e_shoff = 0x58) or invalid number of sections specified in the first section header's sh_size field (0x3ffffffffffffff)


I'd like to preserve the status quo, if possible, regarding the error messages. The refactoring should be entirely non-functional from an end user perspective.

Emm, I am a little bit uncertain about this. Now, the ELFFile::section() error can be detected in the initialization step of ELFFile since we actually try to read it. Therefore, it makes some error happen early like this case since this error was happen in the middle when dumping instead of constructing ELFFile. Should we ignore silently in construction? Or should we report it right away? If we ignore it, we can preserve the original behavior and allow later step to handle it but we actually create an incorrect ELFFile. If we want to emit the error in ELFFile, we should do some modification like here.

Can we lazily populate the information?

One of the issues we've had in the past with llvm-readobj and other dumping tools is it would error out early because it couldn't read some part of the file, long before it actually needed to display the relevant file information, which meant in turn it became hard to impossible to diagnose the problem when a file was malformed in some way. In practice, when we dump the file headers, whether the entire section header table is readable is irrelevant, for example. We should only need to read the 0th section header to do certain things. I hope that makes sense.

jh7370 · 2025-10-08T07:33:46Z

llvm/test/tools/llvm-objcopy/ELF/many-sections.test

@@ -6,7 +6,7 @@ RUN: llvm-readobj --file-headers --sections --symbols %t2 | FileCheck %s
 RUN: llvm-readelf --symbols %t2 | FileCheck --check-prefix=SYMS %s

 ## The ELF header should have e_shnum == 0 and e_shstrndx == SHN_XINDEX.
-# CHECK:        SectionHeaderCount: 0
+# CHECK:        SectionHeaderCount: 0 (65540)


Is this change actually related to the code changes you've made, or is it just tightening up the test and is already the current behaviour? If the latter, please put it in a separate PR.

Yes, it is part of the readelf (readobj). I would put to seperated PR. Thanks!

jh7370 · 2025-10-08T07:35:34Z

llvm/test/tools/llvm-readobj/ELF/file-headers.test

@@ -143,64 +143,67 @@ FileHeader:
 # RUN: yaml2obj %s --docnum=4 -o %t.invalid1
 # RUN: not llvm-readobj --file-headers %t.invalid1 2>&1 \
 # RUN:  | FileCheck %s --implicit-check-not=warning: -DFILE=%t.invalid1 \
-# RUN:    -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-LLVM
+# RUN:    -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-LLVM-TC1


-TC1 is a meaningless suffix. Instead, rename the whole prefix to better describe the specific case that you're interested in, e.g. something like BAD-SHOFF-LLVM. I'd also add a comment at the start of each test case explaining what the case is testing.

llvm/tools/llvm-readobj/ELFDumper.cpp

In ELF file, there is a possible extended header for those phnum, shnum, and shstrndx larger than the maximum of 16 bits. This extended header use section 0 to record these fields in 32 bits. We implment this feature so that programs rely on ELFFile::program_headers() can get the correct number of segments. Also, the consumers don't have to check the section 0 themselve, insteead, they can use the getPhNum() as an alternative.

aokblast · 2025-10-09T12:49:27Z

It seems that github does not have features that can submit patches based on another branch on my local tree. SO I send another PR with two commits. #162648

BTW, should I provide tests for this patch while the readobj patch have already tested it?

Also, thanks for you two to help me review my patch!

aokblast mentioned this pull request Oct 7, 2025

llvm-readelf fails to report > 65535 segments #132216

Open

aokblast force-pushed the elf/fix_65535err branch 2 times, most recently from 7124063 to a4ae837 Compare October 7, 2025 14:45

llvmbot added lld lld:ELF llvm:binary-utilities labels Oct 7, 2025

jh7370 requested a review from MaskRay October 7, 2025 14:55

jh7370 requested changes Oct 7, 2025

View reviewed changes

aokblast force-pushed the elf/fix_65535err branch from a4ae837 to 125374a Compare October 7, 2025 18:02

jh7370 reviewed Oct 8, 2025

View reviewed changes

MaskRay reviewed Oct 9, 2025

View reviewed changes

llvm/tools/llvm-readobj/ELFDumper.cpp Outdated Show resolved Hide resolved

MaskRay reviewed Oct 9, 2025

View reviewed changes

llvm/tools/llvm-readobj/ELFDumper.cpp Outdated Show resolved Hide resolved

aokblast force-pushed the elf/fix_65535err branch from 125374a to d2740bc Compare October 9, 2025 12:30

aokblast changed the title ~~[Object][ELF] Support extended header for Object Parser in ELF~~ [Object,ELF] Impelment PN_XNUM extension for program headers Oct 10, 2025

[Object,ELF] Impelment PN_XNUM extension for program headers #162288

Are you sure you want to change the base?

[Object,ELF] Impelment PN_XNUM extension for program headers #162288

Uh oh!

Conversation

aokblast commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Oct 7, 2025

Uh oh!

jh7370 left a comment

Choose a reason for hiding this comment

Uh oh!

aokblast commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jh7370 commented Oct 7, 2025

Uh oh!

aokblast commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MaskRay commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jh7370 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MaskRay Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aokblast Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aokblast Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aokblast commented Oct 7, 2025 •

edited

Loading

llvmbot commented Oct 7, 2025 •

edited

Loading

aokblast commented Oct 7, 2025 •

edited

Loading

aokblast commented Oct 7, 2025 •

edited

Loading

MaskRay commented Oct 8, 2025 •

edited

Loading

MaskRay Oct 9, 2025 •

edited

Loading

aokblast Oct 9, 2025 •

edited

Loading

aokblast Oct 8, 2025 •

edited

Loading

aokblast commented Oct 9, 2025 •

edited

Loading