Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix .bss section accumulated in ELF file #78265

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

deepa2015
Copy link

Linker fix for issue #78261

The file offsets are incorrectly computed for sections considered after .bss section. The .bss section is accumulated in ELF file.
The computeFileOffset() is modified to fix this issue.

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Collaborator

llvmbot commented Jan 16, 2024

@llvm/pr-subscribers-lld-elf

@llvm/pr-subscribers-lld

Author: None (deepa2015)

Changes

Linker fix for issue #78261

The file offsets are incorrectly computed for sections considered after .bss section. The .bss section is accumulated in ELF file.
The computeFileOffset() is modified to fix this issue.


Full diff: https://github.com/llvm/llvm-project/pull/78265.diff

5 Files Affected:

  • (modified) lld/ELF/Writer.cpp (+9-3)
  • (modified) lld/ELF/Writer.h (+2-1)
  • (modified) lld/test/ELF/linkerscript/sections-nonalloc.s (+11-11)
  • (modified) lld/test/ELF/linkerscript/sections.s (+3-3)
  • (modified) lld/test/ELF/x86-64-section-layout.s (+5-5)
diff --git a/lld/ELF/Writer.cpp b/lld/ELF/Writer.cpp
index dfec5e07301a74a..59c240a0c893b05 100644
--- a/lld/ELF/Writer.cpp
+++ b/lld/ELF/Writer.cpp
@@ -2630,16 +2630,22 @@ static uint64_t computeFileOffset(OutputSection *os, uint64_t off) {
   // increasing rather than setting to zero.
   if (os->type == SHT_NOBITS &&
       (!Out::tlsPhdr || Out::tlsPhdr->firstSec != os))
-     return off;
+  {
+    // Update no_bits size in the current segment
+    if (os->ptLoad)
+      os->ptLoad->p_nobits +=  os->size;
+    return off;
+  }
 
   // If the section is not in a PT_LOAD, we just have to align it.
   if (!os->ptLoad)
      return alignToPowerOf2(off, os->addralign);
 
   // If two sections share the same PT_LOAD the file offset is calculated
-  // using this formula: Off2 = Off1 + (VA2 - VA1).
+  // using this formula: Off2 = Off1 + (VA2 - VA1 - p_nobits).
   OutputSection *first = os->ptLoad->firstSec;
-  return first->offset + os->addr - first->addr;
+
+  return first->offset + (os->addr - first->addr - os->ptLoad->p_nobits);
 }
 
 template <class ELFT> void Writer<ELFT>::assignFileOffsetsBinary() {
diff --git a/lld/ELF/Writer.h b/lld/ELF/Writer.h
index aac8176d90989f3..6f5ab6ce3202f26 100644
--- a/lld/ELF/Writer.h
+++ b/lld/ELF/Writer.h
@@ -37,7 +37,8 @@ struct PhdrEntry {
   uint32_t p_align = 0;
   uint32_t p_type = 0;
   uint32_t p_flags = 0;
-
+  // NO_BITS size processed in the output section..
+  uint64_t p_nobits =0;
   OutputSection *firstSec = nullptr;
   OutputSection *lastSec = nullptr;
   bool hasLMA = false;
diff --git a/lld/test/ELF/linkerscript/sections-nonalloc.s b/lld/test/ELF/linkerscript/sections-nonalloc.s
index a0669f701d8c90c..03de378c3b2a4c3 100644
--- a/lld/test/ELF/linkerscript/sections-nonalloc.s
+++ b/lld/test/ELF/linkerscript/sections-nonalloc.s
@@ -37,18 +37,18 @@
 # CHECK1-NEXT: [ 0]           NULL     0000000000000000 000000 000000 00      0
 # CHECK1-NEXT: [ 1] .text     PROGBITS 00000000000000b0 0000b0 000001 00  AX  0
 # CHECK1-NEXT: [ 2] .bss      NOBITS   00000000000000b1 0000b1 000001 00  WA  0
-# CHECK1-NEXT: [ 3] data1     PROGBITS 00000000000000b2 0000b2 000001 00  WA  0
-# CHECK1-NEXT: [ 4] data3     PROGBITS 00000000000000b3 0000b3 000001 00  WA  0
-# CHECK1-NEXT: [ 5] other1    PROGBITS 0000000000000000 0000b8 000001 00      0
-# CHECK1-NEXT: [ 6] other2    PROGBITS 0000000000000000 0000c0 000001 00      0
-# CHECK1-NEXT: [ 7] other3    PROGBITS 0000000000000000 0000d0 000001 00      0
-# CHECK1-NEXT: [ 8] .symtab   SYMTAB   0000000000000000 0000d8 000030 18     10
-# CHECK1-NEXT: [ 9] .shstrtab STRTAB   0000000000000000 000108 00004d 00      0
-# CHECK1-NEXT: [10] .strtab   STRTAB   0000000000000000 000155 000008 00      0
-# CHECK1-NEXT: [11] data2     PROGBITS 00000000000000b4 0000b4 000001 00  WA  0
+# CHECK1-NEXT: [ 3] data1     PROGBITS 00000000000000b2 0000b1 000001 00  WA  0
+# CHECK1-NEXT: [ 4] data3     PROGBITS 00000000000000b3 0000b2 000001 00  WA  0
+# CHECK1-NEXT: [ 5] other1    PROGBITS 0000000000000000 0000b4 000001 00      0
+# CHECK1-NEXT: [ 6] other2    PROGBITS 0000000000000000 0000b8 000001 00      0
+# CHECK1-NEXT: [ 7] other3    PROGBITS 0000000000000000 0000c0 000001 00      0
+# CHECK1-NEXT: [ 8] .symtab   SYMTAB   0000000000000000 0000c8 000030 18     10
+# CHECK1-NEXT: [ 9] .shstrtab STRTAB   0000000000000000 0000f8 00004d 00      0
+# CHECK1-NEXT: [10] .strtab   STRTAB   0000000000000000 000145 000008 00      0
+# CHECK1-NEXT: [11] data2     PROGBITS 00000000000000b4 0000b3 000001 00  WA  0
 # CHECK1:      Type       Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
-# CHECK1-NEXT: LOAD       0x000000 0x0000000000000000 0x0000000000000000 0x0000b5 0x0000b5 RWE 0x1000
-# CHECK1-NEXT: 0x60000000 0x0000b8 0x0000000000000000 0x0000000000000000 0x000009 0x000001     0x8
+# CHECK1-NEXT: LOAD       0x000000 0x0000000000000000 0x0000000000000000 0x0000b4 0x0000b5 RWE 0x1000
+# CHECK1-NEXT: 0x60000000 0x0000b4 0x0000000000000000 0x0000000000000000 0x000005 0x000001     0x8
 
 #--- a.lds
 SECTIONS {
diff --git a/lld/test/ELF/linkerscript/sections.s b/lld/test/ELF/linkerscript/sections.s
index 539aa9c1705888b..6ac435f74b99739 100644
--- a/lld/test/ELF/linkerscript/sections.s
+++ b/lld/test/ELF/linkerscript/sections.s
@@ -79,12 +79,12 @@
 # SEP-BY-NONALLOC:      [ 1] .text     PROGBITS 0000000000000000 001000 00000e 00  AX
 # SEP-BY-NONALLOC-NEXT: [ 2] .data     PROGBITS 000000000000000e 00100e 000020 00  WA
 # SEP-BY-NONALLOC-NEXT: [ 3] .bss      NOBITS   000000000000002e 00102e 000002 00  WA
-# SEP-BY-NONALLOC-NEXT: [ 4] .comment  PROGBITS 0000000000000000 001033 000008 01  MS
-# SEP-BY-NONALLOC:      [ 8] other     PROGBITS 0000000000000030 001030 000003 00  WA
+# SEP-BY-NONALLOC-NEXT: [ 4] .comment  PROGBITS 0000000000000000 001031 000008 01  MS
+# SEP-BY-NONALLOC:      [ 8] other     PROGBITS 0000000000000030 00102e 000003 00  WA
 
 # SEP-BY-NONALLOC:      Type      Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
 # SEP-BY-NONALLOC-NEXT: LOAD      0x001000 0x0000000000000000 0x0000000000000000 0x00000e 0x00000e R E 0x1000
-# SEP-BY-NONALLOC-NEXT: LOAD      0x00100e 0x000000000000000e 0x000000000000000e 0x000025 0x000025 RW  0x1000
+# SEP-BY-NONALLOC-NEXT: LOAD      0x00100e 0x000000000000000e 0x000000000000000e 0x000023 0x000025 RW  0x1000
 # SEP-BY-NONALLOC-NEXT: GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0
 
 # Input section pattern contains additional semicolon.
diff --git a/lld/test/ELF/x86-64-section-layout.s b/lld/test/ELF/x86-64-section-layout.s
index 37201279fa0a5d0..9d60510a0d13b65 100644
--- a/lld/test/ELF/x86-64-section-layout.s
+++ b/lld/test/ELF/x86-64-section-layout.s
@@ -57,17 +57,17 @@
 # CHECK2-NEXT: .tbss.1    NOBITS          0000000000200307 000306 000001 00 WAT  0   0  1
 # CHECK2-NEXT: .data      PROGBITS        0000000000200306 000306 000001 00  WA  0   0  1
 # CHECK2-NEXT: .bss       NOBITS          0000000000200307 000307 001800 00  WA  0   0  1
-# CHECK2-NEXT: .ldata     PROGBITS        0000000000201b07 001b07 000002 00 WAl  0   0  1
-# CHECK2-NEXT: .ldata2    PROGBITS        0000000000201b09 001b09 000001 00 WAl  0   0  1
-# CHECK2-NEXT: .lbss      NOBITS          0000000000201b0a 001b0a 000002 00 WAl  0   0  1
-# CHECK2-NEXT: .comment   PROGBITS        0000000000000000 001b0a {{.*}} 01  MS  0   0  1
+# CHECK2-NEXT: .ldata     PROGBITS        0000000000201b07 000307 000002 00 WAl  0   0  1
+# CHECK2-NEXT: .ldata2    PROGBITS        0000000000201b09 000309 000001 00 WAl  0   0  1
+# CHECK2-NEXT: .lbss      NOBITS          0000000000201b0a 00030a 000002 00 WAl  0   0  1
+# CHECK2-NEXT: .comment   PROGBITS        0000000000000000 00030a {{.*}} 01  MS  0   0  1
 
 # CHECK2:      Program Headers:
 # CHECK2-NEXT:   Type  Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
 # CHECK2-NEXT:   PHDR  0x000040 0x0000000000200040 0x0000000000200040 {{.*}}   {{.*}}   R   0x8
 # CHECK2-NEXT:   LOAD  0x000000 0x0000000000200000 0x0000000000200000 0x000304 0x000304 R   0x1000
 # CHECK2-NEXT:   LOAD  0x000304 0x0000000000200304 0x0000000000200304 0x000001 0x000001 R E 0x1000
-# CHECK2-NEXT:   LOAD  0x000305 0x0000000000200305 0x0000000000200305 0x001805 0x001807 RW  0x1000
+# CHECK2-NEXT:   LOAD  0x000305 0x0000000000200305 0x0000000000200305 0x000005 0x001807 RW  0x1000
 # CHECK2-NEXT:   TLS   0x000305 0x0000000000200305 0x0000000000200305 0x000001 0x000003 R   0x1
 
 #--- a.s

Copy link

github-actions bot commented Jan 16, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@smithp35
Copy link
Collaborator

Thanks for the patch. I think we'll need to have a discussion on what we want the behaviour of LLD to be before evaluating a potential fix, I think we'll also want some new test cases that describe that behaviour.

For example in a single PT_LOAD region if .bss precedes non .bss then if the size of the .bss is not added to the file-offset then the .bss when created may (when the .bss VMA and LMA are the same) overwrite the contents of the following non .bss section, however if the size is added then creating the .bss can be done.

    .data VMA 0x1000 LMA 0x1000 Size 4
    .bss VMA 0x1004 LMA 0x1004 Size 4
    /* .bss when created will overwrite following section if we don't increase file offset (LMA here) */
    /* If VMA was something different say 0x1000000 then LMA 0x1004 is fine */
    .more_data VMA 0x1008 LMA 0x1004 Size 4 

In summary if all the VMAs in a PT_LOAD are monotonically ascending and the sections are contiguous in VMA and VMA == LMA then it is probably better to reserve space for .bss.

I'm also a bit concerned that subtracting the .bss size seen so far from (os->addr - first->addr) might cause problems with alignment (in the file). For example:

    .data VMA 0x1000 size 4
    .bss VMA 0x1004 size 4
    .other VMA 0x1010 (align 0x10) size 4

If we just subtract 4 then we'll have altered the file offset alignment.

I do agree that there are problems lurking in computeFileOffsets though. In particular I think that MEMORY and PHDRS lets people create non-monotonically ascending VMA in a PT_LOAD program header which I don't think computeFileOffsets works well for. I've seen a couple of issues filed for that.

I'm planning to do some investigations into how GNU ld and LLD handle address assignment over the next few weeks for a FOSDEM presentation in February. Will hopefully know the bounds of what works and what doesn't by then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants