[lld/ELF] Add documentation on large sections #82560

aeubanks · 2024-02-22T00:50:48Z

Fixes #82438

Fixes llvm#82438

llvmbot · 2024-02-22T00:51:23Z

@llvm/pr-subscribers-lld

@llvm/pr-subscribers-lld-elf

Author: Arthur Eubanks (aeubanks)

Changes

Fixes #82438

Full diff: https://github.com/llvm/llvm-project/pull/82560.diff

4 Files Affected:

(added) lld/docs/ELF/large_sections.rst (+42)
(added) lld/docs/ELF/large_sections_nopic.png ()
(added) lld/docs/ELF/large_sections_pic.png ()
(modified) lld/docs/index.rst (+1)

diff --git a/lld/docs/ELF/large_sections.rst b/lld/docs/ELF/large_sections.rst
new file mode 100644
index 00000000000000..2bcb9b63faddb7
--- /dev/null
+++ b/lld/docs/ELF/large_sections.rst
@@ -0,0 +1,42 @@
+Large sections
+==============
+
+When linking very large binaries, lld may report relocation overflows like
+
+::
+
+  relocation R_X86_64_PC32 out of range: 2158227201 is not in [-2147483648, 2147483647]
+
+This happens when running into architectural limitations. For example, in x86-64
+PIC code, a reference to a static global variable is typically done with a
+``R_X86_64_PC32`` relocation, which is a 32-bit signed offset from the PC. That
+means if the global is laid out further than 2GB (2^31 bytes) from the
+instruction referencing it, we run into a relocation overflow.
+
+Some code models offer a tradeoff between relocation pressure and performance.
+For example, x86-64's medium code model splits globals into small and large
+globals depending on if they are over a certain size. Large globals are placed
+further away from text and we use 64-bit references to refer to them.
+
+Large globals are placed in separate sections from small globals, and those
+sections have a "large" section flag, e.g. ``SHF_X86_64_LARGE`` for x86-64. The
+linker places large sections on the outer edges of the binary, making sure they
+do not affect affect the distance of small globals to text. The large versions
+of ``.rodata``, ``.bss``, and ``.data`` are ``.lrodata``, ``.lbss``, and
+``.ldata``, and they are laid out as follows:
+
+.. image:: large_sections_pic.png
+
+We try to keep the number of PT_LOAD segments to a minimum, so we place large
+sections next to the small sections with the same RWX permissions when possible.
+
+``.lbss`` is right after ``.bss`` so that they are merged together and we
+minimize the number of segments with ``p_memsz > p_filesz``.
+
+Note that the above applies to PIC code. For non-PIC code with absolute
+relocations instead of relative relocations, 32-bit relocations typically assume
+that symbols are in the lower 2GB of the address space. So for non-PIC code,
+large sections should be placed after all small sections. ``-z
+lrodata-after-bss`` changes the layout to be:
+
+.. image:: large_sections_nopic.png
diff --git a/lld/docs/ELF/large_sections_nopic.png b/lld/docs/ELF/large_sections_nopic.png
new file mode 100644
index 00000000000000..10ec06f20d5727
Binary files /dev/null and b/lld/docs/ELF/large_sections_nopic.png differ
diff --git a/lld/docs/ELF/large_sections_pic.png b/lld/docs/ELF/large_sections_pic.png
new file mode 100644
index 00000000000000..92976b5f78096b
Binary files /dev/null and b/lld/docs/ELF/large_sections_pic.png differ
diff --git a/lld/docs/index.rst b/lld/docs/index.rst
index 6281b6893cc19b..f5f9802fd974e6 100644
--- a/lld/docs/index.rst
+++ b/lld/docs/index.rst
@@ -169,4 +169,5 @@ document soon.
    ELF/linker_script
    ELF/start-stop-gc
    ELF/warn_backrefs
+   ELF/large_sections
    MachO/index

lld/docs/index.rst

MaskRay · 2024-02-22T01:45:41Z

lld/docs/ELF/large_sections_nopic.png

This is 16KB. Is there a way to decrease the size (permanent repo size increase)? Does a svg help?

I'm confused by the arcs. What do they mean?

I've changed the images to just display one layout without any arrows. Now they're 2-3KB each.

(an svg of the original pictures was larger at ~40KB)

I think I drew this on a whiteboard originally, and the intention of the arcs was to signal that data above the large data threshold from the original small data section would move to the large data section, keeping overall binary size the same, but changing the layout to make the small section portion of the binary more compact.

Maybe it would be better expressed with two colors (blue/red), where blue indicates small data (below the threshold) and red indicates large data (above the threshold), and the .l-prefixed sections are all red.

That was the idea, but let's not make that diagram a requirement.

lld/docs/ELF/large_sections.rst

MaskRay · 2024-02-22T01:51:22Z

lld/docs/ELF/large_sections.rst

+Note that the above applies to PIC code. For non-PIC code with absolute
+relocations instead of relative relocations, 32-bit relocations typically assume
+that symbols are in the lower 2GB of the address space. So for non-PIC code,
+large sections should be placed after all small sections. ``-z


I think "should" is too strong. This is an edge case that very few groups run into anyway. Perhaps "You can specify -z .... to place ..."

Large data should not contribute to relocation pressure. Putting .lrodata at the beginning of non-PIC binaries increases relocation pressure. So I think "should" is appropriate here. I did change it to say "less common non-PIC code".

.lrodata doesn't increase relocation pressure. It does not change relocation pressure, since it's spin-off from .rodata.

I'm not understanding your comment. Perhaps we have different definitions of relocation pressure? My definition is that a symbol contributes to relocation pressure if it pushes 32-bit relocations toward overflowing. Putting .lrodata at the beginning of a non-PIC binary pushes small symbols toward the end of the lower 2GB.

Switching from -fno-pic -mcmodel=small to -fno-pic -mcmodel=medium alleviates relocation overflow pressure for .ldata .lbss which are split off from .data .bss. This switch does not alleviate .lrodata, which is split off from .rodata. Quite significant -fno-pic -mcmodel=small users benefit from the default layout. -z lrodata-after-bss likely helps more, but we should not say they should do this.

I don't understand the concern, and it seems this is a sticking point. I encourage you to use a real-time communication medium to come to consensus.

Quite significant -fno-pic -mcmodel=small users benefit from the default layout.

I don't understand this. The only benefit I can think of is having one fewer segment / program header, which seems pretty marginal.

I don't have data ready, but in our experience, readonly data is significantly larger than writable global data (.data & .bss), so I think moving an fno-pic build from small to medium with the current layout will have very little benefit.

"So for non-PIC code, large sections should be placed after all small sections."

I don't think there is large discrepancy here. I just pointed out that "should" is too strong. The alternative

"So for non-PIC code (which requires -no-pie linking), you can specify -z lrodata-after-bss to place large read-only data sections after .bss to alleviate ..."

is smoother.

I see, you're saying that for a given binary, laying out lbss/ldata after bss/data already helps with relocation pressure. Laying out lrodata at the end is just another helpful thing you can do for non-PIC binaries.

My perspective is that we've already decided which symbols should not contribute toward 32-bit relocation pressure by marking them as large.

lld/docs/ELF/large_sections.rst

[lld/ELF] Add documentation on large sections

b605fc0

Fixes llvm#82438

aeubanks requested review from rnk and MaskRay February 22, 2024 00:50

llvmbot added lld lld:ELF labels Feb 22, 2024

MaskRay reviewed Feb 22, 2024

View reviewed changes

address comments

089d11e

MaskRay approved these changes Feb 23, 2024

View reviewed changes

actually commit the right pngs

a7469cc

aeubanks merged commit 4bc3b35 into llvm:main Feb 26, 2024
5 checks passed

aeubanks deleted the large-docs branch February 26, 2024 17:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[lld/ELF] Add documentation on large sections #82560

[lld/ELF] Add documentation on large sections #82560

aeubanks commented Feb 22, 2024

llvmbot commented Feb 22, 2024 •

edited

MaskRay Feb 22, 2024

aeubanks Feb 22, 2024

rnk Feb 22, 2024

MaskRay Feb 22, 2024

aeubanks Feb 22, 2024

MaskRay Feb 23, 2024 •

edited

aeubanks Feb 23, 2024

MaskRay Feb 23, 2024

rnk Feb 23, 2024

MaskRay Feb 23, 2024 •

edited

aeubanks Feb 23, 2024

[lld/ELF] Add documentation on large sections #82560

[lld/ELF] Add documentation on large sections #82560

Conversation

aeubanks commented Feb 22, 2024

llvmbot commented Feb 22, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MaskRay Feb 23, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MaskRay Feb 23, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

llvmbot commented Feb 22, 2024 •

edited

MaskRay Feb 23, 2024 •

edited

MaskRay Feb 23, 2024 •

edited