Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLD over-aligns strings and wastes space #54036

Closed
int3 opened this issue Feb 23, 2022 · 2 comments
Closed

LLD over-aligns strings and wastes space #54036

int3 opened this issue Feb 23, 2022 · 2 comments
Assignees

Comments

@int3
Copy link
Contributor

int3 commented Feb 23, 2022

The over-alignment hack was added here: https://reviews.llvm.org/D104835

The overhead of over-alignment is minor on Chromium, but it seems significant on at least one of our internal builds (making __cstring approx. 50% larger).

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 23, 2022

@llvm/issue-subscribers-lld-macho

@EugeneZelenko EugeneZelenko removed the lld label Feb 23, 2022
@int3
Copy link
Contributor Author

int3 commented Feb 24, 2022

#50135 explains why the hack was added in the first place.

@int3 int3 self-assigned this Mar 8, 2022
@int3 int3 closed this as completed in 4308f03 Mar 10, 2022
gbaraldi pushed a commit to JuliaLang/llvm-project that referenced this issue Jan 6, 2023
Previously, we aligned every cstring to 16 bytes as a temporary hack to
deal with llvm#50135. However, it
was highly wasteful in terms of binary size.

To recap, in contrast to ELF, which puts strings that need different
alignments into different sections, `clang`'s Mach-O backend puts them
all in one section.  Strings that need to be aligned have the .p2align
directive emitted before them, which simply translates into zero padding
in the object file. In other words, we have to infer the alignment of
the cstrings from their addresses.

We differ slightly from ld64 in how we've chosen to align these
cstrings. Both LLD and ld64 preserve the number of trailing zeros in
each cstring's address in the input object files. When deduplicating
identical cstrings, both linkers pick the cstring whose address has more
trailing zeros, and preserve the alignment of that address in the final
binary. However, ld64 goes a step further and also preserves the offset
of the cstring from the last section-aligned address.  I.e. if a cstring
is at offset 18 in the input, with a section alignment of 16, then both
LLD and ld64 will ensure the final address is 2-byte aligned (since
`18 == 16 + 2`). But ld64 will also ensure that the final address is of
the form 16 * k + 2 for some k (which implies 2-byte alignment).

Note that ld64's heuristic means that a dedup'ed cstring's final address is
dependent on the order of the input object files. E.g. if in addition to the
cstring at offset 18 above, we have a duplicate one in another file with a
`.cstring` section alignment of 2 and an offset of zero, then ld64 will pick
the cstring from the object file earlier on the command line (since both have
the same number of trailing zeros in their address). So the final cstring may
either be at some address `16 * k + 2` or at some address `2 * k`.

I've opted not to follow this behavior primarily for implementation
simplicity, and secondarily to save a few more bytes. It's not clear to me
that preserving the section alignment + offset is ever necessary, and there
are many cases that are clearly redundant. In particular, if an x86_64 object
file contains some strings that are accessed via SIMD instructions, then the
.cstring section in the object file will be 16-byte-aligned (since SIMD
requires its operand addresses to be 16-byte aligned). However, there will
typically also be other cstrings in the same file that aren't used via SIMD
and don't need this alignment. They will be emitted at some arbitrary address
`A`, but ld64 will treat them as being 16-byte aligned with an offset of
`16 % A`.

I have verified that the two repros in llvm#50135
work well with the new alignment behavior.

Fixes llvm#54036.

Reviewed By: #lld-macho, oontvoo

Differential Revision: https://reviews.llvm.org/D121342
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants