Skip to content

LTO awareness#11

Merged
jserv merged 2 commits intomainfrom
lto-aware
Apr 30, 2026
Merged

LTO awareness#11
jserv merged 2 commits intomainfrom
lto-aware

Conversation

@jserv
Copy link
Copy Markdown
Owner

@jserv jserv commented Apr 30, 2026

No description provided.

jserv added 2 commits April 30, 2026 16:58
Replace scripts/text-rollup.py (which silently bucketed 99.7% as
<unknown> whenever the production build dropped DWARF) with
scripts/subsystem-rollup.py, an LTO-aware reporter that:

- strips GCC clone suffixes (.lto_priv/.constprop/.isra/.part/
  .cold/.localalias, including stacked combinations) before
  bucket lookup so specialized clones credit their parent
  function's source dir;
- deduplicates IPA-ICF aliases by start address (ld.bfd has no
  --icf, so all folding is GCC-side and surfaces as multiple
  distinct names sharing one address) into a normalized
  <icf-merged> bucket;
- uses addr2line -p -f -i with "(inlined by)" continuations to
  attribute each address to the OUTERMOST inline frame ("where
  the code lives in the image");
- filters by ELF section so .init.text/.exit.text/.head.text do
  not inflate the rollup by ~7%;
- routes linker-emitted bytes outside any nm symbol to
  <compiler-partition> rather than dropping them;
- lex-normalizes . and .. segments via posixpath.normpath before
  bucketing, so cross-tree paths like lib/../scripts/dtc/libfdt/...
  attribute to where the source actually lives (scripts) instead
  of the meaningless depth-2 key lib/..;
- exits 2 on missing DWARF instead of silently emitting garbage.

Outputs land next to section-sizes.txt: subsystem-rollup.txt
(table consumed by the gate), subsystem-rollup-bars.svg (sorted
horizontal bars), subsystem-rollup-tree.html (D3 treemap with
hover-to-drill).

Add --deep BUCKET flag (repeatable) plus --deep-output PATH to
emit two new sibling artifacts: subsystem-rollup-deep.txt (tab-
delimited per-bucket 2nd-level subdirectory rollup + top-N source
files, machine-parseable) and subsystem-rollup-deep.html (styled
tables with sticky header, in-page nav between buckets,
proportional share bars, hover tooltips). kernel-size-report.sh
invokes it with --deep kernel --deep lib. --deep-output rejects
.html-suffixed paths because the HTML sibling derives via
with_suffix(".html") and would otherwise alias the .txt silently.
The _esc helper escapes both " and ' in addition to <>&, so values
are safe in attribute contexts as well as tag content.

Add per-bucket budget gate: scripts/check-subsystem-budget.py
diffs subsystem-rollup.txt against configs/subsystem-budget.txt
with a default +/- 2% noise band per bucket (LTO re-decides what
to inline between rebuilds, so identical sources still produce
small fluctuations). Wired into kernel-size-report.sh after the
rollup as a warn-only stage: a breach prints to stderr and writes
subsystem-budget.txt with bucket-by-bucket status, but does not
abort the build. Total-bytes threshold remains the coarse gate;
this layer answers WHICH bucket regressed. subsystem-budget.txt
ships with all rules commented out -- the operator pins ceilings
5-10% above observed sizes after one diagnostic run.

qemu-trace-to-orderfile.py learns LTO-clone normalization for the
bootcost view (matches the rollup's clone-suffix stripping so the
same function's hot-path hits do not split across N specialized
clone names) and emits kernel_bootcost.txt rolled up by
context_switch / scheduler / syscall_entry / exec_path /
fork_clone / softirq_irq buckets. collect-kernel-profile.sh adds
the new artifact to its cleanup and output lists.

Multi-model review (Gemini + Codex) caught the lex-normalize gap,
the _esc quote-escape gap, and the --deep-output aliasing risk
before merge -- all three are addressed above.
The embedded initramfs is gzip-compressed
(CONFIG_INITRAMFS_COMPRESSION_GZIP=y, runtime decompressor
lib/zlib_inflate 4,588 bytes), but olddefconfig was silently
restoring upstream "default y" for every other RD_* selector.
The new sub-bucket rollup surfaced the cost: lib/zstd 36,942
bytes, lib/lz4 10,972 bytes, lib/xz 6,598 bytes -- ~54KB of
decompressor library code with no consumer on this target.
RD_ZSTD also pulls lib/xxhash.c (~3KB), which cascades out
automatically.

Add explicit "# CONFIG_RD_ZSTD is not set" / RD_LZ4 / RD_XZ
disables to the inline kernel .config block, and mirror them
into the existing positive olddefconfig-survivor verification.
RD_LZMA / RD_BZIP2 / RD_LZO are deferred (RD_LZO has 728 bytes
of measured cost; LZMA and BZIP2 have zero in this build, only
matter for hygiene).

Add a negative-guard loop that fails the build if any of
ZSTD_DECOMPRESS, ZSTD_COMMON, LZ4_DECOMPRESS, XZ_DEC, XXHASH,
DECOMPRESS_ZSTD, DECOMPRESS_LZ4, or DECOMPRESS_XZ survive
olddefconfig as =y. The decompressor libraries are hidden bools
selected by the RD_* options, but a future fs/ or net/ enable
(e.g. squashfs+zstd) could re-pull them through a different
selector -- the guard catches that drift loudly so the size win
does not silently regress. XXHASH is in the list because the
Kconfig comment claims the cascade covers it; including it
tightens the guard to match the claim, with no other in-tree
consumer enabled here to displace.

Result: linux.axf 1,303,072 -> 1,229,344 bytes (-73,728 / -5.7%
in three steps; cumulative -32.4% vs the pre-pruning baseline).
QEMU boot-test (scripts/validate-qemu.sh) passes.
@jserv jserv merged commit 3bf761f into main Apr 30, 2026
2 checks passed
@jserv jserv deleted the lto-aware branch April 30, 2026 09:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant