Skip irrelevant foreign impls when building the specialization graph by xmakro · Pull Request #157281 · rust-lang/rust

xmakro · 2026-06-02T03:18:57Z

What this does

specialization_graph_provider enumerates every impl of a trait, including all foreign impls, and records each one into the graph. For a leaf crate this is dominated by foreign impls: a crate that locally defines a handful of Debug/Clone impls still pulls in every foreign impl Debug for <std type>, decoding each one's trait ref via impl_trait_ref and reading impl_parent.

This skips foreign non-blanket impls that cannot affect local coherence. The work list is built from trait_impls_of and keeps:

all blanket impls, and
every non-blanket impl in a simplified-self bucket that contains at least one local impl.

Foreign impls whose bucket holds no local impl are never recorded.

Why it is sound

Overlap checking of a local impl only consults blanket impls and the non-blanket impls sharing its simplified self type (filtered_children). The orphan rules forbid a local blanket impl of a foreign trait, so a foreign non-blanket impl can only matter to local coherence if its bucket also contains a local impl, and those buckets are kept in full.

The graph's parent map is read in exactly one place, Ancestors::next. For a foreign impl the parent is now read lazily from crate metadata via impl_parent instead of from the graph, consistent with how foreign parent chains are already resolved for impls reachable only as ancestors. Specialization preserves the head constructor (a specializing impl shares the parent's simplified self type, or the parent is blanket), so the parent of any kept non-blanket impl is itself always kept and no graph node becomes unreachable.

What was measured

Two stage1 librustc_driver shared objects were built from this tree, one at the parent commit (baseline) and one with the change. For each run only the .so is swapped, so nothing else differs. The metric is callgrind instruction reads (Ir), which is deterministic (no run-to-run noise), attributed to the rustc process with the largest Ir, that is the crate's own rustc --crate-name <crate> invocation; build scripts and proc-macro host compiles are excluded. Two scenarios per crate:

From scratch (CARGO_INCREMENTAL=0): dependencies prebuilt, then the crate compiled cold (cargo check to warm deps, touch the crate's own sources, cargo check under callgrind). This models clean builds and CI, where every crate is compiled from source.
Incremental, unchanged (CARGO_INCREMENTAL=1): build once to populate the on-disk incremental cache, then bump source mtimes only (no content change) and rebuild under callgrind. This models the steady-state warm edit loop where the previous session's caches are reused.

From scratch

crate	baseline Ir	with change Ir	delta Ir	delta
tokio 1.38	810,555,980	754,459,060	-56,096,920	-6.92%
regex 1.10	1,247,973,341	1,183,361,228	-64,612,113	-5.18%
ripgrep 14.1	3,125,603,677	3,058,667,865	-66,935,812	-2.14%
syn 2.0	5,532,483,754	5,481,826,160	-50,657,594	-0.92%
rayon 1.10	9,131,456,998	9,073,720,082	-57,736,916	-0.63%
serde 1.0	12,074,820,787	12,027,900,440	-46,920,347	-0.39%
wasmi 0.35	24,118,407,063	24,052,136,488	-66,270,575	-0.28%

The absolute saving is similar across crates (roughly 47M to 67M Ir); the percentage tracks how foreign-impl-heavy a crate is relative to its own code.

Incremental, unchanged

crate	baseline Ir	with change Ir	delta Ir	delta
regex 1.10	429,085,543	381,953,940	-47,131,603	-10.98%
tokio 1.38	382,531,263	342,721,889	-39,809,374	-10.41%
ripgrep 14.1	1,027,211,095	974,595,117	-52,615,978	-5.12%

The warm rebuild shows a larger percentage even though specialization_graph_of is cache_on_disk and its result is reloaded rather than recomputed. The saving here is not from the provider: recording each foreign impl creates an impl_parent and an impl_trait_header dep-graph node, and skipping them leaves the serialized dependency graph many thousands of nodes smaller, so try_mark_previous_green has less to validate on every incremental session. The absolute saving is comparable to the from-scratch case, against a much smaller total, hence the higher percentage.

Wall clock

The Ir figures above are the deterministic signal. Native cargo check wall time was also measured the same way (same .so hot-swap, baseline and with-change trials interleaved so thermal drift cancels), reporting the median and best (min) per crate. Wall time carries real run-to-run noise, so it is weaker evidence than Ir, but it confirms the direction and rough size.

Incremental, unchanged (re-check after bumping source mtimes, 21 trials):

crate	baseline median	with change median	median delta	best delta
regex 1.10	0.0700 s	0.0623 s	-11.0%	-7.4%
tokio 1.38	0.0718 s	0.0690 s	-3.9%	-6.2%
ripgrep 14.1	0.1586 s	0.1519 s	-4.2%	-3.9%
serde 1.0	0.3562 s	0.3506 s	-1.6%	-1.7%
syn 2.0	0.2327 s	0.2346 s	+0.8%	-0.2%

From scratch (clean check of the crate plus its dependencies, 7 trials):

crate	baseline median	with change median	median delta	best delta
tokio 1.38	0.1734 s	0.1661 s	-4.2%	-3.8%
ripgrep 14.1	3.3812 s	3.3373 s	-1.3%	-1.3%
regex 1.10	2.4611 s	2.4494 s	-0.5%	-0.4%

The incremental, unchanged wall time tracks the Ir saving on the foreign-impl-heavy crates (regex about -11%, ripgrep and tokio in the -4 to -6% range) and falls into the noise floor on the largest crates (serde about -1.6%, syn indistinguishable from zero). This matches the deterministic Ir: the absolute saving is a fixed few tens of millions of Ir, so it shrinks as a fraction of a larger rebuild.

The from-scratch wall time is the whole-project check, so the saving on any single crate's front end is diluted by recompiling every dependency and by cargo's own overhead. tokio, whose default-feature check is essentially just the tokio crate, keeps most of the effect (about -4%); regex and ripgrep, which pull in several dependency crates, dilute to roughly -0.5 to -1.3%.

Check vs build

The work removed is front-end coherence and trait analysis, so a full cargo build sees the same absolute saving, with codegen then diluting the percentage. The figures above are cargo check, so they are the upper bound on the proportional effect.

These are local measurements (deterministic callgrind Ir plus native wall clock); marking the PR as draft so a perf run can confirm on the full suite.

Testing

The specialization, coherence, traits, negative-impls, associated-types, impl-trait, auto-traits and marker_trait_attr UI suites pass locally with no failures (3264 tests).
The standard library, which uses min_specialization, builds cleanly with the modified compiler.
cargo check diagnostics are byte-identical to baseline on regex, syn, serde and ripgrep.

The specialization graph is built to overlap-check local impls and to walk specialization parent chains. A local non-blanket impl is only overlap-checked against blanket impls and against non-blanket impls that share its simplified self type, and the orphan rules forbid local blanket impls of foreign traits. A foreign non-blanket impl is therefore irrelevant to local coherence unless its simplified-self bucket also contains a local impl. Build the work list from `trait_impls_of` and skip foreign non-blanket impls whose bucket holds no local impl, instead of enumerating every impl of the trait. For a typical leaf crate this skips the large majority of impls (for example the many foreign `Debug` and `Clone` impls), avoiding the `impl_trait_ref` decode and `impl_parent` read that recording each one would otherwise require. A local blanket impl is the one exception: it is overlap-checked against every child rather than just its own bucket, so the foreign non-blanket impls are kept whenever one is present. Orphan rules forbid local blanket impls of foreign traits, so this only retains them for crates that are already going to be rejected, where it avoids reporting a spurious extra overlap on top of the orphan error. Specialization parents of foreign impls are now resolved lazily from crate metadata in `Ancestors::next`, since a skipped foreign impl is no longer present in the graph's `parent` map. This is sound because specialization preserves the head constructor, so the parent of any kept non-blanket impl is itself always kept.

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jun 2, 2026

This comment has been minimized.

Sign in to view

xmakro force-pushed the perf/spec-graph-skip-foreign-impls branch from 290c594 to 52e8f33 Compare June 2, 2026 16:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Skip irrelevant foreign impls when building the specialization graph#157281

Skip irrelevant foreign impls when building the specialization graph#157281
xmakro wants to merge 1 commit into
rust-lang:mainfrom
xmakro:perf/spec-graph-skip-foreign-impls

xmakro commented Jun 2, 2026 •

edited

Loading

Uh oh!

This comment has been minimized.

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

xmakro commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this does

Why it is sound

What was measured

From scratch

Incremental, unchanged

Wall clock

Check vs build

Testing

Uh oh!

This comment has been minimized.

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

xmakro commented Jun 2, 2026 •

edited

Loading