Skip irrelevant foreign impls when building the specialization graph#157281
Draft
xmakro wants to merge 1 commit into
Draft
Skip irrelevant foreign impls when building the specialization graph#157281xmakro wants to merge 1 commit into
xmakro wants to merge 1 commit into
Conversation
This comment has been minimized.
This comment has been minimized.
The specialization graph is built to overlap-check local impls and to walk specialization parent chains. A local non-blanket impl is only overlap-checked against blanket impls and against non-blanket impls that share its simplified self type, and the orphan rules forbid local blanket impls of foreign traits. A foreign non-blanket impl is therefore irrelevant to local coherence unless its simplified-self bucket also contains a local impl. Build the work list from `trait_impls_of` and skip foreign non-blanket impls whose bucket holds no local impl, instead of enumerating every impl of the trait. For a typical leaf crate this skips the large majority of impls (for example the many foreign `Debug` and `Clone` impls), avoiding the `impl_trait_ref` decode and `impl_parent` read that recording each one would otherwise require. A local blanket impl is the one exception: it is overlap-checked against every child rather than just its own bucket, so the foreign non-blanket impls are kept whenever one is present. Orphan rules forbid local blanket impls of foreign traits, so this only retains them for crates that are already going to be rejected, where it avoids reporting a spurious extra overlap on top of the orphan error. Specialization parents of foreign impls are now resolved lazily from crate metadata in `Ancestors::next`, since a skipped foreign impl is no longer present in the graph's `parent` map. This is sound because specialization preserves the head constructor, so the parent of any kept non-blanket impl is itself always kept.
290c594 to
52e8f33
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this does
specialization_graph_providerenumerates every impl of a trait, including all foreign impls, and records each one into the graph. For a leaf crate this is dominated by foreign impls: a crate that locally defines a handful ofDebug/Cloneimpls still pulls in every foreignimpl Debug for <std type>, decoding each one's trait ref viaimpl_trait_refand readingimpl_parent.This skips foreign non-blanket impls that cannot affect local coherence. The work list is built from
trait_impls_ofand keeps:Foreign impls whose bucket holds no local impl are never recorded.
Why it is sound
Overlap checking of a local impl only consults blanket impls and the non-blanket impls sharing its simplified self type (
filtered_children). The orphan rules forbid a local blanket impl of a foreign trait, so a foreign non-blanket impl can only matter to local coherence if its bucket also contains a local impl, and those buckets are kept in full.The graph's
parentmap is read in exactly one place,Ancestors::next. For a foreign impl the parent is now read lazily from crate metadata viaimpl_parentinstead of from the graph, consistent with how foreign parent chains are already resolved for impls reachable only as ancestors. Specialization preserves the head constructor (a specializing impl shares the parent's simplified self type, or the parent is blanket), so the parent of any kept non-blanket impl is itself always kept and no graph node becomes unreachable.What was measured
Two stage1
librustc_drivershared objects were built from this tree, one at the parent commit (baseline) and one with the change. For each run only the.sois swapped, so nothing else differs. The metric is callgrind instruction reads (Ir), which is deterministic (no run-to-run noise), attributed to the rustc process with the largest Ir, that is the crate's ownrustc --crate-name <crate>invocation; build scripts and proc-macro host compiles are excluded. Two scenarios per crate:CARGO_INCREMENTAL=0): dependencies prebuilt, then the crate compiled cold (cargo checkto warm deps,touchthe crate's own sources,cargo checkunder callgrind). This models clean builds and CI, where every crate is compiled from source.CARGO_INCREMENTAL=1): build once to populate the on-disk incremental cache, then bump source mtimes only (no content change) and rebuild under callgrind. This models the steady-state warm edit loop where the previous session's caches are reused.From scratch
The absolute saving is similar across crates (roughly 47M to 67M Ir); the percentage tracks how foreign-impl-heavy a crate is relative to its own code.
Incremental, unchanged
The warm rebuild shows a larger percentage even though
specialization_graph_ofiscache_on_diskand its result is reloaded rather than recomputed. The saving here is not from the provider: recording each foreign impl creates animpl_parentand animpl_trait_headerdep-graph node, and skipping them leaves the serialized dependency graph many thousands of nodes smaller, sotry_mark_previous_greenhas less to validate on every incremental session. The absolute saving is comparable to the from-scratch case, against a much smaller total, hence the higher percentage.Wall clock
The Ir figures above are the deterministic signal. Native
cargo checkwall time was also measured the same way (same.sohot-swap, baseline and with-change trials interleaved so thermal drift cancels), reporting the median and best (min) per crate. Wall time carries real run-to-run noise, so it is weaker evidence than Ir, but it confirms the direction and rough size.Incremental, unchanged (re-check after bumping source mtimes, 21 trials):
From scratch (clean check of the crate plus its dependencies, 7 trials):
The incremental, unchanged wall time tracks the Ir saving on the foreign-impl-heavy crates (regex about -11%, ripgrep and tokio in the -4 to -6% range) and falls into the noise floor on the largest crates (serde about -1.6%, syn indistinguishable from zero). This matches the deterministic Ir: the absolute saving is a fixed few tens of millions of Ir, so it shrinks as a fraction of a larger rebuild.
The from-scratch wall time is the whole-project check, so the saving on any single crate's front end is diluted by recompiling every dependency and by cargo's own overhead. tokio, whose default-feature check is essentially just the tokio crate, keeps most of the effect (about -4%); regex and ripgrep, which pull in several dependency crates, dilute to roughly -0.5 to -1.3%.
Check vs build
The work removed is front-end coherence and trait analysis, so a full
cargo buildsees the same absolute saving, with codegen then diluting the percentage. The figures above arecargo check, so they are the upper bound on the proportional effect.These are local measurements (deterministic callgrind Ir plus native wall clock); marking the PR as draft so a perf run can confirm on the full suite.
Testing
min_specialization, builds cleanly with the modified compiler.cargo checkdiagnostics are byte-identical to baseline on regex, syn, serde and ripgrep.