Skip to content

Relink don't rebuild: improve public api hashing using an rmeta reachability graph#156940

Draft
susitsm wants to merge 50 commits into
rust-lang:mainfrom
susitsm:rdr-rmeta-reachability-graph
Draft

Relink don't rebuild: improve public api hashing using an rmeta reachability graph#156940
susitsm wants to merge 50 commits into
rust-lang:mainfrom
susitsm:rdr-rmeta-reachability-graph

Conversation

@susitsm
Copy link
Copy Markdown
Contributor

@susitsm susitsm commented May 25, 2026

Rmeta reachability graph

This work in progress! I decided to open it a bit prematurely to motivate speeding up the review of #155871. The features work in the few tests added, but it needs lots of cleanup and tests before review.

This PR builds on #155871. The PR moves it from sound but not too useful to sound and useful. The features:

  1. Public hash does not change when adding private items. With the caveat that changing the span of public items by adding a private item before it will change the public hash. Nevertheless, if private code is moved to separate files from public ones, or just added at the end of a file, it works.
  2. Changing a public item in a dependency does not cause the public hash of the local crate to change, unless that public item is reexported here. This allows the recompilation to be cut off 2 crates downstream when the public hash of a crate changes, instead of propagating it all the way, requiring recompiles everywhere.
  3. It is future proof

How this is achieved reliably

For simplicity, this explanation only uses DefId-s as id-s. There a few other kinds of id-s, like ExpnId, but the mechanism is the same. It also uses a simplified defintion of public items.

First, lets define what a public item is: a public item is something that is accessible through the rmeta. A private items is an item that is not public.

The implementation is based on a simple observation: rmeta does not provide queries like all_def_ids which iterates over all DefId-s in the crate. It stores most data in tables which can only be accessed by calling queries that require a DefId as input, then return some output. So most information is only accessible through maps that map a DefId to some data. With some exceptions. These exceptions are what will be our root nodes, always reachable. The crate root is always a root node.
A reachability graph can be constructed at encoding time: whenever an id -> data mapping is encoded in rmeta, every id encoded with data is recorded as a edge from id. For example, if id is a public function, we will always add edges to its span and the types in its signature. If it also has mir encoded, we add all types/functions/spans occuring inside. For any associated data we store in tables, encoder callbacks make sure that those edges will automatically appear in the graph.

After the graph is constructed, each node has its local hash as the combined hash of all data saved in these tables. Then the strongly connected components are calculated and a public hash is assigned to each scc: the local hash of everything inside the connected component plus the public hash from each of its dependencies. The public hash of a node is essentially the hash of everything reachable starting from it.

Finally, each LocalDefId reachable from root nodes has its public hash (the public hash of its scc) saved in rmeta, which faciliates feature 2. The public hash of a crate does not depend on the full public hash of its dependencies, only their global part (like lang items and externally implementable items) plus the hash of items that are reachable through this rmeta.

Improving it reliably

The graph constructed from the rmeta has redundant edges. A DefId occuring in a result of a query does not mean it is actually used to access all data reachable through that DefId.
An example of this is how ExpnData was changed: its parent_module field is only used to check visibilities, never used as an input to a query. Its macro_def_id field is set to LOCAL_CRATE for the root expansion, but that is only used in queries when the returned DefId is of a macro. Changing their types to typed DefId-s which only allows for specific operations removed those from the rmeta reachability graph.

susitsm added 30 commits May 8, 2026 14:37
…end on public_api_hash instead of crate_hash
…lic hashes in the resolver

When the resolver resolved transitive dependecies, it used the
`crate_hash` of dependecies saved inside the rmeta of downstream crates
to locate them. With public api hashing enabled, it is not sound to save
that hash into downstream crates. Only the public hash can be saved.
This modifies the locator to find all crates with the given public hash,
but look for conflicing crates using the (public, private) hash pair. So
if there are multiple crates with the same public hash, but different
private hash (which should only happen if there was some kind of hash
collision while making the public hashes or the StableCrateId which is
included in it), it will be reported as having multiple candidates for
the crate.
…fIndex, which is interpreted as LocalDefId for now
susitsm added 20 commits May 25, 2026 19:07
…ehind the private hash, as these are only used when during linking
…to return the correct reachable_non_generics implementations
…api_hash, and in sessions requiring a linking step
…lic hash of downstream crates, unless they reexport that item
@rustbot rustbot added A-attributes Area: Attributes (`#[…]`, `#![…]`) A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels May 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-attributes Area: Attributes (`#[…]`, `#![…]`) A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants