Relink don't rebuild: improve public api hashing using an rmeta reachability graph by susitsm · Pull Request #156940 · rust-lang/rust

susitsm · 2026-05-25T22:17:46Z

Rmeta reachability graph

This work in progress! I decided to open it a bit prematurely to motivate speeding up the review of #155871. The features work in the few tests added, but it needs lots of cleanup and tests before review.

This PR builds on #155871. The PR moves it from sound but not too useful to sound and useful. The features:

Public hash does not change when adding private items. With the caveat that changing the span of public items by adding a private item before it will change the public hash. Nevertheless, if private code is moved to separate files from public ones, or just added at the end of a file, it works.
Changing a public item in a dependency does not cause the public hash of the local crate to change, unless that public item is reexported here. This allows the recompilation to be cut off 2 crates downstream when the public hash of a crate changes, instead of propagating it all the way, requiring recompiles everywhere.
It is future proof

How this is achieved reliably

For simplicity, this explanation only uses DefId-s as id-s. There a few other kinds of id-s, like ExpnId, but the mechanism is the same. It also uses a simplified defintion of public items.

First, lets define what a public item is: a public item is something that is accessible through the rmeta. A private items is an item that is not public.

The implementation is based on a simple observation: rmeta does not provide queries like all_def_ids which iterates over all DefId-s in the crate. It stores most data in tables which can only be accessed by calling queries that require a DefId as input, then return some output. So most information is only accessible through maps that map a DefId to some data. With some exceptions. These exceptions are what will be our root nodes, always reachable. The crate root is always a root node.
A reachability graph can be constructed at encoding time: whenever an id -> data mapping is encoded in rmeta, every id encoded with data is recorded as a edge from id. For example, if id is a public function, we will always add edges to its span and the types in its signature. If it also has mir encoded, we add all types/functions/spans occuring inside. For any associated data we store in tables, encoder callbacks make sure that those edges will automatically appear in the graph.

After the graph is constructed, each node has its local hash as the combined hash of all data saved in these tables. Then the strongly connected components are calculated and a public hash is assigned to each scc: the local hash of everything inside the connected component plus the public hash from each of its dependencies. The public hash of a node is essentially the hash of everything reachable starting from it.

Finally, each LocalDefId reachable from root nodes has its public hash (the public hash of its scc) saved in rmeta, which faciliates feature 2. The public hash of a crate does not depend on the full public hash of its dependencies, only their global part (like lang items and externally implementable items) plus the hash of items that are reachable through this rmeta.

Improving it reliably

The graph constructed from the rmeta has redundant edges. A DefId occuring in a result of a query does not mean it is actually used to access all data reachable through that DefId.
An example of this is how ExpnData was changed: its parent_module field is only used to check visibilities, never used as an input to a query. Its macro_def_id field is set to LOCAL_CRATE for the root expansion, but that is only used in queries when the returned DefId is of a macro. Changing their types to typed DefId-s which only allows for specific operations removed those from the rmeta reachability graph.

…ll encoded data

… rmeta without parents)

…end on public_api_hash instead of crate_hash

…tributes for testing

…stc_public_hash_unchanged attributes

…lic hashes in the resolver When the resolver resolved transitive dependecies, it used the `crate_hash` of dependecies saved inside the rmeta of downstream crates to locate them. With public api hashing enabled, it is not sound to save that hash into downstream crates. Only the public hash can be saved. This modifies the locator to find all crates with the given public hash, but look for conflicing crates using the (public, private) hash pair. So if there are multiple crates with the same public hash, but different private hash (which should only happen if there was some kind of hash collision while making the public hashes or the StableCrateId which is included in it), it will be reported as having multiple candidates for the crate.

…fIndex, which is interpreted as LocalDefId for now

…atas without rdr

…sh graph

…when hashing crate dependencies

…ls hashing

…ehind the private hash, as these are only used when during linking

…ashing is enabled

…to return the correct reachable_non_generics implementations

…api_hash, and in sessions requiring a linking step

…the public api hash

…lic hash of downstream crates, unless they reexport that item

susitsm added 30 commits May 8, 2026 14:37

rdr: add RDRHashes field to CrateRoot

7db1240

rdr: add public_api_hash unstable option

aac2ec2

derive HashStable for TargetModifier

2e274d5

derive HashStable for DeniedPartialMitigation

ab1fd6c

implement StableHash for TargetTuple

b3c3fda

tidy: move TargetTuple to a new file

c46e842

rdr: implement rmeta public api hash as the stable hash of (almost) a…

346cfd4

…ll encoded data

Add public api hash debug logs

04c1362

rdr: elaborate on how EII-s should be included in the public api hash

cfdfcc5

rdr: document the process of hashing new fields added to CrateRoot

5989a84

rdr: document hashed_lazy_array

f8e21df

rdr: hash spans as if they had no parents (spans are encoded into the…

504f859

… rmeta without parents)

rdr: add the public_api_hash query. Queries provided by rmeta now dep…

294e8a3

…end on public_api_hash instead of crate_hash

rdr: add rustc_public_hash_unchanged and rustc_public_hash_changed at…

f3a45e4

…tributes for testing

rdr: add incremental test exercising rustc_public_hash_changed and ru…

53ced96

…stc_public_hash_unchanged attributes

rdr: use public_api_hash when hashing dependencies

949b4d4

rdr: use public_api_hash in the rmeta headers

bdf7e0b

rdr: add hash FIXME comments to LazyTables fields

d292f0b

record encoded crate nums

0380de9

rdr: more comments of what to include where

aa0a2b8

rdr: start building reachable graph

69d2f86

rdr: document what exportable_items is used for

8895a17

rdr: save TraitImpls defids as (u32,u32) to avoid encoding them as De…

646aeef

…fIndex, which is interpreted as LocalDefId for now

use SyntaxContext instead of raw u32 in HygieneEncodeContext::encode

f8f2b58

hash stuff reachable through the API

243669c

rdr: save public api hashes of DefId-s and ExpnId-s to the metadata

f204233

rdr: hash the public hashes, make sure hash retrieval works for metad…

baaaef7

…atas without rdr

rdr: remove stripped cfg items from metadata

f9137c0

rdr: only record public module children in rmeta

3a7edd6

susitsm added 20 commits May 25, 2026 19:07

rdr: add VisibilityDefId which is only usable to check visibilities

fdd50d5

rdr: do not include VisibilityDefId in the public hash graph

a785d80

rdr: debug log the hash graph

ed3bebe

add -Zls=public_hash for printing the metadata public hash

86f06ec

rdr: do not hash the def_path_hash_map

a4868f7

rdr: remove LocalExpnId::ROOT -> CRATE_DEF_ID edge from the public ha…

3d506a4

…sh graph

rdr: add reachablity to the debug representation of IndexGraph

f6906bf

rdr: only include public_global_hash instead of the full public hash …

6d8b170

…when hashing crate dependencies

rdr: remove traits from rmeta

98e4c4a

rdr: add some comments to the impls field of HashableCrateRoot

1458d18

rdr: disable upstream_monomorphizations when public_api_hash is enabled

5a9be6c

rdr: document exported_non_generic_symbols and exported_generic_symbo…

3ae9712

…ls hashing

rdr: move is_reachable_non_generic into a table

3e4f2c7

rdr: add the is_reachable_non_generic_with_export_level_c query

bedb5cf

rdr: move exported_generic_symbols and exported_non_generic_symbols b…

9c1fc78

…ehind the private hash, as these are only used when during linking

rdr: assert that reachable_non_generics is not used when public api h…

23fd17c

…ashing is enabled

rdr: save into the rmeta whether public_api_hash was enabled, use it …

2127868

…to return the correct reachable_non_generics implementations

rdr: allow access to private hash for rmetas compiled without public_…

f5ac7fc

…api_hash, and in sessions requiring a linking step

rdr: add test testing that changes to a private file does not change …

621b5c4

…the public api hash

rdr: test that changing public items in crate does not change the pub…

33aec4a

…lic hash of downstream crates, unless they reexport that item

susitsm mentioned this pull request May 26, 2026

Relink don't rebuild: add a baseline, sound implementation that can be incrementally improved #155871

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Relink don't rebuild: improve public api hashing using an rmeta reachability graph#156940

Relink don't rebuild: improve public api hashing using an rmeta reachability graph#156940
susitsm wants to merge 50 commits into
rust-lang:mainfrom
susitsm:rdr-rmeta-reachability-graph

susitsm commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

susitsm commented May 25, 2026

Rmeta reachability graph

How this is achieved reliably

Improving it reliably

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants