Relink don't rebuild: improve public api hashing using an rmeta reachability graph#156940
Draft
susitsm wants to merge 50 commits into
Draft
Relink don't rebuild: improve public api hashing using an rmeta reachability graph#156940susitsm wants to merge 50 commits into
susitsm wants to merge 50 commits into
Conversation
… rmeta without parents)
…end on public_api_hash instead of crate_hash
…tributes for testing
…stc_public_hash_unchanged attributes
…lic hashes in the resolver When the resolver resolved transitive dependecies, it used the `crate_hash` of dependecies saved inside the rmeta of downstream crates to locate them. With public api hashing enabled, it is not sound to save that hash into downstream crates. Only the public hash can be saved. This modifies the locator to find all crates with the given public hash, but look for conflicing crates using the (public, private) hash pair. So if there are multiple crates with the same public hash, but different private hash (which should only happen if there was some kind of hash collision while making the public hashes or the StableCrateId which is included in it), it will be reported as having multiple candidates for the crate.
…fIndex, which is interpreted as LocalDefId for now
…when hashing crate dependencies
…ehind the private hash, as these are only used when during linking
…ashing is enabled
…to return the correct reachable_non_generics implementations
…api_hash, and in sessions requiring a linking step
…the public api hash
…lic hash of downstream crates, unless they reexport that item
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Rmeta reachability graph
This work in progress! I decided to open it a bit prematurely to motivate speeding up the review of #155871. The features work in the few tests added, but it needs lots of cleanup and tests before review.
This PR builds on #155871. The PR moves it from sound but not too useful to sound and useful. The features:
How this is achieved reliably
For simplicity, this explanation only uses
DefId-s as id-s. There a few other kinds of id-s, likeExpnId, but the mechanism is the same. It also uses a simplified defintion of public items.First, lets define what a public item is: a public item is something that is accessible through the rmeta. A private items is an item that is not public.
The implementation is based on a simple observation: rmeta does not provide queries like
all_def_idswhich iterates over allDefId-s in the crate. It stores most data in tables which can only be accessed by calling queries that require aDefIdas input, then return some output. So most information is only accessible through maps that map aDefIdto some data. With some exceptions. These exceptions are what will be our root nodes, always reachable. The crate root is always a root node.A reachability graph can be constructed at encoding time: whenever an
id -> datamapping is encoded in rmeta, every id encoded withdatais recorded as a edge fromid. For example, ifidis a public function, we will always add edges to its span and the types in its signature. If it also hasmirencoded, we add all types/functions/spans occuring inside. For any associated data we store in tables, encoder callbacks make sure that those edges will automatically appear in the graph.After the graph is constructed, each node has its local hash as the combined hash of all data saved in these tables. Then the strongly connected components are calculated and a public hash is assigned to each scc: the local hash of everything inside the connected component plus the public hash from each of its dependencies. The public hash of a node is essentially the hash of everything reachable starting from it.
Finally, each
LocalDefIdreachable from root nodes has its public hash (the public hash of its scc) saved in rmeta, which faciliates feature 2. The public hash of a crate does not depend on the full public hash of its dependencies, only their global part (like lang items and externally implementable items) plus the hash of items that are reachable through this rmeta.Improving it reliably
The graph constructed from the rmeta has redundant edges. A
DefIdoccuring in a result of a query does not mean it is actually used to access all data reachable through thatDefId.An example of this is how
ExpnDatawas changed: itsparent_modulefield is only used to check visibilities, never used as an input to a query. Itsmacro_def_idfield is set toLOCAL_CRATEfor the root expansion, but that is only used in queries when the returnedDefIdis of a macro. Changing their types to typedDefId-s which only allows for specific operations removed those from the rmeta reachability graph.