Only use the new node hashmap for anonymous nodes. #112469

cjgillot · 2023-06-09T15:35:06Z

Split from #109050

The duplication check is made opt-in with -Zincremental-verify-ich.

rustbot · 2023-06-09T15:35:14Z

Some changes occurred in compiler/rustc_codegen_cranelift

michaelwoerister

Thanks for splitting this out into a more limited set of changes, @cjgillot! That should make it easier to make progress.

I've left a few comments below where I think changes are needed to preserve the semantics of the checks or to make things easier to understand. Otherwise the PR looks good to me, except for one thing:

The current version does not contain the lightweight check for duplicate dep-nodes during graph deserialization. That means that we would basically stop detecting duplicate dep-nodes altogether because the more accurate checks are now behind a flag that nobody uses. As long as (non-anonymous) dep-nodes are still defined to have a 1:1 relationship to query invocations, we want to know about DepNode collisions, right?

I'm not sure what's the best reaction to a DepNode collision being detected. Some options are:

Ignore the incremental cache and emit a warning, encouraging re-running with -Zverify-incremental-ich and creating a bug report.
Same as above, but only on nightly. Silently ignore the cache on non-nightly.
Instead of a warning, make the compiler ICE with message mentioning -Zverify-incremental-ich.
ICE on nightly, silently ignore cache on stable.

compiler/rustc_query_system/src/dep_graph/graph.rs

michaelwoerister · 2023-07-14T15:24:15Z

I think this looks great now!

I'd still like us to have some form of the duplicates check during deserialization, so that the signal on hash collisions doesn't go completely silent. However, I realize that it's kind of a hard UX question that I don't want this to be blocked on.

What do you think about this:

We add the check and if a duplicate is detected, we just silently ignore the cache. That way we are on the safe side correctness-wise.
We merge the PR with that behavior.
We (or just I) re-visit the question of how to deal with detected duplicates once I'm back from vacation in two weeks.

bors · 2023-09-07T04:24:48Z

☔ The latest upstream changes (presumably #110050) made this pull request unmergeable. Please resolve the merge conflicts.

Zoxc · 2023-09-25T00:30:45Z

compiler/rustc_query_system/src/dep_graph/graph.rs

@@ -1163,7 +1196,7 @@ impl<K: DepKind> CurrentDepGraph<K> {
                record_graph,
                record_stats,
            )),
-            new_node_to_index: Sharded::new(|| {
+            anon_node_to_index: Sharded::new(|| {
                FxHashMap::with_capacity_and_hasher(
                    new_node_count_estimate / sharded::SHARDS,


This size estimate is probably a bit off now, but I'm not sure what we'd replace it with.

Zoxc · 2023-09-25T00:31:51Z

compiler/rustc_query_system/src/dep_graph/graph.rs

@@ -1200,20 +1243,19 @@ impl<K: DepKind> CurrentDepGraph<K> {
        edges: EdgesVec,
        current_fingerprint: Fingerprint,
    ) -> DepNodeIndex {


This function should probably be renamed alloc_new_node as it no longer does interning.

Zoxc · 2023-09-25T00:33:48Z

compiler/rustc_query_system/src/dep_graph/graph.rs

+        if let Some(ref nodes_newly_allocated_in_current_session) =
+            self.nodes_newly_allocated_in_current_session
+        {
+            if !nodes_newly_allocated_in_current_session.lock().insert(key) {


I'd put cold_path around this.

Zoxc · 2023-09-25T01:13:27Z

I have a branch that builds the index on-demand, which means by the time duplicates are detected, we can no longer ignore the incremental cache. That would only be compatible with the always ICE option.

cjgillot · 2023-09-26T16:58:02Z

I have a branch that builds the index on-demand, which means by the time duplicates are detected, we can no longer ignore the incremental cache. That would only be compatible with the always ICE option.

That would also be compatible with the "drop duplicates from index" option, wouldn't it?

Zoxc · 2023-09-26T21:26:26Z

Yeah.

michaelwoerister · 2023-12-04T09:39:13Z

I noticed that dropping duplicates from the index might not catch all cases: If the hash collision occurs between a node from the previous graph and a node that is newly allocated in the current session, looking up the new node would wrongly find the previous node, right?

I'm working on a proof of concept implementation of an alternative approach (inspired by @cjgillot's work in #109050), which does not have DepNodes and result fingerprints at all, except for reconstructible DepNodes (i.e. DefPathHashes, HirIds, etc). I hope to have the POC implementation ready later this week. It won't be nice enough to be merged but it should allow us to do a perf run so we can see if that approach is viable at all. If perf doesn't rule it out and nobody finds a fatal flaw in the approach, we could discuss in the wg-incr-comp Zulip, how it might fit with other in-progress work.

michaelwoerister · 2023-12-07T14:46:57Z

OK, I ran the experiment mentioned above in #118667 and I think we can safely call that a failure 🙂

Given that the quick check during deserialization also is not as complete as we thought¹, I think it's time to re-evaluate. I'm proposing the following:

We default to not building the nodes_newly_allocated_in_current_session map for the production compiler, so that regular users don't have to pay the memory consumption cost of that.
When debug assertions are enabled, we do default to building the map and doing the associated assertions. But they still can be turned off by -Zincremental_verify_ich=no or forced with -Zincremental_verify_ich=yes, even in a production compiler.
In a separate PR, we add an additional check for query key hash collisions, that can be enabled independently of incr. comp. being turned on or not. We can then enable the check for various kinds of CI builds to get good coverage. This should help catch faulty HashStable implementations before they get merged. The check would also be off by default. It would just go through each query table separately, checking if the stable hashes of query keys collide.² I can take care of implementing this and it would not block this PR.
We could still do the cheap collision check during deserialization and just silently throw away the cache if there is a collision. It would not catch everything but if it does find something, the cache is definitely corrupted in some way. I'm not sure about this point -- but with the additional checking described above, I don't think it is as important as before.

What do you think?

Correct me if I'm wrong about this. ↩
We could turn the check off for any queries that are declared anonymous. That should be safe, as long as the query key type does not also occur in a query result type, where the hash collision would also cause trouble. ↩

Zoxc · 2024-03-09T07:16:52Z

@cjgillot Do you mind if I pick up this PR?

Zoxc · 2024-03-09T07:59:54Z

In a separate PR, we add an additional check for query key hash collisions, that can be enabled independently of incr. comp. being turned on or not. We can then enable the check for various kinds of CI builds to get good coverage.

It seems this would be a good idea anyway as seems to be quite a few hash issues sneaking through CI already: https://github.com/rust-lang/rust/issues?q=is%3Aissue+is%3Aopen+forcing+query+with+already+existing

Zoxc · 2024-03-09T08:05:44Z

I noticed that dropping duplicates from the index might not catch all cases

I don't think it would catch less cases that we currently do, it just catches them later. It also does not catch all cases. We're trying to catch mismatches between Eq and HashStable impls for query keys, but we only have the HashStable result from the previous session, limiting us to catching mismatches from the current session without also storing the query keys in the incremental cache.

Verify that query keys result in unique dep nodes This implements checking that query keys result into unique dep nodes as mentioned in rust-lang#112469. We could do a perf check to see how expensive this is. r? `@michaelwoerister`

…rister Verify that query keys result in unique dep nodes This implements checking that query keys result into unique dep nodes as mentioned in rust-lang#112469. We could do a perf check to see how expensive this is. r? `@michaelwoerister`

Verify that query keys result in unique dep nodes This implements checking that query keys result into unique dep nodes as mentioned in rust-lang/rust#112469. We could do a perf check to see how expensive this is. r? `@michaelwoerister`

rustbot assigned michaelwoerister Jun 9, 2023

cjgillot mentioned this pull request Jun 9, 2023

Only use the new DepNode hashmap for anonymous nodes. #109050

Open

michaelwoerister reviewed Jun 12, 2023

View reviewed changes

michaelwoerister added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 16, 2023

cjgillot added 2 commits July 14, 2023 08:07

Only use the new node hashmap for anonymous nodes.

8b4fa75

Recover comment.

4409a51

cjgillot force-pushed the graph-anon-hashmap branch from 018664e to ff9f2a0 Compare July 14, 2023 09:30

cjgillot added 3 commits July 14, 2023 09:41

Fortify check.

3ca19f6

Do not cfg-gate the check.

3a8b357

Recover another check.

9227452

cjgillot force-pushed the graph-anon-hashmap branch from ff9f2a0 to 9227452 Compare July 14, 2023 09:51

Zoxc reviewed Sep 25, 2023

View reviewed changes

Zoxc mentioned this pull request Mar 9, 2024

Verify that query keys result in unique dep nodes #122227

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only use the new node hashmap for anonymous nodes. #112469

Only use the new node hashmap for anonymous nodes. #112469

cjgillot commented Jun 9, 2023

rustbot commented Jun 9, 2023

michaelwoerister left a comment

michaelwoerister commented Jul 14, 2023

bors commented Sep 7, 2023

Zoxc Sep 25, 2023

Zoxc Sep 25, 2023

Zoxc Sep 25, 2023

Zoxc commented Sep 25, 2023

cjgillot commented Sep 26, 2023

Zoxc commented Sep 26, 2023

michaelwoerister commented Dec 4, 2023

michaelwoerister commented Dec 7, 2023 •

edited

Zoxc commented Mar 9, 2024

Zoxc commented Mar 9, 2024

Zoxc commented Mar 9, 2024

Only use the new node hashmap for anonymous nodes. #112469

Are you sure you want to change the base?

Only use the new node hashmap for anonymous nodes. #112469

Conversation

cjgillot commented Jun 9, 2023

rustbot commented Jun 9, 2023

michaelwoerister left a comment

Choose a reason for hiding this comment

michaelwoerister commented Jul 14, 2023

bors commented Sep 7, 2023

Zoxc Sep 25, 2023

Choose a reason for hiding this comment

Zoxc Sep 25, 2023

Choose a reason for hiding this comment

Zoxc Sep 25, 2023

Choose a reason for hiding this comment

Zoxc commented Sep 25, 2023

cjgillot commented Sep 26, 2023

Zoxc commented Sep 26, 2023

michaelwoerister commented Dec 4, 2023

michaelwoerister commented Dec 7, 2023 • edited

Footnotes

Zoxc commented Mar 9, 2024

Zoxc commented Mar 9, 2024

Zoxc commented Mar 9, 2024

michaelwoerister commented Dec 7, 2023 •

edited