internal: skip associated items and call `generic_implements_goal` earlier in method resolution #16498

davidbarsky · 2024-02-06T00:47:06Z

This is a small change whose correctness I'm not entirely convinced of, but based on the discussion in #6432 and my testing in https://github.com/oxidecomputer/omicron, this seems to provide faster completions, especially when it comes to flyimport-based ones.

Without these changes, I'm seeing completions take roughly 808ms cold and 570ms warm. With these changes, I'm seeing completions take 473ms cold and 55-90ms warm.

…rlier in method resolution

davidbarsky · 2024-02-06T00:59:08Z

(Note that these numbers are coming from my M1 Pro-based MacBook Pro. Your experience may vary.)

davidbarsky · 2024-02-06T01:44:50Z

This change, however, does regress running ~/.cargo/bin/rust-analyzer analysis-stats . on rust-analyzer pretty substantially:

Before

~/.cargo/bin/rust-analyzer analysis-stats .
Database loaded:     15.66s, 0b (metadata 540.38ms, 0b; build 2.15s, 0b)
  item trees: 1179
Item Tree Collection: 793.09ms, 0b
  crates: 47, mods: 1002, decls: 26108, bodies: 24240, adts: 1843, consts: 1121
Item Collection:     9.63s, 0b
  exprs: 707465, ??ty: 39 (0%), ?ty: 126 (0%), !ty: 3                     pats: 163636, ??ty: 4 (0%), ?ty: 4 (0%), !ty: 0                       Inference:           34.66s, 0b                                         MIR lowering:        6.42s, 0b
Mir failed bodies: 19 (0%)
Data layouts:        58.03ms, 0b
Failed data layouts: 129 (7%)
Const evaluation:    318.78ms, 0b
Failed const evals: 0 (0%)
Total:               56.95s, 0b

After

~/.cargo/bin/rust-analyzer analysis-stats .
Database loaded:     8.88s, 0b (metadata 1.10s, 0b; build 2.26s, 0b)
  item trees: 1179
Item Tree Collection: 794.24ms, 0b
  crates: 47, mods: 1002, decls: 26108, bodies: 24240, adts: 1843, consts: 1121
Item Collection:     9.87s, 0b
  exprs: 707470, ??ty: 39 (0%), ?ty: 126 (0%), !ty: 3                     pats: 163638, ??ty: 4 (0%), ?ty: 4 (0%), !ty: 0                       Inference:           132.21s, 0b                                        MIR lowering:        6.58s, 0b
Mir failed bodies: 19 (0%)
Data layouts:        54.77ms, 0b
Failed data layouts: 129 (7%)
Const evaluation:    385.90ms, 0b
Failed const evals: 0 (0%)
Total:               155.07s, 0b

Veykril · 2024-02-06T14:17:22Z

That slow down does not surprise me, trait solving is expensive, and now you are solving before checking the validity of a candidate. I'm more surprised that the change actually helps with completion speed here in your scenario.

Veykril · 2024-02-06T14:32:08Z

From a quick look we can (very very slightly only) optimize is_valid_candidate for this call path though. We know that our assoc items here come from a trait, meaning this branch here cannot be hit ever

rust-analyzer/crates/hir-ty/src/method_resolution.rs

Lines 1403 to 1414 in e2cfed1

    
           if let ItemContainerId::ImplId(impl_id) = c.lookup(db.upcast()).container { 
        
               let self_ty_matches = table.run_in_snapshot(|table| { 
        
                   let expected_self_ty = TyBuilder::impl_self_ty(db, impl_id) 
        
                       .fill_with_inference_vars(table) 
        
                       .build(); 
        
                   table.unify(&expected_self_ty, self_ty) 
        
               }); 
        
               if !self_ty_matches { 
        
                   cov_mark::hit!(const_candidate_self_type_mismatch); 
        
                   return IsValidCandidate::No; 
        
               } 
        
           }

as well as

rust-analyzer/crates/hir-ty/src/method_resolution.rs

Lines 1480 to 1529 in e2cfed1

    
           if let ItemContainerId::ImplId(impl_id) = container { 
        
               // We need to consider the bounds on the impl to distinguish functions of the same name 
        
               // for a type. 
        
               let predicates = db.generic_predicates(impl_id.into()); 
        
               let goals = predicates.iter().map(|p| { 
        
                   let (p, b) = p 
        
                       .clone() 
        
                       .substitute(Interner, &impl_subst) 
        
                       // Skipping the inner binders is ok, as we don't handle quantified where 
        
                       // clauses yet. 
        
                       .into_value_and_skipped_binders(); 
        
                   stdx::always!(b.len(Interner) == 0); 
        
                   p.cast::<Goal>(Interner) 
        
               }); 
        
               for goal in goals.clone() { 
        
                   let in_env = InEnvironment::new(&table.trait_env.env, goal); 
        
                   let canonicalized = table.canonicalize(in_env); 
        
                   let solution = table.db.trait_solve( 
        
                       table.trait_env.krate, 
        
                       table.trait_env.block, 
        
                       canonicalized.value.clone(), 
        
                   ); 
        
                   match solution { 
        
                       Some(Solution::Unique(canonical_subst)) => { 
        
                           canonicalized.apply_solution( 
        
                               table, 
        
                               Canonical { 
        
                                   binders: canonical_subst.binders, 
        
                                   value: canonical_subst.value.subst, 
        
                               }, 
        
                           ); 
        
                       } 
        
                       Some(Solution::Ambig(Guidance::Definite(substs))) => { 
        
                           canonicalized.apply_solution(table, substs); 
        
                       } 
        
                       Some(_) => (), 
        
                       None => return IsValidCandidate::No, 
        
                   } 
        
               } 
        
               for goal in goals { 
        
                   if table.try_obligation(goal).is_none() { 
        
                       return IsValidCandidate::No; 
        
                   } 
        
               } 
        
               IsValidCandidate::Yes

additionally the visibility check here

rust-analyzer/crates/hir-ty/src/method_resolution.rs

Lines 1397 to 1402 in e2cfed1

    
           if let Some(from_module) = visible_from_module { 
        
               if !db.const_visibility(c).is_visible_from(db.upcast(), from_module) { 
        
                   cov_mark::hit!(const_candidate_not_visible); 
        
                   return IsValidCandidate::NotVisible; 
        
               } 
        
           }

and here

rust-analyzer/crates/hir-ty/src/method_resolution.rs

Lines 1440 to 1445 in e2cfed1

    
           if let Some(from_module) = visible_from_module { 
        
               if !db.function_visibility(fn_id).is_visible_from(db.upcast(), from_module) { 
        
                   cov_mark::hit!(autoderef_candidate_not_visible); 
        
                   return IsValidCandidate::NotVisible; 
        
               } 
        
           }

should be unnecessary. The visibility of trait assoc items is the visibility of the trait which we can either check once prior to looping, or ignore if the check is implicitly done somewhere already (not 100% about that). This gets rid of a few database accesses as well as inserting extra query dependencies.

(Note these are only irrelevant for this one call path here, not in general for the other callers of the is_valid_ functions)

davidbarsky · 2024-02-06T16:11:36Z

That slow down does not surprise me, trait solving is expensive, and now you are solving before checking the validity of a candidate. I'm more surprised that the change actually helps with completion speed here in your scenario.

Going off a previous profile I ran, rust-analyzer is doing a lot of is_valid_candidate checks—for 300 traits, rust-analyzer did 13856 is_valid_candidate calls. The approach in this PR reduced the number of is_valid_candidate calls to 372.

(Note these are only irrelevant for this one call path here, not in general for the other callers of the is_valid_ functions)

I'll try out your suggested changes! If it works out well, I'll inline the reduced is_valid_ functions into a single function and document that function should only be called from iterate_trait_method_candidates.

davidbarsky · 2024-02-06T17:20:36Z

I'll try out your suggested changes! If it works out well, I'll inline the reduced is_valid_ functions into a single function and document that function should only be called from iterate_trait_method_candidates.

You are correct that the optimizations are slight, but they are noticeable! Thanks for the tip.

Veykril · 2024-02-07T16:25:26Z

I'm not happy merging the first change of this fwiw. The slow down is just too big for this to be considered.

davidbarsky · 2024-02-13T18:30:56Z

Closing this PR in favor of opening a new one in order to make the conversation history more legible.

…iting-orphan-rules, r=Veykril performance: Speed up Method Completions By Taking Advantage of Orphan Rules (Continues #16498) This PR speeds up method completions by doing two things without regressing `analysis-stats`[^1]: - Filter candidate traits prior to calling `iterate_path_candidates` by relying on orphan rules (see below for a slightly more in-depth explanation). When generating completions [on `slog::Logger`](https://github.com/oxidecomputer/omicron/blob/5e9e59c312cd787ba50cf6776f220951917dfa14/common/src/ledger.rs#L78) in `oxidecomputer/omicron` as a test, this PR halved my completion times—it's now 454ms cold and 281ms warm. Before this PR, it was 808ms cold and 579ms warm. - Inline some of the method candidate checks into `is_valid_method_candidate` and remove some unnecessary visibility checks. This was suggested by `@Veykril` in [this comment](#16498 (comment)). We filter candidate traits by taking advantage of orphan rules. For additional details, I'll rely on `@WaffleLapkin's` explanation [from Zulip](https://rust-lang.zulipchat.com/#narrow/stream/185405-t-compiler.2Frust-analyzer/topic/Trait.20Checking/near/420942417): > A type `A` can only implements traits which > 1. Have a blanket implementation (`impl<T> Trait for T {}`) > 2. Have implementation for `A` (`impl Trait for A {}`) > > Blanket implementation can only exist in `Trait`'s crate. Implementation for `A` can only exist in `A`'s or `Trait`'s crate. Big thanks to Waffle for its keen observation! --- I think some additional improvements are possible: - `for_trait_and_self_ty` seemingly does not distinguish between `&T`, `&mut T`, or `T`, resulting in seemingly irrelevant traits like `tokio::io::AsyncWrite` being being included for, e.g., `&slog::Logger`. I don't know they're being considered due to the [autoref/autoderef behavior](https://github.com/rust-lang/rust-analyzer/blob/a02a219773629686bd8ff123ca1aa995fa50d976/crates/hir-ty/src/method_resolution.rs#L945-L962), but I wonder if it'd make sense to filter by mutability earlier and not consider trait implementations that require `&mut T` when we only have a `&T`. - The method completions [spend a _lot_ of time in unification](https://rust-lang.zulipchat.com/#narrow/stream/185405-t-compiler.2Frust-analyzer/topic/Trait.20Checking/near/421072356), and while there might be low-hanging fruit there, it might make more sense to wait for the new trait solver in `rustc`. I dunno. [^1]: The filtering occurs outside of typechecking, after all.

internal: skip associated items and call generic_implements_goal ea…

e2cfed1

…rlier in method resolution

rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 6, 2024

davidbarsky requested review from flodiebold and Veykril February 6, 2024 00:47

internal: apply Lukas' suggested optimizations

c3613d3

Veykril added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 8, 2024

davidbarsky closed this Feb 13, 2024

davidbarsky mentioned this pull request Feb 13, 2024

performance: Speed up Method Completions By Taking Advantage of Orphan Rules #16555

Merged

davidbarsky deleted the david/skip-associtems-in-method-resolution branch February 13, 2024 20:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

internal: skip associated items and call `generic_implements_goal` earlier in method resolution #16498

internal: skip associated items and call `generic_implements_goal` earlier in method resolution #16498

davidbarsky commented Feb 6, 2024 •

edited

davidbarsky commented Feb 6, 2024

davidbarsky commented Feb 6, 2024

Veykril commented Feb 6, 2024 •

edited

Veykril commented Feb 6, 2024

davidbarsky commented Feb 6, 2024 •

edited

davidbarsky commented Feb 6, 2024

Veykril commented Feb 7, 2024

davidbarsky commented Feb 13, 2024

internal: skip associated items and call generic_implements_goal earlier in method resolution #16498

internal: skip associated items and call generic_implements_goal earlier in method resolution #16498

Conversation

davidbarsky commented Feb 6, 2024 • edited

davidbarsky commented Feb 6, 2024

davidbarsky commented Feb 6, 2024

Veykril commented Feb 6, 2024 • edited

Veykril commented Feb 6, 2024

davidbarsky commented Feb 6, 2024 • edited

davidbarsky commented Feb 6, 2024

Veykril commented Feb 7, 2024

davidbarsky commented Feb 13, 2024

internal: skip associated items and call `generic_implements_goal` earlier in method resolution #16498

internal: skip associated items and call `generic_implements_goal` earlier in method resolution #16498

davidbarsky commented Feb 6, 2024 •

edited

Veykril commented Feb 6, 2024 •

edited

davidbarsky commented Feb 6, 2024 •

edited