Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal: skip associated items and call generic_implements_goal earlier in method resolution #16498

Conversation

davidbarsky
Copy link
Contributor

@davidbarsky davidbarsky commented Feb 6, 2024

This is a small change whose correctness I'm not entirely convinced of, but based on the discussion in #6432 and my testing in https://github.com/oxidecomputer/omicron, this seems to provide faster completions, especially when it comes to flyimport-based ones.

Without these changes, I'm seeing completions take roughly 808ms cold and 570ms warm. With these changes, I'm seeing completions take 473ms cold and 55-90ms warm.

@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 6, 2024
@davidbarsky
Copy link
Contributor Author

(Note that these numbers are coming from my M1 Pro-based MacBook Pro. Your experience may vary.)

@davidbarsky
Copy link
Contributor Author

This change, however, does regress running ~/.cargo/bin/rust-analyzer analysis-stats . on rust-analyzer pretty substantially:

Before
~/.cargo/bin/rust-analyzer analysis-stats .
Database loaded:     15.66s, 0b (metadata 540.38ms, 0b; build 2.15s, 0b)
  item trees: 1179
Item Tree Collection: 793.09ms, 0b
  crates: 47, mods: 1002, decls: 26108, bodies: 24240, adts: 1843, consts: 1121
Item Collection:     9.63s, 0b
  exprs: 707465, ??ty: 39 (0%), ?ty: 126 (0%), !ty: 3                     pats: 163636, ??ty: 4 (0%), ?ty: 4 (0%), !ty: 0                       Inference:           34.66s, 0b                                         MIR lowering:        6.42s, 0b
Mir failed bodies: 19 (0%)
Data layouts:        58.03ms, 0b
Failed data layouts: 129 (7%)
Const evaluation:    318.78ms, 0b
Failed const evals: 0 (0%)
Total:               56.95s, 0b
After
~/.cargo/bin/rust-analyzer analysis-stats .
Database loaded:     8.88s, 0b (metadata 1.10s, 0b; build 2.26s, 0b)
  item trees: 1179
Item Tree Collection: 794.24ms, 0b
  crates: 47, mods: 1002, decls: 26108, bodies: 24240, adts: 1843, consts: 1121
Item Collection:     9.87s, 0b
  exprs: 707470, ??ty: 39 (0%), ?ty: 126 (0%), !ty: 3                     pats: 163638, ??ty: 4 (0%), ?ty: 4 (0%), !ty: 0                       Inference:           132.21s, 0b                                        MIR lowering:        6.58s, 0b
Mir failed bodies: 19 (0%)
Data layouts:        54.77ms, 0b
Failed data layouts: 129 (7%)
Const evaluation:    385.90ms, 0b
Failed const evals: 0 (0%)
Total:               155.07s, 0b

@Veykril
Copy link
Member

Veykril commented Feb 6, 2024

That slow down does not surprise me, trait solving is expensive, and now you are solving before checking the validity of a candidate. I'm more surprised that the change actually helps with completion speed here in your scenario.

@Veykril
Copy link
Member

Veykril commented Feb 6, 2024

From a quick look we can (very very slightly only) optimize is_valid_candidate for this call path though. We know that our assoc items here come from a trait, meaning this branch here cannot be hit ever

if let ItemContainerId::ImplId(impl_id) = c.lookup(db.upcast()).container {
let self_ty_matches = table.run_in_snapshot(|table| {
let expected_self_ty = TyBuilder::impl_self_ty(db, impl_id)
.fill_with_inference_vars(table)
.build();
table.unify(&expected_self_ty, self_ty)
});
if !self_ty_matches {
cov_mark::hit!(const_candidate_self_type_mismatch);
return IsValidCandidate::No;
}
}
as well as
if let ItemContainerId::ImplId(impl_id) = container {
// We need to consider the bounds on the impl to distinguish functions of the same name
// for a type.
let predicates = db.generic_predicates(impl_id.into());
let goals = predicates.iter().map(|p| {
let (p, b) = p
.clone()
.substitute(Interner, &impl_subst)
// Skipping the inner binders is ok, as we don't handle quantified where
// clauses yet.
.into_value_and_skipped_binders();
stdx::always!(b.len(Interner) == 0);
p.cast::<Goal>(Interner)
});
for goal in goals.clone() {
let in_env = InEnvironment::new(&table.trait_env.env, goal);
let canonicalized = table.canonicalize(in_env);
let solution = table.db.trait_solve(
table.trait_env.krate,
table.trait_env.block,
canonicalized.value.clone(),
);
match solution {
Some(Solution::Unique(canonical_subst)) => {
canonicalized.apply_solution(
table,
Canonical {
binders: canonical_subst.binders,
value: canonical_subst.value.subst,
},
);
}
Some(Solution::Ambig(Guidance::Definite(substs))) => {
canonicalized.apply_solution(table, substs);
}
Some(_) => (),
None => return IsValidCandidate::No,
}
}
for goal in goals {
if table.try_obligation(goal).is_none() {
return IsValidCandidate::No;
}
}
IsValidCandidate::Yes

additionally the visibility check here

if let Some(from_module) = visible_from_module {
if !db.const_visibility(c).is_visible_from(db.upcast(), from_module) {
cov_mark::hit!(const_candidate_not_visible);
return IsValidCandidate::NotVisible;
}
}
and here
if let Some(from_module) = visible_from_module {
if !db.function_visibility(fn_id).is_visible_from(db.upcast(), from_module) {
cov_mark::hit!(autoderef_candidate_not_visible);
return IsValidCandidate::NotVisible;
}
}

should be unnecessary. The visibility of trait assoc items is the visibility of the trait which we can either check once prior to looping, or ignore if the check is implicitly done somewhere already (not 100% about that). This gets rid of a few database accesses as well as inserting extra query dependencies.

(Note these are only irrelevant for this one call path here, not in general for the other callers of the is_valid_ functions)

@davidbarsky
Copy link
Contributor Author

davidbarsky commented Feb 6, 2024

That slow down does not surprise me, trait solving is expensive, and now you are solving before checking the validity of a candidate. I'm more surprised that the change actually helps with completion speed here in your scenario.

Going off a previous profile I ran, rust-analyzer is doing a lot of is_valid_candidate checks—for 300 traits, rust-analyzer did 13856 is_valid_candidate calls. The approach in this PR reduced the number of is_valid_candidate calls to 372.

(Note these are only irrelevant for this one call path here, not in general for the other callers of the is_valid_ functions)

I'll try out your suggested changes! If it works out well, I'll inline the reduced is_valid_ functions into a single function and document that function should only be called from iterate_trait_method_candidates.

@davidbarsky
Copy link
Contributor Author

I'll try out your suggested changes! If it works out well, I'll inline the reduced is_valid_ functions into a single function and document that function should only be called from iterate_trait_method_candidates.

You are correct that the optimizations are slight, but they are noticeable! Thanks for the tip.

@Veykril
Copy link
Member

Veykril commented Feb 7, 2024

I'm not happy merging the first change of this fwiw. The slow down is just too big for this to be considered.

@Veykril Veykril added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 8, 2024
@davidbarsky
Copy link
Contributor Author

Closing this PR in favor of opening a new one in order to make the conversation history more legible.

@davidbarsky davidbarsky deleted the david/skip-associtems-in-method-resolution branch February 13, 2024 20:28
bors added a commit that referenced this pull request Feb 26, 2024
…iting-orphan-rules, r=Veykril

performance: Speed up Method Completions By Taking Advantage of Orphan Rules

(Continues #16498)

This PR speeds up method completions by doing two things without regressing `analysis-stats`[^1]:
- Filter candidate traits prior to calling `iterate_path_candidates` by relying on orphan rules (see below for a slightly more in-depth explanation). When generating completions [on `slog::Logger`](https://github.com/oxidecomputer/omicron/blob/5e9e59c312cd787ba50cf6776f220951917dfa14/common/src/ledger.rs#L78) in `oxidecomputer/omicron` as a test, this PR halved my completion times—it's now 454ms cold and 281ms warm. Before this PR, it was 808ms cold and 579ms warm.
- Inline some of the method candidate checks into `is_valid_method_candidate` and remove some unnecessary visibility checks. This was suggested by `@Veykril` in [this comment](#16498 (comment)).

We filter candidate traits by taking advantage of orphan rules. For additional details, I'll rely on `@WaffleLapkin's` explanation  [from Zulip](https://rust-lang.zulipchat.com/#narrow/stream/185405-t-compiler.2Frust-analyzer/topic/Trait.20Checking/near/420942417):

> A type `A` can only implements traits which
> 1. Have a blanket implementation (`impl<T> Trait for T {}`)
> 2. Have implementation for `A` (`impl Trait for A {}`)
>
> Blanket implementation can only exist in `Trait`'s crate. Implementation for `A` can only exist in `A`'s or `Trait`'s crate.

Big thanks to Waffle for its keen observation!

---

I think some additional improvements are possible:
- `for_trait_and_self_ty` seemingly does not distinguish between `&T`, `&mut T`, or `T`, resulting in seemingly irrelevant traits like `tokio::io::AsyncWrite` being being included for, e.g., `&slog::Logger`. I don't know they're being considered due to the [autoref/autoderef behavior](https://github.com/rust-lang/rust-analyzer/blob/a02a219773629686bd8ff123ca1aa995fa50d976/crates/hir-ty/src/method_resolution.rs#L945-L962), but I wonder if it'd make sense to filter by mutability earlier and not consider trait implementations that require `&mut T` when we only have a `&T`.
- The method completions [spend a _lot_ of time in unification](https://rust-lang.zulipchat.com/#narrow/stream/185405-t-compiler.2Frust-analyzer/topic/Trait.20Checking/near/421072356), and while there might be low-hanging fruit there, it might make more sense to wait for the new trait solver in `rustc`. I dunno.

[^1]: The filtering occurs outside of typechecking, after all.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants