[perf] cache type info for ParamEnv #123058

lukas-code · 2024-03-25T20:49:43Z

This is an attempt to mitigate some of the perf regressions in #122553 (comment), but seems worth to test and land separately, since it is mostly unrelated to that PR.

lukas-code · 2024-03-25T20:54:00Z

compiler/rustc_middle/src/ty/list.rs

+        let data_ptr = ptr::addr_of!(self.skel.data).cast::<T>();
+        // SAFETY: `data_ptr` has the same provenance as `self` and can therefore
+        // access the `self.len` elements stored at `self.data`.
+        // Note that we specifically don't reborrow `&self.skel.data`, because that
+        // would give us a pointer with provenance over 0 bytes.
+        unsafe { slice::from_raw_parts(data_ptr, self.skel.len) }


This was UB before.

lqd · 2024-03-25T20:58:17Z

@bors try @rust-timer queue

bors · 2024-03-25T20:59:27Z

⌛ Trying commit 4a99cdc with merge be2a772...

[perf] cache type info for ParamEnv This is an attempt to mitigate some of the perf regressions in rust-lang#122553 (comment), but seems worth to test and land separately. r? `@ghost`

compiler/rustc_middle/src/ty/list.rs

bors · 2024-03-26T01:14:27Z

💥 Test timed out

lukas-code · 2024-03-26T10:42:05Z

uuh idk what bors is on about, the artifact looks like it got built? Let's try this:

☀️ Try build successful - checks-actions
Build commit: be2a772 (be2a772ea7aa3f1114b45c8fb9dd2a2466a5c152)

lukas-code · 2024-03-26T10:42:52Z

well that worked i guess?

rust-timer · 2024-03-26T12:16:37Z

Finished benchmarking commit (be2a772): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.4%	[0.3%, 0.6%]	4
Regressions ❌ (secondary)	0.5%	[0.5%, 0.5%]	3
Improvements ✅ (primary)	-1.7%	[-5.0%, -0.2%]	20
Improvements ✅ (secondary)	-0.3%	[-0.4%, -0.2%]	4
All ❌✅ (primary)	-1.3%	[-5.0%, 0.6%]	24

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.2%	[0.6%, 1.5%]	4
Regressions ❌ (secondary)	2.7%	[2.4%, 2.9%]	2
Improvements ✅ (primary)	-0.6%	[-0.9%, -0.4%]	3
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.4%	[-0.9%, 1.5%]	7

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.0%	[1.0%, 1.0%]	1
Regressions ❌ (secondary)	5.6%	[2.2%, 9.0%]	2
Improvements ✅ (primary)	-1.5%	[-2.9%, -0.4%]	12
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-1.4%	[-2.9%, 1.0%]	13

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 672.58s -> 671.55s (-0.15%)
Artifact size: 315.08 MiB -> 315.13 MiB (0.01%)

lukas-code · 2024-03-26T13:18:19Z

The regression in token-stream-stress looks like it's probably noise:

The regressions in bitmaps might be legit, but to me it seems like the improvements outweigh the regressions here.

r? @lcnr or compiler

bors · 2024-03-26T21:14:51Z

☔ The latest upstream changes (presumably #123098) made this pull request unmergeable. Please resolve the merge conflicts.

bors · 2024-04-02T01:54:09Z

☔ The latest upstream changes (presumably #123327) made this pull request unmergeable. Please resolve the merge conflicts.

lcnr · 2024-04-04T10:44:13Z

compiler/rustc_middle/src/ty/list.rs

+#[repr(C)]
+pub struct ListWithCachedTypeInfo<T> {
+    skel: ListWithCachedTypeInfoSkeleton<T>,
+    opaque: OpaqueListContents,


why is that moved out of ListSkeleton?

Because accessing a field with an extern type tail will always panic, because we don't know the align of the extern type and therefore the padding before the field. So basically if we just put List as the tail of ListWithCachedTypeInfo, then any attempt to access the list will panic.

that does not seem right to me 🤔 it should be totally fine to access an unsized field which contains an extern type as long as you never try to do a place-to-value conversion. I don't think we should ever copy ListSkeleton to the stack. Which use of ListSkeleton causes issues if you keep the current setup of having OpaqueListContents as a field of it?

https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=a47ebf1e7a6ce3908b3ab3fcdd6ecc8d

Hmmm apparently there is a special case if the field with an extern type tail is the only field. Which i guess makes sense since there can't be any padding in that case. If you change the outer type to have some extra data, your example will panic: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=92bafecde22b9fd72a3ddb9d9018bc68

I don't think we should ever copy ListSkeleton to the stack

I agree. If my code does that it's not intended.

I've added some comments to the *Skeleton types to hopefully clear this up a bit.

Relevant diff: https://github.com/rust-lang/rust/compare/c4f1f7aa586451154c669711abc9c4d9ad9b1075..fcc477fbd0871d8fab045bd7e9524172ab10e51c

lcnr

r=me after explaining the changes in ty/list.rs to me 🤔

compiler/rustc_middle/src/ty/list.rs

lcnr

🤔 the panic with extern types is annoying, but I understand the issue now.

Looking at this again, we now duplicate the unsafe code for List which is somewhat intricate and I am a bit worried that it ends up getting out of sync. Could we instead have a single type:

struct RawList<H, T> {
    header: H,
    length: usize,
    data: [T; 0],
    opaque: OpaqueListConten,
}

This should avoid both the "extern type alignment" issue and aloows us to remove the code duplication.

Sorry for not thinking of that during the first review round 😅

lcnr · 2024-04-06T15:58:44Z

@bors r+ rollup=never

bors · 2024-04-06T15:58:47Z

📌 Commit 94d61d8 has been approved by lcnr

It is now in the queue for this repository.

bors · 2024-04-07T02:07:23Z

⌛ Testing commit 94d61d8 with merge 087ae97...

bors · 2024-04-07T04:08:37Z

☀️ Test successful - checks-actions
Approved by: lcnr
Pushing 087ae97 to master...

rust-timer · 2024-04-07T06:56:00Z

Finished benchmarking commit (087ae97): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.9%	[-4.9%, -0.2%]	47
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-0.9%	[-4.9%, -0.2%]	47

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	1.0%	[1.0%, 1.0%]	1
Improvements ✅ (primary)	-1.3%	[-1.3%, -1.3%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-1.3%	[-1.3%, -1.3%]	1

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-2.2%	[-3.4%, -0.8%]	9
Improvements ✅ (secondary)	-2.8%	[-2.9%, -2.7%]	2
All ❌✅ (primary)	-2.2%	[-3.4%, -0.8%]	9

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 669.732s -> 668.846s (-0.13%)
Artifact size: 318.22 MiB -> 318.48 MiB (0.08%)

nnethercote · 2024-05-09T07:08:27Z

I stumbled across this today while looking at List. I see RawList has been added without much explanation in the comments. It seems to allow an arbitrary header to be included before the list length. How/why is this a speed win in some cases? It seems very strange to augment a generic container type with an arbitrary value of a different type.

lukas-code · 2024-05-09T10:57:48Z

There are some functions in the trait solver that use the TypeVisitableExt methods on a ParamEnv to find out some property about the where clauses that are assumed to be true during trait solving.

For example here we disable caching here if the where clauses contain inference variables (which can happen due to canonicalization).

rust/compiler/rustc_trait_selection/src/traits/select/mod.rs

Line 1508 in 5f8c17d

if param_env.has_infer() {

Or here we find out whether inference has happened on the where clauses by resolving the inference variables. This has a shortcut if there are no inference variables.

rust/compiler/rustc_trait_selection/src/traits/select/mod.rs

Line 574 in 5f8c17d

if this.infcx.resolve_vars_if_possible(goal) != goal {

Before this PR, the ParamEnv was represented roughly like this:

               ------------------------------------------------------------
Clause         | type flags | outer binder | stable hash | predicate kind |
               ------------------------------------------------------------
               ^
               |
               +------------+
                            |
               -------------------------------------
List<Clause>   | length | clause 1 | clause 2 | ... 
               -------------------------------------
                    ^
                    |
                    |
               --------------------------
ParamEnv       | pointer   | reveal bit |
               --------------------------

So to find out the type flags or outer binder of the whole ParamEnv we'd have to iterate over all of the clauses. For some crates, in particular diesel, the ParamEnvs can get rather large (order of magnitude 100..1000) and iterating over them several times for every trait that we have to check adds up.

The main thing this PR does is add ListWithCachedTypeInfo, which stores the type flags and outer binder directly in front of the List, and use that new type in ParamEnv:

               ------------------------------------------------------------
Clause         | type flags | outer binder | stable hash | predicate kind |
               ------------------------------------------------------------
               ^
               |
               +--------------------------------------------------------+
                                                                        |
                                 -----------------------------------------------------------------
ListWithCachedTypeInfo<Clause>   | type flags | outer binder | length | clause 1 | clause 2 | ... 
                                 -----------------------------------------------------------------
                                  ^
                                  |
                    +-------------+
                    |
               --------------------------
ParamEnv       | pointer   | reveal bit |
               --------------------------

This way, we can compute e.g. param_env.has_infer() in O(1) rather than O(n).

RawList only exists to share code between List<T> (= RawList<(), T>) and ListWithCachedTypeInfo<T> (= RawList<TypeInfo, T>).

nnethercote · 2024-05-17T07:22:05Z

Thanks for the explanation. In this bit:

                                 -----------------------------------------------------------------
ListWithCachedTypeInfo<Clause>   | type flags | outer binder | length | clause 1 | clause 2 | ... 
                                 -----------------------------------------------------------------

are the type flags/outer binder computed as a combination of the same-named fields from all the clauses?

lukas-code · 2024-05-17T09:27:28Z

are the type flags/outer binder computed as a combination of the same-named fields from all the clauses?

Exactly. The flags of the list is the bit-or of all the flags from the clauses and the outer binder of the list is the maximum of the outer binders of the clauses.

They are computed when the ListWithCachedTypeInfo<Clauses> is constructed here:

rust/compiler/rustc_middle/src/ty/context.rs

Lines 406 to 412 in 1189851

    
           let flags = super::flags::FlagComputation::for_clauses(clauses); 
        
           InternedInSet(ListWithCachedTypeInfo::from_arena( 
        
               &*self.arena, 
        
               flags.into(), 
        
               clauses, 
        
           ))

rust/compiler/rustc_middle/src/ty/flags.rs

Lines 38 to 45 in 1189851

    
           pub fn for_clauses(clauses: &[ty::Clause<'_>]) -> FlagComputation { 
        
               let mut result = FlagComputation::new(); 
        
               for c in clauses { 
        
                   result.add_flags(c.as_predicate().flags()); 
        
                   result.add_exclusive_binder(c.as_predicate().outer_exclusive_binder()); 
        
               } 
        
               result 
        
           }

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 25, 2024

lukas-code commented Mar 25, 2024

View reviewed changes

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 25, 2024

lukas-code commented Mar 25, 2024

View reviewed changes

compiler/rustc_middle/src/ty/list.rs Outdated Show resolved Hide resolved

This comment has been minimized.

Sign in to view

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Mar 26, 2024

lukas-code force-pushed the clauses branch from 4a99cdc to 2c5df92 Compare March 26, 2024 12:26

lukas-code mentioned this pull request Mar 26, 2024

make Thin a supertrait of Sized #122553

Open

lukas-code marked this pull request as ready for review March 26, 2024 13:16

rustbot assigned lcnr Mar 26, 2024

lukas-code force-pushed the clauses branch from 2c5df92 to 8c8f0a9 Compare March 26, 2024 14:00

lukas-code force-pushed the clauses branch from 8c8f0a9 to ee92e20 Compare March 26, 2024 23:33

lcnr reviewed Apr 4, 2024

View reviewed changes

compiler/rustc_middle/src/ty/list.rs Outdated Show resolved Hide resolved

lukas-code force-pushed the clauses branch 2 times, most recently from c4f1f7a to 2cdccba Compare April 4, 2024 16:18

cache type info for ParamEnv

fcc477f

lukas-code force-pushed the clauses branch from 2cdccba to fcc477f Compare April 4, 2024 16:33

lcnr reviewed Apr 6, 2024

View reviewed changes

add RawList

94d61d8

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 6, 2024

bors added the merged-by-bors This PR was explicitly merged by bors. label Apr 7, 2024

bors merged commit 087ae97 into rust-lang:master Apr 7, 2024
12 checks passed

rustbot added this to the 1.79.0 milestone Apr 7, 2024

This was referenced Apr 7, 2024

Implement minimal, internal-only pattern types in the type system #120131

Merged

Explicitly don't normalize param-env in new solver #122403

Open

rustbot removed the perf-regression Performance regression. label Apr 7, 2024

lukas-code deleted the clauses branch April 7, 2024 08:57

pnkfelix mentioned this pull request Apr 9, 2024

hir: Drop owner's own item-local id (zero) from parenting tables #123415

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[perf] cache type info for ParamEnv #123058

[perf] cache type info for ParamEnv #123058

lukas-code commented Mar 25, 2024 •

edited

Loading

lukas-code Mar 25, 2024

lqd commented Mar 25, 2024

This comment has been minimized.

bors commented Mar 25, 2024

bors commented Mar 26, 2024

lukas-code commented Mar 26, 2024

This comment has been minimized.

lukas-code commented Mar 26, 2024

rust-timer commented Mar 26, 2024

lukas-code commented Mar 26, 2024

bors commented Mar 26, 2024

bors commented Apr 2, 2024

lcnr Apr 4, 2024

lukas-code Apr 4, 2024

lcnr Apr 4, 2024

lukas-code Apr 4, 2024 •

edited

Loading

lukas-code Apr 4, 2024

lcnr left a comment

lcnr left a comment

lcnr commented Apr 6, 2024

bors commented Apr 6, 2024

bors commented Apr 7, 2024

bors commented Apr 7, 2024

rust-timer commented Apr 7, 2024

nnethercote commented May 9, 2024

lukas-code commented May 9, 2024

nnethercote commented May 17, 2024

lukas-code commented May 17, 2024

[perf] cache type info for ParamEnv #123058

[perf] cache type info for ParamEnv #123058

Conversation

lukas-code commented Mar 25, 2024 • edited Loading

Choose a reason for hiding this comment

lqd commented Mar 25, 2024

This comment has been minimized.

bors commented Mar 25, 2024

bors commented Mar 26, 2024

lukas-code commented Mar 26, 2024

This comment has been minimized.

lukas-code commented Mar 26, 2024

rust-timer commented Mar 26, 2024

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

lukas-code commented Mar 26, 2024

bors commented Mar 26, 2024

bors commented Apr 2, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lukas-code Apr 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lcnr left a comment

Choose a reason for hiding this comment

lcnr left a comment

Choose a reason for hiding this comment

lcnr commented Apr 6, 2024

bors commented Apr 6, 2024

bors commented Apr 7, 2024

bors commented Apr 7, 2024

rust-timer commented Apr 7, 2024

Overall result: ✅ improvements - no action needed

nnethercote commented May 9, 2024

lukas-code commented May 9, 2024

nnethercote commented May 17, 2024

lukas-code commented May 17, 2024

lukas-code commented Mar 25, 2024 •

edited

Loading

lukas-code Apr 4, 2024 •

edited

Loading