Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip duplicate stable crate ID encoding into metadata #119238

Merged
merged 1 commit into from Dec 24, 2023

Conversation

Mark-Simulacrum
Copy link
Member

@Mark-Simulacrum Mark-Simulacrum commented Dec 23, 2023

Instead, we store just the local crate hash as a bare u64. On decoding,
we recombine it with the crate's stable crate ID stored separately in
metadata. The end result is that we save ~8 bytes/DefIndex in metadata
size.

One key detail here is that we no longer distinguish in encoded metadata
between present and non-present DefPathHashes. It used to be highly
likely we could distinguish as we used DefPathHash::default(), an
all-zero representation. However in theory even that is fallible as
nothing strictly prevents the StableCrateId from being zero. In review it
was pointed out that we should never have a missing hash for a DefIndex anyway,
so this shouldn't matter.

@rustbot
Copy link
Collaborator

rustbot commented Dec 23, 2023

r? @wesleywiser

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Dec 23, 2023
@Mark-Simulacrum
Copy link
Member Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Dec 23, 2023
@bors
Copy link
Contributor

bors commented Dec 23, 2023

⌛ Trying commit 5b3116c with merge ce5fed6...

bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 23, 2023
… r=<try>

Specialize DefPathHash table to skip encoding crate IDs

The current implementation is ad-hoc and likely should be replaced with a non-table based approach (i.e., fully pulling out DefPathHash from the rmeta table infrastructure, of which we use ~none now), but this was an easy way to get an initial PR out.

The main pending question is whether the assumption made here that there is exactly one shared prefix accurate? If not, is it right that the number should be typically small? (If so a deduplication scheme of which this is a special case almost certainly makes sense).

We encode a lot of these (1000s) so the savings of 8 bytes/hash add up quickly. Opening this PR to get opinions more on the general idea and to run perf on whether the underlying impl will perform OK.
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Dec 23, 2023

💔 Test failed - checks-actions

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 23, 2023
@Mark-Simulacrum
Copy link
Member Author

@bors try @rust-timer queue

@bors
Copy link
Contributor

bors commented Dec 23, 2023

⌛ Trying commit 7ad670d with merge 86e86bf...

bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 23, 2023
… r=<try>

Specialize DefPathHash table to skip encoding crate IDs

The current implementation is ad-hoc and likely should be replaced with a non-table based approach (i.e., fully pulling out DefPathHash from the rmeta table infrastructure, of which we use ~none now), but this was an easy way to get an initial PR out.

The main pending question is whether the assumption made here that there is exactly one shared prefix accurate? If not, is it right that the number should be typically small? (If so a deduplication scheme of which this is a special case almost certainly makes sense).

We encode a lot of these (1000s) so the savings of 8 bytes/hash add up quickly. Opening this PR to get opinions more on the general idea and to run perf on whether the underlying impl will perform OK.
@Mark-Simulacrum Mark-Simulacrum changed the title Specialize DefPathHash table to skip encoding crate IDs Skip duplicate stable crate ID encoding into metadata Dec 23, 2023
@bors
Copy link
Contributor

bors commented Dec 23, 2023

☀️ Try build successful - checks-actions
Build commit: 86e86bf (86e86bf5b3bd2fea322b3be421af5bb093b284bb)

@rust-timer

This comment has been minimized.

// N.B. this means that we can't distinguish between non-present items and a present but zero
// local hash item. In practice the compiler shouldn't care about non-present items in a foreign
// crate, so this should be OK. If we do start to care we should most likely adjust our hashing
// to reserve a bit (e.g., NonZeroU64).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the contrary: having a DefIndex with no associated DefPathHash should be a bug. So it's not an issue if we cannot distinguish the default: it must not appear.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I'm not 100% sure that is upheld today (an earlier version of this PR hit some asserts in encoding that looked like zeros in the crate hash, which I assumed to mean that we had defaults present somewhere), see #119238 (comment) and this assert: rust-lang-ci@5b3116c#diff-6c291b96ad60b2cdaad44471b95c96d98c10d2eb9ed8dab44fe9e21ef7f1bd5eR508

Updated the comment for now and didn't dig deeper on re-introducing asserts etc.

compiler/rustc_metadata/src/rmeta/encoder.rs Outdated Show resolved Hide resolved
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (86e86bf): comparison URL.

Overall result: no relevant changes - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.0% [-2.8%, -0.1%] 114
Improvements ✅
(secondary)
-1.7% [-4.8%, -0.0%] 72
All ❌✅ (primary) -1.0% [-2.8%, -0.1%] 114

Bootstrap: 669.674s -> 670.033s (0.05%)
Artifact size: 312.80 MiB -> 312.62 MiB (-0.06%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Dec 23, 2023
Instead, we store just the local crate hash as a bare u64. On decoding,
we recombine it with the crate's stable crate ID stored separately in
metadata. The end result is that we save ~8 bytes/DefIndex in metadata
size.

One key detail here is that we no longer distinguish in encoded metadata
between present and non-present DefPathHashes. It used to be highly
likely we could distinguish as we used DefPathHash::default(), an
all-zero representation. However in theory even that is fallible as
nothing strictly prevents the StableCrateId from being zero.
bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 23, 2023
Remove metadata decoding DefPathHash cache

My expectation is that this cache is largely useless. Decoding a DefPathHash from metadata is essentially a pair of memory loads - there's no heavyweight processing involved. Caching it behind a HashMap just adds extra cost and incurs hashing overheads for the indices.

Based on rust-lang#119238.
@Mark-Simulacrum Mark-Simulacrum removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Dec 23, 2023
@Mark-Simulacrum Mark-Simulacrum added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Dec 23, 2023
@cjgillot
Copy link
Contributor

Great!
@bors r+

@bors
Copy link
Contributor

bors commented Dec 24, 2023

📌 Commit 6630d69 has been approved by cjgillot

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 24, 2023
bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 24, 2023
Only store StableCrateId once in DefPathTable.

rust-lang#119238 made me think of this.

cc `@Mark-Simulacrum`
@bors
Copy link
Contributor

bors commented Dec 24, 2023

⌛ Testing commit 6630d69 with merge 8a671d9...

bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 24, 2023
… r=cjgillot

Skip duplicate stable crate ID encoding into metadata

Instead, we store just the local crate hash as a bare u64. On decoding,
we recombine it with the crate's stable crate ID stored separately in
metadata. The end result is that we save ~8 bytes/DefIndex in metadata
size.

One key detail here is that we no longer distinguish in encoded metadata
between present and non-present DefPathHashes. It used to be highly
likely we could distinguish as we used DefPathHash::default(), an
all-zero representation. However in theory even that is fallible as
nothing strictly prevents the StableCrateId from being zero. In review it
was pointed out that we should never have a missing hash for a DefIndex anyway,
so this shouldn't matter.
@bors
Copy link
Contributor

bors commented Dec 24, 2023

💥 Test timed out

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Dec 24, 2023
@lqd
Copy link
Member

lqd commented Dec 24, 2023

@bors retry

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 24, 2023
@bors
Copy link
Contributor

bors commented Dec 24, 2023

⌛ Testing commit 6630d69 with merge cf64273...

@rust-log-analyzer
Copy link
Collaborator

A job failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

@bors
Copy link
Contributor

bors commented Dec 24, 2023

☀️ Test successful - checks-actions
Approved by: cjgillot
Pushing cf64273 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Dec 24, 2023
@bors bors merged commit cf64273 into rust-lang:master Dec 24, 2023
12 checks passed
@rustbot rustbot added this to the 1.77.0 milestone Dec 24, 2023
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (cf64273): comparison URL.

Overall result: no relevant changes - no action needed

@rustbot label: -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
3.3% [1.9%, 4.7%] 2
Improvements ✅
(primary)
-0.7% [-0.7%, -0.7%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.7% [-0.7%, -0.7%] 1

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-8.1% [-11.3%, -2.2%] 7
All ❌✅ (primary) - - 0

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.0% [-2.8%, -0.1%] 114
Improvements ✅
(secondary)
-1.7% [-4.8%, -0.0%] 72
All ❌✅ (primary) -1.0% [-2.8%, -0.1%] 114

Bootstrap: 671.528s -> 670.535s (-0.15%)
Artifact size: 312.79 MiB -> 312.63 MiB (-0.05%)

bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 24, 2023
…llot

Remove metadata decoding DefPathHash cache

My expectation is that this cache is largely useless. Decoding a DefPathHash from metadata is essentially a pair of memory loads - there's no heavyweight processing involved. Caching it behind a HashMap just adds extra cost and incurs hashing overheads for the indices.

Based on rust-lang#119238.
EliseZeroTwo pushed a commit to EliseZeroTwo/rust that referenced this pull request Dec 24, 2023
…llot

Remove metadata decoding DefPathHash cache

My expectation is that this cache is largely useless. Decoding a DefPathHash from metadata is essentially a pair of memory loads - there's no heavyweight processing involved. Caching it behind a HashMap just adds extra cost and incurs hashing overheads for the indices.

Based on rust-lang#119238.
@Mark-Simulacrum Mark-Simulacrum deleted the def-hash-efficiency branch December 29, 2023 12:53
bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 29, 2023
…ulacrum

Only store StableCrateId once in DefPathTable.

rust-lang#119238 made me think of this.

cc `@Mark-Simulacrum`
github-actions bot pushed a commit to rust-lang/miri that referenced this pull request Dec 31, 2023
Only store StableCrateId once in DefPathTable.

rust-lang/rust#119238 made me think of this.

cc `@Mark-Simulacrum`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants