Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a specialized varint + bitpacking scheme for DepGraph encoding #110050

Merged
merged 2 commits into from
Sep 7, 2023

Conversation

saethlin
Copy link
Member

@saethlin saethlin commented Apr 7, 2023

The previous scheme here uses leb128 to encode the edge tables that represent the incr comp dependency graph. The problem with that scheme is that leb128 has overhead for larger values, and generally relies on the distribution of encoded values being heavily skewed towards smaller values. That is definitely not the case for a dep node index, since they are handed out sequentially and the whole range is covered, the distribution is actually biased in the opposite direction: Most dep nodes are large.

This PR implements a different varint encoding scheme. Instead of applying varint encoding to individual dep node indices (which is extremely branchy) we now apply it per node.

While being built, each node now stores its edges in a SmallVec with a bit of extra logic to track the max value of each edge. Then we varint encode the whole batch. This is a gamble: We save on space by only claiming 2 bits per node instead of ~3 bits per edge which is a nice savings but needs to balance out with the space overhead that a single large index in a node with a lot of edges will encode unnecessary bytes in each of that node's edge indices.

Then, to keep the runtime overhead of this encoding scheme down we deserialize our indices by loading 4 bytes for each then masking off the bytes that are't ours. This is much less code and branches than leb128, but relies on having some readable bytes past the end of each edge list. We explicitly add such padding to the in-memory data during decoding. And we also do this decoding lazily, turning a dense on-disk encoding into a peak memory reduction.

Then we apply a bit-packing scheme; since in #115391 we now have unused bits on DepKind, we use those unused bits (currently there are 7!) to store the 2 bits that we need for the byte width of the edges in each node, then use the remaining bits to store the length of the edge list, if it fits.

r? @nnethercote

@saethlin saethlin added the S-experimental Status: Ongoing experiment that does not require reviewing and won't be merged in its current state. label Apr 7, 2023
@rustbot rustbot added A-query-system Area: The rustc query system (https://rustc-dev-guide.rust-lang.org/query.html) S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Apr 7, 2023
@saethlin
Copy link
Member Author

saethlin commented Apr 7, 2023

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 7, 2023
@bors
Copy link
Contributor

bors commented Apr 7, 2023

⌛ Trying commit 0e05ccdfbc10f96411b16b0503b6234f701f1ab9 with merge cf71c3fe420cb3197ee017eaebcddfe19bc54720...

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Apr 7, 2023

💔 Test failed - checks-actions

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 7, 2023
@rust-log-analyzer

This comment has been minimized.

@saethlin
Copy link
Member Author

saethlin commented Apr 7, 2023

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Apr 7, 2023

⌛ Trying commit 58405b6d16dcf0f5d865664b70c57e1a2ae240d1 with merge c297bb0502ac94c26a93f91543d3680a54b51ad5...

@bors
Copy link
Contributor

bors commented Apr 7, 2023

☀️ Try build successful - checks-actions
Build commit: c297bb0502ac94c26a93f91543d3680a54b51ad5 (c297bb0502ac94c26a93f91543d3680a54b51ad5)

1 similar comment
@bors
Copy link
Contributor

bors commented Apr 7, 2023

☀️ Try build successful - checks-actions
Build commit: c297bb0502ac94c26a93f91543d3680a54b51ad5 (c297bb0502ac94c26a93f91543d3680a54b51ad5)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (c297bb0502ac94c26a93f91543d3680a54b51ad5): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.4% [0.2%, 0.6%] 8
Regressions ❌
(secondary)
0.4% [0.2%, 0.7%] 14
Improvements ✅
(primary)
-1.1% [-2.5%, -0.4%] 76
Improvements ✅
(secondary)
-1.0% [-2.1%, -0.4%] 17
All ❌✅ (primary) -0.9% [-2.5%, 0.6%] 84

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
1.6% [1.6%, 1.6%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.8% [0.8%, 6.1%] 18
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-3.1% [-3.1%, -3.1%] 1
All ❌✅ (primary) 2.8% [0.8%, 6.1%] 18

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Apr 7, 2023
@Kobzol
Copy link
Contributor

Kobzol commented Apr 8, 2023

The icount reductions are nice, but cycles and wall-time don't look that good.

@rustbot rustbot removed S-waiting-on-perf Status: Waiting on a perf run to be completed. perf-regression Performance regression. labels Sep 4, 2023
@saethlin saethlin changed the title Use a specialized varint encoding for the incr comp dep graph Use a specialized varint + bitpacking scheme for DepGraph encoding Sep 4, 2023
@saethlin saethlin added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. S-experimental Status: Ongoing experiment that does not require reviewing and won't be merged in its current state. labels Sep 4, 2023
@saethlin saethlin marked this pull request as ready for review September 4, 2023 21:58
Copy link
Contributor

@nnethercote nnethercote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a lot of good comments on individual fields and functions, but it would be great to have a top-level comment somewhere containing much of the information from the PR description, which provides motivation and a high-level view.

TaskDepsRef::Allow(deps) => edges.extend(deps.lock().reads.iter().copied()),
TaskDepsRef::Allow(deps) => {
for index in deps.lock().reads.iter().copied() {
edges.push(index);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be hard to impl Extend?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope! I really did not think of it at the time.

/// Amount of padding we need to add to the edge list data so that we can retrieve every
/// SerializedDepNodeIndex with a fixed-size read then mask.
const DEP_NODE_PAD: usize = DEP_NODE_SIZE - 1;
/// Amount of bits we need to store the number of used bytes in a SerializedDepNodeIndex.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Amount of bits we need to store the number of used bytes in a SerializedDepNodeIndex.
/// Number of bits we need to store the number of used bytes in a SerializedDepNodeIndex.

Copy link
Member Author

@saethlin saethlin Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to avoid saying "number" twice in a sentence, but if that construction reads well to you I agree that "amount" is a bit of an odd word to use in this context.

let mut nodes = IndexVec::with_capacity(node_count);
let mut fingerprints = IndexVec::with_capacity(node_count);
let mut edge_list_indices = IndexVec::with_capacity(node_count);
let mut edge_list_data = Vec::with_capacity(edge_count);
// This slightly over-estimates the amount of bytes used for all the edge data but never by
// more than ~6%, because over-estimation only occurs for large nodes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it overestimate? Where does the ~6% come from?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. This comment needed to be updated, it was referring to an encoding scheme from a few iterations ago.

// Bit fields are
// 0..? length of the edge
// ?..?+2 bytes per index
// ?+2..16 kind
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use M and N instead of ?, perhaps?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@nnethercote
Copy link
Contributor

r=me with the comments addressed, thanks.

@bors delegate=saethlin

@bors
Copy link
Contributor

bors commented Sep 5, 2023

✌️ @saethlin, you can now approve this pull request!

If @nnethercote told you to "r=me" after making some further change, please make that change, then do @bors r=@nnethercote

RalfJung pushed a commit to RalfJung/miri that referenced this pull request Sep 6, 2023
Encode DepKind as u16

The derived Encodable/Decodable impls serialize/deserialize as a varint, which results in a lot of code size around the encoding/decoding of these types which isn't justified: The full range of values here is rather small but doesn't quite fit in to a `u8`. Growing _all_ serialized `DepKind` to 2 bytes costs us on average 1% size in the incr comp dep graph, which I plan to recoup in rust-lang/rust#110050 by taking advantage of the unused bits in all the serialized `DepKind`.

r? `@nnethercote`
@saethlin
Copy link
Member Author

saethlin commented Sep 7, 2023

@bors r=nnethercote

@bors
Copy link
Contributor

bors commented Sep 7, 2023

📌 Commit 469dc8f has been approved by nnethercote

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 7, 2023
@bors
Copy link
Contributor

bors commented Sep 7, 2023

⌛ Testing commit 469dc8f with merge f00c139...

@bors
Copy link
Contributor

bors commented Sep 7, 2023

☀️ Test successful - checks-actions
Approved by: nnethercote
Pushing f00c139 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Sep 7, 2023
@bors bors merged commit f00c139 into rust-lang:master Sep 7, 2023
12 checks passed
@rustbot rustbot added this to the 1.74.0 milestone Sep 7, 2023
@saethlin saethlin deleted the better-u32-encoding branch September 7, 2023 03:59
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (f00c139): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
0.5% [0.3%, 0.8%] 4
Improvements ✅
(primary)
-1.7% [-5.8%, -0.3%] 104
Improvements ✅
(secondary)
-1.4% [-2.9%, -0.5%] 32
All ❌✅ (primary) -1.7% [-5.8%, -0.3%] 104

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.4% [-2.3%, -1.0%] 43
Improvements ✅
(secondary)
-1.8% [-2.0%, -1.4%] 6
All ❌✅ (primary) -1.4% [-2.3%, -1.0%] 43

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.0% [1.3%, 2.6%] 7
Regressions ❌
(secondary)
1.7% [1.5%, 2.0%] 2
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 2.0% [1.3%, 2.6%] 7

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 628.991s -> 628.823s (-0.03%)
Artifact size: 317.97 MiB -> 317.79 MiB (-0.06%)

@rustbot rustbot added the perf-regression Performance regression. label Sep 7, 2023
@lqd
Copy link
Member

lqd commented Sep 7, 2023

As seen in the previous perf runs, this is a case where the reductions in icounts don't seem to match cycles and walltime, so investigating the small secondary regressions is probably not needed.

@pnkfelix
Copy link
Member

  • on its surface, the improvements to instruction counts here clearly outweigh the regressions
  • it is worth noting that the cycle counts did not see the same trends;
    there were zero improvements and 7 primary regressions to cycle counts.
  • still, marking as triaged; this PR has gone through enough performance evaluation already.

@rustbot label: +perf-regression-triaged

@rustbot rustbot added the perf-regression-triaged The performance regression has been triaged. label Sep 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-query-system Area: The rustc query system (https://rustc-dev-guide.rust-lang.org/query.html) merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.