Use a specialized varint + bitpacking scheme for DepGraph encoding #110050

saethlin · 2023-04-07T13:15:22Z

The previous scheme here uses leb128 to encode the edge tables that represent the incr comp dependency graph. The problem with that scheme is that leb128 has overhead for larger values, and generally relies on the distribution of encoded values being heavily skewed towards smaller values. That is definitely not the case for a dep node index, since they are handed out sequentially and the whole range is covered, the distribution is actually biased in the opposite direction: Most dep nodes are large.

This PR implements a different varint encoding scheme. Instead of applying varint encoding to individual dep node indices (which is extremely branchy) we now apply it per node.

While being built, each node now stores its edges in a SmallVec with a bit of extra logic to track the max value of each edge. Then we varint encode the whole batch. This is a gamble: We save on space by only claiming 2 bits per node instead of ~3 bits per edge which is a nice savings but needs to balance out with the space overhead that a single large index in a node with a lot of edges will encode unnecessary bytes in each of that node's edge indices.

Then, to keep the runtime overhead of this encoding scheme down we deserialize our indices by loading 4 bytes for each then masking off the bytes that are't ours. This is much less code and branches than leb128, but relies on having some readable bytes past the end of each edge list. We explicitly add such padding to the in-memory data during decoding. And we also do this decoding lazily, turning a dense on-disk encoding into a peak memory reduction.

Then we apply a bit-packing scheme; since in #115391 we now have unused bits on DepKind, we use those unused bits (currently there are 7!) to store the 2 bits that we need for the byte width of the edges in each node, then use the remaining bits to store the length of the edge list, if it fits.

r? @nnethercote

saethlin · 2023-04-07T13:15:47Z

@bors try @rust-timer queue

bors · 2023-04-07T13:15:57Z

⌛ Trying commit 0e05ccdfbc10f96411b16b0503b6234f701f1ab9 with merge cf71c3fe420cb3197ee017eaebcddfe19bc54720...

bors · 2023-04-07T13:27:23Z

💔 Test failed - checks-actions

saethlin · 2023-04-07T13:57:38Z

@bors try @rust-timer queue

bors · 2023-04-07T13:57:47Z

⌛ Trying commit 58405b6d16dcf0f5d865664b70c57e1a2ae240d1 with merge c297bb0502ac94c26a93f91543d3680a54b51ad5...

bors · 2023-04-07T15:40:31Z

☀️ Try build successful - checks-actions
Build commit: c297bb0502ac94c26a93f91543d3680a54b51ad5 (c297bb0502ac94c26a93f91543d3680a54b51ad5)

bors · 2023-04-07T15:40:31Z

☀️ Try build successful - checks-actions
Build commit: c297bb0502ac94c26a93f91543d3680a54b51ad5 (c297bb0502ac94c26a93f91543d3680a54b51ad5)

rust-timer · 2023-04-07T17:01:00Z

Finished benchmarking commit (c297bb0502ac94c26a93f91543d3680a54b51ad5): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.4%	[0.2%, 0.6%]	8
Regressions ❌ (secondary)	0.4%	[0.2%, 0.7%]	14
Improvements ✅ (primary)	-1.1%	[-2.5%, -0.4%]	76
Improvements ✅ (secondary)	-1.0%	[-2.1%, -0.4%]	17
All ❌✅ (primary)	-0.9%	[-2.5%, 0.6%]	84

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	1.6%	[1.6%, 1.6%]	1
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-	-	0

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	2.8%	[0.8%, 6.1%]	18
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-3.1%	[-3.1%, -3.1%]	1
All ❌✅ (primary)	2.8%	[0.8%, 6.1%]	18

compiler/rustc_serialize/src/serialize.rs

Kobzol · 2023-04-08T09:52:36Z

The icount reductions are nice, but cycles and wall-time don't look that good.

nnethercote

You have a lot of good comments on individual fields and functions, but it would be great to have a top-level comment somewhere containing much of the information from the PR description, which provides motivation and a high-level view.

nnethercote · 2023-09-05T04:18:02Z

compiler/rustc_query_system/src/dep_graph/graph.rs

-                TaskDepsRef::Allow(deps) => edges.extend(deps.lock().reads.iter().copied()),
+                TaskDepsRef::Allow(deps) => {
+                    for index in deps.lock().reads.iter().copied() {
+                        edges.push(index);


Would it be hard to impl Extend?

Nope! I really did not think of it at the time.

nnethercote · 2023-09-05T04:18:53Z

compiler/rustc_query_system/src/dep_graph/serialized.rs

+/// Amount of padding we need to add to the edge list data so that we can retrieve every
+/// SerializedDepNodeIndex with a fixed-size read then mask.
+const DEP_NODE_PAD: usize = DEP_NODE_SIZE - 1;
+/// Amount of bits we need to store the number of used bytes in a SerializedDepNodeIndex.


Suggested change

/// Amount of bits we need to store the number of used bytes in a SerializedDepNodeIndex.

/// Number of bits we need to store the number of used bytes in a SerializedDepNodeIndex.

I was trying to avoid saying "number" twice in a sentence, but if that construction reads well to you I agree that "amount" is a bit of an odd word to use in this context.

nnethercote · 2023-09-05T04:20:18Z

compiler/rustc_query_system/src/dep_graph/serialized.rs

        let mut nodes = IndexVec::with_capacity(node_count);
        let mut fingerprints = IndexVec::with_capacity(node_count);
        let mut edge_list_indices = IndexVec::with_capacity(node_count);
-        let mut edge_list_data = Vec::with_capacity(edge_count);
+        // This slightly over-estimates the amount of bytes used for all the edge data but never by
+        // more than ~6%, because over-estimation only occurs for large nodes.


Why does it overestimate? Where does the ~6% come from?

Good question. This comment needed to be updated, it was referring to an encoding scheme from a few iterations ago.

nnethercote · 2023-09-05T04:39:32Z

compiler/rustc_query_system/src/dep_graph/serialized.rs

+// Bit fields are
+// 0..?    length of the edge
+// ?..?+2  bytes per index
+// ?+2..16 kind


Use M and N instead of ?, perhaps?

nnethercote · 2023-09-05T04:42:40Z

r=me with the comments addressed, thanks.

@bors delegate=saethlin

bors · 2023-09-05T04:42:43Z

✌️ @saethlin, you can now approve this pull request!

If @nnethercote told you to "r=me" after making some further change, please make that change, then do @bors r=@nnethercote

Encode DepKind as u16 The derived Encodable/Decodable impls serialize/deserialize as a varint, which results in a lot of code size around the encoding/decoding of these types which isn't justified: The full range of values here is rather small but doesn't quite fit in to a `u8`. Growing _all_ serialized `DepKind` to 2 bytes costs us on average 1% size in the incr comp dep graph, which I plan to recoup in rust-lang/rust#110050 by taking advantage of the unused bits in all the serialized `DepKind`. r? `@nnethercote`

saethlin · 2023-09-07T01:23:17Z

@bors r=nnethercote

bors · 2023-09-07T01:23:18Z

📌 Commit 469dc8f has been approved by nnethercote

It is now in the queue for this repository.

bors · 2023-09-07T02:09:44Z

⌛ Testing commit 469dc8f with merge f00c139...

bors · 2023-09-07T03:56:35Z

☀️ Test successful - checks-actions
Approved by: nnethercote
Pushing f00c139 to master...

rust-timer · 2023-09-07T08:15:21Z

Finished benchmarking commit (f00c139): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	0.5%	[0.3%, 0.8%]	4
Improvements ✅ (primary)	-1.7%	[-5.8%, -0.3%]	104
Improvements ✅ (secondary)	-1.4%	[-2.9%, -0.5%]	32
All ❌✅ (primary)	-1.7%	[-5.8%, -0.3%]	104

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.4%	[-2.3%, -1.0%]	43
Improvements ✅ (secondary)	-1.8%	[-2.0%, -1.4%]	6
All ❌✅ (primary)	-1.4%	[-2.3%, -1.0%]	43

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	2.0%	[1.3%, 2.6%]	7
Regressions ❌ (secondary)	1.7%	[1.5%, 2.0%]	2
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	2.0%	[1.3%, 2.6%]	7

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 628.991s -> 628.823s (-0.03%)
Artifact size: 317.97 MiB -> 317.79 MiB (-0.06%)

lqd · 2023-09-07T08:32:24Z

As seen in the previous perf runs, this is a case where the reductions in icounts don't seem to match cycles and walltime, so investigating the small secondary regressions is probably not needed.

pnkfelix · 2023-09-13T19:16:17Z

on its surface, the improvements to instruction counts here clearly outweigh the regressions
it is worth noting that the cycle counts did not see the same trends;
there were zero improvements and 7 primary regressions to cycle counts.
still, marking as triaged; this PR has gone through enough performance evaluation already.

@rustbot label: +perf-regression-triaged

saethlin added the S-experimental Status: Ongoing experiment that does not require reviewing and won't be merged in its current state. label Apr 7, 2023

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 7, 2023

This comment has been minimized.

Sign in to view

bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 7, 2023

saethlin force-pushed the better-u32-encoding branch from 0e05ccd to 58405b6 Compare April 7, 2023 13:34

This comment has been minimized.

Sign in to view

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Apr 7, 2023

saethlin mentioned this pull request Apr 7, 2023

Optimize integer decoding a bit #109867

Closed

Noratrieb reviewed Apr 7, 2023

View reviewed changes

compiler/rustc_serialize/src/serialize.rs Outdated Show resolved Hide resolved

scottmcm reviewed Apr 8, 2023

View reviewed changes

compiler/rustc_serialize/src/serialize.rs Outdated Show resolved Hide resolved

saethlin force-pushed the better-u32-encoding branch from 58405b6 to 44cb85f Compare April 19, 2023 01:34

rustbot removed S-waiting-on-perf Status: Waiting on a perf run to be completed. perf-regression Performance regression. labels Sep 4, 2023

saethlin changed the title ~~Use a specialized varint encoding for the incr comp dep graph~~ Use a specialized varint + bitpacking scheme for DepGraph encoding Sep 4, 2023

rustbot assigned nnethercote Sep 4, 2023

saethlin marked this pull request as ready for review September 4, 2023 21:58

nnethercote reviewed Sep 5, 2023

View reviewed changes

Add comments with the same level of detail as the PR description

469dc8f

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 7, 2023

bors added the merged-by-bors This PR was explicitly merged by bors. label Sep 7, 2023

bors merged commit f00c139 into rust-lang:master Sep 7, 2023
12 checks passed

rustbot added this to the 1.74.0 milestone Sep 7, 2023

saethlin deleted the better-u32-encoding branch September 7, 2023 03:59

This was referenced Sep 7, 2023

Only use the new node hashmap for anonymous nodes. #112469

Open

Only use the new DepNode hashmap for anonymous nodes. #109050

Open

rustbot added the perf-regression Performance regression. label Sep 7, 2023

rustbot added the perf-regression-triaged The performance regression has been triaged. label Sep 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a specialized varint + bitpacking scheme for DepGraph encoding #110050

Use a specialized varint + bitpacking scheme for DepGraph encoding #110050

saethlin commented Apr 7, 2023 •

edited

Loading

saethlin commented Apr 7, 2023

This comment has been minimized.

bors commented Apr 7, 2023

This comment has been minimized.

bors commented Apr 7, 2023

This comment has been minimized.

saethlin commented Apr 7, 2023

This comment has been minimized.

bors commented Apr 7, 2023

bors commented Apr 7, 2023

bors commented Apr 7, 2023

This comment has been minimized.

rust-timer commented Apr 7, 2023

Kobzol commented Apr 8, 2023

nnethercote left a comment

nnethercote Sep 5, 2023

saethlin Sep 7, 2023

nnethercote Sep 5, 2023

saethlin Sep 7, 2023 •

edited

Loading

nnethercote Sep 5, 2023

saethlin Sep 7, 2023

nnethercote Sep 5, 2023

saethlin Sep 7, 2023

nnethercote commented Sep 5, 2023

bors commented Sep 5, 2023

saethlin commented Sep 7, 2023

bors commented Sep 7, 2023

bors commented Sep 7, 2023

bors commented Sep 7, 2023

rust-timer commented Sep 7, 2023

lqd commented Sep 7, 2023 •

edited

Loading

pnkfelix commented Sep 13, 2023

	/// Amount of bits we need to store the number of used bytes in a SerializedDepNodeIndex.
	/// Number of bits we need to store the number of used bytes in a SerializedDepNodeIndex.

Use a specialized varint + bitpacking scheme for DepGraph encoding #110050

Use a specialized varint + bitpacking scheme for DepGraph encoding #110050

Conversation

saethlin commented Apr 7, 2023 • edited Loading

saethlin commented Apr 7, 2023

This comment has been minimized.

bors commented Apr 7, 2023

This comment has been minimized.

bors commented Apr 7, 2023

This comment has been minimized.

saethlin commented Apr 7, 2023

This comment has been minimized.

bors commented Apr 7, 2023

bors commented Apr 7, 2023

bors commented Apr 7, 2023

This comment has been minimized.

rust-timer commented Apr 7, 2023

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Kobzol commented Apr 8, 2023

nnethercote left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saethlin Sep 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nnethercote commented Sep 5, 2023

bors commented Sep 5, 2023

saethlin commented Sep 7, 2023

bors commented Sep 7, 2023

bors commented Sep 7, 2023

bors commented Sep 7, 2023

rust-timer commented Sep 7, 2023

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

lqd commented Sep 7, 2023 • edited Loading

pnkfelix commented Sep 13, 2023

saethlin commented Apr 7, 2023 •

edited

Loading

saethlin Sep 7, 2023 •

edited

Loading

lqd commented Sep 7, 2023 •

edited

Loading