Optimize layout calculations in HashMap #51340

Amanieu · 2018-06-04T08:21:05Z

This now produces the same assembly code as the previous implementation.

cc #51163 @nnethercote @gnzlbg @andjo403

rust-highfive · 2018-06-04T08:21:09Z

r? @kennytm

(rust_highfive has picked a reviewer for you, use r? to override)

rust-highfive · 2018-06-04T08:21:10Z

Warning

These commits modify submodules.

Amanieu · 2018-06-04T08:23:21Z

(stupid git pull doesn't update submodules by default...)

Playground link for playing around with the generated code: https://play.rust-lang.org/?gist=b233a122e3223db47521f5b0a0a3911c&version=nightly&mode=release

Amanieu · 2018-06-04T08:23:49Z

r? @SimonSapin

rust-highfive · 2018-06-04T08:32:33Z

The job x86_64-gnu-llvm-3.9 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.

[00:03:29]    Compiling std_unicode v0.0.0 (file:///checkout/src/libstd_unicode)
[00:03:30]    Compiling alloc_system v0.0.0 (file:///checkout/src/liballoc_system)
[00:03:30]    Compiling panic_abort v0.0.0 (file:///checkout/src/libpanic_abort)
[00:03:35]    Compiling panic_unwind v0.0.0 (file:///checkout/src/libpanic_unwind)
[00:03:36] error[E0425]: cannot find function `align_of` in this scope
[00:03:36]    --> libstd/collections/hash/table.rs:663:60
[00:03:36]     |
[00:03:36] 663 |         (layout, hashes.size() + hashes.padding_needed_for(align_of::<A>()))
[00:03:36]     |                                                            ^^^^^^^^ not found in this scope
[00:03:36] help: possible candidate is found in another module, you can import it into scope
[00:03:36] 11  | use core::mem::align_of;
[00:03:36]     |
[00:03:36] 
[00:03:36] error[E0412]: cannot find type `A` in this scope
[00:03:36] error[E0412]: cannot find type `A` in this scope
[00:03:36]    --> libstd/collections/hash/table.rs:663:71
[00:03:36]     |
[00:03:36] 663 |         (layout, hashes.size() + hashes.padding_needed_for(align_of::<A>()))
[00:03:36]     |                                                                       ^ did you mean `K`?
[00:03:39] error: aborting due to 2 previous errors
[00:03:39] 
[00:03:39] Some errors occurred: E0412, E0425.
[00:03:39] For more information about an error, try `rustc --explain E0412`.
[00:03:39] For more information about an error, try `rustc --explain E0412`.
[00:03:39] error: Could not compile `std`.
[00:03:39] 
[00:03:39] Caused by:
[00:03:39]   process didn't exit successfully: `/checkout/obj/build/bootstrap/debug/rustc --crate-name std libstd/lib.rs --color always --error-format json --crate-type dylib --crate-type rlib --emit=dep-info,link -C prefer-dynamic -C opt-level=2 --cfg feature="alloc_jemalloc" --cfg feature="backtrace" --cfg feature="jemalloc" --cfg feature="panic-unwind" --cfg feature="panic_unwind" -C metadata=fc6b9a3d7065b2e2 -C extra-filename=-fc6b9a3d7065b2e2 --out-dir /checkout/obj/build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps --target x86_64-unknown-linux-gnu -L dependency=/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps -L dependency=/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-std/release/deps --extern rustc_asan=/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/librustc_asan-a72c0fa0b2d5877e.rlib --extern std_unicode=/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/libstd_unicode-18039a1361938ca9.rlib --extern rustc_msan=/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/librustc_msan-ce1ae9dbeb02af61.rlib --extern rustc_lsan=/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/librustc_lsan-a2f6e9f263c82267.rlib --extern unwind=/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/libunwind-bd02867d7573c11e.rlib --extern alloc_system=/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/liballoc_system-386e278237d725f5.rlib --extern core=/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/libcore-e9cdce497aae9e81.rlib --extern panic_abort=/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/libpanic_abort-5f4d07ea9b3edda4.rlib --extern panic_unwind=/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/libpanic_unwind-2bd12fd5ac9768f9.rlib --extern rustc_tsan=/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/librustc_tsan-ccd411645051f1ef.rlib --extern libc=/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/liblibc-223f966213a60acb.rlib --extern alloc_jemalloc=/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/liballoc_jemalloc-5e6c2ac297f2c2e3.rlib --extern alloc=/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/liballoc-d6cd5f8b78fddf12.rlib --extern compiler_builtins=/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/libcompiler_builtins-90a13cda2e54742f.rlib -L native=/checkout/obj/build/x86_64-unknown-linux-gnu/native/libbacktrace/ -L native=/checkout/obj/build/x86_64-unknown-linux-gnu/native/libbacktrace -l static=backtrace -l static=backtrace -l dl -l rt -l pthread -L native=/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/build/compiler_builtins-ffd422941bf53e42/out -L native=/checkout/obj/build/x86_64-unknown-linux-gnu/native/jemalloc/lib` (exit code: 101)
[00:03:39] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "build" "--target" "x86_64-unknown-linux-gnu" "-j" "4" "--release" "--locked" "--color" "always" "--features" "panic-unwind jemalloc backtrace" "--manifest-path" "/checkout/src/libstd/Cargo.toml" "--message-format" "json"
[00:03:39] expected success, got: exit code: 101
[00:03:39] thread 'main' panicked at 'cargo must succeed', bootstrap/compile.rs:1091:9
[00:03:39] travis_fold:end:stage0-std

[00:03:39] travis_time:end:stage0-std:start=1528101059815394637,finish=1528101100038715281,duration=40223320644


[00:03:39] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test src/tools/tidy
[00:03:39] Build completed unsuccessfully in 0:00:41
[00:03:39] make: *** [tidy] Error 1
[00:03:39] Makefile:79: recipe for target 'tidy' failed

The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:21f9eaf0
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

gnzlbg · 2018-06-04T08:41:07Z

src/libstd/collections/hash/table.rs

+    hashes.extend(pairs).map(|(layout, _)| {
+        // LLVM seems to have trouble properly const-propagating pairs.align(),
+        // possibly due to the use of NonZeroUsize. This little hack allows it
+        // to generate optimal code.


This is weird, the NonZeroUsize::get(self) function is marked #[inline] and the constructor is a const fn. Did you by chance happen to look at the generated LLVM-IR for calculate_layout ?

Try the playground link that I posted above. If you remove the map then it generates worse code. What seems to happen is that LLVM will emit an overflowing multiply to calculate pairs.size, and will use the overflow flag to set pairs.align to either 0 (LayoutErr) or the actual alignment.

Perhaps the comment should include the second sentence of that explanation?

Every caller of calculate_layout only use one of the returned Layout or offset. Now that they’re mostly calculated separately, aren’t we better off having two separate functions?

To me this feels like more of a temporary hack to deal with a LLVM bug, rather than a permanent solution. Logically the offset calculation belongs as part of the layout calculation.

@nnethercote I added a reference to #51346 in the comment.

nnethercote · 2018-06-04T09:53:09Z

Nice! Hopefully this will fix the rustc perf regression. rustc uses hash tables a lot :)

gnzlbg · 2018-06-04T10:18:45Z

Avoiding NonZeroUsize: https://play.rust-lang.org/?gist=419e4d6cee534087573e17943b02b674&version=nightly&mode=release

playground::new:
	push	rax
	mov	rsi, rdi
	mov	ecx, 8
	mov	rax, rsi
	mul	rcx
	mov	rcx, rax
	jo	.LBB3_3
	mov	edx, 16
	mov	rax, rsi
	mul	rdx
	jo	.LBB3_3
	add	rax, rcx
	jb	.LBB3_3
	mov	rax, rcx
	pop	rcx
	ret

.LBB3_3:
	call	core::result::unwrap_failed
	ud2

which is at least shorter (don't know if better) than using NonZeroUsize: https://play.rust-lang.org/?gist=9c5e1054b61976dffc49b2d26c86ad82&version=nightly&mode=release

playground::new:
	mov	ecx, 8
	xor	esi, esi
	mov	rax, rdi
	mul	rcx
	mov	r8, rax
	setno	r9b
	jo	.LBB3_7
	mov	edx, 16
	xor	ecx, ecx
	mov	rax, rdi
	mul	rdx
	setno	dl
	jo	.LBB3_7
	mov	sil, r9b
	shl	rsi, 3
	mov	cl, dl
	shl	rcx, 3
	cmp	rcx, rsi
	cmovae	rsi, rcx
	lea	rdx, [r8 + rcx]
	add	rdx, -1
	neg	rcx
	and	rcx, rdx
	sub	rcx, r8
	add	rcx, r8
	jb	.LBB3_7
	add	rax, rcx
	jb	.LBB3_7
	mov	rdx, rsi
	neg	rdx
	cmp	rax, rdx
	ja	.LBB3_7
	test	rsi, rsi
	je	.LBB3_7
	lea	rax, [rsi + 15]
	and	rax, rsi
	jne	.LBB3_7
	mov	rax, rcx
	ret

.LBB3_7:
	push	rbp
	mov	rbp, rsp
	call	core::result::unwrap_failed
	ud2

However, given that both situations should just generate:

playground::original:
	lea	rax, [8*rdi]
	ret

I don't think that NonZeroUsize is the root of the issue here.

Amanieu · 2018-06-04T11:50:03Z

new is the implementation that is currently in rust right now (added by #51163). The one that you should be looking at is new_optimized which calls unreachable_unchecked if a LayoutErr occurs..

gnzlbg · 2018-06-04T13:11:30Z

@Amanieu that one was as good as the original IIRC (that is, this PR fixes the perf regression).

My point was that maybe it is worth fixing whatever codegen bug is causing the perf regression in the first place because, while we might add these workarounds in std, I don't think users of Layout outside of std will.

You mentioned the regression might be caused by making align NonZeroUsize, but while that has an impact on the generated code, that does not appear to be the cause (and reverting that has little effect). I think it would be worth it to open an issue to at least try to get to the bottom of it.

Amanieu · 2018-06-04T13:16:44Z

@gnzlbg If you use my original playground link and remove the .map then you will see that the generated code is not the same:

playground::new_optimized:
	movl	$16, %edx
	xorl	%ecx, %ecx
	movq	%rdi, %rax
	mulq	%rdx
	setno	%cl
	shlq	$3, %rcx
	leaq	(%rcx,%rdi,8), %rax
	addq	$-1, %rax
	negq	%rcx
	andq	%rax, %rcx
	movq	%rcx, %rax
	retq

playground::original:
	leaq	(,%rdi,8), %rax
	retq

This is the codegen bug that I was talking about, which is caused by NonZeroUsize. This discussion was originally started because you were wondering why the .map call was necessary.

gnzlbg · 2018-06-04T13:19:09Z

Can you explain how does that show that it is caused by NonZeroUsize ? In my two playground links I have the code, once using NonZeroUsize (https://play.rust-lang.org/?gist=9c5e1054b61976dffc49b2d26c86ad82&version=nightly&mode=release), and once using usize instead (https://play.rust-lang.org/?gist=419e4d6cee534087573e17943b02b674&version=nightly&mode=release). The assembly code generated for new without the .map looks like the one you show in both cases.

Amanieu · 2018-06-04T13:27:55Z

It's obvious if you look at the generated LLVM IR:

define i64 @_ZN10playground13new_optimized17hf103a5e804c97ad6E(i64 %c) unnamed_addr #2 personality i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* @rust_eh_personality {
start:
  %0 = shl i64 %c, 3

  // This computes the alignment result for Layout::repeat.
  // It checks if the multiplication overflows and returns either 0 (LayoutErr) or 8 (self.align).
  %1 = tail call { i64, i1 } @llvm.umul.with.overflow.i64(i64 %c, i64 16) #7
  %2 = extractvalue { i64, i1 } %1, 1
  %spec.select.i15.i = select i1 %2, i64 0, i64 8

  // This calculates the offset and Layout::padding_for in Layout::extend.
  // The codegen bug is here: At this point, the alignment can *only* be 8, not 0
  // since Err(LayoutErr) would have returned early before this point (due to ?).
  // But LLVM fails to constant-propagate this for some reason.
  %3 = add i64 %0, -1
  %4 = add i64 %3, %spec.select.i15.i
  %5 = sub nsw i64 0, %spec.select.i15.i
  %6 = and i64 %4, %5
  ret i64 %6
}

gnzlbg · 2018-06-04T13:28:00Z

Looking only at new_optimized with NonZeroUsize: https://play.rust-lang.org/?gist=e7cd2987b841d62657a4c8d114abea59&version=nightly&mode=release

playground::new_optimized:
	mov	rcx, rdi
	mov	edx, 8
	mov	rax, rcx
	mul	rdx
	mov	rsi, rax
	mov	edx, 16
	xor	edi, edi
	mov	rax, rcx
	mul	rdx
	setno	dil
	lea	rax, [rsi + 8*rdi]
	add	rax, -1
	shl	rdi, 3
	neg	rdi
	and	rdi, rax
	mov	rax, rdi
	ret

and without NonZeroUsize: https://play.rust-lang.org/?gist=25ce1213448b27d155bffd30c4adc831&version=nightly&mode=release

playground::new_optimized:
	lea	rax, [8*rdi]
	ret

It looks like not using NonZeroUsize makes a hell of a difference.

Mark-Simulacrum · 2018-06-04T14:49:41Z

@bors p=1 for when this is approved

This now produces the same assembly code as the previous implementation.

SimonSapin · 2018-06-04T17:05:15Z

Let’s land this to fix the perf regression, but I think it is rather unfortunate an somewhat defeats the point of doing #51163 in the first place.

@bors r+

bors · 2018-06-04T17:05:16Z

📌 Commit b69724f has been approved by SimonSapin

bors · 2018-06-04T18:36:59Z

⌛ Testing commit b69724f with merge 41affd0...

@nnethercote

Optimize layout calculations in HashMap This now produces the same assembly code as the previous implementation. cc #51163 @nnethercote @gnzlbg @andjo403

bors · 2018-06-04T21:27:37Z

☀️ Test successful - status-appveyor, status-travis
Approved by: SimonSapin
Pushing 41affd0 to master...

nnethercote · 2018-06-05T02:11:42Z

I just did some local measurements with Cachegrind and the numbers are looking much better -- the regression from #51163 looks to be entirely fixed. Thank you!

nikic · 2018-06-05T06:58:37Z

Perf link: http://perf.rust-lang.org/compare.html?start=cdc193db993eadab74d9ea439613e8f9d3725ac7&end=41affd03eb169830773cd1b11efda562ab81fad0&stat=instructions%3Au

This doesn't seem to show any difference though. Am using it incorrectly?

Mark-Simulacrum · 2018-06-05T13:52:19Z

That should be accurate; I'm not sure why we're seeing a difference from @nnethercote's local benchmarks.

Amanieu · 2018-06-05T14:01:31Z

Note that #51226 also significantly improve the performance since it marked several Layout functions as inlinable.

nnethercote · 2018-06-06T00:49:28Z

My measurements were with Cachegrind rather than perf-stat... that shouldn't make any difference though.

nnethercote · 2018-06-07T23:29:07Z

Can anyone think of reasons why I would see improvements on my machine, but the rustc-perf machine would not? Are the clang versions guaranteed to be the same?

Mark-Simulacrum · 2018-06-07T23:45:04Z

Clang is not installed on the perf collector (though not sure what specific clang you are referring to...); gcc is version 5.4.0.

nnethercote · 2018-06-08T00:43:20Z

Sorry, I meant LLVM, not clang. Are the LLVM versions guaranteed to be the same?

Mark-Simulacrum · 2018-06-08T01:09:44Z

Yes, LLVM should be version equivalent (by default), though the way it's compiled might be different, e.g. using gcc locally vs. clang on Travis. I suppose that could play a role, but it would be decidedly odd.

rust-highfive assigned kennytm Jun 4, 2018

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 4, 2018

Amanieu force-pushed the hashmap_layout2 branch from a9aeb0a to 3d8bca8 Compare June 4, 2018 08:22

rust-highfive assigned SimonSapin and unassigned kennytm Jun 4, 2018

Amanieu force-pushed the hashmap_layout2 branch 2 times, most recently from 7bf6245 to 0b56145 Compare June 4, 2018 08:26

Amanieu force-pushed the hashmap_layout2 branch from 0b56145 to 7311634 Compare June 4, 2018 08:34

gnzlbg reviewed Jun 4, 2018

View reviewed changes

Amanieu mentioned this pull request Jun 4, 2018

NonZero prevents values from being const-propagated properly #51346

Closed

Optimize layout calculations in HashMap

b69724f

This now produces the same assembly code as the previous implementation.

Amanieu force-pushed the hashmap_layout2 branch from 7311634 to b69724f Compare June 4, 2018 15:00

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 4, 2018

bors added a commit that referenced this pull request Jun 4, 2018

Auto merge of #51340 - Amanieu:hashmap_layout2, r=SimonSapin

41affd0

Optimize layout calculations in HashMap This now produces the same assembly code as the previous implementation. cc #51163 @nnethercote @gnzlbg @andjo403

bors merged commit b69724f into rust-lang:master Jun 4, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize layout calculations in HashMap #51340

Optimize layout calculations in HashMap #51340

Amanieu commented Jun 4, 2018

rust-highfive commented Jun 4, 2018

rust-highfive commented Jun 4, 2018

Amanieu commented Jun 4, 2018

Amanieu commented Jun 4, 2018

rust-highfive commented Jun 4, 2018

gnzlbg Jun 4, 2018

Amanieu Jun 4, 2018

nnethercote Jun 4, 2018

SimonSapin Jun 4, 2018

Amanieu Jun 4, 2018

Amanieu Jun 4, 2018

nnethercote commented Jun 4, 2018

gnzlbg commented Jun 4, 2018

Amanieu commented Jun 4, 2018

gnzlbg commented Jun 4, 2018 •

edited

Amanieu commented Jun 4, 2018

gnzlbg commented Jun 4, 2018 •

edited

Amanieu commented Jun 4, 2018 •

edited

gnzlbg commented Jun 4, 2018

Mark-Simulacrum commented Jun 4, 2018

SimonSapin commented Jun 4, 2018

bors commented Jun 4, 2018

bors commented Jun 4, 2018

bors commented Jun 4, 2018

nnethercote commented Jun 5, 2018

nikic commented Jun 5, 2018

Mark-Simulacrum commented Jun 5, 2018

Amanieu commented Jun 5, 2018

nnethercote commented Jun 6, 2018

nnethercote commented Jun 7, 2018

Mark-Simulacrum commented Jun 7, 2018

nnethercote commented Jun 8, 2018

Mark-Simulacrum commented Jun 8, 2018

Optimize layout calculations in HashMap #51340

Optimize layout calculations in HashMap #51340

Conversation

Amanieu commented Jun 4, 2018

rust-highfive commented Jun 4, 2018

rust-highfive commented Jun 4, 2018

Amanieu commented Jun 4, 2018

Amanieu commented Jun 4, 2018

rust-highfive commented Jun 4, 2018

gnzlbg Jun 4, 2018

Choose a reason for hiding this comment

Amanieu Jun 4, 2018

Choose a reason for hiding this comment

nnethercote Jun 4, 2018

Choose a reason for hiding this comment

SimonSapin Jun 4, 2018

Choose a reason for hiding this comment

Amanieu Jun 4, 2018

Choose a reason for hiding this comment

Amanieu Jun 4, 2018

Choose a reason for hiding this comment

nnethercote commented Jun 4, 2018

gnzlbg commented Jun 4, 2018

Amanieu commented Jun 4, 2018

gnzlbg commented Jun 4, 2018 • edited

Amanieu commented Jun 4, 2018

gnzlbg commented Jun 4, 2018 • edited

Amanieu commented Jun 4, 2018 • edited

gnzlbg commented Jun 4, 2018

Mark-Simulacrum commented Jun 4, 2018

SimonSapin commented Jun 4, 2018

bors commented Jun 4, 2018

bors commented Jun 4, 2018

bors commented Jun 4, 2018

nnethercote commented Jun 5, 2018

nikic commented Jun 5, 2018

Mark-Simulacrum commented Jun 5, 2018

Amanieu commented Jun 5, 2018

nnethercote commented Jun 6, 2018

nnethercote commented Jun 7, 2018

Mark-Simulacrum commented Jun 7, 2018

nnethercote commented Jun 8, 2018

Mark-Simulacrum commented Jun 8, 2018

gnzlbg commented Jun 4, 2018 •

edited

gnzlbg commented Jun 4, 2018 •

edited

Amanieu commented Jun 4, 2018 •

edited