Improve BinaryHeap performance #78857

SkiFire13 · 2020-11-07T22:04:53Z

By changing the condition in the loops from child < end to child < end - 1 we're guaranteed that right = child + 1 < end and since finding the index of the biggest sibling can be done with an arithmetic operation we can remove a branch from the loop body. The case where there's no right child, i.e. child == end - 1 is instead handled outside the loop, after it ends; note that if the loops ends early we can use return instead of break since the check child == end - 1 will surely fail.

I've also removed a call to <[T]>::swap that was hiding a bound check that wasn't being optimized by LLVM.

A quick benchmarks on my pc shows that the gains are pretty significant:

name	before ns/iter	after ns/iter	diff ns/iter	diff %	speedup
find_smallest_1000	352,565	260,098	-92,467	-26.23%	x 1.36
from_vec	676,795	473,934	-202,861	-29.97%	x 1.43
into_sorted_vec	469,511	304,275	-165,236	-35.19%	x 1.54
pop	483,198	373,778	-109,420	-22.64%	x 1.29

The other 2 benchmarks for BinaryHeap (peek_mut_deref_mut and push) weren't impacted and as such didn't show any significant change.

rust-highfive · 2020-11-07T22:04:56Z

r? @KodrAus

(rust_highfive has picked a reviewer for you, use r? to override)

tavianator · 2020-11-09T16:56:41Z

I don't know if this makes a difference performance-wise (or even readability-wise), but you could avoid the comparisons with end - 1 by computing the right child by default instead of the left (child = 2 * pos + 2).

SkiFire13 · 2020-11-09T19:46:47Z

@tavianator I didn't think of that, but I wouldn't expect it to make much of a difference (the compiler stores end-1 in a register before the loop so no other calculation is required inside). I benched it anyway, and surprising it even performs worse.

name	left child ns/iter	right child ns/iter	diff ns/iter	diff %	speedup
find_smallest_1000	260,098	268,056	+7,958	+3.06%	x 0.97
from_vec	473,934	533,500	+59,566	+12.57%	x 0.89
into_sorted_vec	304,275	365,226	+60,951	+20.03%	x 0.83
pop	373,778	392,372	+18,594	+4.97%	x 0.95

I also tested a reduced version on godbolt, and looks like LLVM is doing a worse job at optimizing sift_down_range's loop (13 vs 16 instructions for a u64, that's like 23% more instructions!) which might explain the slowdown, in particular hole.get(child - 1) is not optimized as good as hole.get(child + 1) previously was.

tavianator · 2020-11-09T20:11:02Z

Huh, thanks for checking. That codegen is kinda odd.

KodrAus

Nice! I just left one comment to include your rationale for the new unsafe block inline. r=me after that.

library/alloc/src/collections/binary_heap.rs

SkiFire13 · 2020-11-09T22:16:20Z

I've added the safety comment, I hope it's clear enough. I'm tempted to document other unsafe usages as well, but that should probably have its own PR.

r=KodrAus

Xanewok · 2020-11-11T20:21:11Z

@bors r=KodrAus

(You need to mention the bot and be either in the reviewer set or be delegated to, see cheatsheet at https://bors.rust-lang.org/)

bors · 2020-11-11T20:21:13Z

📌 Commit 387568c has been approved by KodrAus

bors · 2020-11-12T03:25:29Z

⌛ Testing commit 387568c with merge d75378aab28aecf8b1521a40409823dd9c0f928b...

bors · 2020-11-12T03:33:21Z

💔 Test failed - checks-actions

kennytm · 2020-11-12T08:41:15Z

@bors retry

Improve BinaryHeap performance By changing the condition in the loops from `child < end` to `child < end - 1` we're guaranteed that `right = child + 1 < end` and since finding the index of the biggest sibling can be done with an arithmetic operation we can remove a branch from the loop body. The case where there's no right child, i.e. `child == end - 1` is instead handled outside the loop, after it ends; note that if the loops ends early we can use `return` instead of `break` since the check `child == end - 1` will surely fail. I've also removed a call to `<[T]>::swap` that was hiding a bound check that [wasn't being optimized by LLVM](https://godbolt.org/z/zrhdGM). A quick benchmarks on my pc shows that the gains are pretty significant: |name |before ns/iter |after ns/iter |diff ns/iter |diff % |speedup | |---------------------|----------------|---------------|--------------|----------|--------| |find_smallest_1000 | 352,565 | 260,098 | -92,467 | -26.23% | x 1.36 | |from_vec | 676,795 | 473,934 | -202,861 | -29.97% | x 1.43 | |into_sorted_vec | 469,511 | 304,275 | -165,236 | -35.19% | x 1.54 | |pop | 483,198 | 373,778 | -109,420 | -22.64% | x 1.29 | The other 2 benchmarks for `BinaryHeap` (`peek_mut_deref_mut` and `push`) weren't impacted and as such didn't show any significant change.

Rollup of 7 pull requests Successful merges: - rust-lang#76730 (Fix rustdoc rendering of by-value mutable arguments in async fn) - rust-lang#78836 (Implement destructuring assignment for structs and slices) - rust-lang#78857 (Improve BinaryHeap performance) - rust-lang#78950 (Add asm register information for SPIR-V) - rust-lang#78970 (update rustfmt to v1.4.25) - rust-lang#78972 (Update cargo) - rust-lang#78987 (extend min_const_generics param ty tests) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup

* Remove useless branches from sift_down_range loop * Remove branches from sift_down_to_bottom loop * Remove useless bound checks from into_sorted_vec

518: Remove unpredicted branch from kmerge::sift_down r=jswrenn a=SkiFire13 This is pretty much a port from rust-lang/rust#78857 Compared with the previous implementation, this adds more branches for bound checks which aren't present on the stdlib version, however they should be predicted almost always. The speedup should come from the removal of an unpredictable branch from the loop body, in favor of boolean arithmetic. The benchmarks seem to agree: ``` before: test kmerge default ... bench: 6812 ns/iter (+/- 18) test kmerge tenway ... bench: 223673 ns/iter (+/- 769) after: test kmerge default ... bench: 6212 ns/iter (+/- 43) test kmerge tenway ... bench: 190700 ns/iter (+/- 419) ``` Co-authored-by: Giacomo Stevanato <giaco.stevanato@gmail.com>

SkiFire13 added 3 commits November 7, 2020 22:20

Remove branches from sift_down_to_bottom loop

6dfcf9a

Remove useless branches from sift_down_range loop

25b3f61

Remove useless bound checks from into_sorted_vec

8d15753

rust-highfive assigned KodrAus Nov 7, 2020

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 7, 2020

SkiFire13 mentioned this pull request Nov 8, 2020

Suboptimal performance of BinaryHeap::append #77433

Closed

KodrAus reviewed Nov 9, 2020

View reviewed changes

library/alloc/src/collections/binary_heap.rs Show resolved Hide resolved

Added SAFETY comment as request

387568c

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 11, 2020

bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Nov 12, 2020

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 12, 2020

m-ou-se mentioned this pull request Nov 12, 2020

Rollup of 7 pull requests #78992

Closed

m-ou-se mentioned this pull request Nov 12, 2020

Rollup of 7 pull requests #78998

Merged

bors merged commit 4088981 into rust-lang:master Nov 13, 2020

rustbot added this to the 1.49.0 milestone Nov 13, 2020

SkiFire13 deleted the bheap-opt branch November 13, 2020 09:32

SkiFire13 mentioned this pull request Dec 29, 2020

Port rust-lang/rust#78857 - Improve BinaryHeap performance sekineh/binary-heap-plus-rs#28

Merged

SkiFire13 mentioned this pull request Jan 20, 2021

Remove unpredicted branch from kmerge::sift_down rust-itertools/itertools#518

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve BinaryHeap performance #78857

Improve BinaryHeap performance #78857

SkiFire13 commented Nov 7, 2020

rust-highfive commented Nov 7, 2020

tavianator commented Nov 9, 2020

SkiFire13 commented Nov 9, 2020

tavianator commented Nov 9, 2020

KodrAus left a comment

SkiFire13 commented Nov 9, 2020

Xanewok commented Nov 11, 2020

bors commented Nov 11, 2020

bors commented Nov 12, 2020

bors commented Nov 12, 2020

kennytm commented Nov 12, 2020

Improve BinaryHeap performance #78857

Improve BinaryHeap performance #78857

Conversation

SkiFire13 commented Nov 7, 2020

rust-highfive commented Nov 7, 2020

tavianator commented Nov 9, 2020

SkiFire13 commented Nov 9, 2020

tavianator commented Nov 9, 2020

KodrAus left a comment

Choose a reason for hiding this comment

SkiFire13 commented Nov 9, 2020

Xanewok commented Nov 11, 2020

bors commented Nov 11, 2020

bors commented Nov 12, 2020

bors commented Nov 12, 2020

kennytm commented Nov 12, 2020