-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Avoid bounds checking at slice::binary_search #30917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
d594bef
to
af7b3d7
Compare
Is there any benchmark data for this change? |
Sorry, wrong buttons. I thought this would be a no brainer. If necessary I can arrange some tests to show the improvements. I saw some in my burst trie implementation a while a go. |
On the topic of correctly implementing binary search being a no brainer: http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html ;) But more seriously, any time we move away from safe code it definitely makes sense to have some definitive benchmarks (and, ideally, we'd be testing those, though we don't quite have the infrastructure for it right now). |
Just to be extra clear, this doesn't change the logic at all, just use the unchecked get instead of index operator. Anyway I'll produce some numbers tomorrow. |
Yes, but it does raise the stakes on that logic being correct. :)
Sounds great! |
Small but noticeable improvement. Here are the results in my machine. Code used in Gist As for the algorithm I checked it for the common mistakes and I believe it's correct.
|
On the topic of profiling array search being a no brainer: http://cglab.ca/~morin/misc/arraylayout-v2/ 😸 |
Well, I don't think changing memory layouts apply here. There's all sort of fancy ways to try (microarchitecture dependent) speed up the binary search itself like using conditional moves but those are only suitable when searching over integers. If anyone knows how to improve a generic version like this it'd be nice though. |
Would it be possible to get rid of the redundant check by updating the slice on which we are doing the search instead of keeping track of its boundaries manually? |
I was curious what was going on here in terms of codegen as cases like this typically seem like LLVM could get all the optimizations in the right place. I whipped up an example of what's going on right now in both versions of this core function, and the only difference in IR is listed below. Otherwise all basic blocks are the same between the two versions. ; version using get_unchecked()
while_body:
%base.05 = phi i64 [ 0, %entry-block ], [ %base.1, %join ]
%lim.04 = phi i64 [ 3, %entry-block ], [ %11, %join ]
%3 = lshr i64 %lim.04, 1
%4 = add i64 %base.05, %3
%5 = icmp eq i64 %4, 2
%6 = icmp ult i64 %4, 2
%..i.i = select i1 %6, i8 -1, i8 1
%sret_slot.0.i.i = select i1 %5, i8 0, i8 %..i.i
switch i8 %sret_slot.0.i.i, label %match_else [
i8 0, label %match_case
i8 -1, label %match_case3
i8 1, label %join
]
; version using safe indexing
while_body:
%base.07 = phi i64 [ 0, %entry-block ], [ %base.1, %join ]
%lim.06 = phi i64 [ 3, %entry-block ], [ %12, %join ]
%3 = lshr i64 %lim.06, 1
%4 = add i64 %base.07, %3
%5 = icmp ugt i64 %4, 2
br i1 %5, label %cond.i, label %"_ZN3bar16_$LT$closure$GT$12closure.3826E.exit", !prof !0
cond.i:
%.lcssa = phi i64 [ %4, %while_body ]
tail call void @_ZN9panicking18panic_bounds_check20hd44ea11c616af168XYLE(...)
unreachable
"_ZN3bar16_$LT$closure$GT$12closure.3826E.exit":
%6 = icmp eq i64 %4, 2
%7 = icmp ult i64 %4, 2
%..i.i = select i1 %7, i8 -1, i8 1
%sret_slot.0.i.i = select i1 %6, i8 0, i8 %..i.i
switch i8 %sret_slot.0.i.i, label %match_else [
i8 0, label %match_case
i8 -1, label %match_case3
i8 1, label %join
] So looking at these two snippets, LLVM today cannot assert that this is a constant branch which always takes the false one: br i1 %5, label %cond.i, label %"_ZN3bar16_$LT$closure$GT$12closure.3826E.exit", !prof !0 With the change to if base + (lim >> 1) > 2 {
panic!()
} It is essentially ensuring that the index is less than the length of the slice, which is in this case 3. To me this does seem like something that's actually pretty hard for a compiler to statically determine. There's quite a few changes to I would prefer to reason to ourselves that this property is indeed always true (before we start resorting to Overall, the wins seems so negligible here and this is such a tricky area that I'd somewhat lean more towards sticking with the safe version. |
@alexcrichton you correctly say, the main issue is that LLVM does not manage to prove |
This gives an easy ~10% improvement on a few cases. It's easily a win-win to me. It's not like there's no unsafe blocks in core/slice.rs already (34). |
@arthurprs Some of those NB: Actually I would love if we had a way to track when LLVM learns how to optimise these checks so that we can remove |
@arthurprs to me at least this may not necessarily be a win-win. While everything we've tested on has shown that this continues to work, this quote from @nikomatsakis's link is particularly telling:
This just goes to show that unsafe code is incredibly tricky at its core and needs to be taken very lightly. The way LLVM does transformations is typically relatively simple (e.g. easy to verify), so if we could coerce LLVM into proving that the bounds check isn't needed here that'd be the best situation. If that doesn't happen then there's some nontrivial logic which means the proof the bounds check isn't necessary is more complicated (e.g. we can't trivially say "oh it's faster let's merge!"). I'm wary here because |
If you merge the sound unchecked indexing through “generativity” framework you can implement binary search safely without bounds checks.. 😄 (code is here!) Jokes aside, this change seems to be something libstd should do.. we write |
@bluss I tried again doing the slice thing after your example (in particular, splitting at len>>1, then checking the size of the tail, which is guaranteed to be at least as big as the tail): #![crate_type="lib"]
use std::cmp::Ordering::*;
pub fn binary_search_by_split(a: &[i32]) -> Result<usize, usize> {
let mut s = a;
let mut base = 0usize;
loop {
let (head, tail) = s.split_at(s.len() >> 1);
if tail.is_empty() {
return Err(base)
}
match tail[0].cmp(&0) {
Equal => return Ok(base),
Less => {
base += head.len() + 1;
s = &tail[1..];
}
Greater => s = head,
}
}
} This still has a check which might cause a panic ( Notice that in order to completely get rid of bound checks, the binary search code should still be modified, but it would then be possible to do so without any unsafe code. |
@ranma42 I love it! You can use .split_first() to make it even prettier. |
@ranma42 nice! Out of curiosity, is it the That seems at least pretty easily verifiable by a human in terms of not having any problems, so I'd be fine changing to unsafe indexing there as well. |
@bluss Thanks for pointing that out, I updated my local example and checked that everything works just fine with @alexcrichton Yes, the call to In fact, I even try to go in the opposite direction: find out why checks in libcore/libstd are not optimised and push the optimisations in LLVM, so that we can eventually get rid of |
@ranma42 related by example, I remember that a case of |
If the code in the standard library were adapted to what @ranma42 has pasted then I'd be fine with the unsafe indexing in |
These are some neat alternatives! @alexcrichton what do you mean with unsafe indexing in split_at? |
@arthurprs oh so in @ranma42's example all the bounds checks are elided except for the call to |
@arthurprs yes it sounds like once we upgrade LLVM @ranma42's version will have no bounds checks in optimized code. It is true yeah as well that today we don't have unchecked slicing for slices (but we probably should...) |
Ok sounds great! Yeah, I think so. I use from_raw_parts a lot more than I'd like. |
Question, with the current llvm wouldn't "/ 2" optimize the bounds checking out? I tried it but it doesn't. Judging by the @ranma42 commit message and the diff I was assuming that it would. Here are the results for the new version
|
af7b3d7
to
6f2c507
Compare
@arthurprs LLVM is clever enough to realise that the costly division in |
@ranma42 Thank you, that's very interesting. |
6f2c507
to
0986933
Compare
if tail.is_empty() { | ||
return Err(base) | ||
} | ||
match f(&tail[0]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use split_first here instead? It will do the job of .is_empty(), getting the first element, and slicing out the &tail[1..], all in one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compiler adds another branch, I'm not sure why.
assembly diff https://www.diffchecker.com/5p8siwcp
if let Some((pivot, rest)) = tail.split_first() {
match f(pivot) {
Equal => return Ok(base + head.len()),
Less => {
base += head.len() + 1;
s = rest;
}
Greater => s = head,
}
} else {
return Err(base)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ranma42 do you know what's the case here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not know why the code should not be the same.
Looking at the IR, the LLVM basic block that generates that jump seems to be
"_ZN5slice12_$u5b$T$u5d$8split_at21h15129678983694567549E.exit": ; preds = %loop_body
%5 = getelementptr inbounds i32, i32* %s.sroa.0.0.ph, i32 %3
%6 = icmp eq i32 %s.sroa.7.0, %3
%switchtmp = icmp eq i32* %5, null
%or.cond = or i1 %6, %switchtmp
br i1 %or.cond, label %match_case, label %match_case4
@arthurprs the new, slice-based version is pretty neat. Thanks for pushing on that. |
0986933
to
35c8672
Compare
35c8672
to
7e5b9d7
Compare
Avoid bounds checking for binary search. All calculated indexes are safe and the branch is useless.
This commit extends the patterns recognised by InstSimplify to also handle (x >> y) <= x in the same way as (x /u y) <= x. The missing optimisation was found investigating why LLVM did not optimise away bound checks in a binary search: rust-lang/rust#30917 Patch by Andrea Canciani! Differential Revision: http://reviews.llvm.org/D16402
💔 We still have the bounds check there, I'm unsure why since we have the optimization in rust llvm branch https://github.com/rust-lang/llvm/blob/rust-llvm-2016-07-09/lib/Analysis/InstructionSimplify.cpp#L2826 Can somebody familiar with llvm take a quick look?
becomes
|
From the Rust code, it ends up as %3 = lshr i64 %s.sroa.7.0.i.i, 1
%4 = icmp ult i64 %s.sroa.7.0.i.i, %3 and the optimization is looking for ( |
Good catch, it could be that. |
@arthurprs Can you open a new issue for this bug? |
Summary: Extends InstSimplify to handle both `x >=u x >> y` and `x >=u x udiv y`. This is a folloup of rL258422 and rust-lang/rust#30917 where llvm failed to optimize away the bounds checking in a binary search. Patch by Arthur Silva! Reviewers: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25941 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285228 91177308-0d34-0410-b5e6-96231b3b80d8
Just an update. llvm just committed a patch that should finally solve this https://github.com/llvm-project/llvm/commit/cf6e9a81f676b3e3885f86af704e834ef5c04264 |
Summary: Extends InstSimplify to handle both `x >=u x >> y` and `x >=u x udiv y`. This is a folloup of rL258422 and rust-lang/rust#30917 where llvm failed to optimize away the bounds checking in a binary search. Patch by Arthur Silva! Reviewers: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25941 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285228 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: Extends InstSimplify to handle both `x >=u x >> y` and `x >=u x udiv y`. This is a folloup of rL258422 and rust-lang/rust#30917 where llvm failed to optimize away the bounds checking in a binary search. Patch by Arthur Silva! Reviewers: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25941 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285228 91177308-0d34-0410-b5e6-96231b3b80d8
Avoid bounds checking for binary search. All calculated indexes are safe and the branch is useless.