Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upSlice equality is slow #16913
Comments
huonw
added
the
I-slow
label
Sep 1, 2014
This comment has been minimized.
This comment has been minimized.
|
the |
This comment has been minimized.
This comment has been minimized.
|
The call to match_case.i: ; preds = %"_ZN5slice57Items$LT$$x27a$C$$x20T$GT$.Iterator$LT$$BP$$x27a$x20T$GT$4next21h11506221690941188551E.exit"
%sret_slot.sroa.0.0.i.lcssa310 = phi i8* [ %sret_slot.sroa.0.0.i, %"_ZN5slice57Items$LT$$x27a$C$$x20T$GT$.Iterator$LT$$BP$$x27a$x20T$GT$4next21h11506221690941188551E.exit" ]
%60 = icmp eq i8* %sret_slot.sroa.0.0.i.lcssa310, null
br i1 %60, label %.noexc76, label %then-block-191-.i.loopexit309
match_case6.i: ; preds = %"_ZN5slice57Items$LT$$x27a$C$$x20T$GT$.Iterator$LT$$BP$$x27a$x20T$GT$4next21h11506221690941188551E.exit"
%61 = icmp eq i8* %sret_slot.sroa.0.0.i, null
br i1 %61, label %then-block-191-.i.loopexit, label %.noexc197
.noexc197: ; preds = %match_case6.i
%62 = bitcast i8* %sret_slot.sroa.0.0.i to i64*
%63 = bitcast i8* %sret_slot.sroa.0.0.i201 to i64*
%64 = load i64* %63, align 8
%65 = load i64* %62, align 8
%66 = icmp eq i64 %64, %65
br i1 %66, label %loop_body.i, label %then-block-191-.i.loopexitcc @zwarich -- Could the NullCheckElim pass handle that? That aside, even the
AFAICT, LLVM has no optimization to optimize loops to calls to memcmp, yet (it's a TODO in the LoopIdiomRecognize xform), so we probably lose out because of that. |
This comment has been minimized.
This comment has been minimized.
|
The custom_eq method can be improved slightly by using raw pointers (C++ style iteration). But even then, the distance to memcmp is vast. For u8 elements it's 10x. |
This comment has been minimized.
This comment has been minimized.
|
Hm, I don't see that much of a difference for u8 elements. The (unmodified) custom_eq is about as fast as memcp for me.
I implemented memcmp_eq like this: fn memcmp_eq<'a, T: PartialEq>(a: &'a [T], b: &'a [T]) -> bool {
if a.len() != b.len() {
return false;
}
unsafe {
rlibc::memcmp(a.as_ptr() as *const _, b.as_ptr() as *const _, a.len()) == 0
}
} |
This comment has been minimized.
This comment has been minimized.
|
try libc memcmp instead |
This comment has been minimized.
This comment has been minimized.
|
The LLVM loop idiom pass doesn't know how to generate |
thestinger
added
A-codegen
A-LLVM
labels
Oct 10, 2014
cmr
self-assigned this
Mar 25, 2015
This comment has been minimized.
This comment has been minimized.
|
Is this fixed by #26884 ? |
This comment has been minimized.
This comment has been minimized.
|
It seems to be fixed in the sense of the original reporter? But, slice equality still doesn't vectorize properly or compare well to glibc's memcmp (for byte slices), so improvements remain. |
This comment has been minimized.
This comment has been minimized.
|
@dotdash Not sure why it doesn't vectorize in nightly: https://play.rust-lang.org/?gist=38c5ef4ccf66898cc261&version=nightly pub fn compare(a: &[u8], b: &[u8]) -> bool {
a == b
} |
cmr
removed their assignment
Jan 5, 2016
This comment has been minimized.
This comment has been minimized.
|
As of today this is still true. |
This comment has been minimized.
This comment has been minimized.
|
Related bug in LLVM: https://llvm.org/bugs/show_bug.cgi?id=16332 Basically the problem is that the LLVM vectoriser does not know how to optimise loops with multiple exits or (which is the same) with termination guards that are not simple constraints on the index variable. |
saschagrunert
referenced this issue
Jan 31, 2017
Closed
Added compile time optimization for bytewise slice comparison #39422
This comment has been minimized.
This comment has been minimized.
|
Seems to be fixed now with some specialisation (the slow comparison, not LLVM bug)
|
This comment has been minimized.
This comment has been minimized.
|
Thanks @nagisa, closing! |
nham commentedSep 1, 2014
If I run the following code with
rustc -O --test src/slice_equality_slow.rs && ./slice_equality_slow --bench:Then I get:
I ran into this because I was able to speed up the naive string matching algorithm in
core::strby replacing the slice equality with aforloop and comparing each component manually.