Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upOptimize string searching using two way search (WIP) #14135
Conversation
huonw
reviewed
May 12, 2014
| impl<'a> MatchIndices<'a> { | ||
| // This is split out into a separate function so that it will be duplicated, | ||
| // allowing there to be fewer branches in the loop. | ||
| #[inline(always)] |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
brson
May 12, 2014
Contributor
We also strongly discourage #[inline(always)] because it is easy to get wrong and make code much worse. This should be changed to #[inline] at the least.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
lifthrasiir
May 13, 2014
Contributor
I guess the force-inlining attribute is for making two copies of next_inner specialized for longPeriod (otherwise a hot loop will continuously test longPeriod). How about making an explicit macro to produce two copies and removing the attribute?
This comment has been minimized.
This comment has been minimized.
huonw
May 13, 2014
Member
Both are approaches are essentially equivalent, with equal problems (i.e. the problem with inline(always) is the code bloat it causes (a problem with a macro too), not something specific to the act of inlining).
This comment has been minimized.
This comment has been minimized.
gereeter
May 14, 2014
Author
Contributor
@lifthrasiir is correct - I was intending to specialize next_inner on longPeriod. I haven't gotten around to benchmarking the difference yet, but I assumed it was worthwhile given that glibc manually inlines and specializes both this and maximal_suffix (for which I'm using a similar trick). Regardless, I downgraded these to inline from inline(always) while refactoring the code, and it didn't seem to affect performance much - I think that it is inlining anyway, as it can easily see that both functions are only called twice.
huonw
reviewed
May 12, 2014
|
|
||
| // See if the right part of the needle matches | ||
| let start = if longPeriod { self.critPos } else { cmp::max(self.critPos, self.memory) }; | ||
| for i in range(start, needle.len()) { |
This comment has been minimized.
This comment has been minimized.
huonw
May 12, 2014
Member
Possibly faster by avoiding (some) bounds checks: for (i, needle_byte) in needle.iter().enumerate(). Or maybe even
let iter = needle.iter().zip(haystack.slice_from(self.position).iter());
for (i, (needle_byte,haystack_byte)) in iter.enumerate() { ... }
This comment has been minimized.
This comment has been minimized.
|
This is cool! How fast is it on the Pride & Prejudice benchmark I used in #14107? |
This comment has been minimized.
This comment has been minimized.
|
C:
Rust:
This code is far better than before, but it still needs work. |
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this pull request
May 14, 2014
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this pull request
May 15, 2014
This comment has been minimized.
This comment has been minimized.
brson
commented on 39cb5b1
May 16, 2014
|
r+ |
This comment has been minimized.
This comment has been minimized.
|
saw approval from brson |
This comment has been minimized.
This comment has been minimized.
|
merging gereeter/rust/two-way-search = 39cb5b1 into auto |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
fast-forwarding master to auto = cea4803 |
gereeter commentedMay 12, 2014
This changes the previously naive string searching algorithm to a two-way search like glibc, which should be faster on average while still maintaining worst case linear time complexity. This fixes #14107. Note that I don't think this should be merged yet, as this is the only approach to speeding up search I've tried - it's worth considering options like Boyer-Moore or adding a bad character shift table to this. However, the benchmarks look quite good so far:
Except for the case specifically designed to be optimal for the naive case (
bench_contains_equal), this gets as good or better performance as the previous code.