Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize string searching using two way search (WIP) #14135

Merged
merged 2 commits into from May 16, 2014

Conversation

Projects
None yet
5 participants
@gereeter
Copy link
Contributor

gereeter commented May 12, 2014

This changes the previously naive string searching algorithm to a two-way search like glibc, which should be faster on average while still maintaining worst case linear time complexity. This fixes #14107. Note that I don't think this should be merged yet, as this is the only approach to speeding up search I've tried - it's worth considering options like Boyer-Moore or adding a bad character shift table to this. However, the benchmarks look quite good so far:

test str::bench::bench_contains_bad_naive                   ... bench:       290 ns/iter (+/- 12)     from 1309 ns/iter (+/- 36)
test str::bench::bench_contains_equal                       ... bench:       479 ns/iter (+/- 10)     from  137 ns/iter (+/- 2)
test str::bench::bench_contains_short_long                  ... bench:      2844 ns/iter (+/- 105)    from 5473 ns/iter (+/- 14)
test str::bench::bench_contains_short_short                 ... bench:        55 ns/iter (+/- 4)      from   57 ns/iter (+/- 6)

Except for the case specifically designed to be optimal for the naive case (bench_contains_equal), this gets as good or better performance as the previous code.

impl<'a> MatchIndices<'a> {
// This is split out into a separate function so that it will be duplicated,
// allowing there to be fewer branches in the loop.
#[inline(always)]

This comment has been minimized.

@huonw

huonw May 12, 2014

Member

This is quite a large function, does inlining actually make it faster?

This comment has been minimized.

@brson

brson May 12, 2014

Contributor

We also strongly discourage #[inline(always)] because it is easy to get wrong and make code much worse. This should be changed to #[inline] at the least.

This comment has been minimized.

@brson

brson May 12, 2014

Contributor

But also it should not be inline without evidence.

This comment has been minimized.

@lifthrasiir

lifthrasiir May 13, 2014

Contributor

I guess the force-inlining attribute is for making two copies of next_inner specialized for longPeriod (otherwise a hot loop will continuously test longPeriod). How about making an explicit macro to produce two copies and removing the attribute?

This comment has been minimized.

@huonw

huonw May 13, 2014

Member

Both are approaches are essentially equivalent, with equal problems (i.e. the problem with inline(always) is the code bloat it causes (a problem with a macro too), not something specific to the act of inlining).

This comment has been minimized.

@gereeter

gereeter May 14, 2014

Author Contributor

@lifthrasiir is correct - I was intending to specialize next_inner on longPeriod. I haven't gotten around to benchmarking the difference yet, but I assumed it was worthwhile given that glibc manually inlines and specializes both this and maximal_suffix (for which I'm using a similar trick). Regardless, I downgraded these to inline from inline(always) while refactoring the code, and it didn't seem to affect performance much - I think that it is inlining anyway, as it can easily see that both functions are only called twice.


// See if the right part of the needle matches
let start = if longPeriod { self.critPos } else { cmp::max(self.critPos, self.memory) };
for i in range(start, needle.len()) {

This comment has been minimized.

@huonw

huonw May 12, 2014

Member

Possibly faster by avoiding (some) bounds checks: for (i, needle_byte) in needle.iter().enumerate(). Or maybe even

let iter = needle.iter().zip(haystack.slice_from(self.position).iter());

for (i, (needle_byte,haystack_byte)) in iter.enumerate() { ... }
@huonw

This comment has been minimized.

Copy link
Member

huonw commented May 12, 2014

This is cool!

How fast is it on the Pride & Prejudice benchmark I used in #14107?

@gereeter

This comment has been minimized.

Copy link
Contributor Author

gereeter commented May 12, 2014

C:

real    0m0.093s
user    0m0.081s
sys 0m0.012s

Rust:

real    0m0.241s
user    0m0.241s
sys 0m0.000s

This code is far better than before, but it still needs work.

Jonathan S
Added substring searching benchmarks.
test str::bench::bench_contains_bad_naive                   ... bench:      1309 ns/iter (+/- 36)
test str::bench::bench_contains_equal                       ... bench:       137 ns/iter (+/- 2)
test str::bench::bench_contains_short_long                  ... bench:      5473 ns/iter (+/- 14)
test str::bench::bench_contains_short_short                 ... bench:        57 ns/iter (+/- 6)

alexcrichton added a commit to alexcrichton/rust that referenced this pull request May 14, 2014

Test fixes from rollup
Closes rust-lang#14210 (Make Vec.truncate() resilient against failure in Drop)
Closes rust-lang#14206 (Register new snapshots)
Closes rust-lang#14205 (use sched_yield on linux and freebsd)
Closes rust-lang#14204 (Add a crate for missing stubs from libcore)
Closes rust-lang#14203 (shootout-mandelbrot: Either 10-20% or 80-100% improvement.)
Closes rust-lang#14201 (Render not_found with an absolute path to the rust stylesheet)
Closes rust-lang#14198 (update valgrind headers)
Closes rust-lang#14174 (Optimize common path of Once::doit)
Closes rust-lang#14162 (Print 'rustc' and 'rustdoc' as the command name for --version)
Closes rust-lang#14145 (Better strict version hash (SVH) computation)
Closes rust-lang#14135 (Optimize string searching using two way search (WIP))
Closes rust-lang#14133 (define Eq,TotalEq,Ord,TotalOrd for &mut T)
Closes rust-lang#14121 (Make some NullablePointer enums FFI-compatible with the base pointer type.)

alexcrichton added a commit to alexcrichton/rust that referenced this pull request May 15, 2014

Test fixes from rollup
Closes rust-lang#14210 (Make Vec.truncate() resilient against failure in Drop)
Closes rust-lang#14206 (Register new snapshots)
Closes rust-lang#14205 (use sched_yield on linux and freebsd)
Closes rust-lang#14204 (Add a crate for missing stubs from libcore)
Closes rust-lang#14203 (shootout-mandelbrot: Either 10-20% or 80-100% improvement.)
Closes rust-lang#14201 (Render not_found with an absolute path to the rust stylesheet)
Closes rust-lang#14198 (update valgrind headers)
Closes rust-lang#14174 (Optimize common path of Once::doit)
Closes rust-lang#14162 (Print 'rustc' and 'rustdoc' as the command name for --version)
Closes rust-lang#14145 (Better strict version hash (SVH) computation)
Closes rust-lang#14135 (Optimize string searching using two way search (WIP))
Closes rust-lang#14133 (define Eq,TotalEq,Ord,TotalOrd for &mut T)
Closes rust-lang#14121 (Make some NullablePointer enums FFI-compatible with the base pointer type.)
Jonathan S
Switched to the two-way algorithm for string searching
test str::bench::bench_contains_bad_naive                   ... bench:       300 ns/iter (+/- 12)     from 1309 ns/iter (+/- 36)
test str::bench::bench_contains_equal                       ... bench:       154 ns/iter (+/- 7)      from  137 ns/iter (+/- 2)
test str::bench::bench_contains_short_long                  ... bench:      2998 ns/iter (+/- 74)     from 5473 ns/iter (+/- 14)
test str::bench::bench_contains_short_short                 ... bench:        65 ns/iter (+/- 2)      from   57 ns/iter (+/- 6)
@brson

This comment has been minimized.

Copy link

brson commented on 39cb5b1 May 16, 2014

r+

@bors

This comment has been minimized.

Copy link
Contributor

bors commented on 39cb5b1 May 16, 2014

saw approval from brson
at gereeter@39cb5b1

This comment has been minimized.

Copy link
Contributor

bors replied May 16, 2014

merging gereeter/rust/two-way-search = 39cb5b1 into auto

This comment has been minimized.

Copy link
Contributor

bors replied May 16, 2014

gereeter/rust/two-way-search = 39cb5b1 merged ok, testing candidate = cea4803

This comment has been minimized.

Copy link
Contributor

bors replied May 16, 2014

fast-forwarding master to auto = cea4803

bors added a commit that referenced this pull request May 16, 2014

auto merge of #14135 : gereeter/rust/two-way-search, r=brson
This changes the previously naive string searching algorithm to a two-way search like glibc, which should be faster on average while still maintaining worst case linear time complexity. This fixes #14107. Note that I don't think this should be merged yet, as this is the only approach to speeding up search I've tried - it's worth considering options like Boyer-Moore or adding a bad character shift table to this. However, the benchmarks look quite good so far:

    test str::bench::bench_contains_bad_naive                   ... bench:       290 ns/iter (+/- 12)     from 1309 ns/iter (+/- 36)
    test str::bench::bench_contains_equal                       ... bench:       479 ns/iter (+/- 10)     from  137 ns/iter (+/- 2)
    test str::bench::bench_contains_short_long                  ... bench:      2844 ns/iter (+/- 105)    from 5473 ns/iter (+/- 14)
    test str::bench::bench_contains_short_short                 ... bench:        55 ns/iter (+/- 4)      from   57 ns/iter (+/- 6)

Except for the case specifically designed to be optimal for the naive case (`bench_contains_equal`), this gets as good or better performance as the previous code.

@bors bors closed this May 16, 2014

@bors bors merged commit 39cb5b1 into rust-lang:master May 16, 2014

2 checks passed

continuous-integration/travis-ci The Travis CI build passed
Details
default all tests passed

@gereeter gereeter deleted the gereeter:two-way-search branch Dec 17, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.