Use memchr for str::find(char) #46735

Manishearth · 2017-12-14T21:57:38Z

This is a 10x improvement for searching for characters.

This also contains the patches from #46713 . Feel free to land both separately or together.

Manishearth · 2017-12-14T21:59:46Z

I haven't really tested this much, there probably are failures. Will do a second pass at self-review once I know we pass all tests (from travis)

Manishearth · 2017-12-14T22:02:50Z

The memchr crate is even faster because it links to glibc's memchr, which uses SIMD and other fancy stuff. libcore can't link to this so to get these wins we'll have to do a SIMD impl ourselves.

#[bench]
fn find_char(b: &mut Bencher) {
    let x = test::black_box("Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.");
    b.iter(|| test::black_box(x.find('/')));
}


#[bench]
fn find_char_memchr(b: &mut Bencher) {
    let x = test::black_box("Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.");
    b.iter(|| test::black_box(memchr::memchr(b'/', x.as_bytes())));
}

Before:

running 2 tests
test find_char        ... bench:         593 ns/iter (+/- 201)
test find_char_memchr ... bench:           9 ns/iter (+/- 1)

After:

running 2 tests
test find_char        ... bench:          57 ns/iter (+/- 12)
test find_char_memchr ... bench:           9 ns/iter (+/- 1)

Manishearth · 2017-12-14T22:12:43Z

This does not bring improvements for multibyte chars or for str::find(str). We can bring improvement for these, but it's tricky.

For str when it starts with an ASCII char we can do similar stuff as here (and then use the original algorithm to finish the match.

When the thing we're searching for is not ASCII we can still search for the first byte. However for most UTF8 text the first byte will generally be pretty uniform; i.e. if it's Arabic text will usually be 0xD8 or 0xD9, Korean will be 0xEA, 0xEB, 0xEC, or 0xED, Devanagari is usually 0xE0, etc. This means that memchr will have lots of false positives; we'll get lots of hits on the first byte and then have to check the second byte. This amount of stutter will probably make memchr's (minor) fixed overhead significant, and destroy any perf gains which we may get.

Searching for the second byte or even better, the last byte, might work better. But I'm not sure if I want to write that code right now, and the tradeoffs are a bit trickier there :)

Manishearth · 2017-12-18T12:10:47Z

Bench numbers do not materially change with the UTF8 changes. I did come up with a pathological case of searching a Devanagari string for ä (which shares bytes) that ends up being 2x slower because every other character is a false positive hit (entirely negating memchr's win).

I think this pathological case is ok, it will only arise when mixing languages and for very specific characters.

I can check some form of these benchmarks into tree if y'all feel it necessary.

$ cargo bench
test find_char                            ... bench:         603 ns/iter (+/- 203)
test find_char_memchr                     ... bench:          10 ns/iter (+/- 3)
test find_multibyte_char_found            ... bench:         376 ns/iter (+/- 67)
test find_multibyte_char_notfound         ... bench:         618 ns/iter (+/- 129)
test find_multibyte_string_multibyte_char ... bench:         719 ns/iter (+/- 137)
test find_multibyte_string_pathological   ... bench:         620 ns/iter (+/- 98)

$ cargo +x-stage2 bench
test find_char                            ... bench:          67 ns/iter (+/- 45)
test find_char_memchr                     ... bench:          10 ns/iter (+/- 1)
test find_multibyte_char_found            ... bench:          50 ns/iter (+/- 12)
test find_multibyte_char_notfound         ... bench:          74 ns/iter (+/- 20)
test find_multibyte_string_multibyte_char ... bench:          74 ns/iter (+/- 20)
test find_multibyte_string_pathological   ... bench:       1,672 ns/iter (+/- 348)

Code:

#[bench]
fn find_char(b: &mut Bencher) {
    let x = test::black_box("Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.");
    b.iter(|| test::black_box(x.find('/')));
}

#[bench]
fn find_char_memchr(b: &mut Bencher) {
    let x = test::black_box("Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.");
    b.iter(|| test::black_box(memchr::memchr(b'/', x.as_bytes())));
}

#[bench]
fn find_multibyte_char_found(b: &mut Bencher) {
    let x = test::black_box("Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, ก remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.");
    b.iter(|| test::black_box(x.find('ก')));
}

#[bench]
fn find_multibyte_char_notfound(b: &mut Bencher) {
    let x = test::black_box("Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.");
    b.iter(|| test::black_box(x.find('ก')));
}

#[bench]
fn find_multibyte_string_multibyte_char(b: &mut Bencher) {
    let x = test::black_box("जलद कोल्हा आळशी कुत्रा वरुन उडी मारली, जलद कोल्हा आळशी कुत्रा वरुन उडी मारली, जलद कोल्हा आळशी कुत्रा वरुन उडी मारली, जलद कोल्हा आळशी कुत्रा वरुन उडी मारली, जलद कोल्हा आळशी कुत्रा वरुन उडी मारली, जलद कोल्हा आळशी कुत्रा वरुन उडी मारली, जलद कोल्हा आळशी कुत्रा वरुन उडी मारली");
    b.iter(|| test::black_box(x.find('ग'))); // not in the string
}

#[bench]
fn find_multibyte_string_pathological(b: &mut Bencher) {
    let x = test::black_box("जलद कोल्हा आळशी कुत्रा वरुन उडी मारली, जलद कोल्हा आळशी कुत्रा वरुन उडी मारली, जलद कोल्हा आळशी कुत्रा वरुन उडी मारली, जलद कोल्हा आळशी कुत्रा वरुन उडी मारली, जलद कोल्हा आळशी कुत्रा वरुन उडी मारली, जलद कोल्हा आळशी कुत्रा वरुन उडी मारली, जलद कोल्हा आळशी कुत्रा वरुन उडी मारली");
    b.iter(|| test::black_box(x.find('ä'))); // ä's last byte is found often in Devanagari text
}

Manishearth · 2017-12-18T12:15:34Z

If we really care about the pathological case it can be avoided by having some check in the loop that after X false positives falls back to regular "loop on next" behavior.

I don't think we should, though.

We could also write some monster SSE-enabled memchr that can search for up to 4 byte units. I'm not doing that.

Manishearth · 2017-12-21T00:19:23Z

@bors-servo try

bors · 2017-12-21T00:19:40Z

⌛ Trying commit 9b92a44 with merge afb0c20...

@mystor

Use memchr for str::find(char) This is a 10x improvement for searching for characters. This also contains the patches from #46713 . Feel free to land both separately or together. cc @mystor @alexcrichton r? @bluss fixes #46693

bors · 2017-12-21T01:49:05Z

☀️ Test successful - status-travis
State: approved= try=True

Manishearth · 2017-12-21T01:52:01Z

@rust-lang/infra could I get a perf.rlo diff result from this try push?

…

On Dec 20, 2017 8:49 PM, "bors" ***@***.***> wrote: ☀️ Test successful - status-travis <https://travis-ci.org/rust-lang/rust/builds/319461824?utm_source=github_status&utm_medium=notification> State: approved= try=True — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#46735 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABivSPUEZFw5P1CqObGiqo0cQOlmsL8Oks5tCbkpgaJpZM4RCuKw> .

nagisa · 2018-01-01T13:12:21Z

cc @Mark-Simulacrum ^

BurntSushi · 2018-01-01T14:09:46Z

When the thing we're searching for is not ASCII we can still search for the first byte. However for most UTF8 text the first byte will generally be pretty uniform; i.e. if it's Arabic text will usually be 0xD8 or 0xD9, Korean will be 0xEA, 0xEB, 0xEC, or 0xED, Devanagari is usually 0xE0, etc. This means that memchr will have lots of false positives; we'll get lots of hits on the first byte and then have to check the second byte. This amount of stutter will probably make memchr's (minor) fixed overhead significant, and destroy any perf gains which we may get.

Searching for the second byte or even better, the last byte, might work better. But I'm not sure if I want to write that code right now, and the tradeoffs are a bit trickier there :)

Searching for the last byte is indeed a better heuristic on UTF-8 than searching for the first byte. You'd be in good company (GNU grep does that). But the last byte is still arbitrary. This is why the regex crate ranks every byte in order of what it believes is rare. Leading UTF-8 bytes are considered common while trailing bytes aren't. But you also get things like "z is rarer than a," which it commonly is. So the memchr is applied to the rarest byte in the pattern. Of course, you still wind up with pathological cases when the frequency rank doesn't match the corpus, but this will always be true when using memchr without analyzing the haystack before hand (which obviously doesn't make sense in this specific domain of text search). That code is here: https://github.com/rust-lang/regex/blob/9c790659c4e83e3497c6f2d14a818b3a69654d5f/src/literals.rs#L379-L514

(To be clear, I think the frequency rank stuff is probably overkill for searching a single char and would probably just stick to the last byte. Different story if you tackled str::find(str) though. Do we really not already use memchr in str::find(str) though?)

BurntSushi · 2018-01-01T14:15:26Z

src/libcore/str/pattern.rs

+    #[inline]
+    fn next(&mut self) -> SearchStep {
+        let old_finger = self.finger;
+        let slice = unsafe { self.haystack.get_unchecked(old_finger..self.haystack.len()) };


Do the various bounds check elisions actually help here? I've tried eliding them in my own substring search algorithms and it meets with variable success.

I don't think they do, but I haven't checked and it seemed pretty easy to keep that invariant. I can check if you want.

My general position has been to not elide bounds checks unless I'm pretty sure that it matters. If it were me, I'd remove the unsafe. :)

I'll do some checking later this week, for now I'll land it.

BurntSushi · 2018-01-01T14:19:07Z

This LGTM! Nice work @Manishearth :-)

Manishearth · 2018-01-01T14:30:16Z

You'd be in good company (GNU grep does that).

yay :)

To be clear, I think the frequency rank stuff is probably overkill for searching a single char and would probably just stick to the last byte.

phew

that sounds trickier to get right 😄

Do we really not already use memchr in str::find(str) though?

Yeah, we do an interesting but non-memchry algorithm. I considered retrofitting the existing memchr'd .find(char) into .find(str) but that would mean losing the existing algorithm which means the wins are iffier (not to mention that memchr has very little wins if you're stuttering the algorithm all the time, which is far likelier with a .find(str) built on top of .find(char))

This LGTM! Nice work @Manishearth :-)

can this be landed r=you? I've made a small mistake which I need to rectify, aside from that it seems basically ready. Or should we wait for second review?

BurntSushi · 2018-01-01T14:50:49Z

@Manishearth Yeah r=me sounds great.

Mark-Simulacrum · 2018-01-01T16:21:22Z

Perf queued; in the future please ping me directly.

Manishearth · 2018-01-01T16:40:00Z

@bors r=burntsushi

bors · 2018-01-01T16:40:01Z

📌 Commit 5cf5516 has been approved by burntsushi

bors · 2018-01-01T19:04:40Z

⌛ Testing commit 5cf5516 with merge b65f0be...

@mystor

Use memchr for str::find(char) This is a 10x improvement for searching for characters. This also contains the patches from #46713 . Feel free to land both separately or together. cc @mystor @alexcrichton r? @bluss fixes #46693

bors · 2018-01-01T21:47:07Z

☀️ Test successful - status-appveyor, status-travis
Approved by: burntsushi
Pushing b65f0be to master...

jesseschalken · 2018-02-22T09:41:43Z

src/libcore/slice/memchr.rs

+    rep
+}
+
+/// Return the first index matching the byte `a` in `text`.


a is meant to be x?

yeah, fixing

someone fixed it already

Manishearth added 7 commits December 13, 2017 01:15

Move rust memchr impl to libcore

2bf0df7

Use memchr in [u8]::contains

f8f2888

Support 16 bit platforms

1d818a4

Remove the unused ascii_only field in CharEqSearcher

4550ea7

Split out char searcher from MultiCharSearcher

72cab5e

Move CharSearcher to its own section in the file

585ad9f

Fill in forward searcher impl for char

d9dc44a

rust-highfive assigned bluss Dec 14, 2017

This was referenced Dec 14, 2017

Use memchr to speed up [u8]::contains 3x #46713

Merged

str::find(char) is slower than it ought ot be #46693

Closed

kennytm added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Dec 15, 2017

Fill in reverse searcher impl for char

f865164

Manishearth force-pushed the memchr-find branch from 93216f1 to f865164 Compare December 16, 2017 20:06

Add memchr search support for multibyte characters

75c07a3

Manishearth force-pushed the memchr-find branch from 876e2a1 to 75c07a3 Compare December 18, 2017 09:59

Manishearth added 2 commits December 18, 2017 03:47

Add simple test for pattern API

efcc447

Add simple search test for pattern API

bc55355

Add stresstests for shared bytes for pattern API

9b92a44

mystor mentioned this pull request Dec 18, 2017

Support meaningful spans in the stable version of proc-macro2 dtolnay/proc-macro2#36

Merged

Pass tidy for tests

85919a0

Manishearth force-pushed the memchr-find branch from b6f2d90 to 85919a0 Compare December 25, 2017 09:11

BurntSushi reviewed Jan 1, 2018

View reviewed changes

handle overflow/underflow in index offsets

5cf5516

rust-lang deleted a comment from BubbaSheen Jan 1, 2018

bors merged commit 5cf5516 into rust-lang:master Jan 1, 2018

kennytm mentioned this pull request Jan 4, 2018

Memory corruption (?) from tower-grpc-build on nightly #47175

Closed

jesseschalken reviewed Feb 22, 2018

View reviewed changes

Manishearth deleted the memchr-find branch February 22, 2018 17:16

killercup mentioned this pull request Jul 23, 2018

Change single char str patterns to chars #52646

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use memchr for str::find(char) #46735

Use memchr for str::find(char) #46735

Manishearth commented Dec 14, 2017 •

edited

Manishearth commented Dec 14, 2017

Manishearth commented Dec 14, 2017

Manishearth commented Dec 14, 2017

Manishearth commented Dec 18, 2017 •

edited

Manishearth commented Dec 18, 2017 •

edited

Manishearth commented Dec 21, 2017

bors commented Dec 21, 2017

bors commented Dec 21, 2017

Manishearth commented Dec 21, 2017 via email

nagisa commented Jan 1, 2018

BurntSushi commented Jan 1, 2018 •

edited

BurntSushi Jan 1, 2018

Manishearth Jan 1, 2018

BurntSushi Jan 1, 2018

Manishearth Jan 1, 2018

BurntSushi commented Jan 1, 2018

Manishearth commented Jan 1, 2018

BurntSushi commented Jan 1, 2018

Mark-Simulacrum commented Jan 1, 2018

Manishearth commented Jan 1, 2018

bors commented Jan 1, 2018

bors commented Jan 1, 2018

bors commented Jan 1, 2018

jesseschalken Feb 22, 2018

Manishearth Feb 22, 2018

Manishearth Feb 22, 2018

Use memchr for str::find(char) #46735

Use memchr for str::find(char) #46735

Conversation

Manishearth commented Dec 14, 2017 • edited

Manishearth commented Dec 14, 2017

Manishearth commented Dec 14, 2017

Manishearth commented Dec 14, 2017

Manishearth commented Dec 18, 2017 • edited

Manishearth commented Dec 18, 2017 • edited

Manishearth commented Dec 21, 2017

bors commented Dec 21, 2017

bors commented Dec 21, 2017

Manishearth commented Dec 21, 2017 via email

nagisa commented Jan 1, 2018

BurntSushi commented Jan 1, 2018 • edited

BurntSushi Jan 1, 2018

Choose a reason for hiding this comment

Manishearth Jan 1, 2018

Choose a reason for hiding this comment

BurntSushi Jan 1, 2018

Choose a reason for hiding this comment

Manishearth Jan 1, 2018

Choose a reason for hiding this comment

BurntSushi commented Jan 1, 2018

Manishearth commented Jan 1, 2018

BurntSushi commented Jan 1, 2018

Mark-Simulacrum commented Jan 1, 2018

Manishearth commented Jan 1, 2018

bors commented Jan 1, 2018

bors commented Jan 1, 2018

bors commented Jan 1, 2018

jesseschalken Feb 22, 2018

Choose a reason for hiding this comment

Manishearth Feb 22, 2018

Choose a reason for hiding this comment

Manishearth Feb 22, 2018

Choose a reason for hiding this comment

Manishearth commented Dec 14, 2017 •

edited

Manishearth commented Dec 18, 2017 •

edited

Manishearth commented Dec 18, 2017 •

edited

BurntSushi commented Jan 1, 2018 •

edited