Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor codegen for string search (i.e., memchr) with iterators #94573

Open
mcy opened this issue Mar 3, 2022 · 7 comments
Open

Poor codegen for string search (i.e., memchr) with iterators #94573

mcy opened this issue Mar 3, 2022 · 7 comments
Labels
C-discussion Category: Discussion or questions that doesn't represent real issues. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@mcy
Copy link
Contributor

mcy commented Mar 3, 2022

A colleague provided me with this benchmark, which compares the following equivalent C++ and Rust:

bool HasMoreThanTwoDashes(std::string_view sv) {
    return sv.find_first_of('-', sv.find_first_of('-')+1) != std::string_view::npos;
}
fn has_more_than_two_dashes(s: &[u8]) -> bool {
  sl.iter().filter(|&&c| c == b'-').count() > 2
}

C++ performs 20x better in the microbenchmark. We haven't tried very hard to figure out why, but our suspicion is that Rust's implementation of memchr is worse than the one provided by the system that ran the benchmark. C++ also bails out early, and Rust does not, but that appears not to matter because the dashes are at the far end of the strings.

We're not actually sure what CPU quick-bench ran this with, but I can run it on my (icelake, I think) Xeon later. I think the difference in memchr implementations is worth investigating regardless.

@mcy
Copy link
Contributor Author

mcy commented Mar 3, 2022

(I should mention that this used the newest stable rustc and clang as of writing: 1.59 and 13, resp.)

@gskt17
Copy link

gskt17 commented Mar 3, 2022

OK, I godbolted this and am working through the assembler code. It's definitely using SSE on -C opt-level=2 and above, though it doesn't look much like the implementation that I would expect; I would have thought it would just count the number of bytes equal to b'-', but I can't wrap my head around it at the moment. I'm certainly not seeing any subtraction or comparisons going on there.

@panstromek
Copy link
Contributor

panstromek commented Mar 3, 2022

Those functions seem different, am I missing something? C++ version returns true when there are at least 2 dashes, while Rust version returns true when there are at least 3 dashes.

@panstromek
Copy link
Contributor

Also, when first sv.find_first_of('-') returns npos (no dashes), +1 overflows and the whole string is searched again, which is interesting, but it should still return the correct result.

@SkiFire13
Copy link
Contributor

SkiFire13 commented Mar 4, 2022

Note that count will fully consume the iterator, while you probably want it to stop when the second b'-' is found. This should a tiny bit better, though it's still not as fast as it should be.

fn has_more_than_two_dashes(s: &[u8]) -> bool {
  sl.iter().filter(|&&c| c == b'-').nth(1).is_some()
}

I think C++'s find_first_of is specialized to use an optimized memchr, while the Rust version can't do that because filter takes a closure that could do anything, not just a plain equality check. This needs either a specialized method that takes an element to search for, as opposed to a closure (see also #84058 and #56345 for some past attempts), or something that acts like a plain equality closure and can be specialized upon.

Edit: not to say this is what you should do, but I find interesting that you can manually inline memchr in Rust, without using unsafe code, and slightly beat the C++ version.

@panstromek
Copy link
Contributor

There's also memchr crate, which was created to address similar performance issues.

@jyn514 jyn514 added I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Apr 9, 2023
@jyn514
Copy link
Member

jyn514 commented Apr 9, 2023

I am not convinced this should be considered a bug; like @panstromek said, if performance matters you can switch in memchr. And the Rust code is not quite equivalent to the C++ anyway, it should be something like s1[s1.iter().position('-')?..].iter().position('-').is_some().

@jyn514 jyn514 added the C-discussion Category: Discussion or questions that doesn't represent real issues. label Apr 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-discussion Category: Discussion or questions that doesn't represent real issues. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants