-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Simplify #5104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify #5104
Conversation
|
|
||
| fn find_cr(src: &[u8]) -> Option<usize> { | ||
| src.iter().enumerate().find_map(|(idx, &b)| if b == b'\r' { Some(idx) } else { None }) | ||
| src.iter().zip(src.iter().skip(1)).position(|it| it == (&b'\r', &b'\n')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the original idea here was that find_cr should auto-vectorise easily (call into memchr, really), but apparently that's not the case :-(
For max performance, we should pull memchr from crates.io here, but we don't need max performance now
bors r+
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://godbolt.org/z/2j3urJ -- using a &str gives us memchr, as that is hard-coded in the stdlib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose slice iter should override the default implementations of find/position and friends, but for some reason I (didn't check it) it most probably uses the default impl with try_fold() trickery under the hood which rustc is not powerful enough to minimize...
Btw, this function is heuristic since I suppose there might be wide unicode characters where the code point of \r is one of their byte footprint
|
utf-8 is a prefix free self-synchronising encoding, ascii character can't
be confused with a part of wide character (all bytes of wide chars have the
msb set)
…On Sun, 28 Jun 2020 at 03:37, Veetaha ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In crates/rust-analyzer/src/line_endings.rs
<#5104 (comment)>
:
> @@ -46,19 +46,7 @@ impl LineEndings {
return (src, LineEndings::Dos);
fn find_crlf(src: &[u8]) -> Option<usize> {
- let mut search_idx = 0;
- while let Some(idx) = find_cr(&src[search_idx..]) {
- if src[search_idx..].get(idx + 1) != Some(&b'\n') {
- search_idx += idx + 1;
- continue;
- }
- return Some(search_idx + idx);
- }
- None
- }
-
- fn find_cr(src: &[u8]) -> Option<usize> {
- src.iter().enumerate().find_map(|(idx, &b)| if b == b'\r' { Some(idx) } else { None })
+ src.iter().zip(src.iter().skip(1)).position(|it| it == (&b'\r', &b'\n'))
I suppose slice iter should override the default implementations of
find/position and friends, but for some reason I (didn't check it) it most
probably uses the default impl with try_fold() trickery under the hood
which rustc is not powerful enough to minimize...
Btw, this function is heuristic since I suppose there might be wide
unicode characters where the code point of \r is one of their byte
footprint
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#5104 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANB3M46FVHECR7J4B6ONLTRY2NGTANCNFSM4OKHE37A>
.
|
No description provided.