Simplify #5104

Veetaha · 2020-06-27T22:30:58Z

No description provided.

matklad · 2020-06-28T01:04:38Z

crates/rust-analyzer/src/line_endings.rs

-
-        fn find_cr(src: &[u8]) -> Option<usize> {
-            src.iter().enumerate().find_map(|(idx, &b)| if b == b'\r' { Some(idx) } else { None })
+            src.iter().zip(src.iter().skip(1)).position(|it| it == (&b'\r', &b'\n'))


I think the original idea here was that find_cr should auto-vectorise easily (call into memchr, really), but apparently that's not the case :-(

https://godbolt.org/z/JH8yaK

For max performance, we should pull memchr from crates.io here, but we don't need max performance now

bors r+

https://godbolt.org/z/2j3urJ -- using a &str gives us memchr, as that is hard-coded in the stdlib.

I suppose slice iter should override the default implementations of find/position and friends, but for some reason I (didn't check it) it most probably uses the default impl with try_fold() trickery under the hood which rustc is not powerful enough to minimize...
Btw, this function is heuristic since I suppose there might be wide unicode characters where the code point of \r is one of their byte footprint

bors · 2020-06-28T01:11:58Z

Build succeeded:

matklad · 2020-06-28T01:39:37Z

utf-8 is a prefix free self-synchronising encoding, ascii character can't be confused with a part of wide character (all bytes of wide chars have the msb set)

…

On Sun, 28 Jun 2020 at 03:37, Veetaha ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In crates/rust-analyzer/src/line_endings.rs <#5104 (comment)> : > @@ -46,19 +46,7 @@ impl LineEndings { return (src, LineEndings::Dos); fn find_crlf(src: &[u8]) -> Option<usize> { - let mut search_idx = 0; - while let Some(idx) = find_cr(&src[search_idx..]) { - if src[search_idx..].get(idx + 1) != Some(&b'\n') { - search_idx += idx + 1; - continue; - } - return Some(search_idx + idx); - } - None - } - - fn find_cr(src: &[u8]) -> Option<usize> { - src.iter().enumerate().find_map(|(idx, &b)| if b == b'\r' { Some(idx) } else { None }) + src.iter().zip(src.iter().skip(1)).position(|it| it == (&b'\r', &b'\n')) I suppose slice iter should override the default implementations of find/position and friends, but for some reason I (didn't check it) it most probably uses the default impl with try_fold() trickery under the hood which rustc is not powerful enough to minimize... Btw, this function is heuristic since I suppose there might be wide unicode characters where the code point of \r is one of their byte footprint — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#5104 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AANB3M46FVHECR7J4B6ONLTRY2NGTANCNFSM4OKHE37A> .

Simplify

39a58ed

matklad reviewed Jun 28, 2020

View reviewed changes

bors bot merged commit 0e0fb81 into rust-lang:master Jun 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Simplify #5104

Simplify #5104

Uh oh!

Veetaha commented Jun 27, 2020

Uh oh!

matklad Jun 28, 2020

Uh oh!

matklad Jun 28, 2020

Uh oh!

Veetaha Jun 28, 2020

Uh oh!

bors bot commented Jun 28, 2020

Uh oh!

matklad commented Jun 28, 2020 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Simplify #5104

Simplify #5104

Uh oh!

Conversation

Veetaha commented Jun 27, 2020

Uh oh!

matklad Jun 28, 2020

Choose a reason for hiding this comment

Uh oh!

matklad Jun 28, 2020

Choose a reason for hiding this comment

Uh oh!

Veetaha Jun 28, 2020

Choose a reason for hiding this comment

Uh oh!

bors bot commented Jun 28, 2020

Uh oh!

matklad commented Jun 28, 2020 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants