Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

str::is_char_boundary - slight optimization #84751

Merged
merged 2 commits into from
May 15, 2021

Conversation

Soveu
Copy link
Contributor

@Soveu Soveu commented Apr 30, 2021

Current str::is_char_boundary implementation emits slightly more instructions, because it includes an additional branch for index == s.len()

pub fn is_char_boundary(s: &str, index: usize) -> bool {
    if index == 0 || index == s.len() {
        return true;
    }
    match s.as_bytes().get(index) {
        None => false,
        Some(&b) => (b as i8) >= -0x40,
    }
}

Just changing the place of index == s.len() merges it with index < s.len() from s.as_bytes().get(index)

pub fn is_char_boundary2(s: &str, index: usize) -> bool {
    if index == 0 {
        return true;
    }

    match s.as_bytes().get(index) {
        // For some reason, LLVM likes this comparison here more
        None => index == s.len(),
        // This is bit magic equivalent to: b < 128 || b >= 192
        Some(&b) => (b as i8) >= -0x40,
    }
}

This one has better codegen on every platform, except powerpc

x86 codegen

example::is_char_boundary:
        mov     al, 1
        test    rdx, rdx
        je      .LBB0_5
        cmp     rsi, rdx
        je      .LBB0_5
        cmp     rsi, rdx
        jbe     .LBB0_3
        cmp     byte ptr [rdi + rdx], -65
        setg    al
.LBB0_5:
        ret
.LBB0_3:
        xor     eax, eax
        ret

example::is_char_boundary2:
        test    rdx, rdx
        je      .LBB1_1
        cmp     rsi, rdx
        jbe     .LBB1_4
        cmp     byte ptr [rdi + rdx], -65
        setg    al
        ret
.LBB1_1:  ; technically this branch is the same as LBB1_4
        mov     al, 1
        ret
.LBB1_4:
        sete    al
        ret

aarch64 codegen

example::is_char_boundary:
        mov     x8, x0
        mov     w0, #1
        cbz     x2, .LBB0_4
        cmp     x1, x2
        b.eq    .LBB0_4
        b.ls    .LBB0_5
        ldrsb   w8, [x8, x2]
        cmn     w8, #65
        cset    w0, gt
.LBB0_4:
        ret
.LBB0_5:
        mov     w0, wzr
        ret

example::is_char_boundary2:
        cbz     x2, .LBB1_3
        cmp     x1, x2
        b.ls    .LBB1_4
        ldrsb   w8, [x0, x2]
        cmn     w8, #65
        cset    w0, gt
        ret
.LBB1_3:
        mov     w0, #1
        ret
.LBB1_4:
        cset    w0, eq
        ret

riscv64gc codegen

example::is_char_boundary:
seqz a3, a2
xor a4, a1, a2
seqz a4, a4
or a4, a4, a3
addi a3, zero, 1
bnez a4, .LBB0_3
bgeu a2, a1, .LBB0_4
add a0, a0, a2
lb a0, 0(a0)
addi a1, zero, -65
slt a3, a1, a0
.LBB0_3:
mv a0, a3
ret
.LBB0_4:
mv a0, zero
ret

example::is_char_boundary2:
beqz a2, .LBB1_3
bgeu a2, a1, .LBB1_4
add a0, a0, a2
lb a0, 0(a0)
addi a1, zero, -65
slt a0, a1, a0
ret
.LBB1_3:
addi a0, zero, 1
ret
.LBB1_4:
xor a0, a1, a2
seqz a0, a0
ret

Link to godbolt

@rustbot label: A-codegen

@rustbot rustbot added the A-codegen Area: Code generation label Apr 30, 2021
@rust-highfive
Copy link
Collaborator

r? @yaahc

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Apr 30, 2021
@jyn514 jyn514 added I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs Relevant to the library team, which will review and decide on the PR/issue. and removed A-codegen Area: Code generation labels Apr 30, 2021
@jyn514
Copy link
Member

jyn514 commented Apr 30, 2021

@Soveu fyi - A-codegen is usually used for the codegen part of the compiler itself, not for changes to the standard library that affect performance.

@Soveu
Copy link
Contributor Author

Soveu commented Apr 30, 2021

btw, can I add I-slow label myself or only members can do it?

@jyn514
Copy link
Member

jyn514 commented Apr 30, 2021

You should be able to - try pinging rustbot with label: +I-slow?

@Soveu
Copy link
Contributor Author

Soveu commented Apr 30, 2021

I was thinking all the time that only members can add those fancy red I-something labels 👀
@rustbot label: +I-slow

@klensy
Copy link
Contributor

klensy commented Apr 30, 2021

Slightly less cryptic version https://godbolt.org/z/Y9o86ch6h

pub fn is_char_boundary(s: &str, index: usize) -> bool {
    match s.as_bytes().get(index) {
        None => false,
        Some(_) if index == 0 || index == s.len() => true,
        Some(&b) => (b as i8) >= -0x40,
    }
}

but this swaps order of cmp and test, better benchmark all versions to check actual difference.

@Soveu
Copy link
Contributor Author

Soveu commented Apr 30, 2021

Slightly less cryptic version https://godbolt.org/z/Y9o86ch6h

pub fn is_char_boundary(s: &str, index: usize) -> bool {
    match s.as_bytes().get(index) {
        None => false,
        Some(_) if index == 0 || index == s.len() => true,
        Some(&b) => (b as i8) >= -0x40,
    }
}

but this swaps order of cmp and test, better benchmark all versions to check actual difference.

index == s.len() will never fire if get(index) returned Some

@klensy
Copy link
Contributor

klensy commented Apr 30, 2021

index == s.len() will never fire if get(index) returned Some

Upps

pub fn is_char_boundary(s: &str, index: usize) -> bool {
    match s.as_bytes().get(index) {
        None if index == s.len() => true,
        None => false,
        Some(_) if index == 0 => true,
        Some(&b) => (b as i8) >= -0x40,
    }
}

But this looks bad.

@Soveu
Copy link
Contributor Author

Soveu commented Apr 30, 2021

I will soon add some comments

@Soveu
Copy link
Contributor Author

Soveu commented Apr 30, 2021

Plus, I want to test how it affects s.get(..i) and s.get(i..) which rely heavily on returning early if index == 0 or index == s.len() (and LLVM actually optimizing it)

@Soveu
Copy link
Contributor Author

Soveu commented Apr 30, 2021

Both on opt-level=3 and opt-level=2 there is only one bound check in get(..i) and get(i..).
On opt-level=1 it's more complicated:

  • functions aren't inlined by default, so no cross-function optimizations can occur
  • when forcing it via #[inline(always)], the new implementation loses at get(i..), because the index == s.len() is "hidden" after s.as_bytes().get(i)

@klensy
Copy link
Contributor

klensy commented May 1, 2021

pub fn is_char_boundary(s: &str, index: usize) -> bool {
    if index == 0 {
        true
    } else if index < s.len() {
        unsafe { *s.as_bytes().get_unchecked(index) as i8 >= -0x40 }
    } else if index == s.len() {
        true
    } else {
        false
    }
}

@Soveu
Copy link
Contributor Author

Soveu commented May 12, 2021

@yaahc ?

@Amanieu
Copy link
Member

Amanieu commented May 14, 2021

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 14, 2021
@bors
Copy link
Contributor

bors commented May 14, 2021

⌛ Trying commit 7bd9d9f with merge b8c3fd3abbd324a8ce3163e2d72ad2da5bb38d83...

@bors
Copy link
Contributor

bors commented May 14, 2021

☀️ Try build successful - checks-actions
Build commit: b8c3fd3abbd324a8ce3163e2d72ad2da5bb38d83 (b8c3fd3abbd324a8ce3163e2d72ad2da5bb38d83)

@rust-timer
Copy link
Collaborator

Queued b8c3fd3abbd324a8ce3163e2d72ad2da5bb38d83 with parent 69b352e, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (b8c3fd3abbd324a8ce3163e2d72ad2da5bb38d83): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 14, 2021
@Amanieu
Copy link
Member

Amanieu commented May 14, 2021

Perf changes seem to be in the noise.

@bors r+ rollup=maybe

@bors
Copy link
Contributor

bors commented May 14, 2021

📌 Commit 7bd9d9f has been approved by Amanieu

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 14, 2021
@Soveu
Copy link
Contributor Author

Soveu commented May 14, 2021

Perf changes seem to be in the noise.

I was kinda expecting it, it's not a huge thing. What about binary size?

bors added a commit to rust-lang-ci/rust that referenced this pull request May 15, 2021
…laumeGomez

Rollup of 4 pull requests

Successful merges:

 - rust-lang#84751 (str::is_char_boundary - slight optimization)
 - rust-lang#85185 (Generate not more docs than necessary)
 - rust-lang#85324 (Warn about unused `pub` fields in non-`pub` structs)
 - rust-lang#85329 (fix version_str comment)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 62b834f into rust-lang:master May 15, 2021
@rustbot rustbot added this to the 1.54.0 milestone May 15, 2021
@matthieu-m
Copy link

@klensy I'm surprised by those constructs:

        None if index == s.len() => true,
        None => false,
    if index == s.len() {
        true
    } else {
        false
    }

Naively, I would expect to just return index == s.len() rather branching on it to select true or false.

Is there any particular reason for the unusual branch?

@klensy
Copy link
Contributor

klensy commented May 21, 2021

@matthieu-m

All that if branches is straightforward walk over index, starting from 0 and up, checking current state, but don't require to think about None => index == self.len() trick. Plus, that compiles to exactly the same asm, if i remember that correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-slow Issue: Problems and improvements with respect to performance of generated code. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants