Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rustc_span][perf] Remove unnecessary string joins and allocs. #114351

Merged
merged 1 commit into from Aug 4, 2023

Conversation

ttsugriy
Copy link
Contributor

@ttsugriy ttsugriy commented Aug 2, 2023

Comparing vectors of string parts yields the same result but avoids unnecessary join and potential allocation for resulting String. This code is cold so it's unlikely to have any measurable impact, but considering but since it's also simpler, why not? :)

Comparing vectors of string parts yields the same result but avoids
unnecessary `join` and potential allocation for resulting `String`.
This code is cold so it's unlikely to have any measurable impact, but
considering but since it's also simpler, why not? :)
@rustbot
Copy link
Collaborator

rustbot commented Aug 2, 2023

r? @fee1-dead

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Aug 2, 2023
@lqd
Copy link
Member

lqd commented Aug 2, 2023

Do you know if the call site immediately above

fn find_match_by_sorted_words(iter_names: &[Symbol], lookup: &str) -> Option<Symbol> {
iter_names.iter().fold(None, |result, candidate| {
if sort_by_words(candidate.as_str()) == sort_by_words(lookup) {
Some(*candidate)
} else {
result
}
})
}
fn sort_by_words(name: &str) -> String {
is actually calling sort_by_words once per iteration for the same lookup input or is it LICMed away ?

@ttsugriy
Copy link
Contributor Author

ttsugriy commented Aug 2, 2023

Do you know if the call site immediately above

fn find_match_by_sorted_words(iter_names: &[Symbol], lookup: &str) -> Option<Symbol> {
iter_names.iter().fold(None, |result, candidate| {
if sort_by_words(candidate.as_str()) == sort_by_words(lookup) {
Some(*candidate)
} else {
result
}
})
}
fn sort_by_words(name: &str) -> String {

is actually calling sort_by_words once per iteration for the same lookup input or is it LICMed away ?

I assumed that this fold would be desugared into a loop and compiler wouldn't have much difficulties to hoist this computation out of the loop. But apparently I was wrong according to https://godbolt.org/z/frs8Kj1rq
In particular the left version uses:

.LBB16_3:
        mov     rbx, qword ptr [r13]
        mov     r14, qword ptr [r13 + 8]
        lea     rdi, [rsp + 40]
        mov     rsi, rbx
        mov     rdx, r14
        call    example::sort_by_words
        lea     rdi, [rsp + 64]
        mov     rsi, qword ptr [rsp + 8]
        mov     rdx, qword ptr [rsp + 16]
        call    example::sort_by_words
        mov     rdi, qword ptr [rsp + 40]
        mov     rdx, qword ptr [rsp + 56]
        mov     rsi, qword ptr [rsp + 64]
        cmp     rdx, qword ptr [rsp + 80]
        mov     qword ptr [rsp + 32], rdi
        mov     qword ptr [rsp + 24], rsi
        jne     .LBB16_5
        call    qword ptr [rip + bcmp@GOTPCREL]
        test    eax, eax
        sete    al
        mov     dword ptr [rsp + 4], eax
        mov     rsi, qword ptr [rsp + 72]
        test    rsi, rsi
        jne     .LBB16_8
        jmp     .LBB16_9

2 calls to sort_by_words and the right side only a single one:

.LBB16_3:
        mov     r13, qword ptr [r15]
        mov     r14, qword ptr [r15 + 8]
        lea     rdi, [rsp + 64]
        mov     rsi, r13
        mov     rdx, r14
        call    example::sort_by_words
        mov     rdi, qword ptr [rsp + 64]
        mov     rdx, qword ptr [rsp + 16]
        cmp     qword ptr [rsp + 80], rdx
        mov     qword ptr [rsp + 32], rdi
        jne     .LBB16_5
        mov     rsi, qword ptr [rsp + 8]
        call    qword ptr [rip + bcmp@GOTPCREL]
        test    eax, eax
        sete    bpl
        mov     rsi, qword ptr [rsp + 72]
        test    rsi, rsi
        jne     .LBB16_8
        jmp     .LBB16_9

Amusingly the total number of assembly lines is the same, but one of the versions clearly seems to be more performant :)

In any case, since hosting is not related to current change, I'll send a separate PR once this one is merged.

ttsugriy added a commit to ttsugriy/rust that referenced this pull request Aug 3, 2023
@lqd commented on rust-lang#114351 asking
if `sort_by_words(lookup)` is computed repeatedly. I was assuming that
rustc should have no difficulties to hoist it automatically outside of
the loop to avoid repeated pure computation, but according to
 https://godbolt.org/z/frs8Kj1rq it seems like I was wrong:
original version seems to have 2 calls per loop iteration
```
.LBB16_3:
        mov     rbx, qword ptr [r13]
        mov     r14, qword ptr [r13 + 8]
        lea     rdi, [rsp + 40]
        mov     rsi, rbx
        mov     rdx, r14
        call    example::sort_by_words
        lea     rdi, [rsp + 64]
        mov     rsi, qword ptr [rsp + 8]
        mov     rdx, qword ptr [rsp + 16]
        call    example::sort_by_words
        mov     rdi, qword ptr [rsp + 40]
        mov     rdx, qword ptr [rsp + 56]
        mov     rsi, qword ptr [rsp + 64]
        cmp     rdx, qword ptr [rsp + 80]
        mov     qword ptr [rsp + 32], rdi
        mov     qword ptr [rsp + 24], rsi
        jne     .LBB16_5
        call    qword ptr [rip + bcmp@GOTPCREL]
        test    eax, eax
        sete    al
        mov     dword ptr [rsp + 4], eax
        mov     rsi, qword ptr [rsp + 72]
        test    rsi, rsi
        jne     .LBB16_8
        jmp     .LBB16_9
```
but the manually hoisted version just 1:
```
.LBB16_3:
        mov     r13, qword ptr [r15]
        mov     r14, qword ptr [r15 + 8]
        lea     rdi, [rsp + 64]
        mov     rsi, r13
        mov     rdx, r14
        call    example::sort_by_words
        mov     rdi, qword ptr [rsp + 64]
        mov     rdx, qword ptr [rsp + 16]
        cmp     qword ptr [rsp + 80], rdx
        mov     qword ptr [rsp + 32], rdi
        jne     .LBB16_5
        mov     rsi, qword ptr [rsp + 8]
        call    qword ptr [rip + bcmp@GOTPCREL]
        test    eax, eax
        sete    bpl
        mov     rsi, qword ptr [rsp + 72]
        test    rsi, rsi
        jne     .LBB16_8
        jmp     .LBB16_9
```
This code is probably not very hot, but there is no reason to leave
such a low hanging fruit.
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Aug 3, 2023
[rustc_span][perf] Hoist lookup sorted by words out of the loop.

`@lqd` commented on rust-lang#114351 asking if `sort_by_words(lookup)` is computed repeatedly. I was assuming that rustc should have no difficulties to hoist it automatically outside of the loop to avoid repeated pure computation, but according to
 https://godbolt.org/z/frs8Kj1rq it seems like I was wrong:
original version seems to have 2 calls per loop iteration
```
.LBB16_3:
        mov     rbx, qword ptr [r13]
        mov     r14, qword ptr [r13 + 8]
        lea     rdi, [rsp + 40]
        mov     rsi, rbx
        mov     rdx, r14
        call    example::sort_by_words
        lea     rdi, [rsp + 64]
        mov     rsi, qword ptr [rsp + 8]
        mov     rdx, qword ptr [rsp + 16]
        call    example::sort_by_words
        mov     rdi, qword ptr [rsp + 40]
        mov     rdx, qword ptr [rsp + 56]
        mov     rsi, qword ptr [rsp + 64]
        cmp     rdx, qword ptr [rsp + 80]
        mov     qword ptr [rsp + 32], rdi
        mov     qword ptr [rsp + 24], rsi
        jne     .LBB16_5
        call    qword ptr [rip + bcmp@GOTPCREL]
        test    eax, eax
        sete    al
        mov     dword ptr [rsp + 4], eax
        mov     rsi, qword ptr [rsp + 72]
        test    rsi, rsi
        jne     .LBB16_8
        jmp     .LBB16_9
```
but the manually hoisted version just 1:
```
.LBB16_3:
        mov     r13, qword ptr [r15]
        mov     r14, qword ptr [r15 + 8]
        lea     rdi, [rsp + 64]
        mov     rsi, r13
        mov     rdx, r14
        call    example::sort_by_words
        mov     rdi, qword ptr [rsp + 64]
        mov     rdx, qword ptr [rsp + 16]
        cmp     qword ptr [rsp + 80], rdx
        mov     qword ptr [rsp + 32], rdi
        jne     .LBB16_5
        mov     rsi, qword ptr [rsp + 8]
        call    qword ptr [rip + bcmp@GOTPCREL]
        test    eax, eax
        sete    bpl
        mov     rsi, qword ptr [rsp + 72]
        test    rsi, rsi
        jne     .LBB16_8
        jmp     .LBB16_9
```
This code is probably not very hot, but there is no reason to leave such a low hanging fruit.
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Aug 3, 2023
[rustc_span][perf] Hoist lookup sorted by words out of the loop.

``@lqd`` commented on rust-lang#114351 asking if `sort_by_words(lookup)` is computed repeatedly. I was assuming that rustc should have no difficulties to hoist it automatically outside of the loop to avoid repeated pure computation, but according to
 https://godbolt.org/z/frs8Kj1rq it seems like I was wrong:
original version seems to have 2 calls per loop iteration
```
.LBB16_3:
        mov     rbx, qword ptr [r13]
        mov     r14, qword ptr [r13 + 8]
        lea     rdi, [rsp + 40]
        mov     rsi, rbx
        mov     rdx, r14
        call    example::sort_by_words
        lea     rdi, [rsp + 64]
        mov     rsi, qword ptr [rsp + 8]
        mov     rdx, qword ptr [rsp + 16]
        call    example::sort_by_words
        mov     rdi, qword ptr [rsp + 40]
        mov     rdx, qword ptr [rsp + 56]
        mov     rsi, qword ptr [rsp + 64]
        cmp     rdx, qword ptr [rsp + 80]
        mov     qword ptr [rsp + 32], rdi
        mov     qword ptr [rsp + 24], rsi
        jne     .LBB16_5
        call    qword ptr [rip + bcmp@GOTPCREL]
        test    eax, eax
        sete    al
        mov     dword ptr [rsp + 4], eax
        mov     rsi, qword ptr [rsp + 72]
        test    rsi, rsi
        jne     .LBB16_8
        jmp     .LBB16_9
```
but the manually hoisted version just 1:
```
.LBB16_3:
        mov     r13, qword ptr [r15]
        mov     r14, qword ptr [r15 + 8]
        lea     rdi, [rsp + 64]
        mov     rsi, r13
        mov     rdx, r14
        call    example::sort_by_words
        mov     rdi, qword ptr [rsp + 64]
        mov     rdx, qword ptr [rsp + 16]
        cmp     qword ptr [rsp + 80], rdx
        mov     qword ptr [rsp + 32], rdi
        jne     .LBB16_5
        mov     rsi, qword ptr [rsp + 8]
        call    qword ptr [rip + bcmp@GOTPCREL]
        test    eax, eax
        sete    bpl
        mov     rsi, qword ptr [rsp + 72]
        test    rsi, rsi
        jne     .LBB16_8
        jmp     .LBB16_9
```
This code is probably not very hot, but there is no reason to leave such a low hanging fruit.
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Aug 3, 2023
[rustc_span][perf] Hoist lookup sorted by words out of the loop.

```@lqd``` commented on rust-lang#114351 asking if `sort_by_words(lookup)` is computed repeatedly. I was assuming that rustc should have no difficulties to hoist it automatically outside of the loop to avoid repeated pure computation, but according to
 https://godbolt.org/z/frs8Kj1rq it seems like I was wrong:
original version seems to have 2 calls per loop iteration
```
.LBB16_3:
        mov     rbx, qword ptr [r13]
        mov     r14, qword ptr [r13 + 8]
        lea     rdi, [rsp + 40]
        mov     rsi, rbx
        mov     rdx, r14
        call    example::sort_by_words
        lea     rdi, [rsp + 64]
        mov     rsi, qword ptr [rsp + 8]
        mov     rdx, qword ptr [rsp + 16]
        call    example::sort_by_words
        mov     rdi, qword ptr [rsp + 40]
        mov     rdx, qword ptr [rsp + 56]
        mov     rsi, qword ptr [rsp + 64]
        cmp     rdx, qword ptr [rsp + 80]
        mov     qword ptr [rsp + 32], rdi
        mov     qword ptr [rsp + 24], rsi
        jne     .LBB16_5
        call    qword ptr [rip + bcmp@GOTPCREL]
        test    eax, eax
        sete    al
        mov     dword ptr [rsp + 4], eax
        mov     rsi, qword ptr [rsp + 72]
        test    rsi, rsi
        jne     .LBB16_8
        jmp     .LBB16_9
```
but the manually hoisted version just 1:
```
.LBB16_3:
        mov     r13, qword ptr [r15]
        mov     r14, qword ptr [r15 + 8]
        lea     rdi, [rsp + 64]
        mov     rsi, r13
        mov     rdx, r14
        call    example::sort_by_words
        mov     rdi, qword ptr [rsp + 64]
        mov     rdx, qword ptr [rsp + 16]
        cmp     qword ptr [rsp + 80], rdx
        mov     qword ptr [rsp + 32], rdi
        jne     .LBB16_5
        mov     rsi, qword ptr [rsp + 8]
        call    qword ptr [rip + bcmp@GOTPCREL]
        test    eax, eax
        sete    bpl
        mov     rsi, qword ptr [rsp + 72]
        test    rsi, rsi
        jne     .LBB16_8
        jmp     .LBB16_9
```
This code is probably not very hot, but there is no reason to leave such a low hanging fruit.
github-actions bot pushed a commit to rust-lang/miri that referenced this pull request Aug 4, 2023
[rustc_span][perf] Hoist lookup sorted by words out of the loop.

```@lqd``` commented on rust-lang/rust#114351 asking if `sort_by_words(lookup)` is computed repeatedly. I was assuming that rustc should have no difficulties to hoist it automatically outside of the loop to avoid repeated pure computation, but according to
 https://godbolt.org/z/frs8Kj1rq it seems like I was wrong:
original version seems to have 2 calls per loop iteration
```
.LBB16_3:
        mov     rbx, qword ptr [r13]
        mov     r14, qword ptr [r13 + 8]
        lea     rdi, [rsp + 40]
        mov     rsi, rbx
        mov     rdx, r14
        call    example::sort_by_words
        lea     rdi, [rsp + 64]
        mov     rsi, qword ptr [rsp + 8]
        mov     rdx, qword ptr [rsp + 16]
        call    example::sort_by_words
        mov     rdi, qword ptr [rsp + 40]
        mov     rdx, qword ptr [rsp + 56]
        mov     rsi, qword ptr [rsp + 64]
        cmp     rdx, qword ptr [rsp + 80]
        mov     qword ptr [rsp + 32], rdi
        mov     qword ptr [rsp + 24], rsi
        jne     .LBB16_5
        call    qword ptr [rip + bcmp@GOTPCREL]
        test    eax, eax
        sete    al
        mov     dword ptr [rsp + 4], eax
        mov     rsi, qword ptr [rsp + 72]
        test    rsi, rsi
        jne     .LBB16_8
        jmp     .LBB16_9
```
but the manually hoisted version just 1:
```
.LBB16_3:
        mov     r13, qword ptr [r15]
        mov     r14, qword ptr [r15 + 8]
        lea     rdi, [rsp + 64]
        mov     rsi, r13
        mov     rdx, r14
        call    example::sort_by_words
        mov     rdi, qword ptr [rsp + 64]
        mov     rdx, qword ptr [rsp + 16]
        cmp     qword ptr [rsp + 80], rdx
        mov     qword ptr [rsp + 32], rdi
        jne     .LBB16_5
        mov     rsi, qword ptr [rsp + 8]
        call    qword ptr [rip + bcmp@GOTPCREL]
        test    eax, eax
        sete    bpl
        mov     rsi, qword ptr [rsp + 72]
        test    rsi, rsi
        jne     .LBB16_8
        jmp     .LBB16_9
```
This code is probably not very hot, but there is no reason to leave such a low hanging fruit.
@fee1-dead
Copy link
Member

@bors r+ rollup

@bors
Copy link
Contributor

bors commented Aug 4, 2023

📌 Commit f74eee2 has been approved by fee1-dead

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 4, 2023
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Aug 4, 2023
[rustc_span][perf] Remove unnecessary string joins and allocs.

Comparing vectors of string parts yields the same result but avoids unnecessary `join` and potential allocation for resulting `String`. This code is cold so it's unlikely to have any measurable impact, but considering but since it's also simpler, why not? :)
bors added a commit to rust-lang-ci/rust that referenced this pull request Aug 4, 2023
…iaskrgr

Rollup of 9 pull requests

Successful merges:

 - rust-lang#113945 (Fix wrong span for trait selection failure error reporting)
 - rust-lang#114351 ([rustc_span][perf] Remove unnecessary string joins and allocs.)
 - rust-lang#114418 (bump parking_lot to 0.12)
 - rust-lang#114434 (Improve spans for indexing expressions)
 - rust-lang#114450 (Fix ICE failed to get layout for ReferencesError)
 - rust-lang#114461 (Fix unwrap on None)
 - rust-lang#114462 (interpret: add mplace_to_ref helper method)
 - rust-lang#114472 (Reword `confusable_idents` lint)
 - rust-lang#114477 (Account for `Rc` and `Arc` when suggesting to clone)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 23e86f6 into rust-lang:master Aug 4, 2023
11 checks passed
@rustbot rustbot added this to the 1.73.0 milestone Aug 4, 2023
@ttsugriy ttsugriy deleted the sort-by-words branch August 4, 2023 22:24
@ttsugriy
Copy link
Contributor Author

ttsugriy commented Aug 7, 2023

btw, just for fun, I've also wrote a benchmark and got

     Running benches/sort_by_words_benchmark.rs (target/release/deps/sort_by_words_benchmark-5adee33fa37ce7e2)
Benchmarking multiply add/original: Collecting 100 samples in estimated 5.0006 s (27M iteratio
multiply add/original   time:   [172.85 ns 177.30 ns 182.83 ns]
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
Benchmarking multiply add/proposed: Collecting 100 samples in estimated 5.0000 s (61M iteratio
multiply add/proposed   time:   [81.106 ns 81.605 ns 82.261 ns]
Found 12 outliers among 100 measurements (12.00%)
  5 (5.00%) high mild
  7 (7.00%) high severe

although the numbers may be different for different strings.

flip1995 pushed a commit to flip1995/rust that referenced this pull request Aug 11, 2023
…iaskrgr

Rollup of 9 pull requests

Successful merges:

 - rust-lang#113945 (Fix wrong span for trait selection failure error reporting)
 - rust-lang#114351 ([rustc_span][perf] Remove unnecessary string joins and allocs.)
 - rust-lang#114418 (bump parking_lot to 0.12)
 - rust-lang#114434 (Improve spans for indexing expressions)
 - rust-lang#114450 (Fix ICE failed to get layout for ReferencesError)
 - rust-lang#114461 (Fix unwrap on None)
 - rust-lang#114462 (interpret: add mplace_to_ref helper method)
 - rust-lang#114472 (Reword `confusable_idents` lint)
 - rust-lang#114477 (Account for `Rc` and `Arc` when suggesting to clone)

r? `@ghost`
`@rustbot` modify labels: rollup
calebcartwright pushed a commit to calebcartwright/rust that referenced this pull request Oct 23, 2023
…iaskrgr

Rollup of 9 pull requests

Successful merges:

 - rust-lang#113945 (Fix wrong span for trait selection failure error reporting)
 - rust-lang#114351 ([rustc_span][perf] Remove unnecessary string joins and allocs.)
 - rust-lang#114418 (bump parking_lot to 0.12)
 - rust-lang#114434 (Improve spans for indexing expressions)
 - rust-lang#114450 (Fix ICE failed to get layout for ReferencesError)
 - rust-lang#114461 (Fix unwrap on None)
 - rust-lang#114462 (interpret: add mplace_to_ref helper method)
 - rust-lang#114472 (Reword `confusable_idents` lint)
 - rust-lang#114477 (Account for `Rc` and `Arc` when suggesting to clone)

r? `@ghost`
`@rustbot` modify labels: rollup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants