Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use more optimal Ord implementation for integers #63767

Merged
merged 3 commits into from Aug 22, 2019

Conversation

@lzutao
Copy link
Contributor

commented Aug 21, 2019

Closes #63758
r? @nagisa

Compare results

(godbolt link)

Old assembly:

example::cmp1:
  mov eax, dword ptr [rdi]
  mov ecx, dword ptr [rsi]
  cmp eax, ecx
  setae dl
  add dl, dl
  add dl, -1
  xor esi, esi
  cmp eax, ecx
  movzx eax, dl
  cmove eax, esi
  ret

New assembly:

example::cmp2:
  mov eax, dword ptr [rdi]
  xor ecx, ecx
  cmp eax, dword ptr [rsi]
  seta cl
  mov eax, 255
  cmovae eax, ecx
  ret

Old llvm-mca statistics:

Iterations:        100
Instructions:      1100
Total Cycles:      243
Total uOps:        1300

Dispatch Width:    6
uOps Per Cycle:    5.35
IPC:               4.53
Block RThroughput: 2.2

New llvm-mca statistics:

Iterations:        100
Instructions:      700
Total Cycles:      217
Total uOps:        1100

Dispatch Width:    6
uOps Per Cycle:    5.07
IPC:               3.23
Block RThroughput: 1.8
@nagisa

This comment has been minimized.

Copy link
Contributor

commented Aug 21, 2019

@bors r+

@bors

This comment has been minimized.

Copy link
Contributor

commented Aug 21, 2019

📌 Commit 0337cc1 has been approved by nagisa

@matthiaskrgr

This comment has been minimized.

Copy link
Member

commented Aug 21, 2019

Could you add a comment explaining that the ordering is performance critical here? (perhaps with a link to the original ticket)
This should make sure that it es not changed back to something slower by accident.

@hellow554

This comment has been minimized.

Copy link
Contributor

commented Aug 21, 2019

What about adding a assembly testcase as well to prevent regressions (e.g. due to other optimizations?)

@lzutao

This comment has been minimized.

Copy link
Contributor Author

commented Aug 21, 2019

What about adding a assembly testcase as well to prevent regressions (e.g. due to other optimizations?)

I don't know how. Could you give a mentor?

@hellow554

This comment has been minimized.

Copy link
Contributor

commented Aug 21, 2019

@lzutao You can take a look at https://rust-lang.github.io/rustc-guide/tests/intro.html and the codegen test cases in https://github.com/rust-lang/rust/tree/master/src/test/codegen especially at

// compile-flags: -C no-prepopulate-passes
#![crate_type = "lib"]
#![feature(core_intrinsics)]
use std::intrinsics::{fadd_fast, fsub_fast, fmul_fast, fdiv_fast, frem_fast};
// CHECK-LABEL: @add
#[no_mangle]
pub fn add(x: f32, y: f32) -> f32 {
// CHECK: fadd float
// CHECK-NOT: fast
x + y
}
// CHECK-LABEL: @addition
#[no_mangle]
pub fn addition(x: f32, y: f32) -> f32 {
// CHECK: fadd fast float
unsafe {
fadd_fast(x, y)
}
}
// CHECK-LABEL: @subtraction
#[no_mangle]
pub fn subtraction(x: f32, y: f32) -> f32 {
// CHECK: fsub fast float
unsafe {
fsub_fast(x, y)
}
}
// CHECK-LABEL: @multiplication
#[no_mangle]
pub fn multiplication(x: f32, y: f32) -> f32 {
// CHECK: fmul fast float
unsafe {
fmul_fast(x, y)
}
}
// CHECK-LABEL: @division
#[no_mangle]
pub fn division(x: f32, y: f32) -> f32 {
// CHECK: fdiv fast float
unsafe {
fdiv_fast(x, y)
}
}
I guess

@lzutao

This comment was marked as resolved.

Copy link
Contributor Author

commented Aug 21, 2019

Hi @hellow554, I wrote a simple test for this. But honestly, I don't know how to match
on this llvm IR output:

; ModuleID = 'integer_cmp.3a1fbbbh-cgu.0'
source_filename = "integer_cmp.3a1fbbbh-cgu.0"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

; core::cmp::impls::<impl core::cmp::Ord for i64>::cmp
; Function Attrs: inlinehint nonlazybind uwtable
define internal i8 @"_ZN4core3cmp5impls48_$LT$impl$u20$core..cmp..Ord$u20$for$u20$i64$GT$3cmp17h54b2deb40875460aE"(i64* noalias readonly align 8 dereferenceable(8) %self, i64* noalias readonly align 8 dereferenceable(8) %other) unnamed_addr #0 {
start:
  %_0 = alloca i8, align 1
  %0 = load i64, i64* %self, align 8
  %1 = load i64, i64* %other, align 8
  %2 = icmp slt i64 %0, %1
  br i1 %2, label %bb2, label %bb1

bb1:                                              ; preds = %start
  %3 = load i64, i64* %self, align 8
  %4 = load i64, i64* %other, align 8
  %5 = icmp sgt i64 %3, %4
  br i1 %5, label %bb4, label %bb3

bb2:                                              ; preds = %start
  store i8 -1, i8* %_0, align 1
  br label %bb6

bb3:                                              ; preds = %bb1
  store i8 0, i8* %_0, align 1
  br label %bb5

bb4:                                              ; preds = %bb1
  store i8 1, i8* %_0, align 1
  br label %bb5

bb5:                                              ; preds = %bb3, %bb4
  br label %bb6

bb6:                                              ; preds = %bb5, %bb2
  %6 = load i8, i8* %_0, align 1, !range !1
  ret i8 %6
}

; core::cmp::impls::<impl core::cmp::Ord for u32>::cmp
; Function Attrs: inlinehint nonlazybind uwtable
define internal i8 @"_ZN4core3cmp5impls48_$LT$impl$u20$core..cmp..Ord$u20$for$u20$u32$GT$3cmp17h1dd39efa68a3677aE"(i32* noalias readonly align 4 dereferenceable(4) %self, i32* noalias readonly align 4 dereferenceable(4) %other) unnamed_addr #0 {
start:
  %_0 = alloca i8, align 1
  %0 = load i32, i32* %self, align 4
  %1 = load i32, i32* %other, align 4
  %2 = icmp ult i32 %0, %1
  br i1 %2, label %bb2, label %bb1

bb1:                                              ; preds = %start
  %3 = load i32, i32* %self, align 4
  %4 = load i32, i32* %other, align 4
  %5 = icmp ugt i32 %3, %4
  br i1 %5, label %bb4, label %bb3

bb2:                                              ; preds = %start
  store i8 -1, i8* %_0, align 1
  br label %bb6

bb3:                                              ; preds = %bb1
  store i8 0, i8* %_0, align 1
  br label %bb5

bb4:                                              ; preds = %bb1
  store i8 1, i8* %_0, align 1
  br label %bb5

bb5:                                              ; preds = %bb3, %bb4
  br label %bb6

bb6:                                              ; preds = %bb5, %bb2
  %6 = load i8, i8* %_0, align 1, !range !1
  ret i8 %6
}

; Function Attrs: nonlazybind uwtable
define i8 @cmp_signed(i64, i64) unnamed_addr #1 {
start:
  %b = alloca i64, align 8
  %a = alloca i64, align 8
  store i64 %0, i64* %a, align 8
  store i64 %1, i64* %b, align 8
; call core::cmp::impls::<impl core::cmp::Ord for i64>::cmp
  %2 = call i8 @"_ZN4core3cmp5impls48_$LT$impl$u20$core..cmp..Ord$u20$for$u20$i64$GT$3cmp17h54b2deb40875460aE"(i64* noalias readonly align 8 dereferenceable(8) %a, i64* noalias readonly align 8 dereferenceable(8) %b), !range !1
  br label %bb1

bb1:                                              ; preds = %start
  ret i8 %2
}

; Function Attrs: nonlazybind uwtable
define i8 @cmp_unsigned(i32, i32) unnamed_addr #1 {
start:
  %b = alloca i32, align 4
  %a = alloca i32, align 4
  store i32 %0, i32* %a, align 4
  store i32 %1, i32* %b, align 4
; call core::cmp::impls::<impl core::cmp::Ord for u32>::cmp
  %2 = call i8 @"_ZN4core3cmp5impls48_$LT$impl$u20$core..cmp..Ord$u20$for$u20$u32$GT$3cmp17h1dd39efa68a3677aE"(i32* noalias readonly align 4 dereferenceable(4) %a, i32* noalias readonly align 4 dereferenceable(4) %b), !range !1
  br label %bb1

bb1:                                              ; preds = %start
  ret i8 %2
}

attributes #0 = { inlinehint nonlazybind uwtable "probe-stack"="__rust_probestack" "target-cpu"="x86-64" }
attributes #1 = { nonlazybind uwtable "probe-stack"="__rust_probestack" "target-cpu"="x86-64" }

!llvm.module.flags = !{!0}

!0 = !{i32 2, !"RtLibUseGOT", i32 1}
!1 = !{i8 -1, i8 2}
src/test/codegen/integer-cmp.rs Outdated Show resolved Hide resolved

@lzutao lzutao force-pushed the lzutao:integer-ord-suboptimal branch from c184fa0 to f5b16f6 Aug 21, 2019

@lzutao

This comment has been minimized.

Copy link
Contributor Author

commented Aug 22, 2019

The CI is green.

@nagisa

This comment has been minimized.

Copy link
Contributor

commented Aug 22, 2019

@bors r+

@bors

This comment has been minimized.

Copy link
Contributor

commented Aug 22, 2019

📌 Commit f5b16f6 has been approved by nagisa

Centril added a commit to Centril/rust that referenced this pull request Aug 22, 2019
Rollup merge of rust-lang#63767 - lzutao:integer-ord-suboptimal, r=na…
…gisa

Use more optimal Ord implementation for integers

Closes rust-lang#63758
r? @nagisa

### Compare results

([godbolt link](https://godbolt.org/z/dsbczy))

Old assembly:
```asm
example::cmp1:
  mov eax, dword ptr [rdi]
  mov ecx, dword ptr [rsi]
  cmp eax, ecx
  setae dl
  add dl, dl
  add dl, -1
  xor esi, esi
  cmp eax, ecx
  movzx eax, dl
  cmove eax, esi
  ret
```

New assembly:
```asm
example::cmp2:
  mov eax, dword ptr [rdi]
  xor ecx, ecx
  cmp eax, dword ptr [rsi]
  seta cl
  mov eax, 255
  cmovae eax, ecx
  ret
```

Old llvm-mca statistics:
```
Iterations:        100
Instructions:      1100
Total Cycles:      243
Total uOps:        1300

Dispatch Width:    6
uOps Per Cycle:    5.35
IPC:               4.53
Block RThroughput: 2.2
```

New llvm-mca statistics:
```
Iterations:        100
Instructions:      700
Total Cycles:      217
Total uOps:        1100

Dispatch Width:    6
uOps Per Cycle:    5.07
IPC:               3.23
Block RThroughput: 1.8
```
bors added a commit that referenced this pull request Aug 22, 2019
Auto merge of #63807 - Centril:rollup-b8lo8ct, r=Centril
Rollup of 7 pull requests

Successful merges:

 - #63624 (When declaring a declarative macro in an item it's only accessible inside it)
 - #63737 (Fix naming misspelling)
 - #63767 (Use more optimal Ord implementation for integers)
 - #63782 (Fix confusion in theme picker functions)
 - #63788 (Add amanjeev to rustc-guide toolstate)
 - #63796 (Tweak E0308 on opaque types)
 - #63805 (Apply few Clippy suggestions)

Failed merges:

r? @ghost
bors added a commit that referenced this pull request Aug 22, 2019
Auto merge of #63807 - Centril:rollup-b8lo8ct, r=Centril
Rollup of 7 pull requests

Successful merges:

 - #63624 (When declaring a declarative macro in an item it's only accessible inside it)
 - #63737 (Fix naming misspelling)
 - #63767 (Use more optimal Ord implementation for integers)
 - #63782 (Fix confusion in theme picker functions)
 - #63788 (Add amanjeev to rustc-guide toolstate)
 - #63796 (Tweak E0308 on opaque types)
 - #63805 (Apply few Clippy suggestions)

Failed merges:

r? @ghost

@bors bors merged commit f5b16f6 into rust-lang:master Aug 22, 2019

4 checks passed

pr Build #20190821.47 succeeded
Details
pr (Linux mingw-check) Linux mingw-check succeeded
Details
pr (Linux x86_64-gnu-llvm-6.0) Linux x86_64-gnu-llvm-6.0 succeeded
Details
pr (LinuxTools) LinuxTools succeeded
Details

@lzutao lzutao deleted the lzutao:integer-ord-suboptimal branch Aug 23, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.