Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

match generates more branch instructions #122726

Open
DianQK opened this issue Mar 19, 2024 · 5 comments
Open

match generates more branch instructions #122726

DianQK opened this issue Mar 19, 2024 · 5 comments
Labels
A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@DianQK
Copy link
Member

DianQK commented Mar 19, 2024

I tried this code:

fn foo(a: Option<i32>, b: Option<i32>) -> i32 {
  match (a, b) {
    (None, None) => 0,
    (Some(a), Some(b)) => a + b,
    (Some(a), _) => a,
    _ => 1,
  }
}z

The generated assembly code:

foo:
        xor     eax, eax
        test    edi, edi
        je      .LBB0_1
        cmp     edx, 1
        cmove   eax, ecx
        add     eax, esi
        ret
.LBB0_1:
        test    edx, edx
        setne   al
        ret

godbolt: https://rust.godbolt.org/z/7dGEccqcW.

We can get less code with alive2 validation:

tgt:                                    # @tgt
        test    edx, edx
        cmove   ecx, edx
        lea     eax, [rcx + rsi]
        test    edi, edi
        cmove   eax, edx
        ret

godbolt (llc): https://llvm.godbolt.org/z/331WYovKG

Meta

rustc --version --verbose:

rustc 1.78.0-nightly (3cbb93223 2024-03-13)
binary: rustc
commit-hash: 3cbb93223f33024db464a4df27a13c7cce870173
commit-date: 2024-03-13
host: x86_64-unknown-linux-gnu
release: 1.78.0-nightly
LLVM version: 18.1.0

@rustbot modify labels: +C-optimization +A-codegen +A-LLVM +I-slow

@rustbot rustbot added needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-slow Issue: Problems and improvements with respect to performance of generated code. labels Mar 19, 2024
@DianQK
Copy link
Member Author

DianQK commented Mar 19, 2024

There should be multiple ways to solve this. Let me try one first.

@rustbot claim

@jieyouxu jieyouxu removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Mar 19, 2024
bors added a commit to rust-lang-ci/rust that referenced this issue Mar 19, 2024
[WIP] Use assume rather than range metadata

Fixes rust-lang#122726.

Currently, range can only be used in load, call and invoke instructions. Due to SROA being run before other passes, even in the simplest IR, LLVM cannot infer that `%i1` is 0.

```llvm
define noundef i32 `@src(i32` noundef %arg) {
  %i = alloca i32, align 4
  store i32 %arg, ptr %i, align 4
  %i1 = load i32, ptr %i, align 4, !range !0
  ret i32 %i1
}
```

https://alive2.llvm.org/ce/z/MjsH9b

r? `@ghost`
@erikdesjardins
Copy link
Contributor

erikdesjardins commented Mar 20, 2024

#122664 marginally improves the example, removing the branch, but it doesn't get down to the optimal code:

define noundef i32 @foo(i1 noundef %0, i32 %1, i1 noundef %2, i32 %3) unnamed_addr #0 {
start:
  %. = zext i1 %2 to i32
  %4 = select i1 %2, i32 %3, i32 0
  %spec.select = add i32 %4, %1
  %_0.0 = select i1 %0, i32 %spec.select, i32 %.
  ret i32 %_0.0
}
foo:
  mov    r8d, edx
  and    r8d, 1
  xor    eax, eax
  test   dl, 1
  cmovne eax, ecx
  add    eax, esi
  test   dil, 1
  cmove  eax, r8d
  ret

@DianQK
Copy link
Member Author

DianQK commented Mar 20, 2024

Hmm, possibly related to backend handling i1?

@DianQK
Copy link
Member Author

DianQK commented Mar 21, 2024

I think the problem is that there is no register equivalent to the i1 on x86 and other platforms. We need additional instructions for this safe conversion.

For codegen/backend, the following IR may be considered equivalent:

define noundef i32 @foo(i32 noundef %0, i32 %1, i32 noundef %2, i32 %3) unnamed_addr #0 {
start:
  %_2 = trunc i32 %2 to i1
  %. = zext i1 %_2 to i32
  %4 = select i1 %_2, i32 %3, i32 0
  %spec.select = add i32 %4, %1
  %_0 = trunc i32 %0 to i1
  %_0.0 = select i1 %_0, i32 %spec.select, i32 %.
  ret i32 %_0.0
}

I would be concerned about the actual impact of the i1 even though the IR looks good.

@jieyouxu
Copy link
Contributor

jieyouxu commented Mar 21, 2024

#122664 marginally improves the example, removing the branch, but it doesn't get down to the optimal code

Unfortunately after experimenting, I don't think it's possible to represent enum discriminants as i1 without significant regressions because we can't annotate a { i1, i32 } in return position with zeroext; and if I do annotate an i1 discriminant as zeroext in argument position, then a ton of ABI compatibilty tests blow up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants