Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Common branch folding for enum types fails when there are more than three enum variants. #121719

Closed
ZhennanWu opened this issue Feb 28, 2024 · 4 comments · Fixed by #121665
Closed
Labels
C-bug Category: This is a bug. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-slow Issue: Problems and improvements with respect to performance of generated code. P-medium Medium priority regression-untriaged Untriaged performance or correctness regression. S-has-mcve Status: A Minimal Complete and Verifiable Example has been found for this issue T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@ZhennanWu
Copy link

ZhennanWu commented Feb 28, 2024

I tried this code:
https://godbolt.org/z/Yh7Px49jh

#[repr(C)]
pub struct A {
    x: f64,
    y: u64,
}
#[repr(C)]
pub struct B {
    x: f64,
    y: u32,
}
#[repr(C)]
pub struct C {
    x: f64,
    y: u16,
}
#[repr(C)]
pub struct D {
    x: f64,
    y: u8,
}

pub enum E {
    A(A),
    B(B),
    C(C),
    D(D),
}

impl E {
    #[inline(never)]
    pub fn x(&self) -> &f64 {
        match self {
            E::A(A { x, .. }) | E::B(B { x, .. }) | E::C(C { x, .. }) | E::D(D { x, .. }) => x,
        }
    }
}

I expected to see this happen:

The E::x function gets optimized into a single add instruction since all the field has the same offset

Instead, this happened:

When the enum has four or more variants, the asm generated is

example::E::x:
        mov     rax, rdi
        mov     rcx, qword ptr [rdi]
        lea     rdx, [rip + .LJTI0_0]
        movsxd  rcx, dword ptr [rdx + 4*rcx]
        add     rcx, rdx
        jmp     rcx
.LBB0_1:
        add     rax, 8
        ret
.LJTI0_0:
        .long   .LBB0_1-.LJTI0_0
        .long   .LBB0_1-.LJTI0_0
        .long   .LBB0_1-.LJTI0_0
        .long   .LBB0_1-.LJTI0_0

When the enum has three variants, the folding took effects but still generates strange stuff

example::E::x:
        mov     rax, rdi
        mov     rcx, qword ptr [rdi]
        test    rcx, rcx
        je      .LBB0_2
        cmp     ecx, 1
.LBB0_2:
        add     rax, 8
        ret

When it has two variants, it finally produces idiomatic code

example::E::x:
        lea     rax, [rdi + 8]
        ret

Interestingly, rustc will generate idiomatic code when:

  1. Enum variants not exceeding three
  2. Enum variants share the same layout. As in the demo the field y is used to disrupt the layout.
  3. (Edit) Rustc version <=1.64.

Meta

godbold rustc 1.76.0 with -C opt-level=3 -C target-cpu=x86-64-v3

Edit: Tried more godbolt settings. This is a regression introduced sometime between rust 1.64 and rust 1.65.

@ZhennanWu ZhennanWu added the C-bug Category: This is a bug. label Feb 28, 2024
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Feb 28, 2024
@jieyouxu jieyouxu added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. S-has-mcve Status: A Minimal Complete and Verifiable Example has been found for this issue C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Feb 28, 2024
@ZhennanWu
Copy link
Author

@jieyouxu Please add regression label as rust 1.64 can optimize this code. Rust 1.64 even managed to produce idiomatic code for my project with 11 complex enum variants.

@jieyouxu jieyouxu added the regression-untriaged Untriaged performance or correctness regression. label Feb 28, 2024
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Feb 28, 2024
@DianQK
Copy link
Member

DianQK commented Feb 28, 2024

#121665 might fix this issue.

@erikdesjardins
Copy link
Contributor

Indeed it does, added a test in 4016510

@apiraino
Copy link
Contributor

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-medium +I-slow

@rustbot rustbot added I-slow Issue: Problems and improvements with respect to performance of generated code. P-medium Medium priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Feb 28, 2024
bors added a commit to rust-lang-ci/rust that referenced this issue Mar 2, 2024
Always generate GEP i8 / ptradd for struct offsets

This implements rust-lang#98615, and goes a bit further to remove `struct_gep` entirely.

Upstream LLVM is in the beginning stages of [migrating to `ptradd`](https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699). LLVM 19 will [canonicalize](llvm/llvm-project#68882) all constant-offset GEPs to i8, which has roughly the same effect as this change.

Fixes rust-lang#121719.

Split out from rust-lang#121577.

r? `@nikic`
@bors bors closed this as completed in 70aa0b8 Mar 4, 2024
GuillaumeGomez pushed a commit to GuillaumeGomez/rust that referenced this issue Mar 5, 2024
Always generate GEP i8 / ptradd for struct offsets

This implements rust-lang#98615, and goes a bit further to remove `struct_gep` entirely.

Upstream LLVM is in the beginning stages of [migrating to `ptradd`](https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699). LLVM 19 will [canonicalize](llvm/llvm-project#68882) all constant-offset GEPs to i8, which has roughly the same effect as this change.

Fixes rust-lang#121719.

Split out from rust-lang#121577.

r? `@nikic`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-slow Issue: Problems and improvements with respect to performance of generated code. P-medium Medium priority regression-untriaged Untriaged performance or correctness regression. S-has-mcve Status: A Minimal Complete and Verifiable Example has been found for this issue T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants