Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AArch64][X86] Suboptimal code for combining overflow flags #92609

Open
Kmeakin opened this issue May 17, 2024 · 2 comments
Open

[AArch64][X86] Suboptimal code for combining overflow flags #92609

Kmeakin opened this issue May 17, 2024 · 2 comments

Comments

@Kmeakin
Copy link
Contributor

Kmeakin commented May 17, 2024

Compiler explorer: https://godbolt.org/z/4avEfzPP1

This code performs a sequence of 3 adds and sets w0 if none of the adds overflow.
The generated code calculates !(add1_overflowed | add2_overflowed).
It would be more efficient to generate (add1_didnt_overflow & add2_didnt_overflow)

#[no_mangle]
pub fn checked_add3(a: u32, b: u32, c: u32) -> Option<u32> {
    a.checked_add(b)?.checked_add(c)
}

generated AArch64 code:

checked_add3:
        adds    w8, w0, w1
        cset    w9, hs
        adds    w1, w8, w2
        csinc   w8, w9, wzr, lo
        eor     w0, w8, #0x1
        ret

Optimal AArch64 code:

checked_add3:
        adds    w8, w0, w1
        cset    w9, lo
        adds    w1, w8, w2
        csel    w0, w9, wzr, lo
        ret

The generated code for x86 is similarly suboptimal:

checked_add3:
        add     edi, esi
        setb    cl
        add     edi, edx
        setb    al
        or      al, cl
        xor     al, 1
        ret

The code for performing 4 checked additions is even worse:

checked_add4:
        adds    w8, w0, w1
        cset    w9, hs
        adds    w10, w2, w3
        cset    w11, hs
        adds    w8, w8, w10
        cset    w10, lo
        tst     w11, #0x1
        csel    w8, w8, w8, ne
        csel    w10, wzr, w10, ne
        tst     w9, #0x1
        csel    w1, w8, w8, ne
        csel    w0, wzr, w10, ne
        ret
@llvmbot
Copy link
Collaborator

llvmbot commented May 17, 2024

@llvm/issue-subscribers-backend-aarch64

Author: Karl Meakin (Kmeakin)

This code performs a sequence of 3 adds and sets `w0` if none of the adds overflow. The generated code calculates `!(add1_overflowed | add2_overflowed)`. It would be more efficient to generate `(add1_didnt_overflow & add2_didnt_overflow)`
#[no_mangle]
pub fn checked_add3(a: u32, b: u32, c: u32) -&gt; Option&lt;u32&gt; {
    a.checked_add(b)?.checked_add(c)
}

generated AArch64 code:

checked_add3:
        adds    w8, w0, w1
        cset    w9, hs
        adds    w1, w8, w2
        csinc   w8, w9, wzr, lo
        eor     w0, w8, #<!-- -->0x1
        ret

Optimal AArch64 code:

checked_add2:
        adds    w1, w0, w1
        cset    w0, lo
        ret

checked_add3:
        adds    w8, w0, w1
        cset    w9, lo
        adds    w1, w8, w2
        csel    w0, w9, wzr, lo
        ret

The generated code for x86 is similarly suboptimal:

checked_add3:
        add     edi, esi
        setb    cl
        add     edi, edx
        setb    al
        or      al, cl
        xor     al, 1
        ret

@llvmbot
Copy link
Collaborator

llvmbot commented May 17, 2024

@llvm/issue-subscribers-backend-x86

Author: Karl Meakin (Kmeakin)

This code performs a sequence of 3 adds and sets `w0` if none of the adds overflow. The generated code calculates `!(add1_overflowed | add2_overflowed)`. It would be more efficient to generate `(add1_didnt_overflow & add2_didnt_overflow)`
#[no_mangle]
pub fn checked_add3(a: u32, b: u32, c: u32) -&gt; Option&lt;u32&gt; {
    a.checked_add(b)?.checked_add(c)
}

generated AArch64 code:

checked_add3:
        adds    w8, w0, w1
        cset    w9, hs
        adds    w1, w8, w2
        csinc   w8, w9, wzr, lo
        eor     w0, w8, #<!-- -->0x1
        ret

Optimal AArch64 code:

checked_add2:
        adds    w1, w0, w1
        cset    w0, lo
        ret

checked_add3:
        adds    w8, w0, w1
        cset    w9, lo
        adds    w1, w8, w2
        csel    w0, w9, wzr, lo
        ret

The generated code for x86 is similarly suboptimal:

checked_add3:
        add     edi, esi
        setb    cl
        add     edi, edx
        setb    al
        or      al, cl
        xor     al, 1
        ret

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants