Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISCV] Inefficient code generated for byte atomicrmw xchg when the input is a constant zero #64090

Closed
Amanieu opened this issue Jul 25, 2023 · 10 comments

Comments

@Amanieu
Copy link
Contributor

Amanieu commented Jul 25, 2023

Noticed while working on rust-lang/rust#114034. When the replacement value is known to be zero then we should lower to an amoand instead of a LL/SC loop.

IR

define zeroext i8 @swap_zero(ptr %0) {
  %2 = atomicrmw xchg ptr %0, i8 0 monotonic
  ret i8 %2
}

Current output

swap_zero:                              # @swap_zero
        andi    a1, a0, -4
        slli    a0, a0, 3
        li      a2, 255
        sllw    a2, a2, a0
.LBB0_1:                                # =>This Inner Loop Header: Depth=1
        lr.w    a3, (a1)
        li      a4, 0
        xor     a4, a3, a4
        and     a4, a4, a2
        xor     a4, a3, a4
        sc.w    a4, a4, (a1)
        bnez    a4, .LBB0_1
        srlw    a0, a3, a0
        andi    a0, a0, 255
        ret

Ideal output

swap_zero:                              # @swap_zero
        andi    a1, a0, -4
        slli    a0, a0, 3
        li      a2, 255
        sllw    a2, a2, a0
        not     a2, a2
        amoand  a1, a2, (a1)
        srlw    a0, a1, a0
        andi    a0, a0, 255
        ret
@llvmbot
Copy link
Collaborator

llvmbot commented Jul 25, 2023

@llvm/issue-subscribers-backend-risc-v

@riking
Copy link

riking commented Jul 25, 2023

If you black_box the false, we get a single byte stack spill followed by nearly the desired codegen: https://rust.godbolt.org/z/dWPPbqEhP

@topperc
Copy link
Collaborator

topperc commented Jul 25, 2023

InstCombine aggressively turns atomicrmw and with 0 into atomicrmw xchg. Maybe AtomicExpandPass needs to reverse this for partial atomics based on the target.

@appujee
Copy link
Contributor

appujee commented Jul 25, 2023

Do you have llc or opt command that will repro the above assembly?

From https://godbolt.org/z/54r8h56cq

llc generated __atomic_exchange_1 instead.

@topperc
Copy link
Collaborator

topperc commented Jul 25, 2023

Do you have llc or opt command that will repro the above assembly?

From https://godbolt.org/z/54r8h56cq

llc generated __atomic_exchange_1 instead.

-mattr=+a

@appujee
Copy link
Contributor

appujee commented Jul 25, 2023

Will generating amoswap followed by andi reg, reg, 255 have the same semantics?

@topperc
Copy link
Collaborator

topperc commented Jul 25, 2023

Will generating amoswap followed by andi reg, reg, 255 have the same semantics?

No. We need to write 0 to a single byte. amoswap writes 32 or 64 bits, but we don't know what value to put in the other bytes until after the read. Using amoand also writes 32 or 64 bits, but would allow to mask out the single byte that needs to be zeroed.

@appujee
Copy link
Contributor

appujee commented Jul 26, 2023

ah, i see there is only amoswap.w and amoswap.d versions

@appujee
Copy link
Contributor

appujee commented Jul 26, 2023

Just confirming that we do generate amoswap for 32 bits. https://godbolt.org/z/8MxMWe5aE

@asb
Copy link
Contributor

asb commented Aug 1, 2023

One candidate approach posted as D156801 (sorry Craig, only seeing now you self-assigned this bug last night).

@asb asb closed this as completed in be0dac2 Aug 2, 2023
doru1004 pushed a commit to doru1004/llvm-project that referenced this issue Aug 3, 2023
doru1004 pushed a commit to doru1004/llvm-project that referenced this issue Aug 3, 2023
As noted in <llvm#64090>, it's
more efficient to lower a partword 'atomicrmw xchg a, 0` to and amoand
with appropriate mask. There are a range of possible ways to go about
this - e.g. writing a combine based on the
`llvm.riscv.masked.atomicrmw.xchg` intrinsic, or introducing a new
interface to AtomicExpandPass to allow target-specific atomics
conversions, or trying to lift the conversion into AtomicExpandPass
itself based on querying some target hook. Ultimately I've gone with
what appears to be the simplest approach - just covering this case in
emitMaskedAtomicRMWIntrinsic. I perhaps should have given that hook a
different name way back when it was introduced.

This also handles the `atomicrmw xchg a, -1` case suggested by Craig
during review.

Fixes llvm#64090

Differential Revision: https://reviews.llvm.org/D156801
razmser pushed a commit to razmser/llvm-project that referenced this issue Sep 8, 2023
razmser pushed a commit to razmser/llvm-project that referenced this issue Sep 8, 2023
As noted in <llvm#64090>, it's
more efficient to lower a partword 'atomicrmw xchg a, 0` to and amoand
with appropriate mask. There are a range of possible ways to go about
this - e.g. writing a combine based on the
`llvm.riscv.masked.atomicrmw.xchg` intrinsic, or introducing a new
interface to AtomicExpandPass to allow target-specific atomics
conversions, or trying to lift the conversion into AtomicExpandPass
itself based on querying some target hook. Ultimately I've gone with
what appears to be the simplest approach - just covering this case in
emitMaskedAtomicRMWIntrinsic. I perhaps should have given that hook a
different name way back when it was introduced.

This also handles the `atomicrmw xchg a, -1` case suggested by Craig
during review.

Fixes llvm#64090

Differential Revision: https://reviews.llvm.org/D156801
SixWeining pushed a commit that referenced this issue Sep 26, 2023
Similar to D156801 for RISCV.

Link: rust-lang/rust#114034
Link: #64090

Reviewed By: SixWeining, xen0n

Differential Revision: https://reviews.llvm.org/D159252
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants