[RISCV] Inefficient code generated for byte atomicrmw xchg when the input is a constant zero #64090

Amanieu · 2023-07-25T01:06:11Z

Noticed while working on rust-lang/rust#114034. When the replacement value is known to be zero then we should lower to an amoand instead of a LL/SC loop.

IR

define zeroext i8 @swap_zero(ptr %0) {
  %2 = atomicrmw xchg ptr %0, i8 0 monotonic
  ret i8 %2
}

Current output

swap_zero:                              # @swap_zero
        andi    a1, a0, -4
        slli    a0, a0, 3
        li      a2, 255
        sllw    a2, a2, a0
.LBB0_1:                                # =>This Inner Loop Header: Depth=1
        lr.w    a3, (a1)
        li      a4, 0
        xor     a4, a3, a4
        and     a4, a4, a2
        xor     a4, a3, a4
        sc.w    a4, a4, (a1)
        bnez    a4, .LBB0_1
        srlw    a0, a3, a0
        andi    a0, a0, 255
        ret

Ideal output

swap_zero:                              # @swap_zero
        andi    a1, a0, -4
        slli    a0, a0, 3
        li      a2, 255
        sllw    a2, a2, a0
        not     a2, a2
        amoand  a1, a2, (a1)
        srlw    a0, a1, a0
        andi    a0, a0, 255
        ret

The text was updated successfully, but these errors were encountered:

llvmbot · 2023-07-25T01:07:39Z

@llvm/issue-subscribers-backend-risc-v

riking · 2023-07-25T21:52:04Z

If you black_box the false, we get a single byte stack spill followed by nearly the desired codegen: https://rust.godbolt.org/z/dWPPbqEhP

topperc · 2023-07-25T22:36:56Z

InstCombine aggressively turns atomicrmw and with 0 into atomicrmw xchg. Maybe AtomicExpandPass needs to reverse this for partial atomics based on the target.

appujee · 2023-07-25T23:01:34Z

Do you have llc or opt command that will repro the above assembly?

From https://godbolt.org/z/54r8h56cq

llc generated __atomic_exchange_1 instead.

topperc · 2023-07-25T23:02:25Z

Do you have llc or opt command that will repro the above assembly?

From https://godbolt.org/z/54r8h56cq

llc generated __atomic_exchange_1 instead.

-mattr=+a

appujee · 2023-07-25T23:06:59Z

Will generating amoswap followed by andi reg, reg, 255 have the same semantics?

topperc · 2023-07-25T23:14:21Z

Will generating amoswap followed by andi reg, reg, 255 have the same semantics?

No. We need to write 0 to a single byte. amoswap writes 32 or 64 bits, but we don't know what value to put in the other bytes until after the read. Using amoand also writes 32 or 64 bits, but would allow to mask out the single byte that needs to be zeroed.

appujee · 2023-07-26T00:08:38Z

ah, i see there is only amoswap.w and amoswap.d versions

appujee · 2023-07-26T16:27:18Z

Just confirming that we do generate amoswap for 32 bits. https://godbolt.org/z/8MxMWe5aE

#64090 <#64090> A forthcoming patch addresses these cases.

asb · 2023-08-01T14:55:50Z

One candidate approach posted as D156801 (sorry Craig, only seeing now you self-assigned this bug last night).

llvm#64090 <llvm#64090> A forthcoming patch addresses these cases.

As noted in <llvm#64090>, it's more efficient to lower a partword 'atomicrmw xchg a, 0` to and amoand with appropriate mask. There are a range of possible ways to go about this - e.g. writing a combine based on the `llvm.riscv.masked.atomicrmw.xchg` intrinsic, or introducing a new interface to AtomicExpandPass to allow target-specific atomics conversions, or trying to lift the conversion into AtomicExpandPass itself based on querying some target hook. Ultimately I've gone with what appears to be the simplest approach - just covering this case in emitMaskedAtomicRMWIntrinsic. I perhaps should have given that hook a different name way back when it was introduced. This also handles the `atomicrmw xchg a, -1` case suggested by Craig during review. Fixes llvm#64090 Differential Revision: https://reviews.llvm.org/D156801

llvm#64090 <llvm#64090> A forthcoming patch addresses these cases.

As noted in <llvm#64090>, it's more efficient to lower a partword 'atomicrmw xchg a, 0` to and amoand with appropriate mask. There are a range of possible ways to go about this - e.g. writing a combine based on the `llvm.riscv.masked.atomicrmw.xchg` intrinsic, or introducing a new interface to AtomicExpandPass to allow target-specific atomics conversions, or trying to lift the conversion into AtomicExpandPass itself based on querying some target hook. Ultimately I've gone with what appears to be the simplest approach - just covering this case in emitMaskedAtomicRMWIntrinsic. I perhaps should have given that hook a different name way back when it was introduced. This also handles the `atomicrmw xchg a, -1` case suggested by Craig during review. Fixes llvm#64090 Differential Revision: https://reviews.llvm.org/D156801

Similar to D156801 for RISCV. Link: rust-lang/rust#114034 Link: #64090 Reviewed By: SixWeining, xen0n Differential Revision: https://reviews.llvm.org/D159252

github-actions bot added the new issue label Jul 25, 2023

EugeneZelenko added backend:RISC-V missed-optimization and removed new issue labels Jul 25, 2023

SUPERCILEX mentioned this issue Jul 29, 2023

AtomicBool layout changes to become amo biased rather than bool biased rust-lang/libs-team#249

Closed

topperc self-assigned this Aug 1, 2023

asb added a commit that referenced this issue Aug 1, 2023

[RISCV][test] Add atomicrmw test cases for suboptimal codegen report in

fdac86c

#64090 <#64090> A forthcoming patch addresses these cases.

asb closed this as completed in be0dac2 Aug 2, 2023

doru1004 pushed a commit to doru1004/llvm-project that referenced this issue Aug 3, 2023

[RISCV][test] Add atomicrmw test cases for suboptimal codegen report in

5731a40

llvm#64090 <llvm#64090> A forthcoming patch addresses these cases.

razmser pushed a commit to razmser/llvm-project that referenced this issue Sep 8, 2023

[RISCV][test] Add atomicrmw test cases for suboptimal codegen report in

4ef7bf9

llvm#64090 <llvm#64090> A forthcoming patch addresses these cases.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RISCV] Inefficient code generated for byte atomicrmw xchg when the input is a constant zero #64090

[RISCV] Inefficient code generated for byte atomicrmw xchg when the input is a constant zero #64090

Amanieu commented Jul 25, 2023

llvmbot commented Jul 25, 2023

riking commented Jul 25, 2023

topperc commented Jul 25, 2023

appujee commented Jul 25, 2023

topperc commented Jul 25, 2023

appujee commented Jul 25, 2023

topperc commented Jul 25, 2023

appujee commented Jul 26, 2023

appujee commented Jul 26, 2023

asb commented Aug 1, 2023

[RISCV] Inefficient code generated for byte atomicrmw xchg when the input is a constant zero #64090

[RISCV] Inefficient code generated for byte atomicrmw xchg when the input is a constant zero #64090

Comments

Amanieu commented Jul 25, 2023

IR

Current output

Ideal output

llvmbot commented Jul 25, 2023

riking commented Jul 25, 2023

topperc commented Jul 25, 2023

appujee commented Jul 25, 2023

topperc commented Jul 25, 2023

appujee commented Jul 25, 2023

topperc commented Jul 25, 2023

appujee commented Jul 26, 2023

appujee commented Jul 26, 2023

asb commented Aug 1, 2023