[RISCV] Missed oppurtunity in memory overlap check idiom #56518

preames · 2022-07-13T19:21:02Z

The LoopVectorizer will emit a memory overlap check of the form:

define i1 @reduced(ptr %c, ptr %a, ptr %b) {
entry:
  %b14 = ptrtoint ptr %b to i64
  %a13 = ptrtoint ptr %a to i64
  %c12 = ptrtoint ptr %c to i64
  %vscale = call i64 @llvm.vscale.i64()
  %sub2 = sub i64 %c12, %a13
  %diff.check = icmp ult i64 %sub2, %vscale
  %sub3 = sub i64 %c12, %b14
  %diff.check15 = icmp ult i64 %sub3, %vscale
  %conflict.rdx = or i1 %diff.check, %diff.check15
  ret i1 %conflict.rdx
}

declare i64 @llvm.vscale.i64()
declare void @foo()
declare void @bar()

./llc -march=riscv64 -mattr=+v,+m,+zba,+zbb < vector_overlap.ll -O3 currently results in:

	csrr	a3, vlenb
	srli	a3, a3, 3
	sub	a1, a0, a1
	sltu	a1, a1, a3
	sub	a0, a0, a2
	sltu	a0, a0, a3
	or	a0, a1, a0
	ret

Unless I'm missing something, we should be able to rewrite this as:

	csrr	a3, vlenb
	srli	a3, a3, 3
	sub	a1, a0, a1
	sub	a0, a0, a2
	minu a0, a0, a1
	sltu	a0, a0, a3
	ret

This does require zbb, but the command line above explicitly includes that.

Separately, there appears to be something weird going on block placement and branch inversion. Compare the following inputs and outputs:

define void @test(ptr %c, ptr %a, ptr %b) {
entry:
  %b14 = ptrtoint ptr %b to i64
  %a13 = ptrtoint ptr %a to i64
  %c12 = ptrtoint ptr %c to i64
  %vscale = call i64 @llvm.vscale.i64()
  %sub2 = sub i64 %c12, %a13
  %diff.check = icmp ult i64 %sub2, %vscale
  %sub3 = sub i64 %c12, %b14
  %diff.check15 = icmp ult i64 %sub3, %vscale
  %conflict.rdx = or i1 %diff.check, %diff.check15
  br i1 %conflict.rdx, label %taken, label %untaken

taken:
  call void @foo()
  ret void

untaken:
  call void @bar()
  ret void
}

define void @test2(ptr %c, ptr %a, ptr %b) {
entry:
  %b14 = ptrtoint ptr %b to i64
  %a13 = ptrtoint ptr %a to i64
  %c12 = ptrtoint ptr %c to i64
  %vscale = call i64 @llvm.vscale.i64()
  %sub2 = sub i64 %c12, %a13
  %diff.check = icmp ult i64 %sub2, %vscale
  %sub3 = sub i64 %c12, %b14
  %diff.check15 = icmp ult i64 %sub3, %vscale
  %conflict.rdx = or i1 %diff.check, %diff.check15
  br i1 %conflict.rdx, label %taken, label %untaken

untaken:
  call void @bar()
  ret void

taken:
  call void @foo()
  ret void
}

declare i64 @llvm.vscale.i64()
declare void @foo()
declare void @bar()

Produces:

est:                                   # @test
	.cfi_startproc
# %bb.0:                                # %entry
	addi	sp, sp, -16
	.cfi_def_cfa_offset 16
	sd	ra, 8(sp)                       # 8-byte Folded Spill
	.cfi_offset ra, -8
	csrr	a3, vlenb
	srli	a3, a3, 3
	sub	a1, a0, a1
	sltu	a1, a1, a3
	xori	a1, a1, 1 # <-- huh?
	sub	a0, a0, a2
	sltu	a0, a0, a3
	xori	a0, a0, 1 # <-- huh?
	and	a0, a1, a0  # <-- huh?
	bnez	a0, .LBB0_2
# %bb.1:                                # %taken
	call	foo@plt
	ld	ra, 8(sp)                       # 8-byte Folded Reload
	addi	sp, sp, 16
	ret
.LBB0_2:                                # %untaken
	call	bar@plt
	ld	ra, 8(sp)                       # 8-byte Folded Reload
	addi	sp, sp, 16
	ret
.Lfunc_end0:
	.size	test, .Lfunc_end0-test
	.cfi_endproc
                                        # -- End function
	.globl	test2                           # -- Begin function test2
	.p2align	2
	.type	test2,@function
test2:                                  # @test2
	.cfi_startproc
# %bb.0:                                # %entry
	addi	sp, sp, -16
	.cfi_def_cfa_offset 16
	sd	ra, 8(sp)                       # 8-byte Folded Spill
	.cfi_offset ra, -8
	csrr	a3, vlenb
	srli	a3, a3, 3
	sub	a1, a0, a1
	sltu	a1, a1, a3
	sub	a0, a0, a2
	sltu	a0, a0, a3
	or	a0, a1, a0
	beqz	a0, .LBB1_2
# %bb.1:                                # %taken
	call	foo@plt
	ld	ra, 8(sp)                       # 8-byte Folded Reload
	addi	sp, sp, 16
	ret
.LBB1_2:                                # %untaken
	call	bar@plt
	ld	ra, 8(sp)                       # 8-byte Folded Reload
	addi	sp, sp, 16
	ret

I believe the second is a separate issue. I'm filing them together only because I'm not sure if this is related to the first one somehow. We can split it into its own bug if it turns out not.

The text was updated successfully, but these errors were encountered:

llvmbot · 2022-07-13T19:21:28Z

@llvm/issue-subscribers-backend-risc-v

topperc · 2022-07-13T19:30:36Z

The xori with 1 come from isel because we don't have sgeu. So we invert an sltu with xori. There's nothing downstream that can apply demorgan to it.

iabg-sc · 2022-09-06T15:52:55Z

I am investigating this issue and have already created patterns for @reduced @test and @test2. For @reduced and @test2

sltu	a1, a1, a3
sltu	a0, a0, a3
or     a0, a1, a0

can be changed directly to

minu  a0, a0, a1
sltu  a0, a0, a3

but in case @test condition was changed with De Morgan's Law. ((a < c) or (b < c)) changes to !((a >= c) and (b >= c)) and can therefore be changed to min(a, b) >= c. Since we do not have sgeu, the logic above can be changed to (min(a, b) < c) xor 1.

Additionally, there are lots of patterns to apply this kind of optimization.

Old	New
`i = a < c; j = b < c; res = i or j`	`min(a, b) < c`
`i = a < c; j = b < c; res = i and j`	`max(a, b) < c`
`i = a > c; j = b > c; res = i or j`	`max(a, b) > c`
`i = a > c; j = b > c; res = i and j`	`min(a, b) > c`

Plus, a < b is the same as b > a which gives 4x4=16 variants. And the same with non-strict comparison which gives 32 variants. I am not sure if we have to handle all these cases, as it mostly copy-paste.

iabg-sc · 2022-09-20T15:34:47Z

Patch with possible cases on review:
https://reviews.llvm.org/D134277

iabg-sc · 2022-12-23T15:21:18Z

Done.
Commit

preames added backend:RISC-V performance labels Jul 13, 2022

EugeneZelenko closed this as completed Dec 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RISCV] Missed oppurtunity in memory overlap check idiom #56518

[RISCV] Missed oppurtunity in memory overlap check idiom #56518

preames commented Jul 13, 2022 •

edited by VoltrexKeyva

llvmbot commented Jul 13, 2022

topperc commented Jul 13, 2022

iabg-sc commented Sep 6, 2022

iabg-sc commented Sep 20, 2022

iabg-sc commented Dec 23, 2022

[RISCV] Missed oppurtunity in memory overlap check idiom #56518

[RISCV] Missed oppurtunity in memory overlap check idiom #56518

Comments

preames commented Jul 13, 2022 • edited by VoltrexKeyva

llvmbot commented Jul 13, 2022

topperc commented Jul 13, 2022

iabg-sc commented Sep 6, 2022

iabg-sc commented Sep 20, 2022

iabg-sc commented Dec 23, 2022

preames commented Jul 13, 2022 •

edited by VoltrexKeyva