Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISCV] Missed oppurtunity in memory overlap check idiom #56518

Closed
preames opened this issue Jul 13, 2022 · 5 comments
Closed

[RISCV] Missed oppurtunity in memory overlap check idiom #56518

preames opened this issue Jul 13, 2022 · 5 comments

Comments

@preames
Copy link
Collaborator

preames commented Jul 13, 2022

The LoopVectorizer will emit a memory overlap check of the form:

define i1 @reduced(ptr %c, ptr %a, ptr %b) {
entry:
  %b14 = ptrtoint ptr %b to i64
  %a13 = ptrtoint ptr %a to i64
  %c12 = ptrtoint ptr %c to i64
  %vscale = call i64 @llvm.vscale.i64()
  %sub2 = sub i64 %c12, %a13
  %diff.check = icmp ult i64 %sub2, %vscale
  %sub3 = sub i64 %c12, %b14
  %diff.check15 = icmp ult i64 %sub3, %vscale
  %conflict.rdx = or i1 %diff.check, %diff.check15
  ret i1 %conflict.rdx
}

declare i64 @llvm.vscale.i64()
declare void @foo()
declare void @bar()

./llc -march=riscv64 -mattr=+v,+m,+zba,+zbb < vector_overlap.ll -O3 currently results in:

	csrr	a3, vlenb
	srli	a3, a3, 3
	sub	a1, a0, a1
	sltu	a1, a1, a3
	sub	a0, a0, a2
	sltu	a0, a0, a3
	or	a0, a1, a0
	ret

Unless I'm missing something, we should be able to rewrite this as:

	csrr	a3, vlenb
	srli	a3, a3, 3
	sub	a1, a0, a1
	sub	a0, a0, a2
	minu a0, a0, a1
	sltu	a0, a0, a3
	ret

This does require zbb, but the command line above explicitly includes that.

Separately, there appears to be something weird going on block placement and branch inversion. Compare the following inputs and outputs:

define void @test(ptr %c, ptr %a, ptr %b) {
entry:
  %b14 = ptrtoint ptr %b to i64
  %a13 = ptrtoint ptr %a to i64
  %c12 = ptrtoint ptr %c to i64
  %vscale = call i64 @llvm.vscale.i64()
  %sub2 = sub i64 %c12, %a13
  %diff.check = icmp ult i64 %sub2, %vscale
  %sub3 = sub i64 %c12, %b14
  %diff.check15 = icmp ult i64 %sub3, %vscale
  %conflict.rdx = or i1 %diff.check, %diff.check15
  br i1 %conflict.rdx, label %taken, label %untaken

taken:
  call void @foo()
  ret void

untaken:
  call void @bar()
  ret void
}

define void @test2(ptr %c, ptr %a, ptr %b) {
entry:
  %b14 = ptrtoint ptr %b to i64
  %a13 = ptrtoint ptr %a to i64
  %c12 = ptrtoint ptr %c to i64
  %vscale = call i64 @llvm.vscale.i64()
  %sub2 = sub i64 %c12, %a13
  %diff.check = icmp ult i64 %sub2, %vscale
  %sub3 = sub i64 %c12, %b14
  %diff.check15 = icmp ult i64 %sub3, %vscale
  %conflict.rdx = or i1 %diff.check, %diff.check15
  br i1 %conflict.rdx, label %taken, label %untaken

untaken:
  call void @bar()
  ret void

taken:
  call void @foo()
  ret void
}

declare i64 @llvm.vscale.i64()
declare void @foo()
declare void @bar()

Produces:

est:                                   # @test
	.cfi_startproc
# %bb.0:                                # %entry
	addi	sp, sp, -16
	.cfi_def_cfa_offset 16
	sd	ra, 8(sp)                       # 8-byte Folded Spill
	.cfi_offset ra, -8
	csrr	a3, vlenb
	srli	a3, a3, 3
	sub	a1, a0, a1
	sltu	a1, a1, a3
	xori	a1, a1, 1 # <-- huh?
	sub	a0, a0, a2
	sltu	a0, a0, a3
	xori	a0, a0, 1 # <-- huh?
	and	a0, a1, a0  # <-- huh?
	bnez	a0, .LBB0_2
# %bb.1:                                # %taken
	call	foo@plt
	ld	ra, 8(sp)                       # 8-byte Folded Reload
	addi	sp, sp, 16
	ret
.LBB0_2:                                # %untaken
	call	bar@plt
	ld	ra, 8(sp)                       # 8-byte Folded Reload
	addi	sp, sp, 16
	ret
.Lfunc_end0:
	.size	test, .Lfunc_end0-test
	.cfi_endproc
                                        # -- End function
	.globl	test2                           # -- Begin function test2
	.p2align	2
	.type	test2,@function
test2:                                  # @test2
	.cfi_startproc
# %bb.0:                                # %entry
	addi	sp, sp, -16
	.cfi_def_cfa_offset 16
	sd	ra, 8(sp)                       # 8-byte Folded Spill
	.cfi_offset ra, -8
	csrr	a3, vlenb
	srli	a3, a3, 3
	sub	a1, a0, a1
	sltu	a1, a1, a3
	sub	a0, a0, a2
	sltu	a0, a0, a3
	or	a0, a1, a0
	beqz	a0, .LBB1_2
# %bb.1:                                # %taken
	call	foo@plt
	ld	ra, 8(sp)                       # 8-byte Folded Reload
	addi	sp, sp, 16
	ret
.LBB1_2:                                # %untaken
	call	bar@plt
	ld	ra, 8(sp)                       # 8-byte Folded Reload
	addi	sp, sp, 16
	ret

I believe the second is a separate issue. I'm filing them together only because I'm not sure if this is related to the first one somehow. We can split it into its own bug if it turns out not.

@llvmbot
Copy link
Collaborator

llvmbot commented Jul 13, 2022

@llvm/issue-subscribers-backend-risc-v

@topperc
Copy link
Collaborator

topperc commented Jul 13, 2022

The xori with 1 come from isel because we don't have sgeu. So we invert an sltu with xori. There's nothing downstream that can apply demorgan to it.

@iabg-sc
Copy link
Contributor

iabg-sc commented Sep 6, 2022

I am investigating this issue and have already created patterns for @reduced @test and @test2. For @reduced and @test2

sltu	a1, a1, a3
sltu	a0, a0, a3
or     a0, a1, a0

can be changed directly to

minu  a0, a0, a1
sltu  a0, a0, a3

but in case @test condition was changed with De Morgan's Law. ((a < c) or (b < c)) changes to !((a >= c) and (b >= c)) and can therefore be changed to min(a, b) >= c. Since we do not have sgeu, the logic above can be changed to (min(a, b) < c) xor 1.

Additionally, there are lots of patterns to apply this kind of optimization.

Old New
i = a < c; j = b < c; res = i or j min(a, b) < c
i = a < c; j = b < c; res = i and j max(a, b) < c
i = a > c; j = b > c; res = i or j max(a, b) > c
i = a > c; j = b > c; res = i and j min(a, b) > c

Plus, a < b is the same as b > a which gives 4x4=16 variants. And the same with non-strict comparison which gives 32 variants. I am not sure if we have to handle all these cases, as it mostly copy-paste.

@iabg-sc
Copy link
Contributor

iabg-sc commented Sep 20, 2022

Patch with possible cases on review:
https://reviews.llvm.org/D134277

@iabg-sc
Copy link
Contributor

iabg-sc commented Dec 23, 2022

Done.
Commit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants