-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[RISCV][POC] Should we be using ADD for disjoint or? #155669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is at the moment mostly a prompt for discussion as opposed to a patch I intend to land. I happened to notice that c.add allows more registers than the c.or instruction. Thus, by using ADD instead of OR for a disjoint OR, we have the possibility of emitting more compressible instructions. I can see that this does happen in a couple places in the tests, but I don't have a compelling example or anything. If we wanted to do this with less test churn and confusion, we could do something like introduce a OR_DISJOINT pseudo, and lower it to either OR or ADD extremely late based on which registers got used..
@llvm/pr-subscribers-backend-risc-v Author: Philip Reames (preames) ChangesThis is at the moment mostly a prompt for discussion as opposed to a patch I intend to land. I happened to notice that c.add allows more registers than the c.or instruction. Thus, by using ADD instead of OR for a disjoint OR, we have the possibility of emitting more compressible instructions. I can see that this does happen in a couple places in the tests, but I don't have a compelling example or anything. If we wanted to do this with less test churn and confusion, we could do something like introduce a OR_DISJOINT pseudo, and lower it to either OR or ADD extremely late based on which registers got used.. Patch is 501.67 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/155669.diff 82 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.td b/llvm/lib/Target/RISCV/RISCVInstrInfo.td
index 23f5a848137c4..5b0973f13eb58 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.td
@@ -1465,11 +1465,12 @@ def : PatGprUimmLog2XLen<shl, SLLI>;
def : PatGprUimmLog2XLen<srl, SRLI>;
def : PatGprUimmLog2XLen<sra, SRAI>;
-// Select 'or' as ADDI if the immediate bits are known to be 0 in $rs1. This
-// can improve compressibility.
def riscv_or_disjoint : PatFrag<(ops node:$lhs, node:$rhs), (or node:$lhs, node:$rhs),[{
return orDisjoint(N);
}]>;
+// Select 'or' as ADD or ADDI if known disjoint. There is no c.ori, and c.add allows
+// more registers than c.or.
+def : PatGprGpr<riscv_or_disjoint, ADD>;
def : PatGprSimm12<riscv_or_disjoint, ADDI>;
def add_like : PatFrags<(ops node:$lhs, node:$rhs),
diff --git a/llvm/test/CodeGen/RISCV/add-before-shl.ll b/llvm/test/CodeGen/RISCV/add-before-shl.ll
index 35a39b89a2cb7..4cb25fcaf1286 100644
--- a/llvm/test/CodeGen/RISCV/add-before-shl.ll
+++ b/llvm/test/CodeGen/RISCV/add-before-shl.ll
@@ -172,14 +172,14 @@ define i128 @add_wide_operand(i128 %a) nounwind {
; RV32I-NEXT: srli a5, a2, 29
; RV32I-NEXT: slli a6, a3, 3
; RV32I-NEXT: srli a3, a3, 29
-; RV32I-NEXT: or a5, a6, a5
+; RV32I-NEXT: add a5, a6, a5
; RV32I-NEXT: slli a6, a4, 3
-; RV32I-NEXT: or a3, a6, a3
+; RV32I-NEXT: add a3, a6, a3
; RV32I-NEXT: lui a6, 128
; RV32I-NEXT: srli a4, a4, 29
; RV32I-NEXT: slli a1, a1, 3
; RV32I-NEXT: slli a2, a2, 3
-; RV32I-NEXT: or a1, a1, a4
+; RV32I-NEXT: add a1, a1, a4
; RV32I-NEXT: add a1, a1, a6
; RV32I-NEXT: sw a2, 0(a0)
; RV32I-NEXT: sw a5, 4(a0)
@@ -192,7 +192,7 @@ define i128 @add_wide_operand(i128 %a) nounwind {
; RV64I-NEXT: srli a2, a0, 61
; RV64I-NEXT: slli a1, a1, 3
; RV64I-NEXT: slli a0, a0, 3
-; RV64I-NEXT: or a1, a1, a2
+; RV64I-NEXT: add a1, a1, a2
; RV64I-NEXT: addi a2, zero, 1
; RV64I-NEXT: slli a2, a2, 51
; RV64I-NEXT: add a1, a1, a2
@@ -208,18 +208,18 @@ define i128 @add_wide_operand(i128 %a) nounwind {
; RV32C-NEXT: add a6, a4, a5
; RV32C-NEXT: srli a5, a2, 29
; RV32C-NEXT: slli a4, a3, 3
-; RV32C-NEXT: c.or a4, a5
+; RV32C-NEXT: c.add a4, a5
; RV32C-NEXT: srli a5, a1, 29
; RV32C-NEXT: c.srli a3, 29
; RV32C-NEXT: c.slli a1, 3
; RV32C-NEXT: c.slli a2, 3
; RV32C-NEXT: c.slli a6, 3
-; RV32C-NEXT: c.or a1, a3
-; RV32C-NEXT: or a3, a6, a5
+; RV32C-NEXT: c.add a1, a3
+; RV32C-NEXT: c.add a5, a6
; RV32C-NEXT: c.sw a2, 0(a0)
; RV32C-NEXT: c.sw a4, 4(a0)
; RV32C-NEXT: c.sw a1, 8(a0)
-; RV32C-NEXT: c.sw a3, 12(a0)
+; RV32C-NEXT: c.sw a5, 12(a0)
; RV32C-NEXT: c.jr ra
;
; RV64C-LABEL: add_wide_operand:
@@ -227,7 +227,7 @@ define i128 @add_wide_operand(i128 %a) nounwind {
; RV64C-NEXT: srli a2, a0, 61
; RV64C-NEXT: c.slli a1, 3
; RV64C-NEXT: c.slli a0, 3
-; RV64C-NEXT: c.or a1, a2
+; RV64C-NEXT: c.add a1, a2
; RV64C-NEXT: c.li a2, 1
; RV64C-NEXT: c.slli a2, 51
; RV64C-NEXT: c.add a1, a2
diff --git a/llvm/test/CodeGen/RISCV/addcarry.ll b/llvm/test/CodeGen/RISCV/addcarry.ll
index ff0d1e75c746c..4ba8ba9701c17 100644
--- a/llvm/test/CodeGen/RISCV/addcarry.ll
+++ b/llvm/test/CodeGen/RISCV/addcarry.ll
@@ -32,13 +32,13 @@ define i64 @addcarry(i64 %x, i64 %y) nounwind {
; RISCV32-NEXT: # %bb.3:
; RISCV32-NEXT: sub a5, a5, a0
; RISCV32-NEXT: .LBB0_4:
-; RISCV32-NEXT: slli a5, a5, 30
-; RISCV32-NEXT: srli a1, a4, 2
+; RISCV32-NEXT: slli a1, a5, 30
+; RISCV32-NEXT: srli a3, a4, 2
; RISCV32-NEXT: slli a4, a4, 30
; RISCV32-NEXT: mul a0, a0, a2
-; RISCV32-NEXT: or a1, a5, a1
+; RISCV32-NEXT: add a1, a1, a3
; RISCV32-NEXT: srli a0, a0, 2
-; RISCV32-NEXT: or a0, a4, a0
+; RISCV32-NEXT: add a0, a4, a0
; RISCV32-NEXT: ret
%tmp = call i64 @llvm.smul.fix.i64(i64 %x, i64 %y, i32 2);
ret i64 %tmp;
diff --git a/llvm/test/CodeGen/RISCV/alu64.ll b/llvm/test/CodeGen/RISCV/alu64.ll
index c7938a718de70..eac57b8f0c5f2 100644
--- a/llvm/test/CodeGen/RISCV/alu64.ll
+++ b/llvm/test/CodeGen/RISCV/alu64.ll
@@ -120,7 +120,7 @@ define i64 @slli(i64 %a) nounwind {
; RV32I: # %bb.0:
; RV32I-NEXT: srli a2, a0, 25
; RV32I-NEXT: slli a1, a1, 7
-; RV32I-NEXT: or a1, a1, a2
+; RV32I-NEXT: add a1, a1, a2
; RV32I-NEXT: slli a0, a0, 7
; RV32I-NEXT: ret
%1 = shl i64 %a, 7
@@ -137,7 +137,7 @@ define i64 @srli(i64 %a) nounwind {
; RV32I: # %bb.0:
; RV32I-NEXT: slli a2, a1, 24
; RV32I-NEXT: srli a0, a0, 8
-; RV32I-NEXT: or a0, a0, a2
+; RV32I-NEXT: add a0, a0, a2
; RV32I-NEXT: srli a1, a1, 8
; RV32I-NEXT: ret
%1 = lshr i64 %a, 8
@@ -154,7 +154,7 @@ define i64 @srai(i64 %a) nounwind {
; RV32I: # %bb.0:
; RV32I-NEXT: slli a2, a1, 23
; RV32I-NEXT: srli a0, a0, 9
-; RV32I-NEXT: or a0, a0, a2
+; RV32I-NEXT: add a0, a0, a2
; RV32I-NEXT: srai a1, a1, 9
; RV32I-NEXT: ret
%1 = ashr i64 %a, 9
diff --git a/llvm/test/CodeGen/RISCV/and-negpow2-cmp.ll b/llvm/test/CodeGen/RISCV/and-negpow2-cmp.ll
index b16672d3c4a16..ecdcbbe043d9a 100644
--- a/llvm/test/CodeGen/RISCV/and-negpow2-cmp.ll
+++ b/llvm/test/CodeGen/RISCV/and-negpow2-cmp.ll
@@ -8,7 +8,7 @@ define i1 @test1(i64 %x) {
; RV32-NEXT: slli a2, a1, 2
; RV32-NEXT: srli a0, a0, 30
; RV32-NEXT: srai a1, a1, 30
-; RV32-NEXT: or a0, a0, a2
+; RV32-NEXT: add a0, a0, a2
; RV32-NEXT: xori a0, a0, -2
; RV32-NEXT: not a1, a1
; RV32-NEXT: or a0, a0, a1
diff --git a/llvm/test/CodeGen/RISCV/avgceils.ll b/llvm/test/CodeGen/RISCV/avgceils.ll
index 64410fad6029a..6e3fe4ace89af 100644
--- a/llvm/test/CodeGen/RISCV/avgceils.ll
+++ b/llvm/test/CodeGen/RISCV/avgceils.ll
@@ -189,7 +189,7 @@ define i64 @test_fixed_i64(i64 %a0, i64 %a1) nounwind {
; RV32I-NEXT: slli a1, a1, 31
; RV32I-NEXT: srli a3, a3, 1
; RV32I-NEXT: sub a4, a4, a2
-; RV32I-NEXT: or a3, a3, a1
+; RV32I-NEXT: add a3, a3, a1
; RV32I-NEXT: sltu a1, a0, a3
; RV32I-NEXT: sub a1, a4, a1
; RV32I-NEXT: sub a0, a0, a3
@@ -220,7 +220,7 @@ define i64 @test_ext_i64(i64 %a0, i64 %a1) nounwind {
; RV32I-NEXT: slli a1, a1, 31
; RV32I-NEXT: srli a3, a3, 1
; RV32I-NEXT: sub a4, a4, a2
-; RV32I-NEXT: or a3, a3, a1
+; RV32I-NEXT: add a3, a3, a1
; RV32I-NEXT: sltu a1, a0, a3
; RV32I-NEXT: sub a1, a4, a1
; RV32I-NEXT: sub a0, a0, a3
diff --git a/llvm/test/CodeGen/RISCV/avgceilu.ll b/llvm/test/CodeGen/RISCV/avgceilu.ll
index 1c1d1cbfd12cb..3bde33d51f329 100644
--- a/llvm/test/CodeGen/RISCV/avgceilu.ll
+++ b/llvm/test/CodeGen/RISCV/avgceilu.ll
@@ -185,7 +185,7 @@ define i64 @test_fixed_i64(i64 %a0, i64 %a1) nounwind {
; RV32I-NEXT: slli a1, a1, 31
; RV32I-NEXT: srli a3, a3, 1
; RV32I-NEXT: sub a4, a4, a2
-; RV32I-NEXT: or a3, a3, a1
+; RV32I-NEXT: add a3, a3, a1
; RV32I-NEXT: sltu a1, a0, a3
; RV32I-NEXT: sub a1, a4, a1
; RV32I-NEXT: sub a0, a0, a3
@@ -216,7 +216,7 @@ define i64 @test_ext_i64(i64 %a0, i64 %a1) nounwind {
; RV32I-NEXT: slli a1, a1, 31
; RV32I-NEXT: srli a3, a3, 1
; RV32I-NEXT: sub a4, a4, a2
-; RV32I-NEXT: or a3, a3, a1
+; RV32I-NEXT: add a3, a3, a1
; RV32I-NEXT: sltu a1, a0, a3
; RV32I-NEXT: sub a1, a4, a1
; RV32I-NEXT: sub a0, a0, a3
diff --git a/llvm/test/CodeGen/RISCV/avgfloors.ll b/llvm/test/CodeGen/RISCV/avgfloors.ll
index b321f4c2f2939..4de15fb0e7220 100644
--- a/llvm/test/CodeGen/RISCV/avgfloors.ll
+++ b/llvm/test/CodeGen/RISCV/avgfloors.ll
@@ -175,7 +175,7 @@ define i64 @test_fixed_i64(i64 %a0, i64 %a1) nounwind {
; RV32I-NEXT: xor a4, a0, a2
; RV32I-NEXT: slli a1, a1, 31
; RV32I-NEXT: srli a4, a4, 1
-; RV32I-NEXT: or a1, a4, a1
+; RV32I-NEXT: add a1, a4, a1
; RV32I-NEXT: and a2, a0, a2
; RV32I-NEXT: add a0, a2, a1
; RV32I-NEXT: sltu a1, a0, a2
@@ -206,7 +206,7 @@ define i64 @test_ext_i64(i64 %a0, i64 %a1) nounwind {
; RV32I-NEXT: xor a4, a0, a2
; RV32I-NEXT: slli a1, a1, 31
; RV32I-NEXT: srli a4, a4, 1
-; RV32I-NEXT: or a1, a4, a1
+; RV32I-NEXT: add a1, a4, a1
; RV32I-NEXT: and a2, a0, a2
; RV32I-NEXT: add a0, a2, a1
; RV32I-NEXT: sltu a1, a0, a2
diff --git a/llvm/test/CodeGen/RISCV/avgflooru.ll b/llvm/test/CodeGen/RISCV/avgflooru.ll
index 2e56f3359434c..ef1867e6d049a 100644
--- a/llvm/test/CodeGen/RISCV/avgflooru.ll
+++ b/llvm/test/CodeGen/RISCV/avgflooru.ll
@@ -176,8 +176,8 @@ define i64 @test_fixed_i64(i64 %a0, i64 %a1) nounwind {
; RV32I-NEXT: srli a3, a1, 1
; RV32I-NEXT: slli a4, a1, 31
; RV32I-NEXT: srli a0, a0, 1
-; RV32I-NEXT: or a1, a3, a2
-; RV32I-NEXT: or a0, a0, a4
+; RV32I-NEXT: add a1, a3, a2
+; RV32I-NEXT: add a0, a0, a4
; RV32I-NEXT: ret
;
; RV64I-LABEL: test_fixed_i64:
@@ -209,8 +209,8 @@ define i64 @test_ext_i64(i64 %a0, i64 %a1) nounwind {
; RV32I-NEXT: srli a3, a1, 1
; RV32I-NEXT: slli a4, a1, 31
; RV32I-NEXT: srli a0, a0, 1
-; RV32I-NEXT: or a1, a3, a2
-; RV32I-NEXT: or a0, a0, a4
+; RV32I-NEXT: add a1, a3, a2
+; RV32I-NEXT: add a0, a0, a4
; RV32I-NEXT: ret
;
; RV64I-LABEL: test_ext_i64:
diff --git a/llvm/test/CodeGen/RISCV/bfloat-arith.ll b/llvm/test/CodeGen/RISCV/bfloat-arith.ll
index 871b43e61df50..c5ed98dee861d 100644
--- a/llvm/test/CodeGen/RISCV/bfloat-arith.ll
+++ b/llvm/test/CodeGen/RISCV/bfloat-arith.ll
@@ -79,7 +79,7 @@ define bfloat @fsgnj_bf16(bfloat %a, bfloat %b) nounwind {
; RV32IZFBFMIN-NEXT: fmv.x.h a1, fa0
; RV32IZFBFMIN-NEXT: slli a1, a1, 17
; RV32IZFBFMIN-NEXT: srli a1, a1, 17
-; RV32IZFBFMIN-NEXT: or a0, a1, a0
+; RV32IZFBFMIN-NEXT: add a0, a1, a0
; RV32IZFBFMIN-NEXT: fmv.h.x fa0, a0
; RV32IZFBFMIN-NEXT: ret
;
@@ -91,7 +91,7 @@ define bfloat @fsgnj_bf16(bfloat %a, bfloat %b) nounwind {
; RV64IZFBFMIN-NEXT: fmv.x.h a1, fa0
; RV64IZFBFMIN-NEXT: slli a1, a1, 49
; RV64IZFBFMIN-NEXT: srli a1, a1, 49
-; RV64IZFBFMIN-NEXT: or a0, a1, a0
+; RV64IZFBFMIN-NEXT: add a0, a1, a0
; RV64IZFBFMIN-NEXT: fmv.h.x fa0, a0
; RV64IZFBFMIN-NEXT: ret
%1 = call bfloat @llvm.copysign.bf16(bfloat %a, bfloat %b)
@@ -133,7 +133,7 @@ define bfloat @fsgnjn_bf16(bfloat %a, bfloat %b) nounwind {
; RV32IZFBFMIN-NEXT: fmv.x.h a1, fa0
; RV32IZFBFMIN-NEXT: slli a1, a1, 17
; RV32IZFBFMIN-NEXT: srli a1, a1, 17
-; RV32IZFBFMIN-NEXT: or a0, a1, a0
+; RV32IZFBFMIN-NEXT: add a0, a1, a0
; RV32IZFBFMIN-NEXT: fmv.h.x fa0, a0
; RV32IZFBFMIN-NEXT: ret
;
@@ -150,7 +150,7 @@ define bfloat @fsgnjn_bf16(bfloat %a, bfloat %b) nounwind {
; RV64IZFBFMIN-NEXT: fmv.x.h a1, fa0
; RV64IZFBFMIN-NEXT: slli a1, a1, 49
; RV64IZFBFMIN-NEXT: srli a1, a1, 49
-; RV64IZFBFMIN-NEXT: or a0, a1, a0
+; RV64IZFBFMIN-NEXT: add a0, a1, a0
; RV64IZFBFMIN-NEXT: fmv.h.x fa0, a0
; RV64IZFBFMIN-NEXT: ret
%1 = fadd bfloat %a, %b
diff --git a/llvm/test/CodeGen/RISCV/bswap-bitreverse.ll b/llvm/test/CodeGen/RISCV/bswap-bitreverse.ll
index 1605e686e9177..e56e57b88a0be 100644
--- a/llvm/test/CodeGen/RISCV/bswap-bitreverse.ll
+++ b/llvm/test/CodeGen/RISCV/bswap-bitreverse.ll
@@ -26,7 +26,7 @@ define i16 @test_bswap_i16(i16 %a) nounwind {
; RV32I-NEXT: slli a1, a0, 8
; RV32I-NEXT: slli a0, a0, 16
; RV32I-NEXT: srli a0, a0, 24
-; RV32I-NEXT: or a0, a1, a0
+; RV32I-NEXT: add a0, a1, a0
; RV32I-NEXT: ret
;
; RV64I-LABEL: test_bswap_i16:
@@ -34,7 +34,7 @@ define i16 @test_bswap_i16(i16 %a) nounwind {
; RV64I-NEXT: slli a1, a0, 8
; RV64I-NEXT: slli a0, a0, 48
; RV64I-NEXT: srli a0, a0, 56
-; RV64I-NEXT: or a0, a1, a0
+; RV64I-NEXT: add a0, a1, a0
; RV64I-NEXT: ret
;
; RV32ZB-LABEL: test_bswap_i16:
@@ -61,11 +61,11 @@ define i32 @test_bswap_i32(i32 %a) nounwind {
; RV32I-NEXT: addi a2, a2, -256
; RV32I-NEXT: and a1, a1, a2
; RV32I-NEXT: and a2, a0, a2
-; RV32I-NEXT: or a1, a1, a3
+; RV32I-NEXT: add a1, a1, a3
; RV32I-NEXT: slli a2, a2, 8
; RV32I-NEXT: slli a0, a0, 24
-; RV32I-NEXT: or a0, a0, a2
-; RV32I-NEXT: or a0, a0, a1
+; RV32I-NEXT: add a0, a0, a2
+; RV32I-NEXT: add a0, a0, a1
; RV32I-NEXT: ret
;
; RV64I-LABEL: test_bswap_i32:
@@ -76,11 +76,11 @@ define i32 @test_bswap_i32(i32 %a) nounwind {
; RV64I-NEXT: addi a2, a2, -256
; RV64I-NEXT: and a1, a1, a2
; RV64I-NEXT: and a2, a0, a2
-; RV64I-NEXT: or a1, a1, a3
+; RV64I-NEXT: add a1, a1, a3
; RV64I-NEXT: slli a2, a2, 8
; RV64I-NEXT: slliw a0, a0, 24
-; RV64I-NEXT: or a0, a0, a2
-; RV64I-NEXT: or a0, a0, a1
+; RV64I-NEXT: add a0, a0, a2
+; RV64I-NEXT: add a0, a0, a1
; RV64I-NEXT: ret
;
; RV32ZB-LABEL: test_bswap_i32:
@@ -106,20 +106,20 @@ define i64 @test_bswap_i64(i64 %a) nounwind {
; RV32I-NEXT: srli a5, a0, 8
; RV32I-NEXT: addi a3, a3, -256
; RV32I-NEXT: and a2, a2, a3
-; RV32I-NEXT: or a2, a2, a4
+; RV32I-NEXT: add a2, a2, a4
; RV32I-NEXT: srli a4, a0, 24
; RV32I-NEXT: and a5, a5, a3
-; RV32I-NEXT: or a4, a5, a4
+; RV32I-NEXT: add a4, a5, a4
; RV32I-NEXT: slli a5, a1, 24
; RV32I-NEXT: and a1, a1, a3
; RV32I-NEXT: slli a1, a1, 8
-; RV32I-NEXT: or a1, a5, a1
+; RV32I-NEXT: add a1, a5, a1
; RV32I-NEXT: and a3, a0, a3
; RV32I-NEXT: slli a0, a0, 24
; RV32I-NEXT: slli a3, a3, 8
-; RV32I-NEXT: or a3, a0, a3
-; RV32I-NEXT: or a0, a1, a2
-; RV32I-NEXT: or a1, a3, a4
+; RV32I-NEXT: add a3, a0, a3
+; RV32I-NEXT: add a0, a1, a2
+; RV32I-NEXT: add a1, a3, a4
; RV32I-NEXT: ret
;
; RV64I-LABEL: test_bswap_i64:
@@ -131,24 +131,24 @@ define i64 @test_bswap_i64(i64 %a) nounwind {
; RV64I-NEXT: lui a5, 4080
; RV64I-NEXT: addi a2, a2, -256
; RV64I-NEXT: and a1, a1, a2
-; RV64I-NEXT: or a1, a1, a3
+; RV64I-NEXT: add a1, a1, a3
; RV64I-NEXT: srli a3, a0, 8
; RV64I-NEXT: and a4, a4, a5
; RV64I-NEXT: srliw a3, a3, 24
; RV64I-NEXT: slli a3, a3, 24
-; RV64I-NEXT: or a3, a3, a4
+; RV64I-NEXT: add a3, a3, a4
; RV64I-NEXT: srliw a4, a0, 24
; RV64I-NEXT: and a5, a0, a5
; RV64I-NEXT: and a2, a0, a2
; RV64I-NEXT: slli a0, a0, 56
; RV64I-NEXT: slli a4, a4, 32
; RV64I-NEXT: slli a5, a5, 24
-; RV64I-NEXT: or a4, a5, a4
+; RV64I-NEXT: add a4, a5, a4
; RV64I-NEXT: slli a2, a2, 40
-; RV64I-NEXT: or a1, a3, a1
-; RV64I-NEXT: or a0, a0, a2
-; RV64I-NEXT: or a0, a0, a4
-; RV64I-NEXT: or a0, a0, a1
+; RV64I-NEXT: add a1, a3, a1
+; RV64I-NEXT: add a0, a0, a2
+; RV64I-NEXT: add a0, a0, a4
+; RV64I-NEXT: add a0, a0, a1
; RV64I-NEXT: ret
;
; RV32ZB-LABEL: test_bswap_i64:
@@ -176,31 +176,31 @@ define i7 @test_bitreverse_i7(i7 %a) nounwind {
; RV32I-NEXT: and a1, a1, a2
; RV32I-NEXT: and a2, a0, a2
; RV32I-NEXT: slli a0, a0, 24
-; RV32I-NEXT: or a1, a1, a3
+; RV32I-NEXT: add a1, a1, a3
; RV32I-NEXT: lui a3, 61681
; RV32I-NEXT: slli a2, a2, 8
-; RV32I-NEXT: or a0, a0, a2
+; RV32I-NEXT: add a0, a0, a2
; RV32I-NEXT: lui a2, 209715
; RV32I-NEXT: addi a3, a3, -241
-; RV32I-NEXT: or a0, a0, a1
+; RV32I-NEXT: add a0, a0, a1
; RV32I-NEXT: srli a1, a0, 4
; RV32I-NEXT: and a0, a0, a3
; RV32I-NEXT: and a1, a1, a3
; RV32I-NEXT: lui a3, 344064
; RV32I-NEXT: addi a2, a2, 819
; RV32I-NEXT: slli a0, a0, 4
-; RV32I-NEXT: or a0, a1, a0
+; RV32I-NEXT: add a0, a1, a0
; RV32I-NEXT: srli a1, a0, 2
; RV32I-NEXT: and a0, a0, a2
; RV32I-NEXT: and a1, a1, a2
; RV32I-NEXT: lui a2, 348160
; RV32I-NEXT: slli a0, a0, 2
-; RV32I-NEXT: or a0, a1, a0
+; RV32I-NEXT: add a0, a1, a0
; RV32I-NEXT: srli a1, a0, 1
; RV32I-NEXT: and a0, a0, a2
; RV32I-NEXT: and a1, a1, a3
; RV32I-NEXT: slli a0, a0, 1
-; RV32I-NEXT: or a0, a1, a0
+; RV32I-NEXT: add a0, a1, a0
; RV32I-NEXT: srli a0, a0, 25
; RV32I-NEXT: ret
;
@@ -215,49 +215,49 @@ define i7 @test_bitreverse_i7(i7 %a) nounwind {
; RV64I-NEXT: srliw a7, a0, 24
; RV64I-NEXT: addi a2, a2, -256
; RV64I-NEXT: and a1, a1, a2
-; RV64I-NEXT: or a1, a1, a3
+; RV64I-NEXT: add a1, a1, a3
; RV64I-NEXT: lui a3, 61681
; RV64I-NEXT: and a4, a4, a5
; RV64I-NEXT: srliw a6, a6, 24
; RV64I-NEXT: slli a6, a6, 24
-; RV64I-NEXT: or a4, a6, a4
+; RV64I-NEXT: add a4, a6, a4
; RV64I-NEXT: lui a6, 209715
; RV64I-NEXT: and a5, a0, a5
; RV64I-NEXT: slli a7, a7, 32
; RV64I-NEXT: addi a3, a3, -241
; RV64I-NEXT: addi a6, a6, 819
; RV64I-NEXT: slli a5, a5, 24
-; RV64I-NEXT: or a5, a5, a7
+; RV64I-NEXT: add a5, a5, a7
; RV64I-NEXT: slli a7, a3, 32
; RV64I-NEXT: add a3, a3, a7
; RV64I-NEXT: slli a7, a6, 32
; RV64I-NEXT: add a6, a6, a7
-; RV64I-NEXT: or a1, a4, a1
+; RV64I-NEXT: add a1, a4, a1
; RV64I-NEXT: and a2, a0, a2
; RV64I-NEXT: slli a0, a0, 56
; RV64I-NEXT: slli a2, a2, 40
-; RV64I-NEXT: or a0, a0, a2
+; RV64I-NEXT: add a0, a0, a2
; RV64I-NEXT: li a2, 21
-; RV64I-NEXT: or a0, a0, a5
+; RV64I-NEXT: add a0, a0, a5
; RV64I-NEXT: li a4, 85
; RV64I-NEXT: slli a2, a2, 58
; RV64I-NEXT: slli a4, a4, 56
-; RV64I-NEXT: or a0, a0, a1
+; RV64I-NEXT: add a0, a0, a1
; RV64I-NEXT: srli a1, a0, 4
; RV64I-NEXT: and a0, a0, a3
; RV64I-NEXT: and a1, a1, a3
; RV64I-NEXT: slli a0, a0, 4
-; RV64I-NEXT: or a0, a1, a0
+; RV64I-NEXT: add a0, a1, a0
; RV64I-NEXT: srli a1, a0, 2
; RV64I-NEXT: and a0, a0, a6
; RV64I-NEXT: and a1, a1, a6
; RV64I-NEXT: slli a0, a0, 2
-; RV64I-NEXT: or a0, a1, a0
+; RV64I-NEXT: add a0, a1, a0
; RV64I-NEXT: srli a1, a0, 1
; RV64I-NEXT: and a0, a0, a4
; RV64I-NEXT: and a1, a1, a2
; RV64I-NEXT: slli a0, a0, 1
-; RV64I-NEXT: or a0, a1, a0
+; RV64I-NEXT: add a0, a1, a0
; RV64I-NEXT: srli a0, a0, 57
; RV64I-NEXT: ret
;
@@ -272,19 +272,19 @@ define i7 @test_bitreverse_i7(i7 %a) nounwind {
; RV32ZBB-NEXT: lui a1, 209715
; RV32ZBB-NEXT: addi a1, a1, 819
; RV32ZBB-NEXT: slli a0, a0, 4
-; RV32ZBB-NEXT: or a0, a2, a0
+; RV32ZBB-NEXT: add a0, a2, a0
; RV32ZBB-NEXT: srli a2, a0, 2
; RV32ZBB-NEXT: and a0, a0, a1
; RV32ZBB-NEXT: and a1, a2, a1
; RV32ZBB-NEXT: lui a2, 344064
; RV32ZBB-NEXT: slli a0, a0, 2
-; RV32ZBB-NEXT: or a0, a1, a0
+; RV32ZBB-NEXT: add a0, a1, a0
; RV32ZBB-NEXT: lui a1, 348160
; RV32ZBB-NEXT: and a1, a0, a1
; RV32ZBB-NEXT: srli a0, a0, 1
; RV32ZBB-NEXT: and a0, a0, a2
; RV32ZBB-NEXT: slli a1, a1, 1
-; RV32ZBB-NEXT: or a0, a0, a1
+; RV32ZBB-NEXT: add a0, a0, a1
; RV32ZBB-NEXT: srli a0, a0, 25
; RV32ZBB-NEXT: ret
;
@@ -304,7 +304,7 @@ define i7 @test_bitreverse_i7(i7 %a) nounwind {
; RV64ZBB-NEXT: and a0, a0, a1
; RV64ZBB-NEXT: li a1, 21
; RV64ZBB-NEXT: slli a0, a0, 4
-; RV64ZBB-NEXT: or a0, a3, a0
+; RV64ZBB-NEXT: add a0, a3, a0
; RV64ZBB-NEXT: srli a3, a0, 2
; RV64ZBB-NEXT: and a0, a0, a2
; RV64ZBB-NEXT: and a2, a3, a2
@@ -312,12 +312,12 @@ define i7 @test_bitreverse_i7(i7 %a) nounwind {
; RV64ZBB-NEXT: slli a1, a1, 58
; RV64ZBB-NEXT: slli a3, a3, 56
; RV64ZBB-NEXT: slli a0, a0, 2
-; RV64ZBB-NEXT: or a0, a2, a0
+; RV64ZBB-NEXT: add a0, a2, a0
; RV64ZBB-NEXT: srli a2, a0, 1
; RV64ZBB-NEXT: and a0, a0, a3
; RV64ZBB-NEXT: and a1, a2, a1
; RV64ZBB-NEXT: slli a0, a0, 1
-; RV64ZBB-NEXT: or a0, a1, a0
+; RV64ZBB-NEXT: add a0, a1, a0
; RV64ZBB-NEXT: srli a0, a0, 57
; RV64ZBB-NEXT: ret
;
@@ -345,17 +345,17 @@ define i8 @test_bitreverse_i8(i8 %a) nounwind {
; RV32I-NEXT: slli a0, a0, 24
; RV32I-NEXT: slli a1, a1, 4
; RV32I-NEXT: srli a0, a0, 28
-; RV32I-NEXT: or a0, a0, a1
+; RV32I-NEXT: add a0, a0, a1
; RV32I-NEXT: andi a1, a0, 51
; RV32I-NEXT: srli a0, a0, 2
; RV32I-NEXT: slli a1, a1, 2
; RV32I-NEXT: andi a0, a0, 51
-; RV32I-NEXT: or a0, a0, a1
+; RV32I-NEXT: add a0, a0, a1
; RV32I-NEXT: andi a1, a0, 85
; RV32I-NEXT: srli a0, a0, 1
; RV32I-NEXT: slli a1, a1, 1
; RV32I-NEXT: andi a0, a0, 85
-; RV32I-NEXT: or a0, a0, a1
+; RV32I-NEXT: add a0, a0, a1
; RV32I-NEXT: ret
;
; RV64I-LABEL: test_bitreverse_i8:
@@ -364,17 +364,17 @@ define i8 @test_bitreverse_i8(i8 %a) nounwind {
; RV64I-NEXT: slli a0, a0, 56
; RV64I-NEXT: slli a1, a1, 4
; RV64I-NEXT: srli a0, a0, 60
-; RV64I-NEXT: or a0, a0, a1
+; RV64I-NEXT: add a...
[truncated]
|
Do you have measurements for code size savings? |
are there cases where we could potentially leverage ADDW? |
…tions c.or requires that all the operands be the gprc register class, but c.add does not. As a result, we can use c.add for disjoint or to allow additional compression. This patch does the transform extremely late (when converting to MCInst) so that we only emit an OR as an ADD if the difference actually reduces code size. I haven't touched the register allocator hint mechanism (yet), so this is only catching cases which naturally end up reusing one of the source registers. This is a (likely much better) alternative to llvm#155669. When I first wrote that, I hadn't reazlied that we already propoagate disjoint onto the RISCV::OR MachineInst Node. Note there is a small correctness risk with this change - if we forgot to drop the disoint flag somewhere this could cause miscompiles, and I don't think we have another use of the flag this late in the backend.
I realized we already propagated the disjoint flag into MI, and have put up an alternate patch which is much more localized in test impact. See: #156044
Nope, this was based on a random observation.
This one is tricky if we move to the late scheme. We don't have an ORW, so we basically have to do this eagerly in ISEL. |
This is at the moment mostly a prompt for discussion as opposed to a patch I intend to land.
I happened to notice that c.add allows more registers than the c.or instruction. Thus, by using ADD instead of OR for a disjoint OR, we have the possibility of emitting more compressible instructions. I can see that this does happen in a couple places in the tests, but I don't have a compelling example or anything.
If we wanted to do this with less test churn and confusion, we could do something like introduce a OR_DISJOINT pseudo, and lower it to either OR or ADD extremely late based on which registers got used..