-
Notifications
You must be signed in to change notification settings - Fork 15.2k
AMDGPU: Stop implementing shouldCoalesce #168988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Use the default, which freely coalesces anything it can. This mostly shows improvements, with a handful of regressions. The main concern would be if introducing wider registers is more likely to push the register usage up to the next occupancy tier.
|
@llvm/pr-subscribers-llvm-globalisel @llvm/pr-subscribers-llvm-regalloc Author: Matt Arsenault (arsenm) ChangesAMDGPU: Stop implementing shouldCoalesce Use the default, which freely coalesces anything it can. Rename test regression Patch is 9.33 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/168988.diff 66 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
index 5484fab3efdcc..ad3828fba2187 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
@@ -3755,20 +3755,6 @@ bool SIRegisterInfo::isAGPR(const MachineRegisterInfo &MRI,
return RC && isAGPRClass(RC);
}
-bool SIRegisterInfo::shouldCoalesce(MachineInstr *MI,
- const TargetRegisterClass *SrcRC,
- unsigned SubReg,
- const TargetRegisterClass *DstRC,
- unsigned DstSubReg,
- const TargetRegisterClass *NewRC,
- LiveIntervals &LIS) const {
- // TODO: This should be more aggressive, but be more cautious with very wide
- // tuples.
- unsigned NewSize = getRegSizeInBits(*NewRC);
- return NewSize <= 128 || NewSize <= getRegSizeInBits(*SrcRC) ||
- NewSize <= getRegSizeInBits(*DstRC);
-}
-
unsigned SIRegisterInfo::getRegPressureLimit(const TargetRegisterClass *RC,
MachineFunction &MF) const {
unsigned MinOcc = ST.getOccupancyWithWorkGroupSizes(MF).first;
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.h b/llvm/lib/Target/AMDGPU/SIRegisterInfo.h
index bb8a80f811d4c..2e2916f68f584 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.h
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.h
@@ -346,14 +346,6 @@ class SIRegisterInfo final : public AMDGPUGenRegisterInfo {
ArrayRef<int16_t> getRegSplitParts(const TargetRegisterClass *RC,
unsigned EltSize) const;
- bool shouldCoalesce(MachineInstr *MI,
- const TargetRegisterClass *SrcRC,
- unsigned SubReg,
- const TargetRegisterClass *DstRC,
- unsigned DstSubReg,
- const TargetRegisterClass *NewRC,
- LiveIntervals &LIS) const override;
-
unsigned getRegPressureLimit(const TargetRegisterClass *RC,
MachineFunction &MF) const override;
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/mul.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/mul.ll
index 3eecaccf0308f..5347110468d9c 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/mul.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/mul.ll
@@ -2404,63 +2404,62 @@ define i256 @v_mul_i256(i256 %num, i256 %den) {
; GFX7: ; %bb.0:
; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX7-NEXT: v_mad_u64_u32 v[16:17], s[4:5], v0, v14, 0
+; GFX7-NEXT: v_mul_lo_u32 v28, v4, v11
; GFX7-NEXT: v_mul_lo_u32 v29, v3, v12
-; GFX7-NEXT: v_mul_lo_u32 v30, v2, v13
; GFX7-NEXT: v_mad_u64_u32 v[18:19], s[4:5], v1, v13, v[16:17]
+; GFX7-NEXT: v_mul_lo_u32 v30, v2, v13
+; GFX7-NEXT: v_mul_lo_u32 v27, v5, v10
; GFX7-NEXT: v_mad_u64_u32 v[16:17], s[4:5], v2, v12, v[18:19]
-; GFX7-NEXT: v_mad_u64_u32 v[18:19], s[4:5], v0, v10, 0
-; GFX7-NEXT: v_mad_u64_u32 v[20:21], s[4:5], v3, v11, v[16:17]
-; GFX7-NEXT: v_mad_u64_u32 v[16:17], s[4:5], v1, v9, v[18:19]
-; GFX7-NEXT: v_mad_u64_u32 v[18:19], s[6:7], v4, v10, v[20:21]
-; GFX7-NEXT: v_mad_u64_u32 v[20:21], s[6:7], v0, v12, 0
-; GFX7-NEXT: v_mad_u64_u32 v[22:23], s[6:7], v5, v9, v[18:19]
-; GFX7-NEXT: v_mad_u64_u32 v[24:25], s[6:7], v1, v11, v[20:21]
-; GFX7-NEXT: v_cndmask_b32_e64 v20, 0, 1, s[4:5]
-; GFX7-NEXT: v_mad_u64_u32 v[18:19], vcc, v2, v8, v[16:17]
-; GFX7-NEXT: v_addc_u32_e32 v28, vcc, 0, v20, vcc
-; GFX7-NEXT: v_mad_u64_u32 v[16:17], vcc, v2, v10, v[24:25]
-; GFX7-NEXT: v_cndmask_b32_e64 v20, 0, 1, s[6:7]
-; GFX7-NEXT: v_addc_u32_e32 v20, vcc, 0, v20, vcc
-; GFX7-NEXT: v_mad_u64_u32 v[26:27], s[8:9], v6, v8, v[22:23]
-; GFX7-NEXT: v_mad_u64_u32 v[22:23], vcc, v3, v9, v[16:17]
-; GFX7-NEXT: v_addc_u32_e32 v16, vcc, 0, v20, vcc
-; GFX7-NEXT: v_mad_u64_u32 v[20:21], vcc, v4, v8, v[22:23]
-; GFX7-NEXT: v_mov_b32_e32 v22, v26
-; GFX7-NEXT: v_addc_u32_e32 v23, vcc, 0, v16, vcc
-; GFX7-NEXT: v_mad_u64_u32 v[16:17], vcc, v0, v13, v[21:22]
-; GFX7-NEXT: v_mad_u64_u32 v[21:22], s[12:13], v0, v11, v[19:20]
-; GFX7-NEXT: v_mad_u64_u32 v[19:20], s[4:5], v1, v12, v[16:17]
-; GFX7-NEXT: v_mul_lo_u32 v26, v4, v11
-; GFX7-NEXT: v_mul_lo_u32 v25, v5, v10
-; GFX7-NEXT: v_mad_u64_u32 v[16:17], s[6:7], v2, v11, v[19:20]
-; GFX7-NEXT: v_mul_lo_u32 v24, v6, v9
-; GFX7-NEXT: v_mad_u64_u32 v[11:12], s[8:9], v3, v10, v[16:17]
-; GFX7-NEXT: v_mad_u64_u32 v[16:17], s[10:11], v4, v9, v[11:12]
-; GFX7-NEXT: v_cndmask_b32_e64 v4, 0, 1, s[12:13]
-; GFX7-NEXT: v_mad_u64_u32 v[11:12], s[12:13], v1, v10, v[21:22]
-; GFX7-NEXT: v_addc_u32_e64 v4, s[12:13], 0, v4, s[12:13]
-; GFX7-NEXT: v_mad_u64_u32 v[19:20], s[12:13], v2, v9, v[11:12]
-; GFX7-NEXT: v_mul_lo_u32 v10, v1, v14
-; GFX7-NEXT: v_mad_u64_u32 v[13:14], s[14:15], v5, v8, v[16:17]
+; GFX7-NEXT: v_mad_u64_u32 v[18:19], s[4:5], v3, v11, v[16:17]
+; GFX7-NEXT: v_mad_u64_u32 v[16:17], s[4:5], v0, v10, 0
+; GFX7-NEXT: v_mad_u64_u32 v[20:21], s[4:5], v4, v10, v[18:19]
+; GFX7-NEXT: v_mad_u64_u32 v[22:23], s[4:5], v1, v9, v[16:17]
+; GFX7-NEXT: v_mad_u64_u32 v[16:17], s[6:7], v5, v9, v[20:21]
+; GFX7-NEXT: v_cndmask_b32_e64 v26, 0, 1, s[4:5]
+; GFX7-NEXT: v_mad_u64_u32 v[18:19], vcc, v2, v8, v[22:23]
+; GFX7-NEXT: v_mad_u64_u32 v[22:23], s[4:5], v6, v8, v[16:17]
+; GFX7-NEXT: v_mad_u64_u32 v[16:17], s[4:5], v0, v12, 0
+; GFX7-NEXT: v_mad_u64_u32 v[20:21], s[4:5], v1, v11, v[16:17]
+; GFX7-NEXT: v_cndmask_b32_e64 v24, 0, 1, s[4:5]
+; GFX7-NEXT: v_mad_u64_u32 v[16:17], s[4:5], v2, v10, v[20:21]
+; GFX7-NEXT: v_addc_u32_e64 v20, s[4:5], 0, v24, s[4:5]
+; GFX7-NEXT: v_mad_u64_u32 v[24:25], s[4:5], v3, v9, v[16:17]
+; GFX7-NEXT: v_addc_u32_e64 v16, s[4:5], 0, v20, s[4:5]
+; GFX7-NEXT: v_mad_u64_u32 v[20:21], s[4:5], v4, v8, v[24:25]
+; GFX7-NEXT: v_addc_u32_e64 v24, s[4:5], 0, v16, s[4:5]
+; GFX7-NEXT: v_mad_u64_u32 v[16:17], s[4:5], v0, v13, v[21:22]
+; GFX7-NEXT: v_mul_lo_u32 v25, v6, v9
+; GFX7-NEXT: v_mad_u64_u32 v[21:22], s[6:7], v1, v12, v[16:17]
+; GFX7-NEXT: v_mad_u64_u32 v[16:17], s[8:9], v2, v11, v[21:22]
+; GFX7-NEXT: v_mad_u64_u32 v[21:22], s[10:11], v3, v10, v[16:17]
+; GFX7-NEXT: v_mad_u64_u32 v[16:17], s[14:15], v0, v11, v[19:20]
+; GFX7-NEXT: v_mad_u64_u32 v[12:13], s[12:13], v4, v9, v[21:22]
+; GFX7-NEXT: v_addc_u32_e32 v4, vcc, 0, v26, vcc
+; GFX7-NEXT: v_mad_u64_u32 v[19:20], vcc, v1, v10, v[16:17]
+; GFX7-NEXT: v_cndmask_b32_e64 v6, 0, 1, s[14:15]
+; GFX7-NEXT: v_addc_u32_e32 v6, vcc, 0, v6, vcc
+; GFX7-NEXT: v_mad_u64_u32 v[10:11], vcc, v2, v9, v[19:20]
; GFX7-NEXT: v_mad_u64_u32 v[16:17], s[16:17], v0, v8, 0
-; GFX7-NEXT: v_addc_u32_e64 v2, s[12:13], 0, v4, s[12:13]
-; GFX7-NEXT: v_mad_u64_u32 v[21:22], s[12:13], v3, v8, v[19:20]
-; GFX7-NEXT: v_addc_u32_e64 v5, s[12:13], 0, v2, s[12:13]
-; GFX7-NEXT: v_mad_u64_u32 v[2:3], s[12:13], v0, v9, v[17:18]
-; GFX7-NEXT: v_cndmask_b32_e64 v4, 0, 1, s[12:13]
+; GFX7-NEXT: v_addc_u32_e32 v2, vcc, 0, v6, vcc
+; GFX7-NEXT: v_mad_u64_u32 v[19:20], vcc, v3, v8, v[10:11]
+; GFX7-NEXT: v_mad_u64_u32 v[21:22], s[14:15], v5, v8, v[12:13]
+; GFX7-NEXT: v_addc_u32_e32 v5, vcc, 0, v2, vcc
+; GFX7-NEXT: v_mad_u64_u32 v[2:3], s[16:17], v0, v9, v[17:18]
+; GFX7-NEXT: v_cndmask_b32_e64 v6, 0, 1, s[16:17]
; GFX7-NEXT: v_mul_lo_u32 v0, v0, v15
-; GFX7-NEXT: v_mad_u64_u32 v[11:12], s[12:13], v1, v8, v[2:3]
-; GFX7-NEXT: v_addc_u32_e64 v3, s[12:13], v4, v21, s[12:13]
-; GFX7-NEXT: v_addc_u32_e64 v4, s[12:13], v28, v22, s[12:13]
-; GFX7-NEXT: v_addc_u32_e64 v5, s[12:13], v5, v13, s[12:13]
-; GFX7-NEXT: v_addc_u32_e64 v6, s[12:13], v23, v14, s[12:13]
-; GFX7-NEXT: v_addc_u32_e64 v0, s[12:13], v27, v0, s[12:13]
-; GFX7-NEXT: v_addc_u32_e64 v0, s[12:13], v0, v10, s[14:15]
-; GFX7-NEXT: v_addc_u32_e64 v0, s[10:11], v0, v30, s[10:11]
-; GFX7-NEXT: v_addc_u32_e64 v0, s[8:9], v0, v29, s[8:9]
-; GFX7-NEXT: v_addc_u32_e64 v0, s[6:7], v0, v26, s[6:7]
-; GFX7-NEXT: v_addc_u32_e64 v0, s[4:5], v0, v25, s[4:5]
-; GFX7-NEXT: v_addc_u32_e32 v0, vcc, v0, v24, vcc
+; GFX7-NEXT: v_mad_u64_u32 v[11:12], vcc, v1, v8, v[2:3]
+; GFX7-NEXT: v_addc_u32_e32 v3, vcc, v6, v19, vcc
+; GFX7-NEXT: v_mul_lo_u32 v10, v1, v14
+; GFX7-NEXT: v_addc_u32_e32 v4, vcc, v4, v20, vcc
+; GFX7-NEXT: v_addc_u32_e32 v5, vcc, v5, v21, vcc
+; GFX7-NEXT: v_addc_u32_e32 v6, vcc, v24, v22, vcc
+; GFX7-NEXT: v_addc_u32_e32 v0, vcc, v23, v0, vcc
+; GFX7-NEXT: v_addc_u32_e64 v0, vcc, v0, v10, s[14:15]
+; GFX7-NEXT: v_addc_u32_e64 v0, vcc, v0, v30, s[12:13]
+; GFX7-NEXT: v_addc_u32_e64 v0, vcc, v0, v29, s[10:11]
+; GFX7-NEXT: v_addc_u32_e64 v0, vcc, v0, v28, s[8:9]
+; GFX7-NEXT: v_addc_u32_e64 v0, vcc, v0, v27, s[6:7]
+; GFX7-NEXT: v_addc_u32_e64 v0, vcc, v0, v25, s[4:5]
; GFX7-NEXT: v_mad_u64_u32 v[9:10], s[4:5], v7, v8, v[0:1]
; GFX7-NEXT: v_mov_b32_e32 v0, v16
; GFX7-NEXT: v_mov_b32_e32 v1, v11
@@ -2472,63 +2471,62 @@ define i256 @v_mul_i256(i256 %num, i256 %den) {
; GFX8: ; %bb.0:
; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX8-NEXT: v_mad_u64_u32 v[16:17], s[4:5], v0, v14, 0
+; GFX8-NEXT: v_mul_lo_u32 v28, v4, v11
; GFX8-NEXT: v_mul_lo_u32 v29, v3, v12
-; GFX8-NEXT: v_mul_lo_u32 v30, v2, v13
; GFX8-NEXT: v_mad_u64_u32 v[18:19], s[4:5], v1, v13, v[16:17]
+; GFX8-NEXT: v_mul_lo_u32 v30, v2, v13
+; GFX8-NEXT: v_mul_lo_u32 v27, v5, v10
; GFX8-NEXT: v_mad_u64_u32 v[16:17], s[4:5], v2, v12, v[18:19]
-; GFX8-NEXT: v_mad_u64_u32 v[18:19], s[4:5], v0, v10, 0
-; GFX8-NEXT: v_mad_u64_u32 v[20:21], s[4:5], v3, v11, v[16:17]
-; GFX8-NEXT: v_mad_u64_u32 v[16:17], s[4:5], v1, v9, v[18:19]
-; GFX8-NEXT: v_mad_u64_u32 v[18:19], s[6:7], v4, v10, v[20:21]
-; GFX8-NEXT: v_mad_u64_u32 v[20:21], s[6:7], v0, v12, 0
-; GFX8-NEXT: v_mad_u64_u32 v[22:23], s[6:7], v5, v9, v[18:19]
-; GFX8-NEXT: v_mad_u64_u32 v[24:25], s[6:7], v1, v11, v[20:21]
-; GFX8-NEXT: v_cndmask_b32_e64 v20, 0, 1, s[4:5]
-; GFX8-NEXT: v_mad_u64_u32 v[18:19], vcc, v2, v8, v[16:17]
-; GFX8-NEXT: v_addc_u32_e32 v28, vcc, 0, v20, vcc
-; GFX8-NEXT: v_mad_u64_u32 v[16:17], vcc, v2, v10, v[24:25]
-; GFX8-NEXT: v_cndmask_b32_e64 v20, 0, 1, s[6:7]
-; GFX8-NEXT: v_addc_u32_e32 v20, vcc, 0, v20, vcc
-; GFX8-NEXT: v_mad_u64_u32 v[26:27], s[8:9], v6, v8, v[22:23]
-; GFX8-NEXT: v_mad_u64_u32 v[22:23], vcc, v3, v9, v[16:17]
-; GFX8-NEXT: v_addc_u32_e32 v16, vcc, 0, v20, vcc
-; GFX8-NEXT: v_mad_u64_u32 v[20:21], vcc, v4, v8, v[22:23]
-; GFX8-NEXT: v_mov_b32_e32 v22, v26
-; GFX8-NEXT: v_addc_u32_e32 v23, vcc, 0, v16, vcc
-; GFX8-NEXT: v_mad_u64_u32 v[16:17], vcc, v0, v13, v[21:22]
-; GFX8-NEXT: v_mad_u64_u32 v[21:22], s[12:13], v0, v11, v[19:20]
-; GFX8-NEXT: v_mad_u64_u32 v[19:20], s[4:5], v1, v12, v[16:17]
-; GFX8-NEXT: v_mul_lo_u32 v26, v4, v11
-; GFX8-NEXT: v_mul_lo_u32 v25, v5, v10
-; GFX8-NEXT: v_mad_u64_u32 v[16:17], s[6:7], v2, v11, v[19:20]
-; GFX8-NEXT: v_mul_lo_u32 v24, v6, v9
-; GFX8-NEXT: v_mad_u64_u32 v[11:12], s[8:9], v3, v10, v[16:17]
-; GFX8-NEXT: v_mad_u64_u32 v[16:17], s[10:11], v4, v9, v[11:12]
-; GFX8-NEXT: v_cndmask_b32_e64 v4, 0, 1, s[12:13]
-; GFX8-NEXT: v_mad_u64_u32 v[11:12], s[12:13], v1, v10, v[21:22]
-; GFX8-NEXT: v_addc_u32_e64 v4, s[12:13], 0, v4, s[12:13]
-; GFX8-NEXT: v_mad_u64_u32 v[19:20], s[12:13], v2, v9, v[11:12]
-; GFX8-NEXT: v_mul_lo_u32 v10, v1, v14
-; GFX8-NEXT: v_mad_u64_u32 v[13:14], s[14:15], v5, v8, v[16:17]
+; GFX8-NEXT: v_mad_u64_u32 v[18:19], s[4:5], v3, v11, v[16:17]
+; GFX8-NEXT: v_mad_u64_u32 v[16:17], s[4:5], v0, v10, 0
+; GFX8-NEXT: v_mad_u64_u32 v[20:21], s[4:5], v4, v10, v[18:19]
+; GFX8-NEXT: v_mad_u64_u32 v[22:23], s[4:5], v1, v9, v[16:17]
+; GFX8-NEXT: v_mad_u64_u32 v[16:17], s[6:7], v5, v9, v[20:21]
+; GFX8-NEXT: v_cndmask_b32_e64 v26, 0, 1, s[4:5]
+; GFX8-NEXT: v_mad_u64_u32 v[18:19], vcc, v2, v8, v[22:23]
+; GFX8-NEXT: v_mad_u64_u32 v[22:23], s[4:5], v6, v8, v[16:17]
+; GFX8-NEXT: v_mad_u64_u32 v[16:17], s[4:5], v0, v12, 0
+; GFX8-NEXT: v_mad_u64_u32 v[20:21], s[4:5], v1, v11, v[16:17]
+; GFX8-NEXT: v_cndmask_b32_e64 v24, 0, 1, s[4:5]
+; GFX8-NEXT: v_mad_u64_u32 v[16:17], s[4:5], v2, v10, v[20:21]
+; GFX8-NEXT: v_addc_u32_e64 v20, s[4:5], 0, v24, s[4:5]
+; GFX8-NEXT: v_mad_u64_u32 v[24:25], s[4:5], v3, v9, v[16:17]
+; GFX8-NEXT: v_addc_u32_e64 v16, s[4:5], 0, v20, s[4:5]
+; GFX8-NEXT: v_mad_u64_u32 v[20:21], s[4:5], v4, v8, v[24:25]
+; GFX8-NEXT: v_addc_u32_e64 v24, s[4:5], 0, v16, s[4:5]
+; GFX8-NEXT: v_mad_u64_u32 v[16:17], s[4:5], v0, v13, v[21:22]
+; GFX8-NEXT: v_mul_lo_u32 v25, v6, v9
+; GFX8-NEXT: v_mad_u64_u32 v[21:22], s[6:7], v1, v12, v[16:17]
+; GFX8-NEXT: v_mad_u64_u32 v[16:17], s[8:9], v2, v11, v[21:22]
+; GFX8-NEXT: v_mad_u64_u32 v[21:22], s[10:11], v3, v10, v[16:17]
+; GFX8-NEXT: v_mad_u64_u32 v[16:17], s[14:15], v0, v11, v[19:20]
+; GFX8-NEXT: v_mad_u64_u32 v[12:13], s[12:13], v4, v9, v[21:22]
+; GFX8-NEXT: v_addc_u32_e32 v4, vcc, 0, v26, vcc
+; GFX8-NEXT: v_mad_u64_u32 v[19:20], vcc, v1, v10, v[16:17]
+; GFX8-NEXT: v_cndmask_b32_e64 v6, 0, 1, s[14:15]
+; GFX8-NEXT: v_addc_u32_e32 v6, vcc, 0, v6, vcc
+; GFX8-NEXT: v_mad_u64_u32 v[10:11], vcc, v2, v9, v[19:20]
; GFX8-NEXT: v_mad_u64_u32 v[16:17], s[16:17], v0, v8, 0
-; GFX8-NEXT: v_addc_u32_e64 v2, s[12:13], 0, v4, s[12:13]
-; GFX8-NEXT: v_mad_u64_u32 v[21:22], s[12:13], v3, v8, v[19:20]
-; GFX8-NEXT: v_addc_u32_e64 v5, s[12:13], 0, v2, s[12:13]
-; GFX8-NEXT: v_mad_u64_u32 v[2:3], s[12:13], v0, v9, v[17:18]
-; GFX8-NEXT: v_cndmask_b32_e64 v4, 0, 1, s[12:13]
+; GFX8-NEXT: v_addc_u32_e32 v2, vcc, 0, v6, vcc
+; GFX8-NEXT: v_mad_u64_u32 v[19:20], vcc, v3, v8, v[10:11]
+; GFX8-NEXT: v_mad_u64_u32 v[21:22], s[14:15], v5, v8, v[12:13]
+; GFX8-NEXT: v_addc_u32_e32 v5, vcc, 0, v2, vcc
+; GFX8-NEXT: v_mad_u64_u32 v[2:3], s[16:17], v0, v9, v[17:18]
+; GFX8-NEXT: v_cndmask_b32_e64 v6, 0, 1, s[16:17]
; GFX8-NEXT: v_mul_lo_u32 v0, v0, v15
-; GFX8-NEXT: v_mad_u64_u32 v[11:12], s[12:13], v1, v8, v[2:3]
-; GFX8-NEXT: v_addc_u32_e64 v3, s[12:13], v4, v21, s[12:13]
-; GFX8-NEXT: v_addc_u32_e64 v4, s[12:13], v28, v22, s[12:13]
-; GFX8-NEXT: v_addc_u32_e64 v5, s[12:13], v5, v13, s[12:13]
-; GFX8-NEXT: v_addc_u32_e64 v6, s[12:13], v23, v14, s[12:13]
-; GFX8-NEXT: v_addc_u32_e64 v0, s[12:13], v27, v0, s[12:13]
-; GFX8-NEXT: v_addc_u32_e64 v0, s[12:13], v0, v10, s[14:15]
-; GFX8-NEXT: v_addc_u32_e64 v0, s[10:11], v0, v30, s[10:11]
-; GFX8-NEXT: v_addc_u32_e64 v0, s[8:9], v0, v29, s[8:9]
-; GFX8-NEXT: v_addc_u32_e64 v0, s[6:7], v0, v26, s[6:7]
-; GFX8-NEXT: v_addc_u32_e64 v0, s[4:5], v0, v25, s[4:5]
-; GFX8-NEXT: v_addc_u32_e32 v0, vcc, v0, v24, vcc
+; GFX8-NEXT: v_mad_u64_u32 v[11:12], vcc, v1, v8, v[2:3]
+; GFX8-NEXT: v_addc_u32_e32 v3, vcc, v6, v19, vcc
+; GFX8-NEXT: v_mul_lo_u32 v10, v1, v14
+; GFX8-NEXT: v_addc_u32_e32 v4, vcc, v4, v20, vcc
+; GFX8-NEXT: v_addc_u32_e32 v5, vcc, v5, v21, vcc
+; GFX8-NEXT: v_addc_u32_e32 v6, vcc, v24, v22, vcc
+; GFX8-NEXT: v_addc_u32_e32 v0, vcc, v23, v0, vcc
+; GFX8-NEXT: v_addc_u32_e64 v0, vcc, v0, v10, s[14:15]
+; GFX8-NEXT: v_addc_u32_e64 v0, vcc, v0, v30, s[12:13]
+; GFX8-NEXT: v_addc_u32_e64 v0, vcc, v0, v29, s[10:11]
+; GFX8-NEXT: v_addc_u32_e64 v0, vcc, v0, v28, s[8:9]
+; GFX8-NEXT: v_addc_u32_e64 v0, vcc, v0, v27, s[6:7]
+; GFX8-NEXT: v_addc_u32_e64 v0, vcc, v0, v25, s[4:5]
; GFX8-NEXT: v_mad_u64_u32 v[9:10], s[4:5], v7, v8, v[0:1]
; GFX8-NEXT: v_mov_b32_e32 v0, v16
; GFX8-NEXT: v_mov_b32_e32 v1, v11
@@ -2540,63 +2538,62 @@ define i256 @v_mul_i256(i256 %num, i256 %den) {
; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: v_mad_u64_u32 v[16:17], s[4:5], v0, v14, 0
+; GFX9-NEXT: v_mul_lo_u32 v28, v4, v11
; GFX9-NEXT: v_mul_lo_u32 v29, v3, v12
-; GFX9-NEXT: v_mul_lo_u32 v30, v2, v13
; GFX9-NEXT: v_mad_u64_u32 v[18:19], s[4:5], v1, v13, v[16:17]
+; GFX9-NEXT: v_mul_lo_u32 v30, v2, v13
+; GFX9-NEXT: v_mul_lo_u32 v27, v5, v10
; GFX9-NEXT: v_mad_u64_u32 v[16:17], s[4:5], v2, v12, v[18:19]
-; GFX9-NEXT: v_mad_u64_u32 v[18:19], s[4:5], v0, v10, 0
-; GFX9-NEXT: v_mad_u64_u32 v[20:21], s[4:5], v3, v11, v[16:17]
-; GFX9-NEXT: v_mad_u64_u32 v[16:17], s[4:5], v1, v9, v[18:19]
-; GFX9-NEXT: v_mad_u64_u32 v[18:19], s[6:7], v4, v10, v[20:21]
-; GFX9-NEXT: v_mad_u64_u32 v[20:21], s[6:7], v0, v12, 0
-; GFX9-NEXT: v_mad_u64_u32 v[22:23], s[6:7], v5, v9, v[18:19]
-; GFX9-NEXT: v_mad_u64_u32 v[24:25], s[6:7], v1, v11, v[20:21]
-; GFX9-NEXT: v_cndmask_b32_e64 v20, 0, 1, s[4:5]
-; GFX9-NEXT: v_mad_u64_u32 v[18:19], vcc, v2, v8, v[16:17]
-; GFX9-NEXT: v_addc_co_u32_e32 v28, vcc, 0, v20, vcc
-; GFX9-NEXT: v_mad_u64_u32 v[16:17], vcc, v2, v10, v[24:25]
-; GFX9-NEXT: v_cndmask_b32_e64 v20, 0, 1, s[6:7]
-; GFX9-NEXT: v_addc_co_u32_e32 v20, vcc, 0, v20, vcc
-; GFX9-NEXT: v_mad_u64_u32 v[26:27], s[8:9], v6, v8, v[22:23]
-; GFX9-NEXT: v_mad_u64_u32 v[22:23], vcc, v3, v9, v[16:17]
-; GFX9-NEXT: v_addc_co_u32_e32 v16, vcc, 0, v20, vcc
-; GFX9-NEXT: v_mad_u64_u32 v[20:21], vcc, v4, v8, v[22:23]
-; GFX9-NEXT: v_mov_b32_e32 v22, v26
-; GFX9-NEXT: v_addc_co_u32_e32 v23, vcc, 0, v16, vcc
-; GFX9-NEXT: v_mad_u64_u32 v[16:17], vcc, v0, v13, v[21:22]
-; GFX9-NEXT: v_mad_u64_u32 v[21:22], s[12:13], v0, v11, v[19:20]
-; GFX9-NEXT: v_mad_u64_u32 v[19:20], s[4:5], v1, v12, v[16:17]
-; GFX9-NEXT: v_mul_lo_u32 v26, v4, v11
-; GFX9-NEXT: v_mul_lo_u32 v25, v5, v10
-; GFX9-NEXT: v_mad_u64_u32 v[16:17], s[6:7], v2, v11, v[19:20]
-; GFX9-NEXT: v_mul_lo_u32 v24, v6, v9
-; GFX9-NEXT: v_mad_u64_u32 v[11:12], s[8:9], v3, v10, v[16:17]
-; GFX9-NEXT: v_mad_u64_u32 v[16:17], s[10:11], v4, v9, v[11:12]
-; GFX9-NEXT: v_cndmask_b32_e64 v4, 0, 1, s[12:13]
-; GFX9-NEXT: v_mad_u64_u32 v[11:12], s[12:13], v1, v10, v[21:22]
-; GFX9-NEXT: v_addc_co_u32_e64 v4, s[12:13], 0, v4, s[12:13]
-; GFX9-NEXT: v_mad_u64_u32 v[19:20], s[12:13], v2, v9, v[11:12]
-; GFX9-NEXT: v_mul_lo_u32 v10, v1, v14
-; GFX9-NEXT: v_mad_u64_u32 v[13:14], s[14:15], v5, v8, v[16:17]
+; GFX9-NEXT: v_mad_u64_u32 v[18:19], s[4:5], v3, v11, v[16:17]
+; GFX9-NEXT: v_mad_u64_u32 v[16:17], s[4:5], v0, v10, 0
+; GFX9-NEXT: v_mad_u64_u32 v[20:21], s[4:5], v4, v10, v[18:19]
+; GFX9-NEXT: v_mad_u64_u32 v[22:23], s[4:5], v1, v9, v[16:17]
+; GFX9-NEXT: v_mad_u64_u32 v[16:17], s[6:7], v5, v9, v[20:21]
+; GFX9-NEXT: v_cndmask_b32_e64 v26, 0, 1, s[4:5]
+; GFX9-NEXT: v_mad_u64_u32 v[18:19], vcc, v2, v8, v[22:23]
+; GFX9-NEXT: v_mad_u64_u32 v[22:23], s[4:5], v6, v8, v[16:17]
+; GFX9-NEXT: v_mad_u64_u32 v[16:17], s[4:5], v0, v12, 0
+; GFX9-NEXT: v_mad_u64_u32 v[20:21], s[4:5], v1, v11, v[16:17]
+; GFX9-NEXT: v_cndmask_b32_e64 v24, 0, 1, s[4:5]
+; GFX9-NEXT: v_mad_u64_u32 v[16:17], s[4:5], v2, v10, v[20:21]
+; GFX9-NEXT: v_addc_co_u32_e64 v20, s[4:5], 0, v24, s[4:5]
+; GFX9-NEXT: v_mad_u64_u32 v[24:25], s[4:5], v3, v9, v[16:17]
+; GFX9-NEXT: v_addc_co_u32_e64 v16, s[4:5], 0, v20, s[4:5]
+; GFX9-NEXT: v_mad_u64_u32 v[20:21], s[4:5], v4, v8, v[24:25]
+; GFX9-NEXT: v_addc_co_u32_e64 v24, s[4:5], 0, v16, s[4:5]
+; GFX9-NEXT: v_mad_u64_u32 v[16:17], s[4:5], v0, v13, v[21:22]
+; GFX9-NEXT: v_mul_lo_u32 v25, v6, v9
+; GFX9-NEXT: v_mad_u64_u32 v[21:22], s[6:7], v1, v12, v[16:17]
+; GFX9-NEXT: v_mad_u64_u32 v[16:17], s[8:9], v2, v11, v[21:22]
+; GFX9-NEXT: v_mad_u64_u32 v[21:22], s[10:11], v3, v10, v[16:17]
+; GFX9-NEXT: v_mad_u64_u32 v[16:17], s[14:15], v0, v11, v[19:20]
+; GFX9-NEXT: v_mad_u64_u32 v[12:13], s[12:13], v4, v9, v[21:22]
+; GFX9-NEXT: v_addc_co_u32_e32 v4, vcc, 0, v26, vcc
+; GFX9-NEXT: v_mad_u64_u32 v[19:20], vcc, v1, v10, v[16:17]
+; GFX9-NEXT: v_cndmask_b32_e64 v6, 0, 1, s[14:15]
+; GFX9-NEXT: v_addc_co_u32_e32 v6, vcc, 0, v6, vcc
+; GFX9-NEXT: v_mad_u64_u32 v[10:11], vcc, v2, v9, v[19:20]
; GFX9-NEXT: v_mad_u64_u32 v[16:17], s[16:17],...
[truncated]
|
🐧 Linux x64 Test Results
|

Use the default, which freely coalesces anything it can.
This mostly shows improvements, with a handful of regressions.
The main concern would be if introducing wider registers is more
likely to push the register usage up to the next occupancy tier.