Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CodeGen] Simplify updateLiveIn in MachineSink #79831

Merged
merged 1 commit into from
Feb 15, 2024

Conversation

jayfoad
Copy link
Contributor

@jayfoad jayfoad commented Jan 29, 2024

When a whole register is added a basic block's liveins, use
LaneBitmask::getAll for the live lanes instead of trying to calculate an
accurate mask of the lanes that comprise the register.

This simplifies the code and matches other places where a whole register
is marked as livein.

This also avoids problems when regunits that are synthesized by TableGen
to represent ad hoc aliasing have a lane mask of 0.

Fixes #78942

When a whole register is added a basic block's liveins, use
LaneBitmask::getAll for the live lanes instead of trying to calculate an
accurate mask of the lanes that comprise the register.

This simplifies the code and matches other places where a whole register
is marked as livein.

This also avoids problems when regunits that are synthesized by TableGen
to represent ad hoc aliasing have a lane mask of 0.

Fixes llvm#78942
@llvmbot
Copy link
Collaborator

llvmbot commented Jan 29, 2024

@llvm/pr-subscribers-backend-amdgpu

Author: Jay Foad (jayfoad)

Changes

When a whole register is added a basic block's liveins, use
LaneBitmask::getAll for the live lanes instead of trying to calculate an
accurate mask of the lanes that comprise the register.

This simplifies the code and matches other places where a whole register
is marked as livein.

This also avoids problems when regunits that are synthesized by TableGen
to represent ad hoc aliasing have a lane mask of 0.

Fixes #78942


Full diff: https://github.com/llvm/llvm-project/pull/79831.diff

4 Files Affected:

  • (modified) llvm/lib/CodeGen/MachineSink.cpp (+2-7)
  • (modified) llvm/test/CodeGen/AMDGPU/av_spill_cross_bb_usage.mir (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/branch-folding-implicit-def-subreg.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/postra-machine-sink.mir (+1-1)
diff --git a/llvm/lib/CodeGen/MachineSink.cpp b/llvm/lib/CodeGen/MachineSink.cpp
index e7e8f602683480..c3a1d3759882d8 100644
--- a/llvm/lib/CodeGen/MachineSink.cpp
+++ b/llvm/lib/CodeGen/MachineSink.cpp
@@ -1949,13 +1949,8 @@ static void updateLiveIn(MachineInstr *MI, MachineBasicBlock *SuccBB,
   for (unsigned DefReg : DefedRegsInCopy)
     for (MCPhysReg S : TRI->subregs_inclusive(DefReg))
       SuccBB->removeLiveIn(S);
-  for (auto U : UsedOpsInCopy) {
-    Register SrcReg = MI->getOperand(U).getReg();
-    LaneBitmask Mask;
-    for (MCRegUnitMaskIterator S(SrcReg, TRI); S.isValid(); ++S)
-      Mask |= (*S).second;
-    SuccBB->addLiveIn(SrcReg, Mask);
-  }
+  for (auto U : UsedOpsInCopy)
+    SuccBB->addLiveIn(MI->getOperand(U).getReg());
   SuccBB->sortUniqueLiveIns();
 }
 
diff --git a/llvm/test/CodeGen/AMDGPU/av_spill_cross_bb_usage.mir b/llvm/test/CodeGen/AMDGPU/av_spill_cross_bb_usage.mir
index 65648bacd55679..c1da29ecc2c2f5 100644
--- a/llvm/test/CodeGen/AMDGPU/av_spill_cross_bb_usage.mir
+++ b/llvm/test/CodeGen/AMDGPU/av_spill_cross_bb_usage.mir
@@ -47,7 +47,7 @@ body:             |
   ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
-  ; GCN-NEXT:   liveins: $exec:0x000000000000000F, $sgpr30, $sgpr31, $vgpr0:0x0000000000000003, $vgpr1:0x0000000000000003, $vgpr2:0x0000000000000003, $vgpr3:0x0000000000000003, $vgpr4:0x0000000000000003, $vgpr5:0x0000000000000003, $vgpr6:0x0000000000000003, $vgpr7:0x0000000000000003, $vgpr8:0x0000000000000003, $vgpr9:0x0000000000000003, $vgpr40, $sgpr30_sgpr31, $vgpr10_vgpr11:0x000000000000000F, $vgpr14_vgpr15:0x000000000000000F, $vgpr41_vgpr42:0x000000000000000F, $vgpr43_vgpr44:0x000000000000000F, $vgpr45_vgpr46:0x000000000000000F, $vgpr56_vgpr57:0x000000000000000F, $vgpr58_vgpr59:0x000000000000000F, $vgpr60_vgpr61:0x000000000000000F
+  ; GCN-NEXT:   liveins: $exec, $sgpr30, $sgpr31, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7, $vgpr8, $vgpr9, $vgpr40, $sgpr30_sgpr31, $vgpr10_vgpr11:0x000000000000000F, $vgpr14_vgpr15:0x000000000000000F, $vgpr41_vgpr42:0x000000000000000F, $vgpr43_vgpr44:0x000000000000000F, $vgpr45_vgpr46:0x000000000000000F, $vgpr56_vgpr57:0x000000000000000F, $vgpr58_vgpr59:0x000000000000000F, $vgpr60_vgpr61:0x000000000000000F
   ; GCN-NEXT: {{  $}}
   ; GCN-NEXT:   renamable $vgpr57 = COPY $vgpr9, implicit $exec
   ; GCN-NEXT:   renamable $vgpr56 = COPY $vgpr8, implicit $exec
diff --git a/llvm/test/CodeGen/AMDGPU/branch-folding-implicit-def-subreg.ll b/llvm/test/CodeGen/AMDGPU/branch-folding-implicit-def-subreg.ll
index 5a128c7541d1ec..24ef8ce1beb2db 100644
--- a/llvm/test/CodeGen/AMDGPU/branch-folding-implicit-def-subreg.ll
+++ b/llvm/test/CodeGen/AMDGPU/branch-folding-implicit-def-subreg.ll
@@ -808,7 +808,7 @@ define amdgpu_kernel void @f1(ptr addrspace(1) %arg, ptr addrspace(1) %arg1, i64
   ; GFX90A-NEXT: {{  $}}
   ; GFX90A-NEXT: bb.58:
   ; GFX90A-NEXT:   successors: %bb.7(0x80000000)
-  ; GFX90A-NEXT:   liveins: $exec:0x000000000000000F, $sgpr12, $sgpr13, $sgpr14, $sgpr15:0x0000000000000003, $sgpr23:0x0000000000000003, $vgpr30, $vgpr31, $sgpr4_sgpr5, $sgpr6_sgpr7:0x000000000000000F, $sgpr8_sgpr9, $sgpr10_sgpr11, $sgpr24_sgpr25, $sgpr26_sgpr27, $sgpr28_sgpr29, $sgpr42_sgpr43, $sgpr16_sgpr17_sgpr18_sgpr19:0x00000000000000F0, $sgpr20_sgpr21_sgpr22_sgpr23:0x000000000000003C, $vgpr2_vgpr3:0x000000000000000F, $vgpr10_vgpr11:0x000000000000000F, $vgpr18_vgpr19:0x000000000000000F, $vgpr20_vgpr21:0x000000000000000F, $vgpr22_vgpr23:0x000000000000000F, $vgpr24_vgpr25:0x000000000000000F, $sgpr0_sgpr1_sgpr2_sgpr3
+  ; GFX90A-NEXT:   liveins: $exec, $sgpr12, $sgpr13, $sgpr14, $sgpr15, $sgpr23, $vgpr30, $vgpr31, $sgpr4_sgpr5, $sgpr6_sgpr7:0x000000000000000F, $sgpr8_sgpr9, $sgpr10_sgpr11, $sgpr24_sgpr25, $sgpr26_sgpr27, $sgpr28_sgpr29, $sgpr42_sgpr43, $sgpr16_sgpr17_sgpr18_sgpr19:0x00000000000000F0, $sgpr20_sgpr21_sgpr22_sgpr23:0x000000000000003C, $vgpr2_vgpr3:0x000000000000000F, $vgpr10_vgpr11:0x000000000000000F, $vgpr18_vgpr19:0x000000000000000F, $vgpr20_vgpr21:0x000000000000000F, $vgpr22_vgpr23:0x000000000000000F, $vgpr24_vgpr25:0x000000000000000F, $sgpr0_sgpr1_sgpr2_sgpr3
   ; GFX90A-NEXT: {{  $}}
   ; GFX90A-NEXT:   renamable $vgpr15 = COPY killed renamable $sgpr23, implicit $exec
   ; GFX90A-NEXT:   renamable $vgpr17 = COPY killed renamable $sgpr15, implicit $exec
diff --git a/llvm/test/CodeGen/AMDGPU/postra-machine-sink.mir b/llvm/test/CodeGen/AMDGPU/postra-machine-sink.mir
index ff7a40bfde9580..86863c31753642 100644
--- a/llvm/test/CodeGen/AMDGPU/postra-machine-sink.mir
+++ b/llvm/test/CodeGen/AMDGPU/postra-machine-sink.mir
@@ -5,7 +5,7 @@
 # CHECK-LABEL: bb.0:
 # CHECK: renamable $sgpr1 = COPY renamable $sgpr2
 # CHECK-LABEL: bb.1:
-# CHECK: liveins: $sgpr0_sgpr1:0x000000000000000F
+# CHECK: liveins: $sgpr0_sgpr1
 # CHECK: renamable $vgpr1_vgpr2 = COPY renamable $sgpr0_sgpr1
 
 ---

@jayfoad jayfoad requested a review from kaz7 January 29, 2024 14:22
@jayfoad
Copy link
Contributor Author

jayfoad commented Jan 29, 2024

This is an alternative to #79462. Both will fix #78942 but in different ways.

for (auto U : UsedOpsInCopy) {
Register SrcReg = MI->getOperand(U).getReg();
LaneBitmask Mask;
for (MCRegUnitMaskIterator S(SrcReg, TRI); S.isValid(); ++S)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this add the test case from VE?

@jayfoad
Copy link
Contributor Author

jayfoad commented Feb 5, 2024

Should this add the test case from VE?

It would be nice to have a VE test case but the only one I've seen so far is main...kaz7:llvm-project:computeregunitlanemasks-test which is specific to the TableGen fix (#79462) - it is no use for this fix. @kaz7

@kaz7
Copy link
Contributor

kaz7 commented Feb 9, 2024

This patch avoids the problem I mentioned. However, it won't solve the problem, in my opinion. We don't want to have 0x00000000 lanemask value for general registers, don't we? #79835 (#79462) correct such problems. Please consider to apply #79835 too, please.

I think this patch itself is a good enhancement. So, I appreciate if you review both, this and #79835.

I'll try to create a regression test for this patch too.

@jayfoad
Copy link
Contributor Author

jayfoad commented Feb 9, 2024

This patch avoids the problem I mentioned. However, it won't solve the problem, in my opinion. We don't want to have 0x00000000 lanemask value for general registers, don't we?

We're talking about special regunits that are only created to represent ad hoc aliasing between two registers. What should the lanemask be for such a regunit - 0 or -1 or something else? I don't think there is a single correct answer to that question. Maybe the lanemask should be poison and we should make sure that no-one ever depends on its value?

@qcolombet @MatzeB

@qcolombet
Copy link
Collaborator

This simplifies the code and matches other places where a whole register is marked as livein.

What you are saying is other places don't bother getting this extra detail.

What does this impact?
For instance, does this makes the verifier less accurate, etc.?

This also avoids problems when regunits that are synthesized by TableGen
to represent ad hoc aliasing have a lane mask of 0.

I have mixed feeling about that.
On one hand, I feel we are working around a real problem. On the other hand, I feel that using aliases in this way (ad hoc) is an abuse of the system.

@jayfoad
Copy link
Contributor Author

jayfoad commented Feb 14, 2024

This simplifies the code and matches other places where a whole register is marked as livein.

What you are saying is other places don't bother getting this extra detail.

Yes. LaneBitmask::getAll is used as a default in a lot of places, so I assume that it should always be treated identically to the more accurate mask that we used to calculate here.

What does this impact? For instance, does this makes the verifier less accurate, etc.?

I don't know for sure. I just know that it does not have much effect on the existing lit tests.

This also avoids problems when regunits that are synthesized by TableGen
to represent ad hoc aliasing have a lane mask of 0.

I have mixed feeling about that. On one hand, I feel we are working around a real problem. On the other hand, I feel that using aliases in this way (ad hoc) is an abuse of the system.

You mean the whole tablegen Register field list<Register> Aliases = []; mechanism is an abuse of the system? Maybe, but it seems like quite a lot of work to get rid of the existing in-tree uses.

By the way I have yet another idea for fixing the reported problem. Currently if a leaf register has an ad hoc alias, tablegen will give it a regunit to represent the alias but it will not give it its own "leaf" regunit. My idea is that we could change that, so that leaf registers always get a unique leaf regunit, in addition to any regunits for ad hoc aliasing. This is something that I need anyway for another project: converting MRI / TRI to track reserved regunits instead of reserved registers.

@qcolombet
Copy link
Collaborator

Sounds good let’s go with this patch and see if anybody complains about the removal of the lanes in the live ins. I expect that people that care about that would use the LiveIIntervals anyway.

You mean the whole tablegen Register field list<Register> Aliases = []; mechanism is an abuse of the system? Maybe, but it seems like quite a lot of work to get rid of the existing in-tree uses.

I mean the part where they don’t actually share a regunit.
And to be clear I’m not advocating for removing the aliases field.

By the way I have yet another idea for fixing the reported […]

Sounds like a plan, although I’m a little worried about the impact on things like pressure sets.
I’d say give it a try and we’ll see how it goes!

@jayfoad
Copy link
Contributor Author

jayfoad commented Feb 15, 2024

You mean the whole tablegen Register field list<Register> Aliases = []; mechanism is an abuse of the system? Maybe, but it seems like quite a lot of work to get rid of the existing in-tree uses.

I mean the part where they don’t actually share a regunit. And to be clear I’m not advocating for removing the aliases field.

I was not aware of any uses of Aliases when the registers already share some regunit. Wouldn't that be redundant? I thought every use of Aliases was deemed to be "ad hoc aliasing".

@jayfoad jayfoad merged commit 2df652a into llvm:main Feb 15, 2024
5 checks passed
@jayfoad jayfoad deleted the simplify-machinesink-updatelivein-2 branch February 15, 2024 10:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Calculation of regunitmasks seems wrong
5 participants