Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DAG] Add legalization handling for AVGCEIL/AVGFLOOR nodes #92096

Merged
merged 1 commit into from
Jun 12, 2024

Conversation

RKSimon
Copy link
Collaborator

@RKSimon RKSimon commented May 14, 2024

Always match AVG patterns pre-legalization, and use TargetLowering::expandAVG to expand again during legalization.

I've removed the X86 custom AVGCEILU pattern detection and replaced with combines to try and convert other AVG nodes to AVGCEILU.

Copy link

github-actions bot commented May 14, 2024

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 0e346eeac676d909402abe01fb23248bb3efc5e0 58c869b8dd4bf1f2929d06bc244ee97b3bde5fa1 -- llvm/include/llvm/CodeGen/TargetLowering.h llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp llvm/lib/Target/X86/X86ISelLowering.cpp
View the diff from clang-format here.
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
index f435a36305..fb4ac238e3 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
@@ -2823,9 +2823,11 @@ void DAGTypeLegalizer::ExpandIntegerResult(SDNode *N, unsigned ResNo) {
   case ISD::USHLSAT: ExpandIntRes_SHLSAT(N, Lo, Hi); break;
 
   case ISD::AVGCEILS:
-  case ISD::AVGCEILU: 
+  case ISD::AVGCEILU:
   case ISD::AVGFLOORS:
-  case ISD::AVGFLOORU: ExpandIntRes_AVG(N, Lo, Hi); break;
+  case ISD::AVGFLOORU:
+    ExpandIntRes_AVG(N, Lo, Hi);
+    break;
 
   case ISD::SMULFIX:
   case ISD::SMULFIXSAT:
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 82c39f4613..f561e80e25 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -479,7 +479,7 @@ private:
   void ExpandIntRes_SADDSUBO          (SDNode *N, SDValue &Lo, SDValue &Hi);
   void ExpandIntRes_UADDSUBO          (SDNode *N, SDValue &Lo, SDValue &Hi);
   void ExpandIntRes_XMULO             (SDNode *N, SDValue &Lo, SDValue &Hi);
-  void ExpandIntRes_AVG               (SDNode *N, SDValue &Lo, SDValue &Hi);
+  void ExpandIntRes_AVG(SDNode *N, SDValue &Lo, SDValue &Hi);
   void ExpandIntRes_ADDSUBSAT         (SDNode *N, SDValue &Lo, SDValue &Hi);
   void ExpandIntRes_SHLSAT            (SDNode *N, SDValue &Lo, SDValue &Hi);
   void ExpandIntRes_MULFIX            (SDNode *N, SDValue &Lo, SDValue &Hi);

@RKSimon RKSimon force-pushed the legal-avg branch 6 times, most recently from c5278b3 to 57017b3 Compare May 14, 2024 14:38
@RKSimon RKSimon force-pushed the legal-avg branch 4 times, most recently from 016927e to 2f9a4fb Compare May 16, 2024 09:56
RKSimon added a commit that referenced this pull request May 16, 2024
…n looking for a splat constant

Limit the isConstOrConstSplat call to the vector elements we care about

Noticed while investigating regressions in #92096
@RKSimon RKSimon force-pushed the legal-avg branch 2 times, most recently from ad8ab1e to 9c6aa40 Compare May 16, 2024 15:18
if (KnownAmt.isConstant() && KnownAmt.getConstant().ult(VTBits))
Tmp = std::min<uint64_t>(Tmp + KnownAmt.getConstant().getZExtValue(),
VTBits);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an unrelated change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If only... its to fix a thumb2 regression as it lowers v2i64 constant as bitcast(v4i32 constant)

Once this draft has addressed all the regressions I'll turn my attention to pulling out some of these changes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is proving tricky to pull out - but I've confirmed that it doesn't cause any notable compile time diff - as we fallback to ComputeKnownBits call which will call computeKnownBits on the shift amount anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there any helper simpler than computeKnownBits that can look through bitcasts to find a constant?

If you are going to use computeKnownBits, why not use KnownAmt.getMinValue() instead of KnownAmt.getConstant()?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to getMaxValue() (for upper bound) + getMinValue() (for min sign extension) - the shift amount isn't just a bitcast(v4i32 constant) hidden constant, so we do need the abilities of computeKnownBits.

We could update getValidMinimumShiftAmountConstant (et. al) to return std::optional<APInt> to allow it to fallback to computeKnownBits, although that would mean the function would return a value that might not actual exist in the shift amount, I don't think we've used that property but it would still be a change.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created #93182 as a possible cleanup for this (the pull requests are independent though so we can go with the above approach for now). #93182 should get analyzed up by llvm-compile-time-tracker in the next hour or so.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should use getMinValue in both places. It doesn't matter if we don't know an upper bound for the shift amount.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I'm not even sure what the ult check is for, except perhaps to guard against Tmp + getMinValue() overflowing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, its mainly just a sanity/overflow check (somebody always comes along with a i1024 fuzz test or something eventually that makes getZExtValue() assert or cause weird getLimitedValue() behaviour).

Using getMaxValue() was mainly to try and keep closer to the behaviour of getValidMinimumShiftAmountConstant which doesn't accept out of bounds shift amounts.

RKSimon added a commit that referenced this pull request May 19, 2024
…tternMatch

No need for this to be vector specific, and its more likely that scalar cases will appear after #92096
@RKSimon
Copy link
Collaborator Author

RKSimon commented Jun 3, 2024

ping - #93182 is now finished, so this PR is ready to go.

@RKSimon
Copy link
Collaborator Author

RKSimon commented Jun 6, 2024

ping? any objections to me getting this committed now please?

@RKSimon RKSimon force-pushed the legal-avg branch 3 times, most recently from 54b366e to fa32106 Compare June 7, 2024 12:15
@RKSimon
Copy link
Collaborator Author

RKSimon commented Jun 7, 2024

@jayfoad any more comments?

Copy link
Contributor

@jayfoad jayfoad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No objection from me. The logic looks good. But I don't feel I know enough about any of the affected targets to approve it.

@RKSimon
Copy link
Collaborator Author

RKSimon commented Jun 7, 2024

@davemgreen @goldsteinn any objections?

@RKSimon
Copy link
Collaborator Author

RKSimon commented Jun 12, 2024

ping?

Copy link
Contributor

@jayfoad jayfoad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Always match AVG patterns pre-legalization, and use TargetLowering::expandAVG to expand again during legalization.

I've removed the X86 custom AVGCEILU pattern detection and replaced with combines to try and convert other AVG nodes to AVGCEILU.
@RKSimon RKSimon merged commit ea2ee5d into llvm:main Jun 12, 2024
3 of 6 checks passed
@RKSimon RKSimon deleted the legal-avg branch June 12, 2024 13:11
@dtcxzyw
Copy link
Member

dtcxzyw commented Jun 12, 2024

Hi @RKSimon, I think this patch causes some regressions on riscv: dtcxzyw/llvm-codegen-benchmark@97ad8e7

Reproducer:

; llc -mtriple=riscv64 test.ll -o -
define signext i64 @func000000000000002b(i32 signext %0) #0 {
entry:
  %1 = zext nneg i32 %0 to i64
  %2 = add nsw i64 %1, -1
  %3 = lshr i64 %2, 1
  %4 = add nuw nsw i64 %3, 1
  %5 = and i64 %4, 9223372036854775806
  ret i64 %5
}

Before (74f200b):

func000000000000002b:
        addi    a0, a0, -1
        srli    a0, a0, 1
        addi    a0, a0, 1
        andi    a0, a0, -2
        ret

After (47afa10):

func000000000000002b:
        addi    a0, a0, -1
        srli    a0, a0, 1
        addi    a0, a0, 1
        li      a1, -3
        srli    a1, a1, 1
        and     a0, a0, a1
        ret

@dtcxzyw
Copy link
Member

dtcxzyw commented Jun 12, 2024

Hi @RKSimon, I think this patch causes some regressions on riscv: dtcxzyw/llvm-codegen-benchmark@97ad8e7

Reproducer:

; llc -mtriple=riscv64 test.ll -o -
define signext i64 @func000000000000002b(i32 signext %0) #0 {
entry:
  %1 = zext nneg i32 %0 to i64
  %2 = add nsw i64 %1, -1
  %3 = lshr i64 %2, 1
  %4 = add nuw nsw i64 %3, 1
  %5 = and i64 %4, 9223372036854775806
  ret i64 %5
}

Before (74f200b):

func000000000000002b:
        addi    a0, a0, -1
        srli    a0, a0, 1
        addi    a0, a0, 1
        andi    a0, a0, -2
        ret

After (47afa10):

func000000000000002b:
        addi    a0, a0, -1
        srli    a0, a0, 1
        addi    a0, a0, 1
        li      a1, -3
        srli    a1, a1, 1
        and     a0, a0, a1
        ret
SelectionDAG has 17 nodes:
  t0: ch,glue = EntryToken
                  t2: i64,ch = CopyFromReg t0, Register:i64 %0
                t4: i64 = AssertSext t2, ValueType:ch:i32
              t5: i32 = truncate t4
            t6: i64 = sign_extend t5
          t8: i64 = add nsw t6, Constant:i64<-1>
        t10: i64 = srl t8, Constant:i64<1>
      t11: i64 = add nuw nsw t10, Constant:i64<1>
    t13: i64 = and t11, Constant:i64<9223372036854775806>
  t15: ch,glue = CopyToReg t0, Register:i64 $x10, t13
  t16: ch = RISCVISD::RET_GLUE t15, Register:i64 $x10, t15:1



Combining: t16: ch = RISCVISD::RET_GLUE t15, Register:i64 $x10, t15:1

Combining: t15: ch,glue = CopyToReg t0, Register:i64 $x10, t13

Combining: t14: i64 = Register $x10

Combining: t13: i64 = and t11, Constant:i64<9223372036854775806>
Creating constant: t17: i32 = Constant<-1>
Creating new node: t18: i32 = avgfloors t5, Constant:i32<-1>
Creating new node: t19: i64 = sign_extend t18

@RKSimon
Copy link
Collaborator Author

RKSimon commented Jun 12, 2024

cheers - looking at this now

@vitalybuka
Copy link
Collaborator

Probably by this patch as this is the only one in DAG in the blame list
https://lab.llvm.org/buildbot/#/builders/237/builds/7908

Can you please fix or revert?

FYI @fmayer

@dtcxzyw
Copy link
Member

dtcxzyw commented Jun 12, 2024

Probably by this patch as this is the only one in DAG in the blame list https://lab.llvm.org/buildbot/#/builders/237/builds/7908

Can you please fix or revert?

FYI @fmayer

Should be fixed by ca33796.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants