Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CGP] Permit tail call optimization on undefined return value #82419

Merged

Conversation

antoniofrighetto
Copy link
Contributor

We should be able to freely allow tail call optimization on undefined values as well.

Fixes: #82387.

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 20, 2024

@llvm/pr-subscribers-backend-aarch64

@llvm/pr-subscribers-backend-x86

Author: Antonio Frighetto (antoniofrighetto)

Changes

We should be able to freely allow tail call optimization on undefined values as well.

Fixes: #82387.


Full diff: https://github.com/llvm/llvm-project/pull/82419.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/CodeGenPrepare.cpp (+3-2)
  • (modified) llvm/test/CodeGen/X86/tailcall-cgp-dup.ll (+22)
diff --git a/llvm/lib/CodeGen/CodeGenPrepare.cpp b/llvm/lib/CodeGen/CodeGenPrepare.cpp
index 4036f18dbc6794..feefe87f406365 100644
--- a/llvm/lib/CodeGen/CodeGenPrepare.cpp
+++ b/llvm/lib/CodeGen/CodeGenPrepare.cpp
@@ -2686,8 +2686,9 @@ bool CodeGenPrepare::dupRetToEnableTailCallOpts(BasicBlock *BB,
             attributesPermitTailCall(F, CI, RetI, *TLI)) {
           // Either we return void or the return value must be the first
           // argument of a known intrinsic or library function.
-          if (!V || (isIntrinsicOrLFToBeTailCalled(TLInfo, CI) &&
-                     V == CI->getArgOperand(0))) {
+          if (!V || isa<UndefValue>(V) ||
+              (isIntrinsicOrLFToBeTailCalled(TLInfo, CI) &&
+               V == CI->getArgOperand(0))) {
             TailCallBBs.push_back(Pred);
           }
         }
diff --git a/llvm/test/CodeGen/X86/tailcall-cgp-dup.ll b/llvm/test/CodeGen/X86/tailcall-cgp-dup.ll
index 401ed9f7bc5a9e..92811c87f5623f 100644
--- a/llvm/test/CodeGen/X86/tailcall-cgp-dup.ll
+++ b/llvm/test/CodeGen/X86/tailcall-cgp-dup.ll
@@ -362,8 +362,30 @@ return:
   ret ptr %src
 }
 
+@i = global i32 0, align 4
+
+define i32 @undef_tailc() nounwind {
+; CHECK-LABEL: undef_tailc:
+; CHECK:       ## %bb.0:
+; CHECK-NEXT:    cmpl $0, _i(%rip)
+; CHECK-NEXT:    jne _qux ## TAILCALL
+; CHECK-NEXT:  ## %bb.1:
+; CHECK-NEXT:    retq
+  %1 = load i32, ptr @i, align 4
+  %2 = icmp eq i32 %1, 0
+  br i1 %2, label %5, label %3
+
+3:
+  %4 = tail call i32 @qux()
+  br label %5
+
+5:
+  ret i32 undef
+}
+
 declare void @llvm.memcpy.p0.p0.i64(ptr noalias nocapture writeonly, ptr noalias nocapture readonly, i64, i1)
 declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1)
 declare noalias ptr @malloc(i64)
 declare ptr @strcpy(ptr noalias returned writeonly, ptr noalias nocapture readonly)
 declare ptr @baz(ptr, ptr)
+declare i32 @qux()

@@ -2686,8 +2686,9 @@ bool CodeGenPrepare::dupRetToEnableTailCallOpts(BasicBlock *BB,
attributesPermitTailCall(F, CI, RetI, *TLI)) {
// Either we return void or the return value must be the first
// argument of a known intrinsic or library function.
if (!V || (isIntrinsicOrLFToBeTailCalled(TLInfo, CI) &&
V == CI->getArgOperand(0))) {
if (!V || isa<UndefValue>(V) ||
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just leave V as nullptr back at line 2602?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the explicit UndefValue check is fine. Especially if we also want to extend the phi case above to handle undef incoming values later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we ever hit the phi case? Looking at https://clang.godbolt.org/z/eq8vTxvzM, it seems that in the first case it suffices one non-undef incoming value to tail call, whereas the second case is refined by CGP as follows:

define dso_local i32 @foo() local_unnamed_addr #0 {
  %1 = load i32, ptr @i, align 4
  switch i32 %1, label %6 [
    i32 2, label %2
    i32 5, label %4
  ]

2:                                                ; preds = %0
  %3 = tail call i32 @bar() #2
  br label %6

4:                                                ; preds = %0
  %5 = tail call i32 @qux() #2
  ret i32 %5

6:                                                ; preds = %2, %0
  ret i32 undef
}

Thus getting tail called as part of this change.

@widlarizer
Copy link
Contributor

Confirmed on RISC-V to fix the reproducer for the issue mentioned above, thanks a lot for the fast PR!

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

llvm/test/CodeGen/X86/tailcall-cgp-dup.ll Outdated Show resolved Hide resolved
@@ -2686,8 +2686,9 @@ bool CodeGenPrepare::dupRetToEnableTailCallOpts(BasicBlock *BB,
attributesPermitTailCall(F, CI, RetI, *TLI)) {
// Either we return void or the return value must be the first
// argument of a known intrinsic or library function.
if (!V || (isIntrinsicOrLFToBeTailCalled(TLInfo, CI) &&
V == CI->getArgOperand(0))) {
if (!V || isa<UndefValue>(V) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the explicit UndefValue check is fine. Especially if we also want to extend the phi case above to handle undef incoming values later.

@widlarizer
Copy link
Contributor

  • TODO: fix tests

From CI: Failed Tests (3):
LLVM :: CodeGen/AArch64/addsub.ll
LLVM :: CodeGen/AArch64/callbr-asm-obj-file.ll
LLVM :: CodeGen/RISCV/pr51206.ll

We may freely allow tail call optzs on undef values as well.

Fixes: llvm#82387.
@antoniofrighetto antoniofrighetto merged commit 25e7e8d into llvm:main Feb 22, 2024
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missed tail call optimization on undefined return value for all architectures
5 participants