Skip to content

Conversation

@LuoYuanke
Copy link
Contributor

When switch from fast isel to dag isel the input value is from llvm IR instruction.
If the instruction is call we should get the calling convention of the callee and
pass it to RegsForValue::getCopyFromRegs, so that it can deduce the right RegisterVT
of the returned value of the callee.

When switch from fast isel to dag isel the input value is from llvm IR instruction.
If the instruction is call we should get the calling convention of the callee and
pass it to RegsForValue::getCopyFromRegs, so that it can deduce the right RegisterVT
of the returned value of the callee.
@llvmbot llvmbot added backend:X86 llvm:SelectionDAG SelectionDAGISel as well labels Oct 22, 2025
@llvmbot
Copy link
Member

llvmbot commented Oct 22, 2025

@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-llvm-selectiondag

Author: Luo Yuanke (LuoYuanke)

Changes

When switch from fast isel to dag isel the input value is from llvm IR instruction.
If the instruction is call we should get the calling convention of the callee and
pass it to RegsForValue::getCopyFromRegs, so that it can deduce the right RegisterVT
of the returned value of the callee.


Full diff: https://github.com/llvm/llvm-project/pull/164565.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (+6-1)
  • (added) llvm/test/CodeGen/X86/bf16-fast-isel.ll (+19)
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 20a0efd3afa1c..14f9fdfddf658 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -1977,8 +1977,13 @@ SDValue SelectionDAGBuilder::getValueImpl(const Value *V) {
   if (const Instruction *Inst = dyn_cast<Instruction>(V)) {
     Register InReg = FuncInfo.InitializeRegForValue(Inst);
 
+    std::optional<CallingConv::ID> CallConv = std::nullopt;
+    auto *Callee = dyn_cast<CallInst>(Inst);
+    if (Callee && !Callee->isInlineAsm())
+      CallConv = Callee->getCallingConv();
+
     RegsForValue RFV(*DAG.getContext(), TLI, DAG.getDataLayout(), InReg,
-                     Inst->getType(), std::nullopt);
+                     Inst->getType(), CallConv);
     SDValue Chain = DAG.getEntryNode();
     return RFV.getCopyFromRegs(DAG, FuncInfo, getCurSDLoc(), Chain, nullptr, V);
   }
diff --git a/llvm/test/CodeGen/X86/bf16-fast-isel.ll b/llvm/test/CodeGen/X86/bf16-fast-isel.ll
new file mode 100644
index 0000000000000..43622566d2974
--- /dev/null
+++ b/llvm/test/CodeGen/X86/bf16-fast-isel.ll
@@ -0,0 +1,19 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
+; RUN: llc < %s -mtriple=x86_64-unknown-unknown | FileCheck %s
+
+define i8 @test(ptr %f) nounwind {
+; CHECK-LABEL: test:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    pushq %rax
+; CHECK-NEXT:    callq foo@PLT
+; CHECK-NEXT:    callq bar@PLT
+; CHECK-NEXT:    popq %rcx
+; CHECK-NEXT:    retq
+entry:
+  %call = call bfloat @foo(ptr %f)
+  %call2 = call noundef zeroext i8 @bar(bfloat %call)
+  ret i8 %call2
+}
+
+declare bfloat @foo(ptr %f)
+declare zeroext i8 @bar(bfloat)

%call2 = call noundef zeroext i8 @bar(bfloat %call)
ret i8 %call2
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the same testcase wit Han indirect call. Can you also test each case with a non-default calling convention

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added test case for indirect call and fast calling convention.

if (const Instruction *Inst = dyn_cast<Instruction>(V)) {
Register InReg = FuncInfo.InitializeRegForValue(Inst);

std::optional<CallingConv::ID> CallConv = std::nullopt;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::optional<CallingConv::ID> CallConv = std::nullopt;
std::optional<CallingConv::ID> CallConv;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 1982 to 1983
if (Callee && !Callee->isInlineAsm())
CallConv = Callee->getCallingConv();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is broken for indirect calls. You should take the calling convention directly from the callsite, not the callee

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the callsite instruction. Let me change the variable name to CI.

@LuoYuanke LuoYuanke requested a review from arsenm October 22, 2025 07:52
@phoebewang
Copy link
Contributor

Does this solve #134222?

@@ -0,0 +1,66 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
; RUN: llc < %s -mtriple=x86_64-unknown-unknown | FileCheck %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you missing -O0 here?

Copy link
Contributor Author

@LuoYuanke LuoYuanke Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Let me add "--fast-isel" option for the test case. The issue happens when switching from fast isel to dag isel where fast isel fails to select call instruction.

@LuoYuanke
Copy link
Contributor Author

Does this solve #134222?

Solve the same issue of https://godbolt.org/z/15K5Pb7GG which generate incorrect instruction "call __truncsfbf2@PLT". The callee return bf16 value through xmm0, so don't need to convert from f32 to bf16.

@LuoYuanke LuoYuanke requested a review from phoebewang October 22, 2025 08:17
@phoebewang
Copy link
Contributor

Does this solve #134222?

Solve the same issue of https://godbolt.org/z/15K5Pb7GG which generate incorrect instruction "call __truncsfbf2@PLT". The callee return bf16 value through xmm0, so don't need to convert from f32 to bf16.

Great! I spent some time on it, but didn't find a concise solution.

Copy link
Contributor

@phoebewang phoebewang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Contributor

@KanRobert KanRobert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@LuoYuanke LuoYuanke merged commit 5afe29e into llvm:main Oct 22, 2025
10 checks passed
@LuoYuanke LuoYuanke deleted the fast-isel-bug branch October 22, 2025 09:11
@phoebewang
Copy link
Contributor

@LuoYuanke Would you help verify the initial problems reported in #134222 and #151692 and close them if they are solved?

@LuoYuanke
Copy link
Contributor Author

@LuoYuanke Would you help verify the initial problems reported in #134222 and #151692 and close them if they are solved?

Sure, I'll verify the two issues.

Register InReg = FuncInfo.InitializeRegForValue(Inst);

std::optional<CallingConv::ID> CallConv;
auto *CI = dyn_cast<CallInst>(Inst);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should also be CallBase?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean CallBase can cover invoke, callbr instructions which also have cconv, but CallInst only cover call instruction?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I try to create a test case for callbr, but I got llc crash with below case. It seems x86 don't support half or bfloat type in inline assembly.

define half @test1(float %x) {
entry:
  %ret = callbr half asm "vxorps $1, $1, $0 jmp ${2:l}", "=x,x,!i"(float %x)
          to label %normal [label %abnormal]
normal:
  ret half %ret

abnormal:
  ret half 1.0
}

error: couldn't allocate output register for constraint 'x'

I also got llc crash when running below test case for invoke.

declare bfloat @pers(...)

declare bfloat @foo();

define bfloat @test_lp() personality ptr @pers {
entry:
  %1 = invoke bfloat @foo()
          to label %then unwind label %hotlp

then:
  ret bfloat %1

hotlp:
  %2 = landingpad { ptr, bfloat }
          cleanup
  br label %lpret

lpret:
  %3 = extractvalue { ptr, bfloat } %2, 1
  ret bfloat %3
}

So I'd like to keep it as CallInst for now. When bfloat is better supported in callbr and invoke, we can review CallInst and support more call-alike instruction.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

callbr is special but invoke is straightforward. You should also fix whatever the crash is

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This absolutely needs to be CallBase, leaving this like this is just waiting for someone else to waste time debugging this in the future

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let me create a PR to change it.

Copy link
Contributor Author

@LuoYuanke LuoYuanke Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a PR #164769, but it is difficult to create a test case. We need to have a call instruction to use the output of invoke instruction, besides, the call and invoke instruction should be in the same basic block to trigger the issue. Below code can be a base code for the test case.

declare i32 @pers(...)
declare bfloat @foo();

define i32 @test_lp() personality ptr @pers {
entry:
  %1 = invoke bfloat @foo()
          to label %then unwind label %hotlp

then:
  %call = call i32 @bar2(bfloat %1)
  ret i32 %call

hotlp:
  %2 = landingpad { ptr, i32 }
          cleanup
  br label %lpret

lpret:
  %3 = extractvalue { ptr, i32 } %2, 1
  ret i32 %3
}

declare i32 @bar2(bfloat)

LuoYuanke added a commit that referenced this pull request Oct 25, 2025
This is to follow the discussion in
#164565
CallBase can cover more call-like instructions which carry caling
convention flag.

Co-authored-by: Yuanke Luo <ykluo@birentech.com>
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Oct 25, 2025
This is to follow the discussion in
llvm/llvm-project#164565
CallBase can cover more call-like instructions which carry caling
convention flag.

Co-authored-by: Yuanke Luo <ykluo@birentech.com>
dvbuka pushed a commit to dvbuka/llvm-project that referenced this pull request Oct 27, 2025
)

When switch from fast isel to dag isel the input value is from llvm IR
instruction.
If the instruction is call we should get the calling convention of the
callee and
pass it to RegsForValue::getCopyFromRegs, so that it can deduce the
right RegisterVT
of the returned value of the callee.

---------

Co-authored-by: Yuanke Luo <ykluo@birentech.com>
dvbuka pushed a commit to dvbuka/llvm-project that referenced this pull request Oct 27, 2025
This is to follow the discussion in
llvm#164565
CallBase can cover more call-like instructions which carry caling
convention flag.

Co-authored-by: Yuanke Luo <ykluo@birentech.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:X86 llvm:SelectionDAG SelectionDAGISel as well

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants