Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[X86][vectorcall] Do not consume register for indirect return value #97939

Merged
merged 2 commits into from
Jul 8, 2024

Conversation

phoebewang
Copy link
Contributor

This is how MSVC handles it. https://godbolt.org/z/Eav3vx7cd

@llvmbot llvmbot added clang Clang issues not falling into any other category backend:X86 clang:codegen labels Jul 7, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Jul 7, 2024

@llvm/pr-subscribers-clang-codegen
@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-clang

Author: Phoebe Wang (phoebewang)

Changes

This is how MSVC handles it. https://godbolt.org/z/Eav3vx7cd


Full diff: https://github.com/llvm/llvm-project/pull/97939.diff

2 Files Affected:

  • (modified) clang/lib/CodeGen/Targets/X86.cpp (+1-1)
  • (modified) clang/test/CodeGen/vectorcall.c (+1-1)
diff --git a/clang/lib/CodeGen/Targets/X86.cpp b/clang/lib/CodeGen/Targets/X86.cpp
index 3146caba1c615..3c0947589ce3d 100644
--- a/clang/lib/CodeGen/Targets/X86.cpp
+++ b/clang/lib/CodeGen/Targets/X86.cpp
@@ -469,7 +469,7 @@ bool X86_32ABIInfo::canExpandIndirectArgument(QualType Ty) const {
 ABIArgInfo X86_32ABIInfo::getIndirectReturnResult(QualType RetTy, CCState &State) const {
   // If the return value is indirect, then the hidden argument is consuming one
   // integer register.
-  if (State.FreeRegs) {
+  if (State.CC != llvm::CallingConv::X86_VectorCall && State.FreeRegs) {
     --State.FreeRegs;
     if (!IsMCUABI)
       return getNaturalAlignIndirectInReg(RetTy);
diff --git a/clang/test/CodeGen/vectorcall.c b/clang/test/CodeGen/vectorcall.c
index 71dc3b0b9585a..cab7fc0972d7b 100644
--- a/clang/test/CodeGen/vectorcall.c
+++ b/clang/test/CodeGen/vectorcall.c
@@ -90,7 +90,7 @@ struct HVA4 __vectorcall hva6(struct HVA4 a, struct HVA4 b) { return b;}
 // X64: define dso_local x86_vectorcallcc %struct.HVA4 @"\01hva6@@128"(%struct.HVA4 inreg %a.coerce, ptr noundef %b)
 
 struct HVA5 __vectorcall hva7(void) {struct HVA5 a = {}; return a;}
-// X86: define dso_local x86_vectorcallcc void @"\01hva7@@0"(ptr dead_on_unwind inreg noalias writable sret(%struct.HVA5) align 16 %agg.result)
+// X86: define dso_local x86_vectorcallcc void @"\01hva7@@0"(ptr dead_on_unwind noalias writable sret(%struct.HVA5) align 16 %agg.result)
 // X64: define dso_local x86_vectorcallcc void @"\01hva7@@0"(ptr dead_on_unwind noalias writable sret(%struct.HVA5) align 16 %agg.result)
 
 v4f32 __vectorcall hva8(v4f32 a, v4f32 b, v4f32 c, v4f32 d, int e, v4f32 f) {return f;}

Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like MSVC also applies this rule to fastcall.

Maybe put a boolean in the "state" to try to group together the code for specific conventions, instead of directly checking the CC.

@phoebewang
Copy link
Contributor Author

It looks like MSVC also applies this rule to fastcall.

Good catch, done!

Maybe put a boolean in the "state" to try to group together the code for specific conventions, instead of directly checking the CC.

There are 3 special conventions here: vectorcall, fastcall and regcall. We sometime check them all, sometime check one or two of them. I think it's impossible to use one boolean to group them.

Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I meant, at the beginning of X86_32ABIInfo::computeInfo there's a chain of if statements that set up the properties of different calling conventions, and maybe some bits could be set there. If you don't think that makes sense, though, it's fine.

@phoebewang
Copy link
Contributor Author

I meant, at the beginning of X86_32ABIInfo::computeInfo there's a chain of if statements that set up the properties of different calling conventions, and maybe some bits could be set there. If you don't think that makes sense, though, it's fine.

Thanks @efriedma-quic! I prefer to using the convention name directly. I think it's more readable than hiding them under a state bit.

@phoebewang phoebewang merged commit 7e01e64 into llvm:main Jul 8, 2024
3 of 5 checks passed
@phoebewang phoebewang deleted the regcall branch July 8, 2024 13:23
@rnk
Copy link
Collaborator

rnk commented Jul 8, 2024

Thanks! I was going to say, surely this can't be right, I was pretty confident that sret pointers consume registers for these conventions, but it looks like I was mistaken. I think this bug and code dates to 661f35b from 2014, which I think made the C++ and "C" sret return paths similar.

You can see from this example that when C++ rules force the use of an sret indirect return, a register is consumed:
https://godbolt.org/z/W3fdn8s5Y

I think this reveals how there is more separation between the C++ and C layers of the MSVC compiler than we have in Clang. We've seen similar examples of this kind of issue on AAarch64, where we have to choose between X0 and X8 depending on whether we're doing sret for C++ reasons or architectural reasons.

@phoebewang
Copy link
Contributor Author

You can see from this example that when C++ rules force the use of an sret indirect return, a register is consumed:
https://godbolt.org/z/W3fdn8s5Y

Could you explain more about the example? I didn't find where the register consumed. Don't both copy_trivial and copy_nontrivial use the same esp + 4 to pass the pointer of sret?

@rnk
Copy link
Collaborator

rnk commented Jul 9, 2024

I lost the __fastcall or __vectorcall during editing, here's a fixed link: https://godbolt.org/z/46j33z8bc

You can see the load from [edx] and store to [ecx]:

NonTrivial copy_nontrivial(NonTrivial *) PROC   ; copy_nontrivial, COMDAT
        mov     eax, DWORD PTR [edx]
        mov     DWORD PTR [ecx], eax
...

@phoebewang
Copy link
Contributor Author

I lost the __fastcall or __vectorcall during editing, here's a fixed link: https://godbolt.org/z/46j33z8bc

You can see the load from [edx] and store to [ecx]:

NonTrivial copy_nontrivial(NonTrivial *) PROC   ; copy_nontrivial, COMDAT
        mov     eax, DWORD PTR [edx]
        mov     DWORD PTR [ecx], eax
...

Got it, thanks for the information!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 clang:codegen clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants