-
Notifications
You must be signed in to change notification settings - Fork 11k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[X86][vectorcall] Do not consume register for indirect return value #97939
Conversation
This is how MSVC handles it. https://godbolt.org/z/Eav3vx7cd
@llvm/pr-subscribers-clang-codegen @llvm/pr-subscribers-clang Author: Phoebe Wang (phoebewang) ChangesThis is how MSVC handles it. https://godbolt.org/z/Eav3vx7cd Full diff: https://github.com/llvm/llvm-project/pull/97939.diff 2 Files Affected:
diff --git a/clang/lib/CodeGen/Targets/X86.cpp b/clang/lib/CodeGen/Targets/X86.cpp
index 3146caba1c615..3c0947589ce3d 100644
--- a/clang/lib/CodeGen/Targets/X86.cpp
+++ b/clang/lib/CodeGen/Targets/X86.cpp
@@ -469,7 +469,7 @@ bool X86_32ABIInfo::canExpandIndirectArgument(QualType Ty) const {
ABIArgInfo X86_32ABIInfo::getIndirectReturnResult(QualType RetTy, CCState &State) const {
// If the return value is indirect, then the hidden argument is consuming one
// integer register.
- if (State.FreeRegs) {
+ if (State.CC != llvm::CallingConv::X86_VectorCall && State.FreeRegs) {
--State.FreeRegs;
if (!IsMCUABI)
return getNaturalAlignIndirectInReg(RetTy);
diff --git a/clang/test/CodeGen/vectorcall.c b/clang/test/CodeGen/vectorcall.c
index 71dc3b0b9585a..cab7fc0972d7b 100644
--- a/clang/test/CodeGen/vectorcall.c
+++ b/clang/test/CodeGen/vectorcall.c
@@ -90,7 +90,7 @@ struct HVA4 __vectorcall hva6(struct HVA4 a, struct HVA4 b) { return b;}
// X64: define dso_local x86_vectorcallcc %struct.HVA4 @"\01hva6@@128"(%struct.HVA4 inreg %a.coerce, ptr noundef %b)
struct HVA5 __vectorcall hva7(void) {struct HVA5 a = {}; return a;}
-// X86: define dso_local x86_vectorcallcc void @"\01hva7@@0"(ptr dead_on_unwind inreg noalias writable sret(%struct.HVA5) align 16 %agg.result)
+// X86: define dso_local x86_vectorcallcc void @"\01hva7@@0"(ptr dead_on_unwind noalias writable sret(%struct.HVA5) align 16 %agg.result)
// X64: define dso_local x86_vectorcallcc void @"\01hva7@@0"(ptr dead_on_unwind noalias writable sret(%struct.HVA5) align 16 %agg.result)
v4f32 __vectorcall hva8(v4f32 a, v4f32 b, v4f32 c, v4f32 d, int e, v4f32 f) {return f;}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like MSVC also applies this rule to fastcall.
Maybe put a boolean in the "state" to try to group together the code for specific conventions, instead of directly checking the CC.
Good catch, done!
There are 3 special conventions here: vectorcall, fastcall and regcall. We sometime check them all, sometime check one or two of them. I think it's impossible to use one boolean to group them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I meant, at the beginning of X86_32ABIInfo::computeInfo there's a chain of if statements that set up the properties of different calling conventions, and maybe some bits could be set there. If you don't think that makes sense, though, it's fine.
Thanks @efriedma-quic! I prefer to using the convention name directly. I think it's more readable than hiding them under a state bit. |
Thanks! I was going to say, surely this can't be right, I was pretty confident that sret pointers consume registers for these conventions, but it looks like I was mistaken. I think this bug and code dates to 661f35b from 2014, which I think made the C++ and "C" sret return paths similar. You can see from this example that when C++ rules force the use of an sret indirect return, a register is consumed: I think this reveals how there is more separation between the C++ and C layers of the MSVC compiler than we have in Clang. We've seen similar examples of this kind of issue on AAarch64, where we have to choose between X0 and X8 depending on whether we're doing sret for C++ reasons or architectural reasons. |
Could you explain more about the example? I didn't find where the register consumed. Don't both |
I lost the You can see the load from [edx] and store to [ecx]:
|
Got it, thanks for the information! |
This is how MSVC handles it. https://godbolt.org/z/Eav3vx7cd