Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ThinLTO] Possible missed import of functions for indirect-call-promotion when linkage-name differs from mangled-name for local-linkage functions #74565

Closed
minglotus-6 opened this issue Dec 6, 2023 · 11 comments · Fixed by #76994
Assignees
Labels
LTO Link time optimization (regular/full LTO or ThinLTO)

Comments

@minglotus-6
Copy link
Contributor

minglotus-6 commented Dec 6, 2023

Global identifiers has the format [filename:]<mangled-name>, and its MD5 hash are known as GUIDs to compute the import of referenced symbols in ThinLTO.

When the referenced symbols is a callee of indirect call (whether the callee is in the same IR module as caller, or from a different IR module), the reference edge comes from PGO profiles as the MD5 hash of IRPGO function names, which has the format [filename;]<linkage-name> (more details)

PR/74008 would update the delimiter to be ;, but it doesn't fix the case when <mangled-name> and <linkage-name> are different. Note [filename;] prefix exists for local-linkage functions only, while the difference of <mangled-name> and <linkage-name> might exist for external functions according to the Mangler impl.

Looking at the code (Mangler used to compute IRPGO names), <linkage-name> (e.g., considering linkage and calling-convention) and <mangled-name> (e.g., emitted by Clang for C++) could be different. This difference means possible missed import of functions as indirect-call-promotion callees (or when reference edge needs to be deduced from raw profiles using a similar approach to ICP).

Create this issue so the missed import issue (and thereby missed opt) is tracked.

@teresajohnson
Copy link
Contributor

Global identifiers has the format [filename:], and its MD5 hash are known as GUIDs to compute the import of referenced symbols in ThinLTO.

Should this be updated to indicate that it will use ";" once your other patch is in? I.e. to focus the fact that this is not about that difference.

Also, you mention importing here, but I would specifically specify this is for ThinLTO, where the ICP target is in a different module.

@minglotus-6 minglotus-6 changed the title Possible missed import of functions for indirect-call-promotion when linkage-name differs from mangled-name for local-linkage functions [ThinLTO] Possible missed import of functions for indirect-call-promotion when linkage-name differs from mangled-name for local-linkage functions Dec 6, 2023
@minglotus-6
Copy link
Contributor Author

minglotus-6 commented Dec 6, 2023

Global identifiers has the format [filename:], and its MD5 hash are known as GUIDs to compute the import of referenced symbols in ThinLTO.

Should this be updated to indicate that it will use ";" once your other patch is in? I.e. to focus the fact that this is not about that difference.

I updated the issue description to mention it would be updated to [filename;]<mangled-name> by pr/74008.

Also, you mention importing here, but I would specifically specify this is for ThinLTO, where the ICP target is in a different module.

Added [ThinTLO] in the title, and more details in the description.

@EugeneZelenko EugeneZelenko added LTO Link time optimization (regular/full LTO or ThinLTO) and removed new issue labels Dec 6, 2023
@ellishg
Copy link
Contributor

ellishg commented Dec 6, 2023

Thanks @minglotus-6 for creating this issue. Could you create a small test case of the missed ICP callees? It would really help me understand the problem.

Also, I think you have some typos that I'm a bit confused about: "...it doesn't fix the case when and are different", "while the difference of and might exist".

@minglotus-6
Copy link
Contributor Author

Could you create a small test case of the missed ICP callees? It would really help me understand the problem.

Test case Transforms/PGOProfile/thinlto_indirect_call_promotion.ll in PR/74008 showcases one type of missed function import when semicolon/colon cause MD5 hash differences between pgo func names and global identifiers. Hopefully this showcases the problem how missed imports happens.

The key thing in getting a test case here is to get a executable (thereby raw profiles) where a function's linkage-name differs from mangled-name, which would cause a missed import.

Some findings while trying to do construct a test case:

  • Getting an indirect callee with specially-handled MS functions would be a start step (inferring from mangler implementation).
  • Clang frontend uses private linkage mainly for global variables (not functions), so it'd be tricky to get a C++ executable with private-linkage function IR (from my attempt to reverse-engineer how to do so, so take this with a grain of salt).

I'm wondering if you come across source code examples where IR functions are generated with private linkage (therefore difference between mangled name and linkage-name) in objective-c++ or swift (for the function-ordering use case). If yes, that could be a start to construct the test case.

Also, I think you have some typos that I'm a bit confused about: "...it doesn't fix the case when and are different", "while the difference of and might exist".

Fixed it now (It seems content inside a pair of angled bracket (i.e., < >) won't render but surrounding them with backticks would.)

@ellishg
Copy link
Contributor

ellishg commented Dec 7, 2023

Fixed it now (It seems content inside a pair of angled bracket (i.e., < >) won't render but surrounding them with backticks would.)

Ah I see. I hope you don't mind I fixed this in other areas too.

@minglotus-6
Copy link
Contributor Author

I fixed this in other areas too.

Of course not. Thanks for the correction!

@minglotus-6
Copy link
Contributor Author

I took another attempt only to find out it isn't very practical to synthesis an executable that contains functions with specific microsoft-faststdcall-mangling, on many (x86-64 or arm64) windows machines.

Here are some uninteresting details and how I come to this finding

  • The \01 prefix is added for a simple function from MinGW clang (https://godbolt.org/z/49Kxq95q9) and this \01 prefix is supposed to prevent IRMangler from further name mangling. Functions compiled for some 32-bit windows machine (https://godbolt.org/z/vW61KaPjb) doesn't have \01 prefix but not sure how easy it is to access these machines (not familiar myself)
  • Looking at clang/lib/AST/Mangle.cpp, \01 prefix will be added iff target-specific UserLabelPrefix is not empty (source code).
  • Looking at clang/lib/Basic/Targets/X86.h and clang/lib/Basic/Targets/AArch64.cpp, quite many windows 64-bit targets sets the UserLabelPrefix to non-empty strings (examples for x86_64darwin and arm64), meaning functions compiled for these machines would have the \01 prefix and will not be mangled by IR Manger :(

@ellishg
Copy link
Contributor

ellishg commented Dec 21, 2023

Based on our discussion in #74008 (comment) it seems like one solution could be to change GlobalValue::getGlobalIdentifier() to use linkage names like we do in InstrProf::getIRPGOFuncName(). But we can't use Mangler().getNameWithPrefix() directly because this callsite does not have the GV.

GlobalValue::getGlobalIdentifier(ValueName, Linkage, SourceFileName);

It does, however, have the linkage. After briefly looking at Mangler::getNameWithPrefix(), I suspect that we can cover 99% of cases with this info.

void Mangler::getNameWithPrefix(raw_ostream &OS, const GlobalValue *GV,
bool CannotUsePrivateLabel) const {
ManglerPrefixTy PrefixTy = Default;
assert(GV != nullptr && "Invalid Global Value");
if (GV->hasPrivateLinkage()) {
if (CannotUsePrivateLabel)
PrefixTy = LinkerPrivate;
else
PrefixTy = Private;
}
const DataLayout &DL = GV->getParent()->getDataLayout();
if (!GV->hasName()) {
// Get the ID for the global, assigning a new one if we haven't got one
// already.
unsigned &ID = AnonGlobalIDs[GV];
if (ID == 0)
ID = AnonGlobalIDs.size();
// Must mangle the global into a unique ID.
getNameWithPrefixImpl(OS, "__unnamed_" + Twine(ID), DL, PrefixTy);
return;
}
StringRef Name = GV->getName();
char Prefix = DL.getGlobalPrefix();
// Mangle functions with Microsoft calling conventions specially. Only do
// this mangling for x86_64 vectorcall and 32-bit x86.
const Function *MSFunc = dyn_cast_or_null<Function>(GV->getAliaseeObject());
// Don't add byte count suffixes when '\01' or '?' are in the first
// character.
if (Name.starts_with("\01") ||
(DL.doNotMangleLeadingQuestionMark() && Name.starts_with("?")))
MSFunc = nullptr;
CallingConv::ID CC =
MSFunc ? MSFunc->getCallingConv() : (unsigned)CallingConv::C;
if (!DL.hasMicrosoftFastStdCallMangling() &&
CC != CallingConv::X86_VectorCall)
MSFunc = nullptr;
if (MSFunc) {
if (CC == CallingConv::X86_FastCall)
Prefix = '@'; // fastcall functions have an @ prefix instead of _.
else if (CC == CallingConv::X86_VectorCall)
Prefix = '\0'; // vectorcall functions have no prefix.
}
getNameWithPrefixImpl(OS, Name, PrefixTy, DL, Prefix);
if (!MSFunc)
return;
// If we are supposed to add a microsoft-style suffix for stdcall, fastcall,
// or vectorcall, add it. These functions have a suffix of @N where N is the
// cumulative byte size of all of the parameters to the function in decimal.
if (CC == CallingConv::X86_VectorCall)
OS << '@'; // vectorcall functions use a double @ suffix.
FunctionType *FT = MSFunc->getFunctionType();
if (hasByteCountSuffix(CC) &&
// "Pure" variadic functions do not receive @0 suffix.
(!FT->isVarArg() || FT->getNumParams() == 0 ||
(FT->getNumParams() == 1 && MSFunc->hasStructRetAttr())))
addByteCountSuffix(OS, MSFunc, DL);
}

So my main question is: are we ok with changing the behavior of GlobalValue::getGlobalIdentifier() to use linkage names?

@minglotus-6
Copy link
Contributor Author

minglotus-6 commented Dec 21, 2023

I think changing GlobalValue::getGlobalIdentifier to use linkage names would work for existing callers. The existing callers use getGlobalIdentifier to get a unique name (and sometimes compute a GUID from the name), but they don't depend on the name being <mangled-name>.

It does, however, have the linkage.

Yes getGlobalIdentifier takes linkage as a parameter. To compute a <linkage-name>, the mangling mode parsed from data layout string is needed as well. For example, the global prefix is _ for {MachO, WinCOFFX86} and \0 (i.e., no prefix) for the rest. This prefix is added to linkage name of external functions if it's not \0. It would be a fair chunk of work to implement this.

@ellishg
Copy link
Contributor

ellishg commented Dec 27, 2023

I had hoped that we could pass a DataLayout to GlobalValue::getGlobalIdentifier() so we can get the <linkage-name>. Unfortunately this seems very difficult because the callsite at ModuleSummaryIndexBitcodeReader::setValueGUID() does not have a way to access a DataLayout, even though we do save the TargetTriple a bit up the callstack in ThinLTOCodeGenerator.

@ellishg
Copy link
Contributor

ellishg commented Dec 27, 2023

I'm not sure of a good path forward, but I'll list some options I can think of.

  1. We change GlobalValue::getGlobalIdentifier() to use a linkage name instead of a mangled name. Like I said above, this requires accessing the DataLayout from ModuleSummaryIndexBitcodeReader::setValueGUID() somehow.
  2. We intercept the value profile function name before we compute the GUID and convert the linkage name to a mangled name. GUIDs would remain the same, but global identifiers in IRPGO profiles would still use linkage names.
  3. We revert IRPGO profiles back to using mangled names. llvm-profdata order would stop working because profiles don't record function linkage types and we won't be able to convert mangled names to linkage names. This is required because lld/ld64 expects linkage names to be passed via --symbol-ordering-file/-order-file.
    3 a. A workaround could be to teach lld/ld64 to ignore some prefixes (_, l_, etc) in the orderfile, but I'm not sure if this would break existing use cases.
    3 b. We could also add an option for IRPGO to use linkage names instead of mangled names and users would have to understand that this breaks value profiling.

ellishg added a commit to ellishg/llvm-project that referenced this issue Jan 4, 2024
Change the format of IRPGO counter names to `[<filepath>;]<mangled-name>` which is computed by `GlobalValue::getGlobalIdentifier()` to fix llvm#74565.

In fe05193 (https://reviews.llvm.org/D156569) the format of IRPGO counter names was changed to be `[<filepath>;]<linkage-name>` where `<linkage-name>` is `F.getName()` with some prefix, e.g., `_` or `l_` on Mach-O (it is confusing that `<linkage-name>` is computed with `Mangler().getNameWithPrefix()` while `<mangled-name>` is just `F.getName()`). We discovered in llvm#74565 that this causes some missed import issues on some targets and llvm#74008 is a partial fix.

Since `<mangled-name>` may not match the `<linkage-name>` on some targets like Mach-O, we will need to post-process the output of `llvm-profdata order` before passing to the linker via `-order_file`.

Profiles generated after fe05193 will become stale after this diff, but I think this is acceptable since that patch after the LLVM 18 cut which has not been released yet.
ellishg added a commit that referenced this issue Jan 5, 2024
Change the format of IRPGO counter names to
`[<filepath>;]<mangled-name>` which is computed by
`GlobalValue::getGlobalIdentifier()` to fix #74565.

In fe05193
(https://reviews.llvm.org/D156569) the format of IRPGO counter names was
changed to be `[<filepath>;]<linkage-name>` where `<linkage-name>` is
basically `F.getName()` with some prefix, e.g., `_` or `l_` on Mach-O
(yes, it is confusing that `<linkage-name>` is computed with
`Mangler().getNameWithPrefix()` while `<mangled-name>` is just
`F.getName()`). We discovered in #74565 that this causes some missed
import issues on some targets and #74008 is a partial fix.

Since `<mangled-name>` may not match the `<linkage-name>` on some
targets like Mach-O, we will need to post-process the output of
`llvm-profdata order` before passing to the linker via `-order_file`.

Profiles generated after fe05193 will
become stale after this diff, but I think this is acceptable since that
patch landed after the LLVM 18 cut which hasn't been released yet.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LTO Link time optimization (regular/full LTO or ThinLTO)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants