New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DISubprogram has declaration pointing to itself #59241
Comments
@llvm/issue-subscribers-debuginfo |
@llvm/issue-subscribers-openmp |
if you out-of-line the functions (rather than defining them within the class - though maybe still marking them as inline) can you still reproduce it? Perhaps then you can reorder them and reproduce the failure even without openmp? (might require other techniques to move when the ctors are instantiated while leaving their written definitions where they are currently..) |
Thanks for the suggestion. If I change the source as follows: class Request final {
public:
Request();
};
class Response final {
public:
Response();
};
class LoggingAllowedFields {
public:
template <typename T>
static const int loggingFieldsForType();
template <>
const int loggingFieldsForType<Request>();
template <>
const int loggingFieldsForType<Response>();
};
int makeLogDataRequest(const Request& payload);
int makeLogDataResponse(const Response& payload);
Request::Request() {}
Response *log(Request* request) {
makeLogDataRequest(*request);
Response *response = new Response{};
makeLogDataResponse(*response);
return response;
}
int makeLogDataRequest(const Request& payload) {
return LoggingAllowedFields::loggingFieldsForType<Request>();
}
Response::Response() {}
int makeLogDataResponse(const Response& payload) {
return LoggingAllowedFields::loggingFieldsForType<Response>();
} and build it with
It seems to crash at the same place. |
Yeah, looks like the IR is invalid - if this is compiled with -emit-llvm, it bails out:
& I think it might be relatively recent - so your best bet is probably to bisect it & see what revision introduced the problem and followup in post-commit review there. |
We found this issue when trying out clang-15 internally, and it also repro on clang-12 after back-ported https://reviews.llvm.org/D106084 (it produces smaller object files) or just force cc: @amykhuang |
Ah, yeah, sorry, I'd sync'd back 1000 revisions and thought it didn't reproduce, but I messed that up somehow. So, yeah, probably somewhat inherent to ctor homing. But that means it can probably be reproduced with other homing strategies too... let's see
Seems to reproduce without ctor homing? |
Looks like this happened around Clang 10? https://godbolt.org/z/5zejTh9s3 |
Got it down a bit further & still crashing: https://godbolt.org/z/nzj7v16xa
|
It seems it compiles fine on clang-9 but fails on clnag-10 and up. I'll do some bisecting. |
bisected to https://reviews.llvm.org/D69743 |
oh, yeah, the call site stuff adding in declarations for entities we hadn't previously tried to produce declarations for did trip over a few things - not too surprised that there are still latent issues like this. @vedantk any chance you could take a look at this? |
The trouble starts immediately after debug info gen for the call Looking at the history, it seems this fits into a known pattern of bugs that historically were dealt with by adding escape hatches (ad hoc) for "bad" kinds of callees (statics, inlines):
It's been years now since I've had time to work on debug info in particular or llvm in general, and I haven't kept up with how the project has evolved, so I'm scratching my head about what to do here. Istm that the easy/lazy stopgap fix would be to add yet another escape hatch (maybe if CalleeDecl->getTemplateKind() is some known "bad" type?). That doesn't seem ideal, though, as it's just papering over the deeper issue of the finalizeSubprogram step being unable to create a uniqued DISubprogram: that's an issue I have even less background knowledge to lean on, but maybe @adrian-prantl has some pointers? My role at Apple has changed, so unfortunately I'm unlikely to be able to take a deeper look at this any time soon.. |
From if (!CalleeDecl->isStatic() && !CalleeDecl->isInlined())
EmitFunctionDecl(CalleeDecl, CalleeDecl->getLocation(), CalleeType, Func); no |
No worries @vedantk - thanks for the context you've provided.
|
Thanks for the explanation. Just noticed it returns |
As @vedantk suggested, we can tighten the constraint here, like struct t2;
struct t3;
struct t1 {
template <typename T>
static const int f1();
template <>
const int f1<t2>();
template <>
const int f1<t3>();
};
struct t3 { };
t3 v3;
t1 v1;
struct t2 { };
t2 v2;
void f2() {
t1::f1<t2>(); // bad
t1::f1<t3>(); // good
} I think the thing goes wrong with decl at callsite
At the callsite, a new
This creates a self-reference cycle when the temporary node is replaced and RAUW later during void MDNode::handleChangedOperand(void *Ref, Metadata *New) {
... ...
// Drop uniquing for self-reference cycles and deleted constants.
if (New == this || (!New && Old && isa<ConstantAsMetadata>(Old))) {
if (!isResolved())
resolve();
storeDistinctInContext();
return;
}
... ...
} I wonder if there is a way to break the cycle. Also what does it mean by having a declaration (!36) points to another declaration (!10)? Can we simply make the |
This is currently blocking our internal toolchain upgrade. I am OK with skipping templated functions at this moment even though it is a strong condition. @dwblaikie, do you have any suggestions? |
@aprantl @JDevlieghere sounds like Apple originally implemented the call site stuff? Is it something you've still got interest in/time to look at a good fix for this bug? Otherwise it's likely we'll end up with a fairly coarse-grained fix, disabling call site info for calls to function template instantiations or something similar? |
Sorry for the delay, I have something I need to finish first, but I'd like to have a look at this soon! |
I've been trying to answer this question and the docs don't seem very clear on whether this is supposed to be allowed or not. I'll experiment forbidding this and checking what breaks. In the meantime, I also tracked where a declaration to another declaration is created, and it is in this function: clang::CodeGen::CGDebugInfo::EmitFunctionDecl(...) {
4211 llvm::DISubprogram::DISPFlags SPFlags = llvm::DISubprogram::SPFlagZero;
4212 if (CGM.getLangOpts().Optimize)
4213 SPFlags |= llvm::DISubprogram::SPFlagOptimized;
4214
4215 llvm::DINodeArray Annotations = CollectBTFDeclTagAnnotations(D);
4216 llvm::DISubroutineType *STy = getOrCreateFunctionType(D, FnType, Unit);
-> 4217 llvm::DISubprogram *SP =
4218 DBuilder.createFunction(FDContext, Name, LinkageName, Unit, LineNo, STy,
4219 ScopeLine, Flags, SPFlags, TParamsArray.get(),
4220 getFunctionDeclaration(D), nullptr, Annotations);
} Note that Turns out the call to |
Yeah, I think this is where the decl references decl happens. I wonder if we could create a new subprogram for decl only if there is none exists (Does this follow the expectation of subprogram for decl should always be unique?). Even though we'd use a subprogram with incomplete type, in the example, the one with only a fwd decl of |
Also, none of the |
Seems worth a go, then. Though probably worth doing some manual testing of surrounding cases would be worthwhile. (try call site debug info with a plain function declaration, with a member function declared/defined before or after the call, etc) |
I'll give those a try! Also compiling Clang itself before/after the patch and seeing if there is any difference whatsoever in debug information. |
An optimized build/otherwise with call site debug info enabled? |
Oh, good point! I had forgotten that the whole issue started with call site debug info, which requires optimizations. |
Good news, using the proposed patch to rebuild clang, I checked the dwarfdump of libLLVMSupport: the dwarf is identical (ignoring the attributes decl_file/producer/comp_dir). I'll open a patch in phab |
https://reviews.llvm.org/D143921 @apolloww I'm not sure what your username is in Phab |
Subscribed. And thanks for amending the doc as well. |
The function `CGDebugInfo::EmitFunctionDecl` is supposed to create a declaration -- never a _definition_ -- of a subprogram. This is made evident by the fact that the SPFlags never have the "Declaration" bit set by that function. However, when `EmitFunctionDecl` calls `DIBuilder::createFunction`, it still tries to fill the "Declaration" argument by passing it the result of `getFunctionDeclaration(D)`. This will query an internal cache of previously created declarations and, for most code paths, we return nullptr; all is good. However, as reported in [0], there are pathological cases in which we attempt to recreate a declaration, so the cache query succeeds, resulting in a subprogram declaration whose declaration field points to another declaration. Through a series of RAUWs, the declaration field ends up pointing to the SP itself. Self-referential MDNodes can't be `unique`, which causes the verifier to fail (declarations must be `unique`). We can argue that the caller should check the cache first, but this is not a correctness issue (declarations are `unique` anyway). The bug is that `CGDebugInfo::EmitFunctionDecl` should always pass `nullptr` to the declaration argument of `DIBuilder::createFunction`, expressing the fact that declarations don't point to other declarations. AFAICT this is not something for which any reasonable meaning exists. This seems a lot like a copy-paste mistake that has survived for ~10 years, since other places in this file have the exact same call almost token-by-token. I've tested this by compiling LLVMSupport with and without the patch, O2 and O0, and comparing the dwarfdump of the lib. The dumps are identical modulo the attributes decl_file/producer/comp_dir. [0]: #59241 Differential Revision: https://reviews.llvm.org/D143921
The function `CGDebugInfo::EmitFunctionDecl` is supposed to create a declaration -- never a _definition_ -- of a subprogram. This is made evident by the fact that the SPFlags never have the "Declaration" bit set by that function. However, when `EmitFunctionDecl` calls `DIBuilder::createFunction`, it still tries to fill the "Declaration" argument by passing it the result of `getFunctionDeclaration(D)`. This will query an internal cache of previously created declarations and, for most code paths, we return nullptr; all is good. However, as reported in [0], there are pathological cases in which we attempt to recreate a declaration, so the cache query succeeds, resulting in a subprogram declaration whose declaration field points to another declaration. Through a series of RAUWs, the declaration field ends up pointing to the SP itself. Self-referential MDNodes can't be `unique`, which causes the verifier to fail (declarations must be `unique`). We can argue that the caller should check the cache first, but this is not a correctness issue (declarations are `unique` anyway). The bug is that `CGDebugInfo::EmitFunctionDecl` should always pass `nullptr` to the declaration argument of `DIBuilder::createFunction`, expressing the fact that declarations don't point to other declarations. AFAICT this is not something for which any reasonable meaning exists. This seems a lot like a copy-paste mistake that has survived for ~10 years, since other places in this file have the exact same call almost token-by-token. I've tested this by compiling LLVMSupport with and without the patch, O2 and O0, and comparing the dwarfdump of the lib. The dumps are identical modulo the attributes decl_file/producer/comp_dir. [0]: llvm#59241 Differential Revision: https://reviews.llvm.org/D143921 (cherry picked from commit 997dc7e)
The function `CGDebugInfo::EmitFunctionDecl` is supposed to create a declaration -- never a _definition_ -- of a subprogram. This is made evident by the fact that the SPFlags never have the "Declaration" bit set by that function. However, when `EmitFunctionDecl` calls `DIBuilder::createFunction`, it still tries to fill the "Declaration" argument by passing it the result of `getFunctionDeclaration(D)`. This will query an internal cache of previously created declarations and, for most code paths, we return nullptr; all is good. However, as reported in [0], there are pathological cases in which we attempt to recreate a declaration, so the cache query succeeds, resulting in a subprogram declaration whose declaration field points to another declaration. Through a series of RAUWs, the declaration field ends up pointing to the SP itself. Self-referential MDNodes can't be `unique`, which causes the verifier to fail (declarations must be `unique`). We can argue that the caller should check the cache first, but this is not a correctness issue (declarations are `unique` anyway). The bug is that `CGDebugInfo::EmitFunctionDecl` should always pass `nullptr` to the declaration argument of `DIBuilder::createFunction`, expressing the fact that declarations don't point to other declarations. AFAICT this is not something for which any reasonable meaning exists. This seems a lot like a copy-paste mistake that has survived for ~10 years, since other places in this file have the exact same call almost token-by-token. I've tested this by compiling LLVMSupport with and without the patch, O2 and O0, and comparing the dwarfdump of the lib. The dumps are identical modulo the attributes decl_file/producer/comp_dir. [0]: llvm/llvm-project#59241 Differential Revision: https://reviews.llvm.org/D143921
@llvm/issue-subscribers-clang-codegen |
Repro source:
run the command:
stack overflow inside
constructCallSiteEntryDIEs
from asm printer pass when constructing callsitecallee
LoggingAllowedFields::loggingFieldsForType<Request>()
caller
makeLogData<Request>(Request const&)
.with the follow debug info:
Notice that
DISubprogram
loggingFieldsForType<Response>
'sdeclaration
field points to itself, which leads to infinite call stack.The self-reference
DISubprogram
exists in the unoptimized IRs straight from clang codegen, so it is reproducible with the following cmd as well,In order to reproduce, I need all of
-fopenmp
,-O1
( or use a-O
level >= 1) and-g2
:-fopenmp
affects which declarations go intoCodeGenModule::DeferredDeclsToEmit
and their orders. The source has nothing to do with openmp, but this flag is applied universally in our internal code repo. As explained below, I think certain order of the declarations causes the issue.-O
level seems to affect debug info verbosity. If I give-O0
, less debug info is generated and the issue is gone.-g2
implies-debug-info-kind=constructor
. If I give-debug-info-kind=limited
, the issue is gone.Below is what I've found so far after digging into clang codegen.
When
-fopenmp
is given, the order of function codegen iscodegen function
log(Request*)
: create temporaryDICompositeType
forResponse
and store inTypeCache
. It hasDIFlagFwdDecl
flag.codegen function
int makeLogData<Request>(Request const&)
: the type definition ofloggingFieldsForType
is created. This in turn creates aDISubprogram
forloggingFieldsForType<Response>()
, and uses the type created in previous step for itstemplateParams
field. The type is stored in theDISubprograms
hashmap using thetemplateParams
as part of key.codegen function for the constructor
Response::Response()
: because of ctor homing, it creates the complete type forResponse
and replaces the previously created temporary node with a distinctDICompositeType
node.codegen function
int makeLogData<Response>(Response const&)
TypeCache
for typeResponse
now returns the distinct node.DISubprograms
hashmap forloggingFieldsForType<Response>()
does not return anything because the key value has been updated.DISubprogram
forloggingFieldsForType<Response>()
is created.If
-fopenmp
is not given, the order of function codegen changes, now #3Response::Response()
, the constructor is generated before #2, so that in #4, the search will hit in the hashmap.I am not sure if openmp is to blame here, it looks to me that it exposes an issue in debug info generation when ctor homing is used.
cc: @dwblaikie @ayermolo
The text was updated successfully, but these errors were encountered: