-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: closure func name changes when inlining #60324
Comments
not only k8s, but for instance leaky go routine tests may fail b/c whitelisting doesn't work correctly anymore. |
Hi Austin, since you are the original author of CL 479095, assigning this bug to you. If you would prefer that I work on this particular cleanup, please assign back to me. Thanks, Than. |
This was noticed before: #55980 |
Is it okay if we just arrange that the |
I don't think I understand the suggestion, but we want the inlining-disabled name to appear in profiles, stack traces, and symbol tables. I don't think changing just runtime.Func will do that. |
For what it's worth, we have the name of the lexically enclosing function at the point of the closure creation, because if the program crashed at that PC we would show that frame in the stack trace marked [inlined]. |
Yes, it's easy to track the original closure symbol name. That's not a problem. What I'm trying to understand is what you want done with that. We currently duplicate the closure and underlying function text, because they can be optimized differently due to different escape analysis and variable capture. So the duplicated closure needs its own linker symbol. |
Does it? I thought the linker addressed all the symbols by table index now, so if we made it a symbol internal to the package being compiled, it would not collide with any others with the same name. |
|
I am interpreting the previous comment as my not having answered your questions, so I will elaborate a bit here. The function names reported in profiles, stack traces, symbol tables, disassembly, and so on should not depend on how aggressively we inline. That is, the names that we get with no inlining at all are the "official" names, and inlining should preserve them. I understand that, in terms of the example at the top, if we do:
that there will be three copies of the actual text symbol generated when f is inlined, and that the texts may actually differ across the 3 calls. But I think the new linker's ability to address symbols by table index instead of name should mean that it's completely fine to have those three text symbols all use the same name: the original name they'd have used without inlining. |
@cherrymui How should the compiler create multiple obj.LSyms that are kept separate but all end up with the same linkname in the symbol tables? |
As background, we originally named closures opaque names like 'func•1', but that was extremely unhelpful in all the places I mentioned where function names appear, not to mention in all tools that didn't support UTF-8. Issue #8291 was to give the closures more useful names, which Dmitri did in CL 3964. Inlining of functions containing closure creation is essentially a regression of that issue. It may seem like they just need a name, any name, but that's not true. The names carry useful information that has been lost since |
Please retract that accusation. The stack trace for your test program is the same in Go 1.19 as in Go 1.20: https://go.dev/play/p/yjP9hrxZ_K0?v=goprev |
It was not an accusation, but I apologize nonetheless. My memory was that one of unified IR's important inlining advances was to enable inlining of closure-containing functions. I misremembered - we started inlining those in Go 1.17, so obviously that must have been with the original IR. In any event, as unified IR has made it possible to do more aggressive inlining, we lose more and more of this information. This is not a criticism of unified IR. My point is only to explain the context of how we got here. All the changes individually make sense, but this is a rough edge where they aren't quite working well enough together. |
Thank you.
Ack. That's what I'm trying to understand: how should this work? As I've said already, we need multiple linker symbols (i.e., obj.LSyms), because in general the inlined closure can get optimized differently than the original closure. You suggest the multiple obj.LSyms should just have identical names in the symbol tables, etc (i.e., use the same linknames). That's not something we do anywhere else in the compiler to my knowledge, so I don't know how to invoke cmd/internal/obj to make that happen. Hence why I asked @cherrymui how to do that. However, I remember she's on vacation, so if any other linker experts can advise on this, that would be great. @golang/compiler |
I think it would be good enough for Go 1.21 if we embed the inlining stack in the name (eliding package names), so for example if F inlines f inlines g inlines h creates a closure, the function would be named F.f.g.h.func1. As long as the same counter is used for all the funcs, it won't matter if that sequences appears again: it would get a different trailing number. So if you had
The four text symbols created in main would be named |
Thanks, that seems doable for 1.21. |
That's not ambiguity, because pkg.A can only be one of those two things.
I agree with Matthew's observation that it would be more helpful to reverse the breadcrumbs. My point was that if we have F inlined into G inlined into H as well as F inlined into J inlined into K, it is much more helpful to name them with some representation that includes (F,G,H) and (F,J,K) than it is to name them F$1 and F$2. |
Ok, but for example debug/gosym doesn't know that and gets them wrong. |
@aarzilli Do you mean just debug/gosym.Sym.BaseName, or other code in debug/gosym too? What do you use BaseName for? |
If we were to make the inlining breadcrumbs unambiguously parseable from the symbol name, then in a traceback, we could print only the "original" name by default, and include the full name if and only if we're showing full PCs in the traceback (I'm thinking as additional information at the end of the line, not replacing the original name). That seems to be the exact case where the actual symbol name may matter. Perhaps we suppress the PC offset if we're only printing the original name to avoid confusion. This is a more refined version of the point I made in #60324 (comment). |
BaseName and ReceiverName. Delve doesn't use debug/gosym, exactly, but it has its own equivalent reimplementation. It uses it to find functions with a partial name match (for example to set a breakpoint). It has never needed to work correctly with closures, so far, but it would be nice to know that it could. Looking around with sourcegraph I can't find many users of debug/gosym, but for example GoRE could be affected by this problem as well. Hard to say if there are other half-reimplementations floating around that are affected. Personally, as a user of go, I agree that |
It occurs to me that this case of inlining is just one example of function multicompilation/multiversioning. In this case, we're compiling multiple versions of the function because they may be optimized differently in different inlining contexts. But this is a technique we've considered for other things, like multiversioning for escape analysis and for access to CPU features that are detected at runtime. In fact, there's another case where we do this already: GC shape stenciling. We had to disambiguate the symbols in that case, too. But in that case we decided to hide the disambiguator in tracebacks and only show the "user" name. I think that was the right choice (though we still show PCs, which might be a mistake 😅). In that case, the disambiguator was pretty meaningless to users, and I could see an argument that the inlining breadcrumbs are more meaningful, but I'm not sure I buy that these cases are so different. |
Aren't generics the same? So maybe it would be neat to include in stack traces also the concrete type for which a generic function got compiled? |
Sorry, that's exactly what I meant by "GC shape stenciling" (I can see how that wouldn't be obvious!) The thing is, the symbol name doesn't include the concrete type for which a generic function got instantiated. What it includes is much more abstract and not particularly helpful to the user. |
Hi just to weigh in on this comment by @mdempsky #60324 (comment).
I agree to the whole comment because I'm writing such tool within the IDE, and it is sometimes not possible to correctly resolve the symbol there. Stack traces in particular have the file name, but this might be transformed (trimmed, or else, e.g. Bazel does that with the default ruleset). Having a marker to identify inlines so it's possible to handle them (maybe skip them) could help. |
As inlining gets better and better, closure names get worse and worse. One of the closure names generated for https://github.com/go-delve/delve/blob/master/_fixtures/rangeoverfunc.go looks like this:
The full string does not show up in stacktraces, but only because the symbol name parser in Go doesn't understand it either, thinks everything between the first I've given a couple of tries to just giving them duplicate symbol names but without success. |
That symbol name is remarkably unpleasant! For reference, it comes from this function, which has four levels of range-over-func loops. Each of these is an Even ignoring all of the type parameters in the symbol name (several of which could be different, but aren't in this particular case), our current naming scheme clearly scales super-linearly here, which is rather concerning:
|
I wonder if James Joyce would have been able to pack a novel into these symbols using carefully crafted closures......? |
Change https://go.dev/cl/639515 mentions this issue: |
Thanks @aarzilli for the CL! It originally appeared to be a good idea to me. But as I read and think more, I'm less sure about it, given the complexity of the CL. I think the fundamental issue is that we need a name that is user friendly, and we need a name for symbol resolution. At least for closures, I think it is still achievable to use one name for both. Reading the comment above, it is clear that for the closure name, the source-level function where the closure is defined is important. This is the name that users want to see. The physical function that contains the closure after inlining seems not that important, and we could omit it (if we can make symbol resolution work). So we probably want to name the closures along this line. We need some mechanism to make symbol resolution work, when the closure containing function is inlined into multiple places. Currently we rely on naming it uniquely so different inlined versions of the closure get different names. But this is where the clutter comes from. Given that we now use indices for symbol resolution in the linker, I think this doesn't really require unique names. Not all symbols are solved by indices. But for closures, I think it is 100% index-resolvable. After inlining, closures can only be directly referenced in the containing package (it can be passed to other packages as a value, but that is not a direct symbol reference), and indices are sufficient for this case. It might be an issue for external linking if the closure symbols have the same name, but I think we can solve that by marking the them static. This way, we can make the closure have the same name regardless of inlining, determined by the source-level containing function. One place is that currently the compiler doesn't like two symbols with the same name in the same package, even if they are resolved solely by index. But that is an internal detail we can fix. If for some reason using the same name doesn't work, or it confuses users or tools, we can try using a name that is mainly based on the source-level containing function, with a short suffix for deduplication. Say, for a closure defined in For the next step I'm going to try making the closure symbols static and ensure they are always resolved by index. If that works, we can start changing the names, without worrying about uniqueness. Thanks. |
Without inlining, the stack shows
main.f.func1
panicking, which is approximately correct: it's the 1st closure inside main.f.With inlining, the stack instead shows
main.main.func1
, because now it's the 1st closure inside main.main due to inlining of f. However, that's a much less useful name, since the closure is still lexically inside main.f. If there is some way to preserve the old name even when inlining, that would be a good idea.This is the source of a Kubernetes test failure using Go 1.21, because a more complicated function was not being inlined in Go 1.20 but is being inlined in Go 1.21 after CL 492017. The test expected to see "newHandler" on the stack when a closure lexically inside newHandler panicked.
If there's something we can do here, we probably should, both to keep the closure names helpful and to keep tests like the Kubernetes one passing.
The text was updated successfully, but these errors were encountered: