-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: missed deferred calls [1.16 backport] #43941
Comments
Somewhat more minimized test case: After the It seems like it's particularly sensitive to stack frame layout. In particular, the issue was I needed to keep making matching changes to both of the main function literals within |
Having stared at these cases a bunch, I'm starting to think they're likely duplicates. In each of the reproducers, there's some code of the shape:
The differences are all in the code around the above. I suspect the above code is just corrupting the defer stack, and then the surrounding code just influences how that corruption ultimately manifests. Most commonly it's simply "bad defer entry in panic" (#43920), but when the stars align (or stacks align?) it manifests instead as issues like #43941 and #43942. |
A small variant: package main
func main() {
defer func() {
println(recover().(int))
}()
func() {
func() (_ [2]int) { type _ int; return }()
func() {
defer func() {
defer func() {
recover()
}()
defer panic(3)
panic(2)
}()
defer func() {
recover()
}()
panic(1)
}()
defer func() {}()
}()
var x = 123
func() {
defer print(x) // not executed!
func() {
defer func() {}()
panic(4)
}()
}()
} |
this need to be fix |
@go101 I think that your example is a different problem. Please open a different issue for it. Thanks. |
@danscales The test case in the initial comment appears to work with Go 1.14, Go 1.17, and tip. It fails with Go 1.15 and Go 1.16. Go 1.15 is no longer supported. Do you think it would be feasible to backport the fix (whatever it is) to Go 1.16? |
@ianlancetaylor My test case is based on @mdempsky's one: #43941 (comment), I think they are same in principle. |
@go101 They may be the same in principle but they are clearly different in practice. The original test case in this issue works on Go 1.17 and on tip. Your test case, if I understand it correctly, fails on Go 1.17 and tip. Therefore the two test cases, though they may be similar, demonstrate different problems. Please open a new issue. Thanks. |
@ianlancetaylor OK, I did the bisection for the test case in the original comment. As you say, it worked in 1.14, which is when fast defers appeared. It started not working with this change: https://go-review.googlesource.com/c/go/+/229601 [cmd/compile: move fixVariadicCall from walk to order]. That seems very unrelated, so maybe that change revealed an existing bug with defers. Then, not unexpectedly, the test started working again with: https://go-review.googlesource.com/c/go/+/310175 [internal/buildcfg: enable regabidefer by default], which enabled a major change in how defers are implemented by @aclements (in service of regabi, I believe). So, I don't think there is an obvious simple fix for 1.16, but I will investigate further. If I don't see anything fairly obvious/easy to fix, I would probably lean toward not back-porting a fix for 1.16, since this case is so unusual. |
I agree that if there isn't a simple fix we should just move on, since it works in Go 1.17. Thanks for looking. |
I'm really surprised that enabling regabidefer would have fixed a missed defer. It was a significant change to how defers are invoked, but not to our overall handling of the defer stack. |
It might be that once we understand #48898 we'll understand why regabidefer fixed this test program. |
Change https://golang.org/cl/356011 mentions this issue: |
This issue is currently in Go1.16.10 milestone and has a release-blocker label, which means the release of Go 1.16.10 will be blocked on it next week. It doesn't have a CherryPickCandidate label, so we missed it in our review of backports. @ianlancetaylor Based on your latest comment above, it sounds like you may no longer think this issue needs to block Go 1.16.10, is that right? We should either add CherryPickCandidate so this can be reviewed, or move this back to Go 1.18 milestone and ask gopherbot to make a 1.16 backport issue from it. |
The fix ends up being fairly simple, see the fix for Go 1.18 that is about to go into master: https://go-review.googlesource.com/c/go/+/356011 . Ignoring comments & tests, there's only 3 lines of code change. However, I would lean toward not backporting to 1.16.10. We will get soak time of several months for Go 1.18 before it goes out, whereas we would not get that for a point release. Also, the test cases are quite elaborate and probably unlikely in "real life". If really desired, maybe we can put in to the next Go 1.16 point release after the Go 1.18 fix has been checked into master for a few weeks? |
Fix two defer bugs related to adding/removing open defer entries. The bugs relate to the way that we add and remove open defer entries from the defer chain. At the point of a panic, when we want to start processing defer entries in order during the panic process, we need to add entries to the defer chain for stack frames with open defers, since the normal fast-defer code does not add these entries. We do this by calling addOneOpenDeferFrame() at the beginning of each time around the defer loop in gopanic(). Those defer entries get sorted with other open and non-open-coded defer frames. However, the tricky part is that we also need to remove defer entries if they end not being needed because of a recover (which means we are back to executing the defer code inline at function exits). But we need to deal with multiple panics and in-process defers on the stack, so we can't just remove all open-coded defers from the the defer chain during a recover. The fix (and new invariant) is that we should not add any open-coded defers to the defer chain that are higher up the stack than an open-coded defer that is in progress. We know that open-coded defer will still be run until completed, and when it is completed, then a more outer frame will be added (if there is one). This fits with existing code in gopanic that only removes open-coded defer entries up to any defer in progress. These bugs were because of the previous inconsistency between adding and removing open defer entries, which meant that stale defer entries could be left on the list, in these unusual cases with both recursive panics plus multiple independent (non-nested) cases of panic & recover. The test for #48898 was difficult to add to defer_test.go (while keeping the failure mode), so I added as a go/test/fixedbug test instead. Fixes #43920 Updates #43941 Fixes #48898 Change-Id: I593b77033e08c33094315abf8089fbc4cab07376 Reviewed-on: https://go-review.googlesource.com/c/go/+/356011 Trust: Dan Scales <danscales@google.com> Trust: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Austin Clements <austin@google.com>
I'm fine either way. |
OK, we don't need to check into 1.16.10 right now. As I said, let's see how the fix in 1.18 goes. I'm removing this as a release blocker for 1.16.10. It doesn't look like 1.16.11 exists yet, but I would move it to that milestone when that exists. |
@danscales Some time has passed here, and this fix (CL 356011) had a chance to be included in 1.18 betas and RC 1. Do you think there's enough signal to be able to tell whether it'd be safe and worthwhile to backport it to Go 1.16.15 now? I'll edit this issue to fit the backport issue format and add CherryPickCandidate label, that way we'll get a chance to see it in upcoming meetings. (Otherwise it's likely to be missed.) |
@dmitshur Yes, I think it is fine to be backported to Go 1.16.15 now! Thanks for following up on this. It's also possible to leave it out of Go 1.16.15, depending on the consensus in your meetings, since the only known/reported case of the bug is for the complex example at the top of this issue that was created by a form of fuzzing. |
Go 1.16 is very close to the end of its release cycle. The team members think that approving this could possibly introduce a regression at the end of a cycle. Because of that, this is considered a risky change. |
This program should run successfully: https://play.golang.org/p/XIVsN45Oyzz
It does with gccgo and with cmd/compile using -gcflags=-N, but not when compiled normally. It also still fails at CL 286712, PS 3.
The problem is the "defer step(12)" and "defer step(13)" calls are being missed. Instead, execution jumps directly to the "step(14)" defer.
Sorry the reproducer is still so large. I'm having trouble minimizing it any further. It seems particularly sensitive to changes that I wouldn't expect to have any effect.
/cc @danscales @randall77
The text was updated successfully, but these errors were encountered: