New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: panic: non-empty mark queue after concurrent mark (Go1.14, Go1.15) #41303
Comments
( wcfs go unit tests are just regular go unit tests - they do not play with virtual memory at all, , and, to my knowledge, there is no Cgo involved. At that level FUSE is also not used. ) |
Unfortunately our test system hit a similar crash with go1.14.9:
full output
Unfortunately I don't have core this time. |
go1.14.9 crash was on: go env, go version, uname, lsb_release
|
(again with go1.14.9)
full output
@aclements, @mknyszek, do you think this is related to #40641 / #40642 / #40643 ? |
I'm sorry to say that the GC crash also happened with
full output
In other words #40641 / #40642 / #40643 seem to be unrelated. /cc @aclements, @mknyszek |
Sadly, I could also trigger the same crash on 1.15.3 (i.e. go1.15 with #40643 included):
full output
go version, go env, uname, ...
This time there is the core file. |
Just in case, have you run this test in the race detector? |
@networkimprov, here you are (this are normal and race runs; there is no GC failue): .../wcfs$ DBTail_SEED=1602769537289632682 go test -v
=== RUN TestZBlk
2020/10/15 20:39:04 zodb: FIXME: open testdata/zblk.fs: raw cache is not ready for invalidations -> NoCache forced
--- PASS: TestZBlk (0.03s)
=== RUN TestΔBTail
2020/10/15 20:39:05 zodb: FIXME: open /tmp/δBTail217888886/1.fs: raw cache is not ready for invalidations -> NoCache forced
--- PASS: TestΔBTail (2.72s)
=== RUN TestΔBTreeAllStructs
δbtail_test.go:1123: # maxdepth=2 maxsplit=1 nkeys=5 n=10 seed=1602769537289632682
2020/10/15 20:39:08 zodb: FIXME: open /tmp/δBTail128030557/1.fs: raw cache is not ready for invalidations -> NoCache forced
--- PASS: TestΔBTreeAllStructs (46.42s)
=== RUN TestIntSets
--- PASS: TestIntSets (0.00s)
=== RUN TestKVDiff
--- PASS: TestKVDiff (0.00s)
=== RUN TestKVTxt
--- PASS: TestKVTxt (0.00s)
PASS
ok lab.nexedi.com/nexedi/wendelin.core/wcfs 49.175s .../wcfs$ DBTail_SEED=1602769537289632682 go test -v -race
=== RUN TestZBlk
2020/10/15 20:40:12 zodb: FIXME: open testdata/zblk.fs: raw cache is not ready for invalidations -> NoCache forced
--- PASS: TestZBlk (0.10s)
=== RUN TestΔBTail
2020/10/15 20:40:13 zodb: FIXME: open /tmp/δBTail980323143/1.fs: raw cache is not ready for invalidations -> NoCache forced
--- PASS: TestΔBTail (13.17s)
=== RUN TestΔBTreeAllStructs
δbtail_test.go:1123: # maxdepth=2 maxsplit=1 nkeys=5 n=10 seed=1602769537289632682
2020/10/15 20:40:27 zodb: FIXME: open /tmp/δBTail188139514/1.fs: raw cache is not ready for invalidations -> NoCache forced
--- PASS: TestΔBTreeAllStructs (220.55s)
=== RUN TestIntSets
--- PASS: TestIntSets (0.00s)
=== RUN TestKVDiff
--- PASS: TestKVDiff (0.00s)
=== RUN TestKVTxt
--- PASS: TestKVTxt (0.00s)
PASS
ok lab.nexedi.com/nexedi/wendelin.core/wcfs 233.857s |
I suspect the bug is related to concurrency and in particular sharpy surrounding OS load tend to increase the probability for it to happen. This is likely due to the fact that internal Go scheduling is interdependent here to other processes, becuase the test spawns another python process and communicates with it in synchronous manner. With this in mind I've found a way to trigger GC crash more quickly via running several instances of the test simultaneously. Reproduction script: gogccrash. |
I've digged this down a bit:
I'm currently suspecting that only unsafe place in my code to be the source of this GC crash. My guess is that rematerializing the object queues more work for GC to mark, but if that catches GC at the time when it things it has finished the marking, that results in the crash. But I have not looked inside for details yet. I'm pausing my analysis here. |
That's definitely an unsupported use of unsafe. |
@randall77, thanks for feedback. To clarify about "rematerialize": when this happens we know for sure that the object is still alive, because there is synchronization on reference state in between finalizer and As this issue shows some synchronization is also needed to be done with respect to garbage-collector as well. (*) yes, stacks can be moved, but on-heap objects are not moved, and weak.Ref is supposed to work for on-heap objects only. |
Also, maybe I'm missing something, but I don't see any other way to implement weak references. |
Right. This code will force heap allocation because, among other things, passing anything to
I don't either. Go is not designed to support weak references. Just imagining one possible bug here. See the comment at line 119; it wonders whether we need a write barrier. This code looks like it doesn't, as the write at line 121 is to the stack, not the heap, and stack writes don't need write barriers. But if |
Weak references and finalizers have the same power. To implement weak references with finalizers, use an intermediate object. The catch is that you need explicit strong pointers as well as explicit weak pointers. // Package weak manages weak references to pointers.
// In order to use this with a particular pointer, you must use
// strong pointers, defined in this package.
// You may also use weak pointers.
// If all the strong pointers to some value are garbage collected,
// then the value may be collected, even if weak pointers still exist.
package weak
import "runtime"
// Strong is a strong pointer to some object.
// To fetch the pointer, use the Get method.
type Strong struct {
p *intermediate
}
// Get returns the value to which s points.
func (s *Strong) Get() interface{} {
return s.p.val
}
// Weak returns a weak reference to the value to which Strong points.
func (s *Strong) Weak() *Weak {
return &Weak{s.p}
}
// clear is a finalizer for s.
func (s *Strong) clear() {
s.p.val = nil
}
// Create a strong pointer to an object.
func MakeStrong(val interface{}) *Strong {
r := &Strong{&intermediate{val}}
runtime.SetFinalizer(r, func(s *Strong) { s.clear() })
return r
}
// Weak is a weak reference to some value.
type Weak struct {
p *intermediate
}
// Get returns the value to which w points.
// If there are no remaining Strong pointers to the value,
// Get may return nil.
func (w *Weak) Get() interface{} {
return w.p.val
}
// intermediate is used to implement weak references.
type intermediate struct {
val interface{}
} |
@randall77, thanks for feedback and commenting on potential trickiness regarding write barrier.
Explicitly using Strong is indeed the price for not having to use unsafe. That indeed works; thanks a lot, again, for the example. However the price might be too high: it requires all users of an API to use So, still, is there maybe a way to synchronize with the garbage-collector, and notify it properly about rematerialized pointer so that it does not see such rematerialization as marking invariant breakage? |
Well, technically all that is required is that there be at least one live As @randall77 said, your code is an unsupported use of unsafe. If you want to be able to safely rematerialize a pointer other than through the mechanisms documented at https://golang.org/pkg/unsafe, that would be a language change proposal that should go through the proposal process. But I think it is extremely unlikely that any such proposal would be accepted. |
@ianlancetaylor, thanks for feedback.
Unfortunately it is not the case: the objects - that The bug I'm talking about here is of exactly the same nature that Connection.objtab is trying to avoid via using weak pointers in the first place: if for an object A
Now if another object B2 references in the database on disk object A by A's OID, and that object B2 is accessed in the program, when the program will try to access/create live object for A in RAM via B2->A database link, it will look into Connection.objtab[A.oid], see the weak pointer there is nil, and create another in-RAM object for A. This breaks isomorphism in between in-database and in-RAM objects and so eventually leads to database corruption. The only solution I see is
With 1 by "everywhere" I really mean everywhere - for every pointer that potentially originates from Strong pointer. But then Strongs just replace all regular Go pointers...
I see... |
Hello up there.
Today, while working on wendelin.core I've got
panic: non-empty mark queue after concurrent mark
several times while running unit tests. It is hard for me to find time to try to reduce the problem to minimal example, but I just cannot ignore such garbage collector kind of bug, so I decided to report at least what I can. Please find details about GC crash below. I also attach dumped core in case it could be useful to analyse what was going on (no confidential data in there).Thanks beforehand,
Kirill
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
No data
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
t
revision=cc216b8c
.cd wcfs
export GOTRACEBACK=crash
ulimit -c unlimited
for i in range
seq 100; do go test -v || break ; done
What did you expect to see?
No Go runtime crashes
What did you see instead?
Go runtime crashed with
full output
core.zip
The text was updated successfully, but these errors were encountered: