When tracing is enabled, the trace begins with a sequence of GoCreate/GoWaiting/GoInSyscall events describing the state of all the extant goroutines. This is done with the world stopped.
On system call exit, we wait until the world has started again to write the GoSysExit event.
The trace reader requires that the GoSysExit event be preceded by an event putting the goroutine into the system call. During ordinary execution that event is the GoSysCall event; at the beginning of a trace that is the GoInSyscall event.
As written, the trace has this property.
However, the tracer reader overwrites the time stamp on the GoSysExit event with the time stamp of the actual return from the system call, recorded in the trace file as ev.Args. The actual system call exit can happen concurrently with the generation of the initial trace events, and in real time before the writing of the corresponding goroutine's GoInSyscall event. The entries are ordered correctly by recorded time stamp, but then when the reader smashes the GoSysExit event's recorded time stamp with this earlier time stamp and sorts the entries, it ends up with the GoSysExit before the GoInSyscall.
This makes the log reader unhappy, which makes many of the trace tests flaky.
CL https://go-review.googlesource.com/11180 works around this by making the error message about bad syscall exits contain the text 'SKIP TEST' and then if the test gets an error with that text, it calls t.Skip instead of t.Fatal.
But this needs a proper fix for the release: presumably the same problem makes the traces unusable in real life should this occur.
The text was updated successfully, but these errors were encountered: