Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/trace: "failed to parse trace: no consistent ordering of events possible" #29707

Open
256dpi opened this issue Jan 12, 2019 · 10 comments
Open

cmd/trace: "failed to parse trace: no consistent ordering of events possible" #29707

256dpi opened this issue Jan 12, 2019 · 10 comments
Milestone

Comments

@256dpi
Copy link

@256dpi 256dpi commented Jan 12, 2019

What version of Go are you using (go version)?

$ go version
go version go1.11.4 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/256dpi/Library/Caches/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/256dpi/Development/Go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/Cellar/go/1.11.4/libexec"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.11.4/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/mj/lp3r462x6tqfd4v_93bvh65h0000gn/T/go-build000614795=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

I ran a network intensive go program (closed source) with a fairly light workload.

What did you expect to see?

I expect to be able to obtain a trace profile and explore it with the viewer program.

What did you see instead?

After obtaining the profile with wget -O trace.out "http://localhost:6060/debug/pprof/trace?seconds=10". When running go tool trace trace.out I received:

2019/01/12 12:37:26 Parsing trace...
failed to parse trace: no consistent ordering of events possible
@256dpi
Copy link
Author

@256dpi 256dpi commented Jan 15, 2019

I have to add, I was testing a binary that was linked with C code (CGO). Is it possible that tracing does not work properly in CGO builds?

@bcmills bcmills changed the title failed to parse trace: no consistent ordering of events possible net/http/pprof: "failed to parse trace: no consistent ordering of events possible" Jan 30, 2019
@bcmills
Copy link
Member

@bcmills bcmills commented Jan 30, 2019

CC @rsc @hyangah @matloob @dvyukov for pprof and trace

@bcmills bcmills added this to the Go1.13 milestone Jan 30, 2019
@bcmills bcmills changed the title net/http/pprof: "failed to parse trace: no consistent ordering of events possible" cmd/trace: "failed to parse trace: no consistent ordering of events possible" Jan 30, 2019
@hyangah
Copy link
Contributor

@hyangah hyangah commented Jan 30, 2019

#16755 was the first one that came to my mind, but I've never seen the issue with darwin/amd64, so I am not sure. I am not aware of cmd/trace issues involving cgo.

@256dpi is it possible to share the captured trace with me?

@AlexRouSg
Copy link
Contributor

@AlexRouSg AlexRouSg commented May 4, 2019

@hyangah I just ran into this and made a repo.

Basically it happens when you export a Go function to C and then call that function inside a C created thread.

package test

/*
	#include <pthread.h>

	extern void* callback(void*);
	typedef void* (*cb)(void*);

	static void testCallback(cb cb) {
		pthread_t thread_id;
		pthread_create(&thread_id, NULL, cb, NULL);
    	        pthread_join(thread_id, NULL);
	}
*/
import "C"
import (
	"context"
	"runtime/trace"
	"time"
	"unsafe"
)

var traceCtx, traceTask = trace.NewTask(context.Background(), "Debug")

func test() {
	C.testCallback(C.cb(C.callback))
}

//export callback
func callback(unsafe.Pointer) unsafe.Pointer {
	defer trace.StartRegion(traceCtx, "callback").End()
	time.Sleep(time.Millisecond)

	return nil
}
@bcmills
Copy link
Member

@bcmills bcmills commented Jun 26, 2019

Here's a related flake in runtime/trace.TestTraceStressStartStop on the freebsd-amd64-12_0 builder:
https://build.golang.org/log/65ddd079bfd87715555a97b296b41dfa64ff12b6

--- FAIL: TestTraceStressStartStop (1.14s)
    trace_test.go:147: failed to parse trace: no consistent ordering of events possible
FAIL
FAIL	runtime/trace	1.574s
@let4be
Copy link

@let4be let4be commented Feb 17, 2020

Any update? I cannot parse trace created on arm64 from x86 machine

@gopherbot
Copy link

@gopherbot gopherbot commented May 19, 2020

Change https://golang.org/cl/234617 mentions this issue: runtime: synchronize StartTrace and StopTrace with sysmon

gopherbot pushed a commit that referenced this issue May 21, 2020
Currently sysmon is not stopped when the world is stopped, which is
in general a difficult thing to do. The result of this is that when
tracing starts and the value of trace.enabled changes, it's possible
for sysmon to fail to emit an event when it really should. This leads to
traces which the execution trace parser deems inconsistent.

Fix this by putting all of sysmon's work behind a new lock sysmonlock.
StartTrace and StopTrace both acquire this lock after stopping the world
but before performing any work in order to ensure sysmon sees the
required state change in tracing. This change is expected to slow down
StartTrace and StopTrace, but will help ensure consistent traces are
generated.

Updates #29707.
Fixes #38794.

Change-Id: I64c58e7c3fd173cd5281ffc208d6db24ff6c0284
Reviewed-on: https://go-review.googlesource.com/c/go/+/234617
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
@davies
Copy link
Contributor

@davies davies commented Jun 22, 2020

I tried with latest master branch, it still has this problem

$ go version
go version devel +60f7876 Sat Jun 20 08:40:13 2020 +0000 linux/amd64

The go code is built as a so, and called by Java. We can reproduce it every time under normal workload, but OK when it's idle.

@entombedvirus
Copy link
Contributor

@entombedvirus entombedvirus commented Jul 20, 2020

I can repro this issue on our production services consistently:

go version
go version go1.14.4 linux/amd64

I've also tried running gotip tool trace to parse trace output generated from go1.14.4 binary and that gives the same result. I haven't tried generating trace output from gotip because I want to avoid running gotip in production. Please let me know if having access to the trace output is helpful to debug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
10 participants
You can’t perform that action at this time.