Skip to content

runtime: cputicks has low precision on arm #9725

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dvyukov opened this issue Jan 29, 2015 · 16 comments
Closed

runtime: cputicks has low precision on arm #9725

dvyukov opened this issue Jan 29, 2015 · 16 comments
Milestone

Comments

@dvyukov
Copy link
Member

dvyukov commented Jan 29, 2015

runtime/os_linux_arm.go:

func cputicks() int64 {
// Currently cputicks() is used in blocking profiler and to seed fastrand1().
// nanotime() is a poor approximation of CPU ticks that is enough for the profiler.
// randomNumber provides better seeding of fastrand1.
return nanotime() + int64(randomNumber)
}

This causes runtime/pprof trace tests failures:

--- FAIL: TestTrace (0.00s)
trace_test.go:56: failed to parse trace: g 3 is not runnable before traceEvGoWaiting (offset 48, time 30528)
--- FAIL: TestTraceStress (1.32s)
trace_test.go:151: failed to parse trace: g 3 is not runnable before traceEvGoWaiting (offset 75185, time 30528)
--- FAIL: TestTraceSymbolize (0.04s)
trace_test.go:199: failed to parse trace: g 3 is not runnable before traceEvGoWaiting (offset 48, time 30481)

http://build.golang.org/log/413c7741597ae53221e88070d1b5a70c3a365b03
http://build.golang.org/log/46208c3a96add5a5e5b15ed36ca1d5aa09f7173b

Tracer expects that cputicks are globally monotonic and has cycle precision (subsequent events has different timestamps). The former is probably true for this impl, but the latter is not true.

@dvyukov
Copy link
Member Author

dvyukov commented Jan 29, 2015

/cc @crawshaw @minux @davecheney

@dvyukov dvyukov changed the title runtime: cputicks has low precision runtime: cputicks has low precision on arm Jan 29, 2015
@crawshaw
Copy link
Member

ARMv6 and ARMv7 has a cycle counter, but it is privileged

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0211i/Bihcgfcf.html

What about using the perf_event_open syscall on linux?

http://man7.org/linux/man-pages/man2/perf_event_open.2.html

@dvyukov
Copy link
Member Author

dvyukov commented Jan 29, 2015

As far as I see, it requires using read syscall to access cpu cycle counter. This may be too expensive for tracer. However, we don't have better solutions for now.
I was thinking about converting a low-frequency monotonic timer into monotonically increasing:

var prevTicks int64

func cputicks() int64 {
for {
x := nanotime()
p := prevTicks
if x <= p {
x = p + 1
}
if cas(&prevTicks, p, x) {
return x
}
}
}

This also can be slow and obviously will give locally skewed traces.

@crawshaw
Copy link
Member

Right now nanotime is a syscall on linux/arm, so a read from perf_event_open shouldn't be any more expensive. Your way might be better if we switch to VSDO for time on linux/arm, but I cannot work out when it was introduced into linux for arm.

@mwhudson
Copy link
Contributor

I don't think linux/arm has a VDSO function for time. There are some kernel patches floating around but nothing in Linus' tree that I can see. (I've had the reasons this is hard on ARM explained to me a few times but never really understood it...)

@davecheney
Copy link
Contributor

Dmitry, you mentioned that your trace infrastructure is based on chrome. If
this is correct, how does chromeos on arm handle this requirement?
On 29 Jan 2015 23:43, "Dmitry Vyukov" notifications@github.com wrote:

/cc @crawshaw https://github.com/crawshaw @minux
https://github.com/minux @davecheney https://github.com/davecheney


Reply to this email directly or view it on GitHub
#9725 (comment).

@davecheney
Copy link
Contributor

Another weird thing is the test reliably fails on my dual core Pandaboard,
but passes on the quad core imx6 machine I have. This might just be dumb
luck.
On 29 Jan 2015 23:43, "Dmitry Vyukov" notifications@github.com wrote:

/cc @crawshaw https://github.com/crawshaw @minux
https://github.com/minux @davecheney https://github.com/davecheney


Reply to this email directly or view it on GitHub
#9725 (comment).

@dvyukov
Copy link
Member Author

dvyukov commented Jul 29, 2015

@davecheney Here is how chromium does that:
#9892

@dvyukov
Copy link
Member Author

dvyukov commented Jul 29, 2015

@mwhudson clock_gettime on arm/arm64 is even already documented in man pages:
http://man7.org/linux/man-pages/man7/vdso.7.html
@crawshaw Can you check what kind of precision __kernel_clock_gettime(CLOCK_MONOTONIC) gives on arm/arm64?

@matt2909
Copy link
Contributor

did anyone follow up on this?

@minux
Copy link
Member

minux commented Oct 28, 2015 via email

@davecheney
Copy link
Contributor

even if the clock_gettime exported in vDSO is faster,
we can't always use it as it's only exported for ARM
since v4.1 of the kernel.

Wow, that's fresh. Most SoC's are lucky if they can do better than some moldy version of 3.0 that seems to ship with all iMX6 platforms.

@matt2909
Copy link
Contributor

@minux
Copy link
Member

minux commented Oct 28, 2015 via email

@matt2909
Copy link
Contributor

Yes my bad, I read arm, as arm in general. 32bit arm is 4.1, arm64 is much earlier

@rsc
Copy link
Contributor

rsc commented Nov 23, 2015

It sounds like this is not easy to fix (because of hardware privilege issues). Also the new trace sequence numbers may mean we don't need to fix it, since these things don't all compare equal anymore. I'm going to mark this as closed unless there's some reason to keep it open.

@rsc rsc closed this as completed Nov 23, 2015
@golang golang locked and limited conversation to collaborators Nov 27, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants