Description
To get the current time in seconds, the runtime implements runtime.nanotime
as a call to the OS (the vdso on GOOS=linux, libc on GOOS=darwin).
Profiling and tracing benefit from efficient access to a clock, but don't require a particular scale or offset. (We're in the habit of rescaling it after the fact, by comparing against another clock.) On GOARCH=amd64, the runtime implements runtime.cputicks
as RDTSCP
(or RDTSC
plus memory fences). It's a bit faster to only read the timer, than to read the timer and then do scaling math.
But on GOARCH=arm64, for GOOS=linux, darwin, and several others, we implement runtime.cputicks
as a call to runtime.nanotime
. It looks like AArch64's CNTVCT_EL0
register [1] might have what we need for cputicks
(monotonic, also monotonic across cores, static frequency). The Linux vdso [2] appears to use it (plus ISB
), or the self-synchronizing version CNTVCTSS_EL0
.
There's also CNTVCT
for GOARCH=arm [3].
Initial benchmarking on an Apple M1 (darwin) and on Raspberry Pi models 5 and 3B (linux) show that reading CNTVCT_EL0
is bit faster than calling nanotime
(14 vs 24ns, 28 vs 43ns, and 45 vs 126ns on those three platforms). I don't know how much of a difference this will make in complete applications, but cheaper clocks means less worry when adding profiling/tracing points.
CC @golang/runtime @golang/arm
[2] https://elixir.bootlin.com/linux/v6.9/source/arch/arm64/include/asm/vdso/gettimeofday.h#L69
Metadata
Metadata
Assignees
Labels
Type
Projects
Status