Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: use CLOCK_MONOTONIC_FAST on FreeBSD? #22942

Closed
bradfitz opened this issue Nov 30, 2017 · 7 comments

Comments

Projects
None yet
4 participants
@bradfitz
Copy link
Member

commented Nov 30, 2017

sys_freebsd_amd64.s says:

TEXT runtime·nanotime(SB), NOSPLIT, $32
        MOVL    $232, AX
        // We can use CLOCK_MONOTONIC_FAST here when we drop                                                                                                    
        // support for FreeBSD 8-STABLE.                                                                                                                        
        MOVQ    $4, DI          // CLOCK_MONOTONIC                                                                                                              
        LEAQ    8(SP), SI
        SYSCALL
        MOVQ    8(SP), AX       // sec                                                                                                                          
        MOVQ    16(SP), DX      // nsec                                                                                                                         

        // sec is in AX, nsec in DX                                                                                                                             
        // return nsec in AX                                                                                                                                    
        IMULQ   $1000000000, AX
        ADDQ    DX, AX
        MOVQ    AX, ret+0(FP)
        RET

We now require FreeBSD 10.3+.

Switch to CLOCK_MONOTONIC_FAST?

I don't know what that is.

@bradfitz bradfitz added this to the Go1.11 milestone Nov 30, 2017

@paulzhol

This comment has been minimized.

Copy link
Member

commented Dec 1, 2017

CLOCK_MONOTONIC_FAST will use getnanouptime, while CLOCK_MONOTONIC will use nanouptime
https://github.com/freebsd/freebsd/blob/release/11.1.0/sys/kern/kern_time.c#L345.

Functions with the "get" prefix returns a less precise result
much faster than the functions without "get" prefix and should
be used where a precision of 1/hz seconds is acceptable or where
performance is priority. (NB: "precision", not "resolution" !)

according to https://github.com/freebsd/freebsd/blob/master/sys/sys/time.h#L450

In a nutshell nanouptime will also read a TSC/HPET/ACPI timecounter (as configured by the system) and use it's value in addition to the pre-calculated "timehand" value available to getnanouptime.

I don't have any numbers but we're paying the syscall cost, so maybe we should let it do the full work?

@domodwyer

This comment has been minimized.

Copy link

commented Jan 29, 2018

Hi all,

Just wanted to give some numbers to help this discussion along, we're running:

  • FreeBSD 11.0-RELEASE-p1 #0 r306420
  • kern.eventtimer.timer: HPET
  • kern.hz: 1000
  • go version go1.9.3 freebsd/amd64

We have a frontend HTTP component that handles large amounts of incoming traffic that is then placed into various backends (DB, queues, etc). While profiling CPU stalls we found that 16% was attributed to hpet_get_timecount():

  PMC: [RESOURCE_STALLS.ANY] Samples: 278326 (100.0%) , 73 unresolved

%SAMP IMAGE      FUNCTION             CALLERS
 16.0 kernel     hpet_get_timecount   binuptime:7.8 nanouptime:4.1 nanotime:1.6

After patching to use CLOCK_MONOTONIC_FAST it halved:

  PMC: [RESOURCE_STALLS.ANY] Samples: 1128771 (100.0%) , 5067 unresolved

%SAMP IMAGE      FUNCTION             CALLERS
 8.4 kernel     hpet_get_timecount   binuptime:7.1 nanotime:1.0

Even though we're not CPU bound the above change made a pretty decent improvement to the 99th% latency, largely because the packages used to communicate with the backends are calling time.Now() while holding locks either directly, or indirectly via context.WithDeadline() and others.

When running a go benchmark calling time.Now():

benchmark              old ns/op     new ns/op     delta
BenchmarkTimeNow-8     1513          916           -39.46%

I understand that the reduced precision is an important consideration, but given the default kern.hz is 1000 this works out to be losing sub-ms precision - I'm not sure how many people expect more precision than this, but it isn't an issue for us - we're quite happy running a patched go binary if this change doesn't make it into master, I just thought it would be helpful to share!

Dom

@paulzhol

This comment has been minimized.

Copy link
Member

commented Feb 9, 2018

@domodwyer I'm working on https://golang.org/cl/93156 as an alternative. The initial version is for a kern.timercounter.hardware=TSC-low though.

I wanted to note that the libc implementation does not differentiate between
_FAST and _PRECISE (the default if _FAST is not used).
They both will get the last available timehand provided by the kernel, and then proceed to read the timecounter to get the delta.

I've asked about this on efnet #bsdcode, the answer I've got is:

jilles: __vdso_clock_gettime always uses the TSC which is quite fast to begin with
jilles: _FAST was originally mainly created to avoid slow hardware like the i8254

I'm assuming HPET is similarly considered fast, when used with mmap to read the counter.

@domodwyer

This comment has been minimized.

Copy link

commented Feb 12, 2018

Hi @paulzhol

Using vdso and skipping the syscall entirely sounds like a great idea - if you'd like us to run a comparison when you're ready just let us know. I would expect similar gains with HPET as the source as you say, though I think we could probably use TSC-low in the Go components without any problem.

Dom

@gopherbot

This comment has been minimized.

Copy link

commented Apr 19, 2018

Change https://golang.org/cl/108095 mentions this issue: runtime: FreeBSD fast clock_gettime HPET timecounter support

@paulzhol

This comment has been minimized.

Copy link
Member

commented Apr 19, 2018

@domodwyer the TSC code is in master, you can give it a try.
https://golang.org/cl/108095 is for HPET timecounter support. It is not nearly as fast as the TSC version but still around 20% less ns/op compared to the syscall path on my AMD FX-8300.

If you can use TSC you really should, but it depends on the hardware/hypervisor providing

kern.timecounter.smp_tsc: 1
kern.timecounter.invariant_tsc: 1

@gopherbot gopherbot closed this in 58c231f Apr 26, 2018

@domodwyer

This comment has been minimized.

Copy link

commented Apr 26, 2018

Hi @paulzhol

The patch makes a substantial difference! My environment is now running FreeBSD 11.1-RELEASE #0 r321309 but the relevant sysctls are the same.

Below are comparisons between tags/go1.10.1 and 58c231f running a simple time.Now() benchmark:

name     old time/op  new time/op  delta
Time-40   469ns ± 0%    99ns ± 1%  -79.01%  (p=0.000 n=9+10)

That is an impressive difference, thanks very much for the hard work!

Dom

@golang golang locked and limited conversation to collaborators Apr 26, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.