Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: hangup in umtx_op system call when changing system clock #17168

Closed
RichardLaos opened this issue Sep 20, 2016 · 9 comments

Comments

Projects
None yet
6 participants
@RichardLaos
Copy link

commented Sep 20, 2016

Version: go1.6.2 freebsd/amd64
Environment:
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="freebsd"
GOOS="freebsd"
GOPATH=""
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/freebsd_amd64"
GO15VENDOREXPERIMENT="1"
CC="cc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0"
CXX="clang++"
CGO_ENABLED="1"

When running the following code sample:

package main

import (
    "time"
    "fmt"
)

func main() {
    var t time.Time

    for {
        x := time.After(5 * time.Second)
        t = <-x
        fmt.Println(t)
    }
}

In between 5 second output intervals adjust the system time backwards and notice the output. FreeBSD is documented as an OS that supports monotonic timers so I would expect the output to happen 5 seconds after the last line. However, it waits until the system time catches up with the scheduled time. The code above works as expected on another OS that supports monotonic timers (Windows).

Also, I tried calling clock_gettime() directly with CLOCK_MONOTONIC on the same system and was able to get monotonic time. Based on the source of sys_freebsd_amd64.s it doesn't seem like clock_gettime() is used inside runtime·nanotime as in sys_linux_amd64.s.

@bradfitz bradfitz changed the title Timers do not use monotonic time on FreeBSD runtime: timers do not use monotonic time on FreeBSD Sep 20, 2016

@bradfitz bradfitz added this to the Go1.8Maybe milestone Sep 20, 2016

@mikioh

This comment has been minimized.

Copy link
Contributor

commented Sep 21, 2016

@RichardLaos

It doesn't seem like clock_gettime() is used inside runtime·nanotime as in sys_linux_amd64.s

I guess it's unlikely and the culprit stays in neighborhood. Can you run your snippet with GOTRACEBACK=all, twiddle the running system clock, and then send SIGABRT to the process? Hope it generates useful information for debugging.

go build sample.go

export GOTRACEBACK=all
./sample

<twiddle system clock>

pkill -f sample
@RichardLaos

This comment has been minimized.

Copy link
Author

commented Sep 21, 2016

Thanks for taking a look at this, Here is the trace:

./sample
2016-09-21 07:54:03.004995496 -0700 PDT
2016-09-21 07:54:08.016407831 -0700 PDT
2016-09-21 07:54:13.072836995 -0700 PDT
SIGABRT: abort
PC=0x45545e m=5

goroutine 0 [idle]:
runtime.sys_umtx_op(0xc820070110, 0xf, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x449ca0, ...)
        /usr/local/go/src/runtime/sys_freebsd_amd64.s:21 +0x1e
runtime.futexsleep1(0xc820070110, 0x0, 0xffffffffffffffff)
        /usr/local/go/src/runtime/os1_freebsd.go:50 +0xa7
runtime.futexsleep.func1()
        /usr/local/go/src/runtime/os1_freebsd.go:38 +0x30
runtime.systemstack(0xc82005de60)
        /usr/local/go/src/runtime/asm_amd64.s:307 +0xab
runtime.futexsleep(0xc820070110, 0x0, 0xffffffffffffffff)
        /usr/local/go/src/runtime/os1_freebsd.go:39 +0x35
runtime.notesleep(0xc820070110)
        /usr/local/go/src/runtime/lock_futex.go:145 +0xa4
runtime.stopm()
        /usr/local/go/src/runtime/proc.go:1538 +0x10b
runtime.findrunnable(0xc820016000, 0x0)
        /usr/local/go/src/runtime/proc.go:1976 +0x739
runtime.schedule()
        /usr/local/go/src/runtime/proc.go:2075 +0x24f
runtime.mstart1()
        /usr/local/go/src/runtime/proc.go:1108 +0x118
runtime.mstart()
        /usr/local/go/src/runtime/proc.go:1068 +0x72

goroutine 1 [chan receive]:
main.main()
        /kbox/src/sample.go:13 +0x8a

rax    0x4
rbx    0x0
rcx    0x0
rdx    0x0
rdi    0xc820070110
rsi    0xf
rbp    0xc820070110
rsp    0xc82005ddd8
r8     0x0
r9     0xc820016000
r10    0x0
r11    0x0
r12    0x2c
r13    0xc820070000
r14    0x0
r15    0x8
rip    0x45545e
rflags 0x287
cs     0x43
fs     0x13
gs     0x1b

@mikioh mikioh changed the title runtime: timers do not use monotonic time on FreeBSD runtime: hangup in umtx_op system call when changing system clock Sep 21, 2016

@appleby

This comment has been minimized.

Copy link
Contributor

commented Sep 28, 2016

It seems this issue is due to a change in the way the _umtx_op syscall interprets it's arguments.

The tl;dr is that due to the way the runtime calls _umtx_op, the timeout it specifies is based on CLOCK_REALTIME rather than CLOCK_MONOTONIC.

The relevant FreeBSD commit is 232144, which I think landed in the 10.0.0 release.

For reference, this is the prototype for _umtx_op. The parameters relevant to this discussion are uaddr1 and uaddr2.

int _umtx_op(void *obj, int op, u_long val, void *uaddr1, void *uaddr2);

And here is how we call it from futexsleep1 in runtime/os_freebsd.go:

ret := sys_umtx_op(addr, _UMTX_OP_WAIT_UINT_PRIVATE, val, nil, tsp)

That is, uaddr1=nil, and uaddr2 is a timespec pointer.

Prior to the above mentioned FreeBSD commit, going back at least as far as release 8.0.0, _umtx_op always considered uaddr2 to be a (possibly NULL) struct timespec *. Based on a quick scan of old kernel sources, my understanding is that it used a monotonic timer (nanouptime(9)/getnanouptime(9)) for timeout calculations, but I didn't verify this with testing.

After revision 232144, the above call will result in __umtx_op_wait_uint_private constructing a struct _umtx_time with it's _clockid set to CLOCK_REALTIME. If you want CLOCK_MONOTONIC, you must construct your own struct _umtx_time and pass it as uaddr2, setting uaddr1 to something larger than sizeof(struct timespec).

The relevant listing from sys/kern/kern_umtx.c. This function is called from __umtx_op_wait_uint_private with addr=uaddr2 and size=(size_t)uaddr1.

static inline int
umtx_copyin_umtx_time(const void *addr, size_t size, struct _umtx_time *tp)
{
        int error;

        if (size <= sizeof(struct timespec)) {
                tp->_clockid = CLOCK_REALTIME;
                tp->_flags = 0;
                error = copyin(addr, &tp->_timeout, sizeof(struct timespec));
        } else
                error = copyin(addr, tp, sizeof(struct _umtx_time));
        if (error != 0)
                return (error);
        if (tp->_timeout.tv_sec < 0 ||
            tp->_timeout.tv_nsec >= 1000000000 || tp->_timeout.tv_nsec < 0)
                return (EINVAL);
        return (0);
}

I modified runtime.futexsleep1 to pass sys_umtx_op a struct _umtx_time * with _clockid=CLOCK_MONOTONIC and verified that this did resolve the issue on FreeBSD 10.3 / amd64. That is, the timers from OP's test program above continue to fire every 5 seconds even when twiddling the system clock during execution.

I'm posting this wall of text here rather than mailing a change list because I'm sufficiently ignorant of the golang runtime that making even a small change makes me nervous, especially when it's difficult for me test against the full range of supported FreeBSD versions / architectures. If the maintainers don't mind beating acceptable code out of me via code review, just say the word and I'll mail a CL.

@bradfitz

This comment has been minimized.

Copy link
Member

commented Sep 28, 2016

@appleby, thanks for the investigation!

Our FreeBSD minimum requirements are kinda undefined: https://golang.org/wiki/MinimumRequirements

@ianlancetaylor, any interest in reviewing a CL if @appleby sends one?

@adg, any opinions on which version(s) of FreeBSD we officially support?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Sep 28, 2016

I'm happy to look at a CL, but do we know what happens when we pass a large uaddr1 to an older version of FreeBSD? If it fails with EINVAL or ENOSYS or something, then we can easily fall back. Otherwise we will need to look at the kernel version to decide what to do.

Or we could decide to drop support for older versions of FreeBSD. I'm not a FreeBSD user myself so I have no opinion about that.

@appleby

This comment has been minimized.

Copy link
Contributor

commented Sep 28, 2016

My understanding is that uaddr1 was ignored completely in revisions prior to 231989, going back at least to release 8.0.0 (the oldest I checked).

231989 was followed 3 days later by 232144, but the semantics of uaddr1 between those revisions does differ, and I'm not certain which versions of FreeBSD, if any, contain 231989.

As far as I can tell, any version cut from a RELEASE branch will either ignore uaddr1 or treat is as a size_t (I'll double check this).

On the other hand, I believe this would be a breaking change for any system running an old STABLE or CURRENT cut after 231989 but before 232144. I don't know enough about FreeBSD releases to know whether any such releases actually exists.

@appleby

This comment has been minimized.

Copy link
Contributor

commented Sep 28, 2016

My svn-fu is weak, but it looks like none of the releng, release, or stable branches contain r231989 that do no also contain r232144. The earliest branches these revisions appear in are stable/10 and releng/10.0.

I will spin up at least a couple of older FreeBSD VMs for testing before I mail the CL. It might take a day or two as I am behind a slow link, so downloading the images takes a while.

@gopherbot

This comment has been minimized.

Copy link

commented Sep 30, 2016

CL https://golang.org/cl/30154 mentions this issue.

@gopherbot gopherbot closed this in 360f2e4 Oct 1, 2016

@mikioh mikioh modified the milestones: Go1.7.2, Go1.8Maybe Oct 14, 2016

@gopherbot

This comment has been minimized.

Copy link

commented Oct 17, 2016

CL https://golang.org/cl/31269 mentions this issue.

gopherbot pushed a commit that referenced this issue Oct 17, 2016

[release-branch.go1.7] runtime: sleep on CLOCK_MONOTONIC in futexslee…
…p1 on freebsd

In FreeBSD 10.0, the _umtx_op syscall was changed to allow sleeping on
any supported clock, but the default clock was switched from a monotonic
clock to CLOCK_REALTIME.

Prior to 10.0, the __umtx_op_wait* functions ignored the fourth argument
to _umtx_op (uaddr1), expected the fifth argument (uaddr2) to be a
struct timespec pointer, and used a monotonic clock (nanouptime(9)) for
timeout calculations.

Since 10.0, if callers want a clock other than CLOCK_REALTIME, they must
call _umtx_op with uaddr1 set to a value greater than sizeof(struct
timespec), and with uaddr2 as pointer to a struct _umtx_time, rather
than a timespec. Callers can set the _clockid field of the struct
_umtx_time to request the clock they want.

The relevant FreeBSD commit:
    https://svnweb.freebsd.org/base?view=revision&revision=232144

Fixes #17168

Change-Id: I3dd7b32b683622b8d7b4a6a8f9eb56401bed6bdf
Reviewed-on: https://go-review.googlesource.com/30154
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-on: https://go-review.googlesource.com/31269

ceseo added a commit to powertechpreview/go that referenced this issue Dec 1, 2016

[release-branch.go1.7] runtime: sleep on CLOCK_MONOTONIC in futexslee…
…p1 on freebsd

In FreeBSD 10.0, the _umtx_op syscall was changed to allow sleeping on
any supported clock, but the default clock was switched from a monotonic
clock to CLOCK_REALTIME.

Prior to 10.0, the __umtx_op_wait* functions ignored the fourth argument
to _umtx_op (uaddr1), expected the fifth argument (uaddr2) to be a
struct timespec pointer, and used a monotonic clock (nanouptime(9)) for
timeout calculations.

Since 10.0, if callers want a clock other than CLOCK_REALTIME, they must
call _umtx_op with uaddr1 set to a value greater than sizeof(struct
timespec), and with uaddr2 as pointer to a struct _umtx_time, rather
than a timespec. Callers can set the _clockid field of the struct
_umtx_time to request the clock they want.

The relevant FreeBSD commit:
    https://svnweb.freebsd.org/base?view=revision&revision=232144

Fixes golang#17168

Change-Id: I3dd7b32b683622b8d7b4a6a8f9eb56401bed6bdf
Reviewed-on: https://go-review.googlesource.com/30154
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-on: https://go-review.googlesource.com/31269

@golang golang locked and limited conversation to collaborators Oct 17, 2017

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.