Skip to content

time: wall and monotonic clocks get out of sync #27090

@dshearer

Description

@dshearer

What version of Go are you using (go version)?

go1.10.3 linux/amd64

Does this issue reproduce with the latest release?

Indeed.

What operating system and processor architecture are you using (go env)?

Running in a Docker container. Here's the output of go env on the container:

GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/go"
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build937894194=/tmp/go-build -gno-record-gcc-switches"

Strangely, I have not seen, or heard reports of, this happening outside of Docker.

What did you do?

My program must do some task on a schedule. So it uses the "time" lib to compute the next time to do the task, and to wait till that time. Here's an example:

package main

import (
	"fmt"
	"time"
)

func main() {
	now := time.Now()

	for {
		fmt.Println("")
		nrt := now.Add(time.Second * 5)
		fmt.Printf("Now: %v; Next run time: %v\n", now.String(), nrt.String())

		for now.Before(nrt) {
			sleepDur := nrt.Sub(now)
			fmt.Printf("Sleeping for %v\n", sleepDur)

			afterChan := time.After(sleepDur)
			now = <-afterChan
			fmt.Printf("Awoke at %v\n", now.String())
		}

		// do task
		fmt.Printf("Doing task at %v (next run time: %v)\n", now.String(), nrt.String())
	}
}

I ran it in a Docker container (version 18.06.0-ce-mac70):

FROM golang:1.10-alpine

WORKDIR /
COPY main.go /
RUN go build /main.go

ENTRYPOINT ["/main"]

What did you expect to see?

For every "Doing task at X (next run time: Y)" line, X should be >= Y.

What did you see instead?

After a few iterations, I see "Doing task at X (next run time: Y)" lines where X < Y. Example:

Doing task at 2018-08-20 00:09:51.9754029 +0000 UTC m=+60.021665301 (next run time: 2018-08-20 00:09:52.0083237 +0000 UTC m=+60.021022301)

Analysis

This does not always happen, and usually only after a few iterations. As I mentioned above, I have only seen this in Docker containers. With this example program, the times will only be off by tens of milliseconds.

Here's a longer output sample, with 3 iterations:

Now: 2018-08-20 00:49:33.8226258 +0000 UTC m=+550.358479801; Next run time: 2018-08-20 00:49:38.8226258 +0000 UTC m=+555.358479801
Sleeping for 5s
Awoke at 2018-08-20 00:49:38.8275073 +0000 UTC m=+555.363361401
Doing task at 2018-08-20 00:49:38.8275073 +0000 UTC m=+555.363361401 (next run time: 2018-08-20 00:49:38.8226258 +0000 UTC m=+555.358479801)

Now: 2018-08-20 00:49:38.8275073 +0000 UTC m=+555.363361401; Next run time: 2018-08-20 00:49:43.8275073 +0000 UTC m=+560.363361401
Sleeping for 5s
Awoke at 2018-08-20 00:49:43.8283399 +0000 UTC m=+560.364194401
Doing task at 2018-08-20 00:49:43.8283399 +0000 UTC m=+560.364194401 (next run time: 2018-08-20 00:49:43.8275073 +0000 UTC m=+560.363361401)

Now: 2018-08-20 00:49:43.8283399 +0000 UTC m=+560.364194401; Next run time: 2018-08-20 00:49:48.8283399 +0000 UTC m=+565.364194401
Sleeping for 5s
Awoke at 2018-08-20 00:49:48.799623 +0000 UTC m=+565.368983701
Doing task at 2018-08-20 00:49:48.799623 +0000 UTC m=+565.368983701 (next run time: 2018-08-20 00:49:48.8283399 +0000 UTC m=+565.364194401)

The bug shows up in the last iteration, in which Before claims that 00:49:48.799623 is not before 00:49:48.8283399. Interestingly, while Before is incorrect in terms of the wall-clock times, it is correct in terms of the monotonic times.

The last iteration began with now == 00:49:43.8283399 (m=+560.364194401). It then slept, and woke when the After channel passed it a new now value of 00:49:48.799623 (m=+565.368983701). Note that the difference in wall-clock time is 4.971283099999994 sec, while the difference in monotonic time is 5.004789300000084 sec. So, it seems that the time lib is returning time values in which the relation between the monotonic and wall clocks changes a bit. IOW, one of these clocks is not properly keeping time.

Cc. @rsc

Background

I run the Jobber project, which is an enhanced cron that can be run in Docker. This bug caused some users' jobs to run twice: once a second or two before the scheduled time, and once at the scheduled time. Confer dshearer/jobber#192

Metadata

Metadata

Assignees

No one assigned

    Labels

    NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions