-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
time: wall and monotonic clocks get out of sync #27090
Comments
Both the wall clock time and the monotonic time come from calling |
I've reproduced this in C by reimplementing much of the Go time API. It's in https://github.com/dshearer/golang-issue-27090 . This repo also includes an equivalent program in Go. |
I think the problem is the fact that the realtime clock and the monotonic clock cannot be checked simultaneously --- rather, you must make two calls to clock_gettime. And of course some amount of time passes between these calls. In a container, this amount of time can be large enough to cause this bug. |
Here's the output of the Go program (https://github.com/dshearer/golang-issue-27090/blob/master/go/main.go) running in a container:
In each iteration, it sleeps for 5 seconds. Those two diffs ("Wall diff" and "Mono diff") show how much time has passed according to the wall and mono components of the time structs. (I made a hack to get the mono component.) |
And here's the output of the C program (https://github.com/dshearer/golang-issue-27090/blob/master/c/main.c) running in a similar container:
|
Although some time passes between the calls to |
By the way, thanks for recreating the problem in C. That is very helpful and suggests that there may not be much we do about it in Go. |
The clocks are read in the same order, but the amount of time between the reads varies. That means that times returned by time.Now present the two clocks in an inconsistent relationship with each other. For illustration, let's pretend that the time struct just has two numbers for each clock. I first call time.Now and get start = {wall: A, mono: B}. I then compute the time I want to wake up by adding 5 units to start: wakeup = {wall: A+5, mono: B+5}. I then sleep, and then call time.Now. But now, I get curr = {wall: A', mono: B'} such that A' - A != B' - B. And eventually, after several iterations, I'll have a curr such that B' > B+5 but A' < A+5 --- or, in other words:
I reproduced the problem in C by reproducing how Go uses clock_gettime. So the bug (assuming we agree this is a bug) is in how Go uses clock_gettime. In fact, it seems this problem was anticipated in the design doc (https://github.com/golang/proposal/blob/master/design/12914-monotonic.md):
|
Thanks for the explanation. I don't see how Go could use |
Oops. Didn't mean to close it. FYI, here's the next paragraph in the design doc:
I can't figure out how to implement this fix without doing strange hacks to find these coefficients. I hate to say it, but I think the root problem is the decision to put both clocks in one datatype :( |
- A bug in Go's time package (golang/go#27090) was causing jobs to be executed multiple times. In this commit, we provide a workaround.
I wouldn't have expected the times to agree to begin with in any container/virtualized situation. To me, this is "working as expected". |
Let's put it this way. The time API allows the following expression to eval to true in some cases:
I'd say that this makes the time API internally inconsistent. This is not limited to containers; it's just more probable to happen in them. If this is acceptable to you all, then there's no bug. |
I think this is expected, but rarely. |
What version of Go are you using (
go version
)?go1.10.3 linux/amd64
Does this issue reproduce with the latest release?
Indeed.
What operating system and processor architecture are you using (
go env
)?Running in a Docker container. Here's the output of
go env
on the container:Strangely, I have not seen, or heard reports of, this happening outside of Docker.
What did you do?
My program must do some task on a schedule. So it uses the "time" lib to compute the next time to do the task, and to wait till that time. Here's an example:
I ran it in a Docker container (version 18.06.0-ce-mac70):
What did you expect to see?
For every "Doing task at X (next run time: Y)" line, X should be >= Y.
What did you see instead?
After a few iterations, I see "Doing task at X (next run time: Y)" lines where X < Y. Example:
Analysis
This does not always happen, and usually only after a few iterations. As I mentioned above, I have only seen this in Docker containers. With this example program, the times will only be off by tens of milliseconds.
Here's a longer output sample, with 3 iterations:
The bug shows up in the last iteration, in which
Before
claims that 00:49:48.799623 is not before 00:49:48.8283399. Interestingly, whileBefore
is incorrect in terms of the wall-clock times, it is correct in terms of the monotonic times.The last iteration began with
now
== 00:49:43.8283399 (m=+560.364194401). It then slept, and woke when theAfter
channel passed it a newnow
value of 00:49:48.799623 (m=+565.368983701). Note that the difference in wall-clock time is 4.971283099999994 sec, while the difference in monotonic time is 5.004789300000084 sec. So, it seems that thetime
lib is returning time values in which the relation between the monotonic and wall clocks changes a bit. IOW, one of these clocks is not properly keeping time.Cc. @rsc
Background
I run the Jobber project, which is an enhanced cron that can be run in Docker. This bug caused some users' jobs to run twice: once a second or two before the scheduled time, and once at the scheduled time. Confer dshearer/jobber#192
The text was updated successfully, but these errors were encountered: