time: clock drift on Windows 2008r2 w/ version >= 1.9 #24489

dennisdupont · 2018-03-22T16:56:43Z

What version of Go are you using (`go version`)?

Occurs on v1.9 and above

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (`go env`)?

windows/amd64
Issue occurs on win2008r2, but not on win2012 (tested) or win2016 (according to consul forum comments)
Issue occurs on domain attached or standalone server.

What did you do?

Clock drift was noticed on a deployed consul cluster (see hashicorp/consul#3925 for excruciating details). Determined it started with consul v0.9.3 and existed in the latest. That was when they switched to go v1.9. So we downgraded to v0.9.2 and problem disappeared.

The major applicable change in go 1.9 seemed to be the monotonic clock changes, so I experimented with the go version. If consul v0.9.3 is built with go 1.8 the problem also does not exist.

With help from a consul contributor we were able to create a small snippet to reproduce the issue:
https://play.golang.org/p/4y79262HSrJ

The clock drift is measured the same way as with the production servers, running w32tm:
>w32tm /stripchart /computer:10.60.1.25 /dataonly /samples:100
As soon as the test starts running you can see drift.

What did you expect to see?

A stable clock

What did you see instead?

Significant clock drift
Here is an example run:

C:\Users\Administrator>w32tm /stripchart /computer:208.88.126.235 /dataonly /samples:100
Tracking 208.88.126.235 [208.88.126.235:123].
Collecting 100 samples.
The current time is 3/22/2018 9:45:25 AM.
09:45:25, +00.0104974s
09:45:27, +00.0085572s
09:45:29, +00.0080007s
09:45:31, +00.0022288s
09:45:33, +00.0070934s
09:45:35, -00.0778244s <== test started
09:45:38, -00.1392391s
09:45:40, -00.3150037s
09:45:42, -00.4225186s
09:45:44, -00.4935759s
09:45:46, -00.6112448s
09:45:48, -00.7180814s
09:45:50, -00.8264958s
09:45:52, -00.9447071s
09:45:54, -01.0553810s <== over 1 second offset in ~20 seconds
09:45:56, -01.1570893s
09:45:58, -01.2324556s

This keeps growing until (S)NTP starts fighting the drift, but we have seen it as high as ~180 seconds, enough to cause kerberos auth failures.

The text was updated successfully, but these errors were encountered:

dgryski · 2018-03-22T20:12:03Z

@alexbrainman

dennisdupont · 2018-03-22T23:09:52Z

There are a couple of things I tried against a 1.9 version (in order):

commented out osRelax and changed osinit to set _timeBeginPeriod to 1 (as in v1.8)
altered os_windows.go to use systime(), nanotime() and unixnano() implementation from 1.8
commented out the set _timeBeginPeriod in osinit

None of these fixed the issue (although I am a neophyte, so take it with a grain of salt).

alexbrainman · 2018-03-24T00:47:38Z

09:45:54, -01.0553810s <== over 1 second offset in ~20 seconds

That's terrible Muriel.

Reading

hashicorp/consul#3925 (comment)

https://stackoverflow.com/questions/102064/clock-drift-on-windows

https://bugs.java.com/view_bug.do?bug_id=5005837

https://support.microsoft.com/en-us/help/821893/the-system-clock-may-run-fast-when-you-use-the-acpi-power-management-t

the only suggestion that comes to my mind is that staring from go1.9 we call timeBeginPeriod / timeEndPeriod much more often. Perhaps that makes your computer time drift. You can easily test that theory by changing osRelax function in runtime to do nothing.

commented out osRelax and changed osinit to set _timeBeginPeriod to 1 (as in v1.8)

Have you tried to do something like that:

diff --git a/src/runtime/os_windows.go b/src/runtime/os_windows.go
index 415ec0c..4295947 100644
--- a/src/runtime/os_windows.go
+++ b/src/runtime/os_windows.go
@@ -284,6 +284,8 @@ const osRelaxMinNS = 60 * 1e6
 // if we're already using the CPU, but if all Ps are idle there's no
 // need to consume extra power to drive the high-res timer.
 func osRelax(relax bool) uint32 {
+       return
+
        if relax {
                return uint32(stdcall1(_timeEndPeriod, 1))
        } else {

?

But I am not a real doctor. Perhaps @aclements will be more helpful.

Alex

Carrotman42 · 2018-03-28T05:12:05Z

I haven't run the example playground, but looking at the code there are two places the code comments don't match the implementation (afaict). I can't speak to what that means for this issue; I just wanted to mention what I noticed.

There's a "default" branch in the select where you're supposed to be waiting for that timer to send something down the channel. This causes the select to not block, and you won't wait for the tick.

At the end of the for loop it says you're trying to clear the timers. I suggest setting the slice to nil instead: setting it to timers[:] unfortunately doesn't do anything useful.

as · 2018-03-28T05:39:13Z

Comment doesn't match the code.

// Clear timers slice
timers = timers[:]

Should be

timers = timers[:0]

Additionally, this program seems to create an ever-increasing amount of timers that it iterates through checking for done-ness. As I now see is already stated by @Carrotman42, the default case is triggered if the uncleared value is checked or the timer hasn't fired yet, if the timer hasn't fired yet, it likely gets read on the next iteration and contains the stale timestamp.

alexbrainman · 2018-04-02T08:03:24Z

Comment doesn't match the code.

True, but that still does not explain computer clock drift.

@dennisdupont could you change your program to remove lines 30-31 and replace line 38 with timers = timers[:0], and tell us if that makes any difference to computer clock.

Thank you.

A;ex

aclements · 2018-04-03T02:26:54Z

I may be confused here, but surely this can't be anything but a Windows kernel bug (not saying that Go isn't tickling it)? No user process should be capable of causing the system clock to skew.

If I were to guess what would trigger a kernel timekeeping issue, I would definitely start with the timeBeginPeriod/timeEndPeriod. But Go definitely isn't the only thing that's constantly switching the time period. Simply retrieving the time would be far down my list of suspects, since that doesn't even enter the kernel except on Wine.

How loaded is the system? Could it be thrashing so bad that it causes huge delays? Do we trust w32tm's report?

dennisdupont · 2018-04-16T16:01:05Z

@aclements - the system I test on is not loaded at all, very idle. I have been using w32tm on quite a few servers around the center and pretty sure it is accurate. Also the drift's are consistent with the visible clocks (remote windows vs my desktop vs other servers, etc.)

dennisdupont · 2018-12-11T00:04:27Z

@ianlancetaylor - This was marked for a couple of milestones, but I don't see any comments regarding a root cause or solution. Seems it also has been referenced by a couple of others (albeit one was on win2003).

ianlancetaylor · 2018-12-11T03:39:28Z

@dennisdupont I don't think anyone knows. Like @aclements , this seems to me like a Windows kernel bug. I don't see how simply fetching the time could cause clock drift.

I also don't see other reports of this problem. If this only affects a ten year old version of Windows, then the reality is that while we would be happy to accept a fix we're unlikely to develop a fix ourselves.

seankhliao · 2023-01-28T12:05:42Z

Windows 2008 R2 support will be removed in 1.21 #57003

ALTree added OS-Windows NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Mar 22, 2018

ALTree changed the title ~~Clock drift on Windows 2008r2 w/ version >= 1.9~~ time: clock drift on Windows 2008r2 w/ version >= 1.9 Mar 22, 2018

andybons added this to the Go1.11 milestone Mar 26, 2018

alexbrainman mentioned this issue May 7, 2018

runtime: go causes Windows 2003 server clocks to run fast #25268

Closed

ruflin mentioned this issue Jun 13, 2018

Clock drift on Windows Server 2008 R2 when running latest version of winlogbeat (or any version > 6.1.0) elastic/beats#7308

Closed

ianlancetaylor modified the milestones: Go1.11, Go1.12 Jun 29, 2018

ianlancetaylor added the help wanted label Dec 11, 2018

ianlancetaylor modified the milestones: Go1.12, Go1.13 Dec 11, 2018

andybons modified the milestones: Go1.13, Go1.14 Jul 8, 2019

rsc modified the milestones: Go1.14, Backlog Oct 9, 2019

seankhliao closed this as not planned Won't fix, can't repro, duplicate, stale Jan 28, 2023

golang locked and limited conversation to collaborators Jan 28, 2024

gopherbot added the FrozenDueToAge label Jan 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

time: clock drift on Windows 2008r2 w/ version >= 1.9 #24489

time: clock drift on Windows 2008r2 w/ version >= 1.9 #24489

dennisdupont commented Mar 22, 2018

dgryski commented Mar 22, 2018

dennisdupont commented Mar 22, 2018 •

edited

Loading

alexbrainman commented Mar 24, 2018

Carrotman42 commented Mar 28, 2018

as commented Mar 28, 2018 •

edited

Loading

alexbrainman commented Apr 2, 2018

aclements commented Apr 3, 2018

dennisdupont commented Apr 16, 2018

dennisdupont commented Dec 11, 2018

ianlancetaylor commented Dec 11, 2018

seankhliao commented Jan 28, 2023

time: clock drift on Windows 2008r2 w/ version >= 1.9 #24489

time: clock drift on Windows 2008r2 w/ version >= 1.9 #24489

Comments

dennisdupont commented Mar 22, 2018

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

dgryski commented Mar 22, 2018

dennisdupont commented Mar 22, 2018 • edited Loading

alexbrainman commented Mar 24, 2018

Carrotman42 commented Mar 28, 2018

as commented Mar 28, 2018 • edited Loading

alexbrainman commented Apr 2, 2018

aclements commented Apr 3, 2018

dennisdupont commented Apr 16, 2018

dennisdupont commented Dec 11, 2018

ianlancetaylor commented Dec 11, 2018

seankhliao commented Jan 28, 2023

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?

dennisdupont commented Mar 22, 2018 •

edited

Loading

as commented Mar 28, 2018 •

edited

Loading