Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: use CLOCK_BOOTIME, not CLOCK_MONOTONIC, when possible #24595

Open
ianlancetaylor opened this issue Mar 29, 2018 · 34 comments

Comments

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Mar 29, 2018

On GNU/Linux since 2.6.39 clock_gettime supports CLOCK_BOOTTIME, which is equivalent to CLOCK_MONOTONIC except that it also accounts for time when the computer is asleep. We should prefer CLOCK_BOOTTIME to CLOCK_MONOTONIC when it is available. That will permit time.Time.Sub to return the correct value across periods when the computer has gone to sleep.

See #23178.

@ianlancetaylor ianlancetaylor added this to the Go1.11 milestone Mar 29, 2018
@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Mar 30, 2018

Define "correct". I am not convinced.

@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Mar 30, 2018

Sorry. That was too short.

I think it's plausible that some people will want to exclude the time when the computer is asleep. I will also note that - I believe - the monotonic time always matches runtime.nanotime, and I'm not sure the runtime timers necessarily want to include the time when the computer is asleep.

Or imagine calling time.Ticker(1*time.Hour) to do some kind of cleanup once per hour. If the computer was off for 50 minutes, do you want the tick after 10 minutes of program execution or not? I'm not sure you do.

I can also imagine programs depending on one or the other behavior. If all current systems agree that monotonic time does not include time the computer is asleep, then I'm not sure we should introduce a Linux-specific variation. (I'm assuming - maybe incorrectly - that other systems typically don't support this.)

My point is only that I don't think it's a slam dunk we should definitely do this. I'd like to know more about what other systems do. For example, do Python and Java's monotonic time measurements include time spent asleep? How many operating systems provide access to time spent asleep?

@mpx

This comment has been minimized.

Copy link
Contributor

@mpx mpx commented Apr 1, 2018

There is an interesting comparison to be made with suspending a single Go process (SIGSTOP/delay/SIGCONT). Over a shorter time frame an overloaded system might look similar as well - real time passes without any opportunities to process events due to cpu starvation.

Processes can't control whether they experience some form of cpu starvation, either via system suspend, process suspend, or overload. They probably need to handle events occuring without much processing time to be reliable. Eg, a process doing an hourly cleanup needs to handle the cleanup occurring "too soon" otherwise it should track against amount processed, not real time.

It is surprising that t1.Sub(t0) doesn't return duration in realtime, but an ambiguous "time spent while system (not necessarily process) was running". Other languages without Go's monotonic aware time.Now will return the expected realtime duration in a similar situation.

@mpx

This comment has been minimized.

Copy link
Contributor

@mpx mpx commented Apr 16, 2018

CLOCK_MONOTONIC will behave the same as CLOCK_BOOTTIME as of Linux 4.17 (advancing time while suspended). Details here.

This may be a good argument to use CLOCK_BOOTTIME for consistency between kernels.

@ash2k

This comment has been minimized.

Copy link

@ash2k ash2k commented Apr 16, 2018

It is surprising that t1.Sub(t0) doesn't return duration in realtime, but an ambiguous "time spent while system (not necessarily process) was running". Other languages without Go's monotonic aware time.Now will return the expected realtime duration in a similar situation.

If I understand correctly, this is also a change in behaviour compared to the previous implementation when time.Now was not monotonic aware.

@zx2c4

This comment has been minimized.

Copy link
Contributor

@zx2c4 zx2c4 commented May 4, 2018

Oh, I just opened a duplicate of this. Yes, please make this change. Without it, implementing certain network protocols reliably for Android devices is really miserable.

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented May 4, 2018

Change https://golang.org/cl/111356 mentions this issue: runtime/linux: use CLOCK_BOOTTIME in nanotime()

@bradfitz

This comment has been minimized.

Copy link
Member

@bradfitz bradfitz commented May 4, 2018

https://github.com/golang/go/wiki/MinimumRequirements#linux says we currently require Kernel version 2.6.23.

Would this require bumping our minimum kernel version?

Or would we use this conditionally at runtime?

@zx2c4

This comment has been minimized.

Copy link
Contributor

@zx2c4 zx2c4 commented May 4, 2018

BOOTTIME requires 2.6.39 indeed. I just checked old kernels, and fortunately clock_gettime returns -EINVAL if you pass it too new of a value, which means you could easily try BOOTTIME and fall back to MONOTONIC, if you're on a kernel that doesn't support it. Or you could just raise the minimum kernel requirement. Either way is fine with me. (It's worth noting that the oldest kernel supported by kernel.org is 3.2.)

@slrz

This comment has been minimized.

Copy link

@slrz slrz commented May 7, 2018

Note that the change to make CLOCK_MONOTONIC behave like BOOTTIME in Linux 4.17 got reverted quickly due to causing compatibility issues. It won't happen, at least not for a very long time, if ever.

https://www.spinics.net/lists/linux-tip-commits/msg43709.html

@zx2c4

This comment has been minimized.

Copy link
Contributor

@zx2c4 zx2c4 commented May 7, 2018

Bummer, though still not a reason for us to inherit Linux's legacy problems.

@slrz

This comment has been minimized.

Copy link

@slrz slrz commented May 7, 2018

Agreed. It's just that the same kind of issue (like the systemd watchdog timeouts triggering on resume) may very well lurk in Go programs, too. It's probably worth a try, reverting the change before release if things blow up.

@zx2c4

This comment has been minimized.

Copy link
Contributor

@zx2c4 zx2c4 commented May 9, 2018

Welp, this is unfortunate, but I'm doing this now downstream: https://git.zx2c4.com/wireguard-android/commit/?id=7f8799a4d44058d2fe0981841b8b6d825f97aee7

(The good news is that it actually works pretty well.)

@eliasnaur

This comment has been minimized.

Copy link
Contributor

@eliasnaur eliasnaur commented May 20, 2018

FWIW, macOS added mach_continuous_time in macOS 10.12. It is like mach_absolute_time but advances during sleep.

@FiloSottile FiloSottile added NeedsDecision and removed NeedsFix labels May 22, 2018
@FiloSottile

This comment has been minimized.

Copy link
Member

@FiloSottile FiloSottile commented May 22, 2018

Tagging this NeedsDecision as the CL is blocked on @rsc approving.

While I see the risk of having the same timeout issues the Linux kernel had, I think our case is different because it’s exposed through an interface that normally refers to real time. The best argument I’ve seen for this change is that without it time.Since can return very surprising results.

@zx2c4

This comment has been minimized.

Copy link
Contributor

@zx2c4 zx2c4 commented May 22, 2018

The best argument I’ve seen

It also makes stateful network protocols extremely difficult to implement without this, particularly cryptographic ones, where cleaning up old keys is important.

@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Jun 25, 2018

OK, feel free to try this early in Go 1.12.

@rsc rsc modified the milestones: Go1.11, Go1.12 Jun 25, 2018
@zx2c4

This comment has been minimized.

Copy link
Contributor

@zx2c4 zx2c4 commented Jun 25, 2018

Okay. I'll set a reminder for myself to do this when the 1.12 tree opens in August.

@zx2c4

This comment has been minimized.

Copy link
Contributor

@zx2c4 zx2c4 commented Sep 4, 2018

OK, feel free to try this early in Go 1.12.

This has now been submitted at https://go-review.googlesource.com/c/go/+/111356 for merging.

@odeke-em

This comment has been minimized.

Copy link
Member

@odeke-em odeke-em commented Feb 4, 2019

Thank you @ianlancetaylor for the report, others for chiming in and thank you @zx2c4 for the CL 111356!

I just commented on that CL that since we are nearing for releasing cutting time for Go1.12 but also
from the 15X slow down from this CL I am going to punt this to Go1.13 to give ample time for more discussions and perhaps changes in Android.

@odeke-em odeke-em modified the milestones: Go1.12, Go1.13 Feb 4, 2019
@andybons andybons modified the milestones: Go1.13, Go1.14 Jul 8, 2019
@rsc rsc modified the milestones: Go1.14, Backlog Oct 9, 2019
@smasher164 smasher164 modified the milestones: Backlog, Go1.14 Oct 11, 2019
@andybons

This comment has been minimized.

Copy link
Member

@andybons andybons commented Oct 15, 2019

It's too late in the cycle for this to land in 1.14 (trusting the early-in-cycle classification). @zx2c4 if you still plan to work on this you are more than welcome to do so, but it can't land until the 1.15 tree opens in Feb 2020. Feel free to move it to the Go1.15 milestone if that's the case.

@andybons andybons modified the milestones: Go1.14, Unplanned Oct 15, 2019
@zx2c4

This comment has been minimized.

Copy link
Contributor

@zx2c4 zx2c4 commented Oct 15, 2019

Indeed it's too late, and this actually requires some other changes too: we have to use futex and semaphores with sleep that corresponds with BOOTTIME.

@andybons

This comment has been minimized.

Copy link
Member

@andybons andybons commented Oct 15, 2019

Thanks for the update, @zx2c4 :)

@networkimprov

This comment has been minimized.

Copy link

@networkimprov networkimprov commented Oct 20, 2019

Maybe this calls for a new stdlib API, since changing the way time.Timer works isn't workable for all callers.

How about type time.WallTimer?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor Author

@ianlancetaylor ianlancetaylor commented Oct 20, 2019

I believe that most systems are moving toward incrementing monotonic time while the system is asleep. As @mpx notes above, the Linux kernel is making CLOCK_MONOTONIC the same as CLOCK_BOOTTIME.

I'm not convinced that we need two different kinds of timers.

@zx2c4

This comment has been minimized.

Copy link
Contributor

@zx2c4 zx2c4 commented Oct 20, 2019

I'm not convinced that we need two different kinds of timers.

I'm not convinced either. One is fine. And we should simply make them all function as CLOCK_BOOTTIME. Windows is already there. The other platforms will require a bit of poking and prodding. I'll see if I can find some time to have another try at it soon, since I'm shipping a patched Go runtime anyway for this.

@networkimprov

This comment has been minimized.

Copy link

@networkimprov networkimprov commented Oct 20, 2019

The Linux kernel change was apparently abandoned; #24595 (comment)

I don't understand why you'd eliminate the absolute timer. I use timers in select for channel receive ops where I'm waiting on an ack. If the timer fires, I retry whatever wants the ack. If the system sleeps, I wouldn't have got the ack, so I wouldn't want the timer to fire on resume.

If Go timers don't work the same on Windows as other platforms, that's a likely source of porting problems, and it's not documented, yikes.

@gabzim

This comment has been minimized.

Copy link

@gabzim gabzim commented Oct 21, 2019

If the system sleeps, I wouldn't have got the ack, so I wouldn't want the timer to fire on resume.

This is just one use case, there are plenty with different behavior needed. Personally, I think it's not intuitive that we use a clock that's not advanced when the computer sleeps. IMHO, the computer sleeping throughout execution is a detail that should be transparent to you, otherwise you can't trust your timers.

Given this code:

func main() {
	now := time.Now()
	tenMinsFromNow := 10 * time.Minute
	timer := time.NewTimer(tenMinsFromNow)

	logrus.WithField("now", now.Format(time.RFC1123)).WithField("shouldFireAt", now.Add(tenMinsFromNow).Format(time.RFC1123)).WithField("expectedDelay", tenMinsFromNow).Infof("starting alarm...")
	firedAt := <- timer.C

	logrus.WithField("now", time.Now().Format(time.RFC1123)).WithField("actuallyFiredAt", firedAt.Format(time.RFC1123)).WithField("actualDelay", firedAt.Sub(now)).Infof("ending alarm...")
}

if after <-time.C you go to sleep, when you come back the timer will fire much later than the given time (depending how much you sleep) and actualDelay will still read 10 mins (because that's the monotonic clock difference). In my use case, I'm triggering notifications & alarms to users, I don't care if the computer sleeps in between, I need those notifications to go out at the right time or when the computer wakes, the fact that it slept in between the time when the alarm was set and the time it was fired is irrelevant to me.

The end result is that timers are no longer predictable. If I set it to fire in 10 mins, will it? who knows, maybe, maybe it'll fire in 15 mins, maybe 20. Using CLOCK_MONOTIC means time stops (when computer sleeps), I think it's safer to say that the more intuitive use case is one in which time doesn't stop.

@networkimprov

This comment has been minimized.

Copy link

@networkimprov networkimprov commented Oct 21, 2019

@gabzim you missed my suggestion for type WallTimer above, which does what you want.

@mpx

This comment has been minimized.

Copy link
Contributor

@mpx mpx commented Oct 22, 2019

Program necessarily receive control some time after the delay specified, ideally not too long afterwards. However, there are many reasons why a program may not get control "quickly":

  • CPU starvation / scheduling, memory paging,..
  • SIGSTOP / SIGCONT
  • Suspend / resume

The best we can hope for is that the OS/runtime doesn't unnecessarily delay providing control to the program. Programs need to be written to cope with this uncertainly and can't assume some amount of processing has occurred in-between timers firing, otherwise they are likely buggy.

In this case, I don't think we need a new timer, we just need the runtime to provide control as close to wallclock duration as is practical. At least, that allows the program to delay further if it is more appropriate.

@networkimprov

This comment has been minimized.

Copy link

@networkimprov networkimprov commented Oct 22, 2019

@as has suggested the API func NewTimerAt(t Time) *Timer here #35012 (comment)

A variation of that is func NewTimerWall(d Duration) *Timer which is easier to apply to existing code.
EDIT: alternative name: func NewTimerBoot(d Duration) *Timer

Either would provide the functionality requested by this issue, and both are preferable to my previous suggestion of type WallTimer.

@networkimprov

This comment has been minimized.

Copy link

@networkimprov networkimprov commented Nov 9, 2019

The Windows runtime was changed in 1.13.3 to the boot-time model, without reference to this discussion. Microsoft has asked us to revert that. See #35482

@rsc @ianlancetaylor there is clearly a need for a boot-time timer; would the Go team look favorably on a proposal for func NewTimerBoot(d Duration) *Timer or similar?

@mpx

This comment has been minimized.

Copy link
Contributor

@mpx mpx commented Nov 12, 2019

The arguments behind selecting BOOTTIME or MONOTONIC are fairly nuanced (with the only difference being across suspend). I'm not convinced there is that much value adding a second timer, and even if it's added, that developers will evaluate or correctly determine the best choice for their application.

time.Time.Sub and time.Since currently return invalid results across suspend. time.Time needs to use BOOTTIME to fix this. The current situation is surprising and likely to cause hard to debug effects. This is unique to Go, so wouldn't have been a factor on other platforms when deciding the balance between BOOTTIME and MONOTONIC.

If time.Time uses BOOTTIME, it makes sense that timers would too.

This isn't perfect in all situations, there are advantages/disadvantages in both directions. With MONOTONIC timers may expire long after the application intended with no recourse. With BOOTTIME the timer may expire before the application has done "enough work" (but it can at least can adapt and set another timer).

With BOOTTIME buggy use of timers may fail, with MONOTONIC correct use of timers may fail (Eg, #25248, #35012).

@networkimprov

This comment has been minimized.

Copy link

@networkimprov networkimprov commented Nov 12, 2019

Quoting @jstarks from #35447 (comment):

The Windows kernel team changed timer behavior in Windows 8 to stop advancing relative timeouts on wake. Otherwise when you open your laptop lid, every timer in the system goes off all at once and you get a bunch of unpredictable errors. Software is generally written to assume that local processes will make forward progress over reasonable time periods, and if they don't then something is wrong. When the machine is asleep, this assumption is violated. By making relative timers behave like threads, so that they both run together or they both don't, the illusion is maintained. You can claim these programs are buggy, but they obviously exist. Watchdog timers are well-known constructs.

This was a conscious design decision in Windows, and so it's disappointing to see the Go runtime second guess this several years later in a bug fix. [see #34130]

As far as behavior on Linux, there is clearly no consensus in issue #24595, which discusses this same problem. And indeed you can see that the CLOCK_MONOTONIC/CLOCK_BOOTTIME convergence was reverted in the kernel exactly because of the reason we stopped advancing time in Windows: random code has random failures due to timeouts. See https://lkml.org/lkml/2018/4/26/929 for a brief justification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.