Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

time: Timer.Reset is not possible to use correctly #14038

Closed
dvyukov opened this issue Jan 20, 2016 · 14 comments

Comments

@dvyukov
Copy link
Member

commented Jan 20, 2016

I saw two common usage patterns of Timer.Reset:

First is ensuring that a sequence of events happens in a timely fashion:

timeout := time.NewTimer(T)
for {
  if !timeout.Reset(T) {
    <-timeout.C
  }
  select {
  case <- ...: ...
  case <-timeout.C: ...
  }
}

Second is connection timeout with an ability to change the timeout:

timeout := time.NewTimer(T)

// goroutine 1
for {
  select {
  case <- ...: ...
  case <-timeout.C: ...
  }
}

// goroutine 2
  if !timeout.Reset(T2) {
    <-timeout.C
  }

There are several things that can go wrong here depending on timings:

  1. Reset returns false; one element is already in the channel (the old one); timer goroutine sends another (corresponding to the new timeout), it is discarded since the channel is full; now we drain the channel to read out the old value; as the result timeout will never happen as the new value was discarded.
  2. Reset returns false; one element is already in the channel (the old one); timer goroutine sends another (corresponding to the new timeout), it is discarded since the channel is full; goroutine 1 reads out from the channel; goroutine 2 hangs forever trying to drain the channel.

I've found several cases of this bug in our internal codebase, one even with a TODO by @bradfitz describing this race.

Timer.Reset does the following:

func (t *Timer) Reset(d Duration) bool {
    if t.r.f == nil {
        panic("time: Reset called on uninitialized Timer")
    }
    w := when(d)
    active := stopTimer(&t.r)
    t.r.when = w
    startTimer(&t.r)
    return active
}

What would be possible to use correctly is:

func (t *Timer) Reset2(d Duration) bool {
    if t.r.f == nil {
        panic("time: Reset called on uninitialized Timer")
    }
    w := when(d)
    active := stopTimer(&t.r)
        if !active {
                <-t.C
        }
    t.r.when = w
    startTimer(&t.r)
    return active
}

@rsc @Sajmani @aclements

@rsc

This comment has been minimized.

Copy link
Contributor

commented Jan 20, 2016

I disagree with the title. It is possible to use correctly. People often don't, but that's kind of a separate issue.

The suggested Reset2 has its own race. If the Reset is concurrent with an active read on t.C, it is possible that stopTimer returns false but then the other goroutine's read on t.C takes the value before the <-t.C line in Reset2, causing Reset2 to hang.

I believe this issue is primarily a duplicate of #11513. Perhaps we should do a better job of explaining how to use Reset correctly, but I don't think it would help to add a Reset2.

@dvyukov

This comment has been minimized.

Copy link
Member Author

commented Jan 20, 2016

Yes, I realized that Reset2 does not fix the second example, while it fixes the first one.

Sorry, but your "corrected" version in #11513 (comment) has exactly the race I described and can hang forever (assuming that case <-time.After(t2) is actually an untrusted channel which can hang).

@rsc

This comment has been minimized.

Copy link
Contributor

commented May 18, 2016

FWIW, on a careful re-reading of Dmitriy's description of the races, it does seem like Reset is not possible to use correctly. The fundamental issue is that if Reset tells the caller that there is an event to read, there is now a race between the caller consuming that event in order to fully reset the channel and the new timer expiring, which will attempt to send a new event to the channel. For a long timeout this is likely never going to be an issue, but for short timeouts you could imagine it happening.

Not thinking about what to do about this right now. It might be that we have to change Reset to reach in and drain the channel itself, as if implemented to start with if !t.Stop() { select { case <-t.C: ; default: } }. As I commented in #11513, this would be a visible change. But it might be justified by Reset being fundamentally unusable otherwise.

Maybe for Go 1.8. I'd like to think about this again when I've had a good week's sleep.

@rsc rsc modified the milestones: Go1.8, Go1.7 May 18, 2016

@rsc rsc removed the Documentation label May 18, 2016

@dvyukov

This comment has been minimized.

Copy link
Member Author

commented May 18, 2016

If I remember correctly, separate Stop and Start would solve the race. The crucial point is to be able to execute code (drain) in between Stop and Start.

@rsc

This comment has been minimized.

Copy link
Contributor

commented May 18, 2016

Hmm, so maybe Reset doesn't have to change, you just have to tell people you should always make sure the timer is stopped before calling Reset (so that Reset = Start) and never Reset a running timer. Anyway, for Go 1.8.

@gopherbot

This comment has been minimized.

Copy link

commented May 31, 2016

CL https://golang.org/cl/23575 mentions this issue.

gopherbot pushed a commit that referenced this issue Jun 6, 2016

time: warn about correct use of a Timer's Stop/Reset methods
Updates #14038
Fixes #14383

Change-Id: Icf6acb7c5d13ff1d3145084544c030a778482a38
Reviewed-on: https://go-review.googlesource.com/23575
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@noblehng

This comment has been minimized.

Copy link

commented Oct 4, 2016

I think this change to documentation is not accurate. Here is a example with a slightly change from the first example of OP that adopt the doc, which does not concurrently receive on timer.C, and it deadlock:

package main

import (
     "time"
)

func main() {
    timer := time.NewTimer(1 * time.Second)

    for i := 0; i < 2; i++ {
            if !timer.Stop() {
                    <-timer.C
            }
            timer.Reset(1 * time.Second)
            <-timer.C
    }
}

I think the problem is that, the first call of both Reset() and Stop() only tells whether the old timer has expired, but not whether the event has been consumed. To reset old timer in normal path even when it could still have been expired before next operation, the first example of OP could be changed to:

timeout := time.NewTimer(T)
for {
    timeout.Reset(T)

    select {
    case <- ...: ...
        if !timeout.Stop() {
            <-timeout.C
        }
    case <-timeout.C: ...
    }
}

Or don't do that but just use a longer timeout duration and accept a concurrently timeout is still a timeout.

The current behavior of Reset() should be that it changes the timer to expire after duration d if the timer have not been expired or the expire event had been consumed but not in between. But this way its return value will be inconsistent, for example:

package main

import (
    "time"
)

func main() {
    timer := time.NewTimer(1 * time.Second)
    time.Sleep(2 * time.Second)
    println(timer.Reset(1 * time.Second))
    println(timer.Reset(1 * time.Second))
}

will print true for the second Reset().

There are still use cases for this behavior without first draining the old timer's channel, for example:

// goroutine 1
for {
    if err := doSomething(ctx); err != nil {
        return err
    }
    timer.Reset(d)
}

// goroutine 2
<-timer.C
cancel() // call the CancelFunc of ctx
@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Oct 4, 2016

It's not clear to me that the current doc is inaccurate. It says "to reuse an active timer." If you have already run <-timer.C, should that timer be considered active?

@noblehng

This comment has been minimized.

Copy link

commented Oct 4, 2016

Oh sorry, my bad. I was confused by the example in the doc.

Edit:
It is still unclear to me what "active" means now. I was assuming it means the timer have not been expired, but in this case it means its event have not been consumed. Maybe the "It returns true if the timer had been active, false if the timer had expired or been stopped." part of Reset()'s doc needs to be changed or removed too?

@quentinmit

This comment has been minimized.

Copy link
Contributor

commented Oct 5, 2016

@rsc @ianlancetaylor What is left to do here? Are we going to change APIs or documentation, or is the existing change sufficient?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Oct 6, 2016

Ping @dvyukov

@dvyukov

This comment has been minimized.

Copy link
Member Author

commented Oct 6, 2016

I don't think that adding more methods is useful here, it won't help with the old methods.
But I would not expect lots of developers to re-read all docs on every release. Was this mentioned in release notes?
Do we want to add a check to vet? Leave a note for 2.0?

@rsc rsc self-assigned this Oct 6, 2016

@gopherbot

This comment has been minimized.

Copy link

commented Oct 18, 2016

CL https://golang.org/cl/31350 mentions this issue.

@r0l1

This comment has been minimized.

Copy link

commented Dec 6, 2016

Just implemented a small fix, which can be used as a drop-in replacement.

https://github.com/desertbit/timer

@golang golang locked and limited conversation to collaborators Dec 6, 2017

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
8 participants
You can’t perform that action at this time.