New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
time: make Timer/Ticker channels not receivable with old values after Stop or Reset returns #37196
Comments
For an example program illustrating why a user might reasonably expect (and attempt to make use of) this behavior, see https://play.golang.org/p/mDkMG67ehAI. |
Thanks for filing this. I also found this behavior puzzling and apparently I got it wrong too as I was expecting no values to be sent after If, as I understand now, a value might be sent after the call to |
If you know that no other goroutine is receiving on the (The |
In the current implementation, this can probably be done by changing |
Added to proposal process to resolve whether to do this, after mention in #38945. |
This made its first appearance in the proposal minutes last week.
This would succeed today, but if we clear the channel during On the other hand, there is technically a race here, and it's not guaranteed that this snippet never blocks, especially on a loaded system. The correct snippet, even today, is:
That example would be unaffected, because Stop would pull the buffered element back out of the channel and then return We could make the change at the start of a release cycle and be ready to roll it back if real (as opposed to hypothetical) problems arose. Thoughts? |
While I like the behavior of pulling the value back out of the channel and having For example, this program today is not reported as a race and the receive is guaranteed to never block, but would deadlock if package main
import (
"fmt"
"time"
)
func main() {
t := time.NewTimer(1 * time.Second)
for len(t.C) == 0 {
time.Sleep(time.Millisecond)
}
t.Stop()
<-t.C
fmt.Println("ok")
} That is why I had restricted my original proposal to guarantee only that no value is sent after |
Sorry, I'm not sure I understand what it means to guarantee that no value is sent on the channel after |
@ianlancetaylor, the program given in https://play.golang.org/p/Wm1x8DmYoQo should run indefinitely without reporting any delayed times. Instead, it reports a nonzero rate of delayed timer sends. On my Xeon workstation:
In all cases, the failure mode is the same: at some point after Instead, I propose that the send to the channel, if it occurs at all, ought to happen before the return from |
Thanks for the example. If the code is written per the documentation of And now I see how this happens. It's because in I don't see an easy fix in the current implementation. We can't leave the timer in |
Would it make sense to change the timer to the (Or would that potentially induce starvation for a |
The same send-after- (Code in https://play.golang.org/p/Y_Hz4xkYr07, but it doesn't reproduce there due to the Playground's |
It's actually a bit more complicated than I thought. If we leave the timer status as |
Let's try to separate the discussion of semantics from implementation. The docs say:
The proposed change in semantics is to make it that no receive can ever happen after t.Stop returns. That would mean that under the assumption - "assuming the program has not received from t.C already" - t.Stop would never return false anymore. So the above code would no longer be required (and if left alone would just never execute the if body). This would avoid the problem of people not knowing to write that fixup code or not understanding the subtlety involved. Certainly we've all confused ourselves enough just thinking through this over the years. That's the proposed semantic change. There are also questions about how to implement that - it's not just the easy "pull the item back out of the buffer" that I suggested before - but certainly it is possible. Let's figure out if we agree about the semantic change. (Another side-effect would probably be that len(t.C) would always return 0, meaning the channel would appear to be unbuffered. But we've never promised buffering in the docs. The buffering is only there because I was trying to avoid blocking the timer thread. It's a hack.) |
I agree that if we could change the semantics to ensure that no receive is possible after However, I don't see any way to do that without breaking the example in https://play.golang.org/p/r77N1PfXuu5. That program explicitly relies on the buffering behaviors of Disallowing a receive after So the only way I see to make that change in semantics would be to declare that this example program is already incorrect, because it is relying on undocumented behavior regardless of the stability of that behavior. I could be wrong, but given how long this behavior has been in place I suspect that changing it would break real programs. |
Do you know of any programs that call |
I do not know of any specifically, but given the 9-year interval and Hyrum's law I would be surprised if they did not exist. If we are willing to assume that such programs do not exist (or declare that they are invalid if they do exist, because the buffering behavior was never documented), then preventing the receive seems ok. |
Above, we identified that one possible way that the semantic change being discussed might break code is if the code used My intuition was that using The corpus I am using is a collection of the latest of each major version of each module listed in the Go index (index.golang.org) as of late March. There are 134,485 modules with distinct paths in my corpus. Of those, 17,021 mention Of the 15 modules applying The semantic change proposed in this issue would make all the buggy code correct, assuming we apply the same change to Use of len to simulate non-blocking selectgithub.com/packetzoom/logslammergithub.com/packetzoom/logslammer version v0.0.3 output/elasticsearch/elasticsearch.go (April 7, 2016)
This code seems to be using the The semantic change proposed in this issue would make the code never process ticks, since Use of len for logging/debugging printsgithub.com/uber/cherami-servergithub.com/uber/cherami-server version v1.28.1 services/outputhost/messagecache.go (November 29, 2017) has a
And the `updatePumpHealth method does:
The logging would never trigger anymore, and the pump health update would register the channel as “full” for being unbuffered (length and capacity both 0). However, the repository's README says that the project is “deprecated and not maintained.” Racy use of len to drain channel after Stopgithub.com/Freezerburn/go-coroutinegithub.com/Freezerburn/go-coroutine v1.0.1 src/github.com/Freezerburn/coroutine/embed.go:
This use of The proposed semantic change to There is another instance of the pattern later in the file:
Like The same analysis applies: if the The current latest commit rewrites both of those code snippets to use a helper that is different (avoiding
The same analysis applies, since a non-blocking select is just a less racy “ github.com/MatrixAINetwork/go-matrixgithub.com/MatrixAINetwork/go-matrix v1.1.7 p2p/buckets.go (May 20, 2020):
This code is trying to make sure that each iteration of the select falls into the timeout case after 60 seconds. Same race as in previous example; proposed semantic change applies the same fix. github.com/cobolbaby/log-agentgithub.com/cobolbaby/log-agent 71f7f9f watchdog/watchdog.go:
Same race; same fix. github.com/dcarbone/go-cachemangithub.com/dcarbone/go-cacheman v1.1.2 key_manager.go:
Same race; same fix (appears four times in file). github.com/myENA/consultant/v2github.com/myENA/consultant/v2 v2.1.1 service.go:
Same race; same fix (appears three times in file). github.com/smartcontractkit/chainlinkgithub.com/smartcontractkit/chainlink v0.7.8 core/services/fluxmonitor/flux_monitor.go:
Same race; same fix. This is of course not the code suggested by the documentation, although it does match the (incorrect) linked blog post. This code was removed without comment in v0.8.0. github.com/vntchain/go-vntgithub.com/vntchain/go-vnt v0.6.4-alpha.6 producer/worker.go:
Same race; same fix. github.com/eBay/akutangithub.com/eBay/akutan 9a750f2 src/github.com/ebay/akutan/util/clocks/wall.go:
Same race; same fix. github.com/piotrnar/gocoingithub.com/piotrnar/gocoin 820d7ad client/main.go:
Same race; same fix. Racy use of len to drain channel before Resetgithub.com/qlcchain/go-qlcgithub.com/qlcchain/go-qlc v1.3.5 p2p/sync.go:
This is the usual pattern from the previous section, except there is no call to If we change Racy use of len to drain channel after Resetgithub.com/chenjie199234/Corelibgithub.com/chenjie199234/Corelib 2d8c16542cbe logger/logger.go:
This is like the previous example but the draining happens after This entire package was removed on April 20. Racy use of len to drain channel without Stop/Resetgithub.com/indeedsecurity/carbonbeatgithub.com/indeedsecurity/carbonbeat v1.0.0 app/carbonbeat.go:
Strictly speaking, this code is not wrong, but it is also not useful. github.com/indeedsecurity/carbonbeat/v2github.com/indeedsecurity/carbonbeat/v2 v2.0.2 app/carbonbeat.go contains the same code. |
For what it's worth, seeing all that buggy code as evidence of people struggling to use the current t.Stop and t.Reset correctly in single-goroutine use cases only makes me that much more convinced we should try the proposed change. |
Discussion ongoing, so leaving this for another week, but it seems headed for likely accept. Please speak up if you think this is a bad idea. |
I think this is an excellent idea and I'd bet that I've written at least one piece of code which lacks, but needs, the fixup code. |
Is this actually the same issue as this old one? #11513 |
@rogpeppe, yes it certainly subsumes that one. The example I gave there in #11513 (comment) would definitely change but after seeing real uses, I think more code will be fixed than would break. It can be an early in cycle change. Also, in that comment I said it's not the library's job to "throw away values already sent on the channel". The fix I'm suggesting above makes the channel (indistinguishable from) unbuffered instead, so there are no values "already sent". |
Based on the discussion above, this sounds like a likely accept. One possible implementation would be:
There are other more involved implementations (see for example #8898), but this simple change seems like the shortest path to the semantics we've been discussing. |
No change in consensus, so accepting. |
In #14383 (comment), @rsc said:
As far as I can tell, no different bug was filed: the documentation for
(*Timer).Stop
still says:go/src/time/sleep.go
Lines 57 to 66 in 7d2473d
and that behavior is still resulting in subtle bugs (#27169). (Note that @rsc himself assumed that a
select
should work in this way in #14038 (comment).)I think we should tighten up the invariants of
(*Timer).Stop
to eliminate this subtlety.CC @empijei @matttproud @the80srobot @ianlancetaylor
The text was updated successfully, but these errors were encountered: