New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
all: Be more stringent with timer resets/stops (ref #9417) #9422
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was pretty casual review (causing a few surprised grins) until I reached folder.go o.O
if runningTests { | ||
// Make the behavior stable when running tests to avoid randomly | ||
// varying test coverage. This ensures, in practice if not in | ||
// theory, that the timer fires and we take the true branch of the | ||
// next if. | ||
runtime.Gosched() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:)
if (err != nil || !success) && f.pullPause < 60*f.pullBasePause() { | ||
// Back off from retrying to pull | ||
f.pullPause *= 2 | ||
|
||
// Pulling failed, try again later. | ||
delay := f.pullPause + time.Since(t0) | ||
l.Infof("Folder %v isn't making sync progress - retrying in %v.", f.Description(), stringutil.NiceDurationString(delay)) | ||
timeutil.ResetTimer(pullFailTimer, delay) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think resetting the fail timer should be inside this condition. Resp. it should be split: Reset the fail timer for (err != nil || !success)
, and inside that condition and before resting the timer have another if condition to increase the pause.
Also I think f.pullPause
is now only used in this loop/serve function, so can be removed from the struct too.
} | ||
|
||
case <-pullFailTimer.C: | ||
f.SchedulePull() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Afaik you also need to schedule a pull instead of calling .pull()
in the next case
. Resp. maybe it's safer to pass the fail timer into pull
and do the resetting there, that way one can't call pull
and forget to handle the timer.
case next := <-f.scanDelay: | ||
l.Debugln(f, "Delaying scan") | ||
f.scanTimer.Reset(next) | ||
|
||
case <-f.scanScheduled: | ||
l.Debugln(f, "Scan was scheduled") | ||
f.scanTimer.Reset(0) | ||
if next > 0 { | ||
l.Debugln(f, "Delaying scan") | ||
} else { | ||
l.Debugln(f, "Scan was scheduled") | ||
} | ||
timeutil.ResetTimer(scanTimer, next) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That scheduling is magic, just a tiny little bit.
I found some concerning things while looking through our timer handling to see if we leaked timers. I didn't find any leaks of tickers, which is good, and most timers were stopped which is also fine. There were some missing stops that I added, but these should not be critical as a normal timer will anyway be collected when it expires. One place created time.After() in a loop which is not great, that was cleaned up.
However, there were some inconsistencies mostly around our use of resetting timers. The documentation is quite clear on what not to do (although it's less clear on the consequences if you do it anyway):
This is impossible to guarantee when doing Reset() from outside the loop that processes the timer in question. Any place where we saved a timer in a struct, looped on a select over it, and did resets from another goroutine would run afoul of this rule.
To handle this better I've rewritten those places to use "local" timers (i.e., timer is declared in the function that loops over it, nobody else can touch it, and we make sure it gets stopped with a defer) and a channel to trigger reset from within that loop instead.
I also added a couple of convenience functions to reset a timer properly and stop a timer properly. Using the stop method is less critical, nothing bad will happen by using just a normal (*timer).Stop(), unless it's for something like a timeout where we want to make sure the timeout isn't trigger after doing the stop because there was already an undrained value in the channel. Then using the new StopAndDrain() will make sure the channel is drained.
I used the non-blocking receive in the convenience functions because it's sometimes hard to know whether the channel is already drained when calling StopAndDrain, we just want to make sure it is drained afterwards.