x/build/internal/workflow: TaskContext.ResetWatchdog is willing to reset expired timer #60444
Labels
Builders
x/build issues (builders, bots, dashboards)
NeedsInvestigation
Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone
watchdogTimer
is currently initialized to a timer that cancels the TaskContext's context when it times out:ResetWatchdog
exists to extend the lifetime of the timer to prevent it from ticking and doing the task's context cancellation. It's currently willing to reset the timer even after the timer has expired and the task's context got cancelled. This alone is odd but fine, but it can affect other places in unexpected ways.There's at least one place that determines whether the task hasn't yet hung by checking if the timer is still running:
Logging via
TaskContext.Printf
callsResetWatchdog
.As a result some tasks end up having logging statements that are slightly load-bearing. If a
Printf
happens after a task has timed out and started to abort (via context cancellation), it'll at minimum cause thestate.err = fmt.Errorf("task did not log for ..., assumed hung", ...)
branch not to trigger. (Maybe that's the entire extent of the side-effects, I hadn't looked further.)All this is to say the logic here can be reworked to be slightly less fragile, and this intersects a bit with #60443.
I spotted this while looking at the code some time ago. Filing a tracking issue so I can close some open windows, but it's low priority. CC @golang/release.
The text was updated successfully, but these errors were encountered: