Skip to content

x/build/internal/workflow: TaskContext.ResetWatchdog is willing to reset expired timer #60444

Open
@dmitshur

Description

@dmitshur

watchdogTimer is currently initialized to a timer that cancels the TaskContext's context when it times out:

func runTask(ctx context.Context, workflowID uuid.UUID, listener Listener, state taskState, args []reflect.Value) taskState {
	ctx, cancel := context.WithCancel(ctx)
	defer cancel()

	tctx := &TaskContext{
		Context:       ctx,
		Logger:        listener.Logger(workflowID, state.def.name),
		TaskName:      state.def.name,
		WorkflowID:    workflowID,
		watchdogTimer: time.AfterFunc(WatchdogDelay, cancel),
	}

	// ...

ResetWatchdog exists to extend the lifetime of the timer to prevent it from ticking and doing the task's context cancellation. It's currently willing to reset the timer even after the timer has expired and the task's context got cancelled. This alone is odd but fine, but it can affect other places in unexpected ways.

There's at least one place that determines whether the task hasn't yet hung by checking if the timer is still running:

if !tctx.watchdogTimer.Stop() {
	state.err = fmt.Errorf("task did not log for %v, assumed hung", WatchdogDelay)

Logging via TaskContext.Printf calls ResetWatchdog.

As a result some tasks end up having logging statements that are slightly load-bearing. If a Printf happens after a task has timed out and started to abort (via context cancellation), it'll at minimum cause the state.err = fmt.Errorf("task did not log for ..., assumed hung", ...) branch not to trigger. (Maybe that's the entire extent of the side-effects, I hadn't looked further.)

All this is to say the logic here can be reworked to be slightly less fragile, and this intersects a bit with #60443.

I spotted this while looking at the code some time ago. Filing a tracking issue so I can close some open windows, but it's low priority. CC @golang/release.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Buildersx/build issues (builders, bots, dashboards)NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.

    Type

    No type

    Projects

    Status

    Planned

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions