Skip to content

fix starlark goroutine leak in exec() with context.AfterFunc#81

Merged
robbyt merged 3 commits into
mainfrom
rterhaar/starlark-goroutine-leak
May 4, 2026
Merged

fix starlark goroutine leak in exec() with context.AfterFunc#81
robbyt merged 3 commits into
mainfrom
rterhaar/starlark-goroutine-leak

Conversation

@robbyt
Copy link
Copy Markdown
Owner

@robbyt robbyt commented May 4, 2026

Summary

Every Eval() call on the Starlark evaluator spawned a goroutine that blocked on <-ctx.Done() to interrupt the running thread on cancellation. With non-cancellable contexts (e.g. context.Background()), those goroutines never exited and accumulated unboundedly.

This swaps the bare goroutine for context.AfterFunc, which registers a callback and returns a stop function. We defer stop() so the callback is unregistered as soon as the script completes — no leak, regardless of whether the context is cancellable.

Changes

  • engines/starlark/evaluator/evaluator.go — replace the goroutine in exec() with context.AfterFunc(ctx, thread.Cancel) plus defer stop().
  • engines/starlark/evaluator/evaluator_test.go — new TestEval_NoGoroutineLeak regression test: runs 100 evaluations with a non-cancellable context and asserts the goroutine count doesn't grow.

Test plan

  • go test -race ./engines/starlark/... passes locally
  • CI green

Every Eval() call spawned a goroutine blocking on <-ctx.Done(). With
non-cancellable contexts (e.g., context.Background()), these goroutines
leaked unboundedly. Replace with context.AfterFunc which registers a
callback and returns a stop function, avoiding the leak.
Copilot AI review requested due to automatic review settings May 4, 2026 05:53
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a goroutine leak in the Starlark evaluator’s exec() cancellation handling by replacing a per-evaluation goroutine waiting on ctx.Done() with context.AfterFunc, and adds a regression test intended to ensure goroutine counts don’t grow across many Eval() calls.

Changes:

  • Replace the go func { <-ctx.Done(); thread.Cancel(...) } pattern with context.AfterFunc(...); defer stop() in exec().
  • Add a regression test that performs repeated Eval() calls and checks goroutine-count growth stays bounded.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
engines/starlark/evaluator/evaluator.go Uses context.AfterFunc + defer stop() to prevent unbounded goroutine accumulation when contexts never cancel.
engines/starlark/evaluator/evaluator_test.go Adds a goroutine-leak regression test around repeated Eval() calls.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread engines/starlark/evaluator/evaluator_test.go Outdated
Comment thread engines/starlark/evaluator/evaluator_test.go Outdated
robbyt added 2 commits May 4, 2026 02:01
drop t.Parallel() since runtime.NumGoroutine() reads a process-wide
counter and parallel tests perturb the baseline. switch the loop's
context back to context.Background() so ctx.Done() is nil — the
exact scenario the bare-goroutine bug couldn't escape.
the previous commit reverted t.Context() back to context.Background()
to actually exercise the Done()==nil scenario, but golangci-lint's
usetesting rule flagged it. add a targeted nolint with a pointer to
the rationale comment immediately above.
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented May 4, 2026

@robbyt robbyt merged commit a9579e1 into main May 4, 2026
3 checks passed
@robbyt robbyt deleted the rterhaar/starlark-goroutine-leak branch May 4, 2026 06:10
robbyt added a commit that referenced this pull request May 14, 2026
)

Closes #125

PR #81 added `context.AfterFunc(ctx, thread.Cancel)` to Starlark's
exec() to fix a goroutine leak, but no test asserted that
**cancellation actually halts execution within a bounded time**. Same
gap existed for Risor — the Risor v2 VM checks ctx.Done() at
DefaultContextCheckInterval (1000 instructions) and via a background
goroutine, but nothing locked the behavior in. Extism is already
covered by the execHelper cancellation cases added in PR #121.

Add TestEval_CancellationHaltsExecution to both Risor and Starlark
evaluator test files. The script body is a long-running loop sized so
natural completion would far outrun the 2-second test deadline; only
ctx cancellation can return Eval early.

  - Risor: Risor v2 has no for/while statements (it's a functional
    language with .map/.filter/.each higher-order methods), so the
    long-running construct is `range(1e9).each(x => x)`. The VM's
    periodic ctx-check inside .each's callable.Call propagates
    cancellation.
  - Starlark: lazy `range(1e12)` with a Python-style for-loop. The
    AfterFunc-registered thread.Cancel halts at the next instruction
    boundary.

Both tests:
  1. Launch Eval in a goroutine.
  2. Sleep 50ms so the engine is observably mid-script.
  3. Cancel the context.
  4. Assert Eval returns within 2s with a cancellation-shaped error
     (errors.Is(context.Canceled) OR "context canceled" / "cancel"
     substring — Starlark's thread.Cancel wraps the reason in an
     EvalError that doesn't unwrap to context.Canceled).

Stress-tested locally with `-count=20 -race`: 40 runs, no flakes,
~2 seconds total wall time.

Out of scope: Eval-halts-on-deadline (context.WithTimeout); same
code path as cancel.

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants