Skip to content

Stronger session isolation and faster E2E tests with parallelization#374

Merged
canercidam merged 11 commits intomainfrom
caner/parallel-e2e
Feb 18, 2026
Merged

Stronger session isolation and faster E2E tests with parallelization#374
canercidam merged 11 commits intomainfrom
caner/parallel-e2e

Conversation

@canercidam
Copy link
Member

@canercidam canercidam commented Feb 18, 2026

  • Port selection in sessions now have an offset of 5 (per port selected) like port = port + concurrent_sessions × 5
  • Session directories are now isolated by their own names instead of defaulting to a devnet directory:
    • The main playground dir is still like $HOME/.local/state/builder-playground
    • The new session paths are like <playground-dir>/sessions/<session-id>
    • There are two new symlinks:
      • <playground-dir>/devnet (keeping old experience to access to the latest dir)
      • <playground-dir>/sessions/latest (new symlink for keeping reference to the latest dir)
  • The docker bridge networks used for each session is now isolated by name as well, like builder-playground-<session-id>
  • Creating new E2E test suite which can:
    • run the playground binary to start a session per test case (in parallel)
    • apply side effects to the services of sessions
    • make checks on the state of the running services
  • The component/integration tests are abandoned in favor of the more complete E2E tests

With these changes, now it is also possible to start multiple sessions on a machine for testing and connect services of different sessions.

Fixes #333

@claude
Copy link
Contributor

claude bot commented Feb 18, 2026

Claude finished @canercidam's task in 2m 39s —— View job


PR Review: Stronger session isolation and faster E2E tests

  • Read and analyze all changed files
  • Review code quality and patterns
  • Check for bugs, edge cases, and security issues
  • Post review feedback with inline comments

Summary

The PR introduces session isolation (ports, directories, networks) and replaces integration tests with a new E2E test framework. The overall direction is good, but there are several issues worth addressing.

Issues Found

Bugs / Correctness:

  1. cmdErrChan consumed in waitForOutput, unavailable to waitForReady (e2e_test.go:152-158): The channel is read in waitForOutput to detect early process exit. Once consumed, waitForReady can no longer detect if the process exited, potentially hanging until timeout.

  2. resp.StatusCode only checked for last endpoint (recipes_test.go:130): The TestBuilderNet test checks status code outside the loop, only validating the final endpoint. Also, defer resp.Body.Close() inside a loop defers all closes until function return, leaking connections during the loop.

Race Conditions / Reliability:

  1. TOCTOU race in CountConcurrentPlaygroundSessions (process.go:15): The ps-based count is a point-in-time snapshot. Two sessions starting simultaneously both see count=0 and get the same offset, defeating the isolation goal.

  2. portOffsetMultiplier = 5 may be too small (local_runner.go:186): A recipe like buildernet has 6+ services, each with 1-4 ports. With an offset of only 5 per concurrent session, port ranges overlap quickly. The reservePort probing provides a fallback, but it undermines the deterministic offset intent.

  3. time.Now().UnixNano() for test dir uniqueness (e2e_test.go:82): Can collide on VMs with coarse clock resolution. t.TempDir() would be both unique and auto-cleaned.

Code Quality:

  1. Fragile process matching (process.go:36): Simple substring matching on ps output can produce false positives (e.g., editing a file whose path contains "builder-playground" and "start").

  2. startupMu serialization pattern (e2e_test.go:24,112-136): The mutex serializes all playground startups but is released as soon as "Waiting for services to get healthy" appears. This is a pragmatic workaround for port conflicts, but it's worth a comment explaining why full parallelism isn't safe (the ps-based counting + offset).

Minor Observations

  • The CI workflow (checks.yaml) was significantly simplified, which is a positive change. The separation of unit tests (go test ./playground/...) from E2E tests (make e2e-test) is clean.
  • The Makefile E2E target correctly builds the binary before running tests.
  • Session directory isolation via sessionID in NewOutput and network naming via builder-playground-<session-id> are straightforward and well-implemented.
  • The deleted components_test.go (343 lines) contained integration tests that are now covered by the E2E suite with better isolation. Good tradeoff.

@claude
Copy link
Contributor

claude bot commented Feb 18, 2026

Claude finished @canercidam's task in 2m 53s —— View job


PR Review (Re-review after updates)

  • Analyze new commits since last review
  • Review all changed files for bugs, security, and code quality
  • Post inline comments on issues found
  • Provide summary feedback

Changes Since Last Review

The new commits (c794643 through b44871b) address several issues from the initial review:

Fixed:

  • cmdErrChan consumption bug — Replaced with processCtx/processErr mutex pattern. waitForOutput and waitForReady now both use p.processCtx.Done() to detect process exit. Clean fix.
  • resp.StatusCode only checked for last endpoint — Status code is now asserted inside the loop per endpoint.
  • defer resp.Body.Close() in loop — Changed to explicit resp.Body.Close() after processing each response.
  • time.Now().UnixNano() for dir uniqueness — Now uses t.TempDir().
  • Session directory isolation — Session dirs are now under sessions/<sessionID> with latest and devnet symlinks for backward compatibility.

Still Present (3 inline comments posted):

  1. Fragile process matching (utils/process.go:34-38) — Substring matching on ps output can produce false positives. The test suite also lacks false-positive test cases. Suggested tightening the matching to verify start/cook appears as a distinct argument token.

  2. TOCTOU race in CountConcurrentPlaygroundSessions (utils/process.go:13-20) — Two sessions starting simultaneously can both see count=0 and get the same offset. The E2E startupMu mitigates this in tests, but production usage has no coordination. The reservePort probe provides a safety net, but the deterministic offset intent is undermined.

  3. portOffsetMultiplier = 5 is too small (local_runner.go:43) — A recipe like buildernet has 6+ services with multiple ports each. With only a 5-port offset between concurrent sessions, ranges overlap quickly. Consider increasing to ~100.

Other Observations

  • The E2E test framework is well-structured. The playgroundInstance abstraction with startPlayground/runPlayground/waitForReady is clean.
  • The startupMu serialization in launchPlayground is a pragmatic workaround. The comment at line 22 (ensures only one playground starts at a time) is minimal — it would benefit from explaining why (the ps-based counting needs the startup phase to be serialized).
  • CI workflow simplification (separating unit tests from E2E) is a good structural change.
  • The cleanup() method correctly handles t.TempDir() output dirs — the redundant os.RemoveAll at line 109 is harmless since t.TempDir() auto-cleans, but could be removed.

@claude
Copy link
Contributor

claude bot commented Feb 18, 2026

Claude finished @canercidam's task in 3m 22s —— View job


PR Review (3rd pass — after latest push)

  • Read and analyze all changed files
  • Check previous review comments and author responses
  • Review for bugs, edge cases, and code quality issues
  • Post review feedback with inline comments

Previous Issues — Status

Most issues from the first two reviews have been addressed:

Issue Status
cmdErrChan consumed by waitForOutput Fixed — replaced with processCtx/processErr mutex pattern
resp.StatusCode only checked for last endpoint Fixed — assertion moved inside the loop
defer resp.Body.Close() in loop Fixed — explicit Close() after each response
time.Now().UnixNano() for dir uniqueness Fixed — uses t.TempDir()
portOffsetMultiplier = 5 too small Resolved — author clarified the offset applies per-port, not per-session range. This is reasonable.
TOCTOU race in CountConcurrentPlaygroundSessions Acknowledged — author accepts the limitation
Fragile process matching Still present — re-flagged with suggestion to add false-positive test cases

New Issues (3 inline comments)

  1. lineBuffer.Contains checks Write buffers individually (e2e/e2e_test.go:35) — If a target string spans two Write calls (due to OS stdout buffering), the match fails. Low risk in practice, but easily fixable by checking the concatenated output.

  2. Redundant os.RemoveAll after t.TempDir() (e2e/e2e_test.go:109) — t.TempDir() auto-cleans on test completion. The explicit removal is unnecessary.

  3. Fragile process matching (utils/process.go:34-37) — Re-flagged. The test suite lacks false-positive test cases to document the known limitation.

Overall

The PR is in good shape after the revisions. The session isolation (directories, networks, ports) is well-implemented. The E2E test framework with playgroundInstance is clean and the startupMu serialization is a pragmatic approach. The remaining inline comments are minor improvements rather than blockers.


@canercidam canercidam merged commit 10b23bc into main Feb 18, 2026
16 checks passed
@canercidam canercidam deleted the caner/parallel-e2e branch February 18, 2026 22:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Turn component tests into full E2E

1 participant