Skip to content

fix(e2e): run binary service tests with sudo -E pv#68

Merged
munezaclovis merged 4 commits intomainfrom
claude/issue-67-20260420-1925
Apr 20, 2026
Merged

fix(e2e): run binary service tests with sudo -E pv#68
munezaclovis merged 4 commits intomainfrom
claude/issue-67-20260420-1925

Conversation

@munezaclovis
Copy link
Copy Markdown
Contributor

Fixes #67

Previous E2E phases use sudo -E pv start, which causes cleanSitesDirs() to recreate ~/.pv/config/sites/ as root-owned (0755). When s3-binary.sh and mail-binary.sh then ran pv start without sudo, the non-root daemon called os.RemoveAll on the root-owned sites dir — which failed with permission denied — causing GenerateAllConfigs to abort, Start() to exit, and the PID file to be cleaned up. server.IsRunning() then returned false, service:add s3/mail printed "daemon not running", and the stale daemon-status.json from Phase 19 (empty supervised) was what the test inspected.

Use sudo -E pv for all daemon and service commands in both binary-service scripts, consistent with every other E2E phase that needs a live daemon. Also increase the post-start sleep from 3 s to 8 s to match the pattern used in start-curl.sh / verify-final.sh.

Generated with Claude Code

Previous E2E phases (start-curl, verify-final) use `sudo -E pv start`,
which causes `cleanSitesDirs()` to recreate `~/.pv/config/sites/` as
root-owned (0755). When s3-binary.sh and mail-binary.sh then ran `pv
start` without sudo, the non-root daemon called `os.RemoveAll` on the
root-owned sites dir — which failed (no write permission inside it) —
causing `GenerateAllConfigs` to abort, `Start()` to exit, and the PID
file to be cleaned up. `server.IsRunning()` then returned false,
`service:add s3/mail` printed "daemon not running", and the stale
daemon-status.json from Phase 19 (empty supervised) was what the test
inspected.

Fix: use `sudo -E pv` for all daemon and service commands in both
binary-service scripts, consistent with every other E2E phase that
needs a live daemon. Also increase the post-start sleep from 3 s to
8 s to match the pattern used in start-curl.sh / verify-final.sh
(FrankenPHP can take up to 5 s to pass its health check).

Co-authored-by: Clovis <munezaclovis@users.noreply.github.com>
@munezaclovis
Copy link
Copy Markdown
Contributor Author

@claude CI still failing. plz check

@claude
Copy link
Copy Markdown

claude Bot commented Apr 20, 2026

Claude finished @munezaclovis's task in 4m 17s —— View job


CI Fix Pushed

  • Check CI status and identify failing jobs
  • Read failing logs — E2E — S3 binary service lifecycle fails at step 24
  • Diagnose root cause
  • Fix and push

Root Cause

service:add s3 calls server.SignalDaemon() which sends SIGHUP asynchronously — it returns immediately, before the daemon has run Reconcile() and written the updated daemon-status.json. The test then checked the file immediately and found supervised: {} (the initial snapshot from pv start):

FAIL: daemon-status.json does not contain rustfs entry
{ "pid": 7451, "supervised": {} }

Fix (commit 66cc096)

Replaced the immediate grep check with a 20-iteration / 1 s retry loop in both s3-binary.sh and mail-binary.sh — the same pattern already used for the port-reachability checks in those scripts:

for i in $(seq 1 20); do
    if grep -q '"rustfs"' "$HOME/.pv/daemon-status.json" 2>/dev/null; then break; fi
    sleep 1
done
grep -q '"rustfs"' "$HOME/.pv/daemon-status.json" 2>/dev/null || { ... }

This gives the daemon up to 20 s to finish its SIGHUP-triggered reconcile and flush the updated snapshot before the assertion fires.
| Branch

SignalDaemon() sends SIGHUP asynchronously — the daemon's reconcile loop
(which writes daemon-status.json with supervised entries) runs concurrently
with service:add returning. Checking the file immediately races against the
reconcile. Add a 20-iteration/1s retry loop matching the pattern used for
port-reachability checks in both binary-service scripts.

Co-authored-by: Clovis <munezaclovis@users.noreply.github.com>
@munezaclovis
Copy link
Copy Markdown
Contributor Author

@claude progress! we have a new error

Run scripts/e2e/start-curl.sh
          
   ERROR  
          
  Pv is already running (PID file exists and process is alive).                                                       

==> pv status

  pv vdev

  ● Stopped

  TLD  .test
  DNS  127.0.0.1:10053  HTTPS  :443
  PHP  8.3 · 8.4 (default)

  Sites  5 linked

  ╭──────────────────────────┬────────────────┬─────╮
  │Site                      │Type            │PHP  │
  ├──────────────────────────┼────────────────┼─────┤
  │ https://e2e-static.test/  │ static         │ 8.4 │
  │ https://e2e-php.test/     │ php            │ 8.4 │
  │ https://e2e-laravel.test/ │ laravel        │ 8.4 │
  │ https://e2e-octane.test/  │ laravel-octane │ 8.4 │
  │ https://e2e-php83.test/   │ php            │ 8.3 │
  ╰──────────────────────────┴────────────────┴─────╯
FAIL: server not running
Error: Process completed with exit code 1.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 20, 2026

Claude finished @munezaclovis's task in 3m 59s —— View job


Fix Pushed

  • Check CI status and identify failing run
  • Download and analyze failure logs
  • Diagnose root cause
  • Fix and push (commit 348275e)

Root Cause

pv install auto-starts the daemon immediately (log line 373: ✓ Daemon installed (starts automatically on login) at 20:05:27). The daemon runs as the runner user and writes a PID file.

start-curl.sh called pv daemon:disable 2>/dev/null || true — this unregisters the launchd service but the daemon process doesn't exit synchronously. FrankenPHP + DNS need time to shut down. The sleep 1 wasn't enough; sudo -E pv start found a live PID and returned "already running".

Fix

Added sudo -E pv stop >/dev/null 2>&1 || true between daemon:disable and pv start in start-curl.sh. The stop command sends SIGTERM and polls for up to 5 s (SIGKILL fallback), so by the time it returns the process is gone and the PID file is cleaned up.

 # Disable daemon if auto-enabled during install — e2e tests use foreground mode with sudo.
 pv daemon:disable 2>/dev/null || true
+# pv install starts the daemon immediately; stop it so the PID file is gone before we start fresh.
+sudo -E pv stop >/dev/null 2>&1 || true
 sleep 1
 
 sudo -E pv start &

| Branch

pv install auto-starts the daemon; daemon:disable unregisters the
launchd service but the process stays alive until FrankenPHP and DNS
finish shutting down. The subsequent sudo pv start saw a live PID and
failed. Add an explicit pv stop (waits up to 5 s with SIGTERM/SIGKILL)
between daemon:disable and the foreground start.

Co-authored-by: Clovis <munezaclovis@users.noreply.github.com>
@munezaclovis
Copy link
Copy Markdown
Contributor Author

@claude Next:

Run scripts/e2e/mail-binary.sh
==> Phase: Mail binary service (Mailpit) lifecycle
FAIL: pv link for env test
Error: Process completed with exit code 1.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 20, 2026

Claude finished @munezaclovis's task in 1m 4s —— View job


Fix Pushed

  • Check CI failure — E2E — Mail binary service lifecycle fails at step 25
  • Read mail-binary.sh to diagnose
  • Fix and push (commit d6d0b75)

Root Cause

mail-binary.sh starts the daemon with sudo -E pv start, which creates root-owned config dirs (~/.pv/config/sites/). The pv link call on line 26 ran without sudo, so the non-root process couldn't write to the root-owned registry — the command failed silently and triggered FAIL: pv link for env test.

The cleanup trap already correctly used sudo -E pv unlink, but the forward link call was missing sudo -E.

Fix

-pv link "$ENVTEST_DIR" --name e2e-mail-env >/dev/null 2>&1 || { echo "FAIL: pv link for env test"; exit 1; }
+sudo -E pv link "$ENVTEST_DIR" --name e2e-mail-env >/dev/null 2>&1 || { echo "FAIL: pv link for env test"; exit 1; }

| Branch

Root-owned config dirs (created by sudo -E pv start) block the non-root
pv link call. Consistent with the sudo -E pv unlink already in cleanup.

Co-authored-by: Clovis <munezaclovis@users.noreply.github.com>
@munezaclovis munezaclovis merged commit 649c868 into main Apr 20, 2026
1 check failed
@munezaclovis munezaclovis deleted the claude/issue-67-20260420-1925 branch April 22, 2026 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2e2 CI failing

1 participant