feat(core): per-port lockfile with stale-lock reclaim (PER-7855 phase 2/3)#2197
Closed
Shivanshu-07 wants to merge 1 commit intomasterfrom
Closed
feat(core): per-port lockfile with stale-lock reclaim (PER-7855 phase 2/3)#2197Shivanshu-07 wants to merge 1 commit intomasterfrom
Shivanshu-07 wants to merge 1 commit intomasterfrom
Conversation
Phase 2 of PER-7855 CLI QoS hardening — short-circuit "Percy already
running" at command entry instead of failing late and noisily with
EADDRINUSE on `server.listen()`.
New module `core/src/lock.js`:
- `acquireLock({port})` writes `~/.percy/agent-<port>.lock` atomically
via `wx`. Payload is `{pid, port, startedAt}`; mode `0o600` on the
file, `0o700` on the parent dir.
- `LockHeldError` carries `{meta, lockPath}` so the refusal message
can name the live pid + lock path for manual cleanup.
- Stale-lock reclaim: `process.kill(pid, 0)` liveness probe; ESRCH
treated as dead, EPERM as alive-but-foreign. A self-pid lock (left
over by an earlier in-process invocation) is reclaimed without
consulting `process.kill` — we cannot conflict with ourselves.
- Reclaim is unlink + retry-`wx`, NOT rename-based: Windows CI is
pinned to Node 14 (`.github/workflows/windows.yml:15`), where
`fs.renameSync` over an existing target is unreliable.
`Percy.start()`:
- Acquires the lock as the first step inside `try {` (before
monitoring, proxy detection, queue starts), so a held-lock fails
fast.
- Registers a one-shot `process.on('exit')` synchronous unlink as
last-chance cleanup if the process exits without a normal `stop()`.
Phase 3 will replace this with a signal-driven drain.
`Percy.stop()`:
- Releases the lock in the `finally` block, alongside monitoring
teardown. Idempotent: re-running release on an already-released
handle is a no-op.
Backwards compatibility: when the lock is held, the start() catch maps
`LockHeldError` to the legacy "Percy is already running or the port X
is in use" message string (downstream tooling may grep for it) AND
also logs the actionable detail (live pid, lockfile path) via
`log.error` so users can recover.
Test infrastructure (`core/test/helpers/index.js`):
- Added `~/.percy/agent-*` to the mockfs `$bypass` list so lock files
go through the real fs rather than the in-memory mock. Files are
cleaned by `Percy.stop()`'s release path; the self-pid stale
optimization handles same-process collisions during sequential
Jasmine runs.
Tests added: 13 unit specs (`core/test/unit/lock.test.js`) covering
SC3 stale reclaim, SC4 live-foreign refusal, SC5 multi-port,
EPERM-as-alive, corrupt-payload recovery, mkdir-p, mode bits on POSIX,
release idempotency, re-acquire after release.
Origin: docs/brainstorms/2026-04-24-per-7855-cli-qos-hardening-requirements.md
Plan: docs/plans/2026-04-27-001-feat-per-7855-cli-qos-hardening-plan.md
Phase 1: commit e135e9a (network refactors + redaction + hint)
Phase 3 next: signal drain + unhandled-rejection handlers (PER-7855)
Co-Authored-By: Claude Opus 4.7 (1M context, extended thinking) <noreply@anthropic.com>
| } | ||
|
|
||
| export function lockPathFor(port) { | ||
| return join(os.homedir(), '.percy', `agent-${port}.lock`); |
This was referenced Apr 27, 2026
feat(core): SIGINT/SIGTERM graceful drain + unhandled-rejection redaction (PER-7855 phase 3/3)
#2198
Closed
Contributor
Author
|
Closing in favor of consolidated PR #2199, which contains all three commits (the same content) so review can happen against a single diff. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 2 of PER-7855. Independent of Phase 1 (#2196) — different files, different concern. Both can land in either order.
Today, after a Percy crash, the next `percy start` on the same port fails late and noisily with `EADDRINUSE` once `server.listen()` runs. This PR adds an upfront per-port lockfile with stale-lock reclaim so the second invocation refuses cleanly with an actionable message naming the live pid and lock-file path.
What's new
Backwards compatibility
When the lock is held, the `start()` catch maps `LockHeldError` to the legacy "Percy is already running or the port \$port is in use" message string (downstream tooling may grep for it) AND also `log.error`s the actionable detail (live pid, lockfile path) so users can recover. The existing test at `Percy #start() throws when the port is in use` continues to pass unchanged.
Tests
Test infra change
`core/test/helpers/index.js` — added `/.percy/agent-` to the mockfs `$bypass` list so lock files go through the real fs rather than the in-memory mock. Files are cleaned by `Percy.stop()`'s release path; the self-pid stale optimization in `lock.js` handles same-process collisions during sequential Jasmine runs.
Test run on this branch: 697 specs, 27 pre-existing failures (same 21 `Unit / Install Chromium` + 5 `runDoctorOnFailure` + 1 `API Server when the server is disabled` as on master). All 13 new Lock tests pass; all Percy tests that exercise start/stop continue to pass.
Test plan
process.kill(pid, 0)andwx+unlink reclaimRisks
Post-Deploy Monitoring & Validation
Origin / Plan
🤖 Generated with Claude Opus 4.7 (1M context, extended thinking) via Claude Code