Add worker version detection and on-demand update#79
Conversation
- Worker health endpoint now returns version info - Workspace get API includes workerVersion for running workspaces - New updateWorker API endpoint to update worker binary in a workspace - copyPerryWorker prefers installed binary (~/.perry/bin/perry) over dist This enables UI to detect version mismatch between host and workers, and provides a button to update outdated workers on demand. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
| ['sh', '-c', 'pkill -f "perry worker serve" || true'], | ||
| { user: 'workspace' } | ||
| ); | ||
|
|
||
| await this.copyPerryWorker(containerName); | ||
| await this.startWorkerServer(containerName); | ||
| } |
There was a problem hiding this comment.
Bug: The updateWorkerBinary function has a race condition. It kills the old worker but doesn't wait for it to terminate, which can prevent the new worker from starting correctly.
Severity: CRITICAL
🔍 Detailed Analysis
A race condition exists in the updateWorkerBinary function. The function calls pkill to terminate the existing worker process but does not wait for the process to fully shut down and release its port. This leads to two potential failure modes. First, if the old process is still shutting down but its health endpoint is responsive, startWorkerServer will detect a running worker and return early, never starting the new binary. Second, if the health check fails, the attempt to start the new worker may fail with an EADDRINUSE error if the port is still held by the dying process. This failure is silent because the startup command is backgrounded with &, preventing error propagation.
💡 Suggested Fix
Introduce a synchronization mechanism after killing the old worker process to ensure it has fully terminated and released the port before attempting to start the new one. This could be a short sleep, or a more robust solution like polling to confirm the port is free or the process ID is gone. Additionally, avoid backgrounding the startup command with & within execInContainer to ensure that startup errors like EADDRINUSE are properly propagated and handled, rather than failing silently.
🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: src/workspace/manager.ts#L312-L318
Potential issue: A race condition exists in the `updateWorkerBinary` function. The
function calls `pkill` to terminate the existing worker process but does not wait for
the process to fully shut down and release its port. This leads to two potential failure
modes. First, if the old process is still shutting down but its health endpoint is
responsive, `startWorkerServer` will detect a running worker and return early, never
starting the new binary. Second, if the health check fails, the attempt to start the new
worker may fail with an `EADDRINUSE` error if the port is still held by the dying
process. This failure is silent because the startup command is backgrounded with `&`,
preventing error propagation.
Did we get this right? 👍 / 👎 to inform future reviews.
Reference ID: 8435010
Summary
workerVersionfor running workspacesupdateWorkerAPI endpoint to update worker binary in a workspacecopyPerryWorkerprefers installed binary (~/.perry/bin/perry) over distHow it works
/healthendpoint returns{ status: 'ok', version: '0.3.7', sessionCount: 5 }workerVersionhostVersion(from/info) withworkerVersionworkspaces.updateWorker({ name })which:Test plan
GET /rpc/workspaces/getreturnsworkerVersioncurl http://<ip>:7392/health🤖 Generated with Claude Code