Skip to content

Add worker version detection and on-demand update#79

Merged
gricha merged 1 commit intomainfrom
worker-version-updates
Jan 10, 2026
Merged

Add worker version detection and on-demand update#79
gricha merged 1 commit intomainfrom
worker-version-updates

Conversation

@gricha
Copy link
Copy Markdown
Owner

@gricha gricha commented Jan 10, 2026

Summary

  • Worker health endpoint now returns version info
  • Workspace get API includes workerVersion for running workspaces
  • New updateWorker API endpoint to update worker binary in a workspace
  • copyPerryWorker prefers installed binary (~/.perry/bin/perry) over dist

How it works

  1. Worker /health endpoint returns { status: 'ok', version: '0.3.7', sessionCount: 5 }
  2. When fetching workspace details, API queries worker health and includes workerVersion
  3. UI can compare hostVersion (from /info) with workerVersion
  4. If versions differ, UI shows "Update available" nudge with button
  5. Button calls workspaces.updateWorker({ name }) which:
    • Kills existing worker process
    • Copies new binary from host
    • Restarts worker server

Test plan

  • Start workspace, verify GET /rpc/workspaces/get returns workerVersion
  • Verify worker health endpoint returns version: curl http://<ip>:7392/health
  • Test updateWorker endpoint updates binary and restarts worker

🤖 Generated with Claude Code

- Worker health endpoint now returns version info
- Workspace get API includes workerVersion for running workspaces
- New updateWorker API endpoint to update worker binary in a workspace
- copyPerryWorker prefers installed binary (~/.perry/bin/perry) over dist

This enables UI to detect version mismatch between host and workers,
and provides a button to update outdated workers on demand.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Comment thread src/workspace/manager.ts
Comment on lines +312 to +318
['sh', '-c', 'pkill -f "perry worker serve" || true'],
{ user: 'workspace' }
);

await this.copyPerryWorker(containerName);
await this.startWorkerServer(containerName);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The updateWorkerBinary function has a race condition. It kills the old worker but doesn't wait for it to terminate, which can prevent the new worker from starting correctly.
Severity: CRITICAL

🔍 Detailed Analysis

A race condition exists in the updateWorkerBinary function. The function calls pkill to terminate the existing worker process but does not wait for the process to fully shut down and release its port. This leads to two potential failure modes. First, if the old process is still shutting down but its health endpoint is responsive, startWorkerServer will detect a running worker and return early, never starting the new binary. Second, if the health check fails, the attempt to start the new worker may fail with an EADDRINUSE error if the port is still held by the dying process. This failure is silent because the startup command is backgrounded with &, preventing error propagation.

💡 Suggested Fix

Introduce a synchronization mechanism after killing the old worker process to ensure it has fully terminated and released the port before attempting to start the new one. This could be a short sleep, or a more robust solution like polling to confirm the port is free or the process ID is gone. Additionally, avoid backgrounding the startup command with & within execInContainer to ensure that startup errors like EADDRINUSE are properly propagated and handled, rather than failing silently.

🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: src/workspace/manager.ts#L312-L318

Potential issue: A race condition exists in the `updateWorkerBinary` function. The
function calls `pkill` to terminate the existing worker process but does not wait for
the process to fully shut down and release its port. This leads to two potential failure
modes. First, if the old process is still shutting down but its health endpoint is
responsive, `startWorkerServer` will detect a running worker and return early, never
starting the new binary. Second, if the health check fails, the attempt to start the new
worker may fail with an `EADDRINUSE` error if the port is still held by the dying
process. This failure is silent because the startup command is backgrounded with `&`,
preventing error propagation.

Did we get this right? 👍 / 👎 to inform future reviews.
Reference ID: 8435010

@gricha gricha merged commit 1d08258 into main Jan 10, 2026
8 checks passed
@gricha gricha deleted the worker-version-updates branch January 10, 2026 17:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant