fix: install pi extension deps on user machine + add doctor command#46
Merged
Conversation
The release workflow ran `npm ci --omit=dev` on the Linux x64 GitHub runner and shipped the resulting node_modules to every user. The @anthropic-ai/claude-agent-sdk package selects its native binary via optionalDependencies, which only resolves correctly when npm runs on the target machine — so macOS users got the Linux x64 native binary and pi silently failed with "Native CLI binary for darwin-arm64 not found", producing empty assistant messages and $0 cost. Three changes: 1. release workflow: stop bundling node_modules in the pi-extension tarball. Ship source + package-lock.json only. 2. install.sh: run `npm ci --omit=dev --include=optional` on the user's machine after extracting the extension. Each user gets the right platform-native SDK binary. Adds a `need_cmd npm` check so missing npm fails fast with a clear error. 3. cmd/doctor.go: new `gsd-cloud doctor` command. Checks claude is installed and logged in, pi is on PATH, the extension is installed with a platform-native SDK binary, machine is paired, and daemon is running. Surfaces the missing-native-binary failure mode that motivated this fix. Updated post-install message to direct users to install Claude Code (if needed) and run `gsd-cloud doctor` for diagnostics. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughThe PR defers npm dependency installation from CI packaging to user install time to ensure platform-appropriate native binaries. A new Changes
Sequence DiagramsequenceDiagram
actor User
participant CLI as gsd-cloud doctor
participant FS as File System
participant Config as Config System
participant Daemon as Daemon Socket
User->>CLI: Invoke doctor command
CLI->>FS: Check claude binary available
FS-->>CLI: Found/Not found
CLI->>FS: Check pi binary runnable
FS-->>CLI: Runnable/Error
CLI->>FS: Verify pi-extension directory exists
FS-->>CLI: Exists/Missing
CLI->>Config: Load machine pairing config
Config-->>CLI: Config loaded or validation failed
CLI->>Daemon: Resolve socket path & query sockapi status
Daemon-->>CLI: Daemon healthy/Unhealthy
CLI->>User: Print per-check report or summary
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes Possibly related PRs
Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
glittercowboy
added a commit
that referenced
this pull request
Apr 27, 2026
#46 changed the release workflow to ship source + package-lock.json instead of bundled node_modules, and added npm ci to install.sh. The auto-updater (internal/update/) is a separate code path that wasn't updated, so daemons that auto-update from a pre-#46 version land in a state with no node_modules and pi exits with: Failed to load extension: Cannot find module '@anthropic-ai/claude-agent-sdk' Unknown provider "claude-cli" Mirror install.sh's behavior in installPiExtensionArchive: after extracting, run npm ci --omit=dev --include=optional in the staging directory before atomic-rename into place. Skip cleanly when no package-lock.json exists so older tarball formats and minimal test fixtures don't blow up. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
glittercowboy
added a commit
that referenced
this pull request
Apr 27, 2026
…ors (#51) * fix: pi extension self-heal + atomic service restart + clearer pi errors Three connected fixes for daemon update reliability that surfaced after v0.2.31's auto-update path produced broken pi extensions in the wild. ## Pi extension self-heal on daemon start PR #46 changed release tarballs to ship source-only and added `npm ci` to install.sh. PR #50 then added the same to the auto-updater. But daemons running v0.2.31 ran v0.2.31's broken updater code at update time, landing on v0.2.32+ with the new binary but no `node_modules/`. Pi tasks fail with: Failed to load extension: Cannot find module '@anthropic-ai/claude-agent-sdk' Unknown provider "claude-cli" `internal/update.EnsureExtensionHealthy(extDir)` checks for the platform- native @anthropic-ai/claude-agent-sdk-* binary and runs `npm ci` if absent. Idempotent: no-op when healthy, no-op when extension isn't installed at all. Called from `internal/loop/daemon.go` on every startup, so any stuck v0.2.31 user gets repaired automatically the next time the daemon starts (which launchd's KeepAlive=true triggers within seconds of the broken state). ## Atomic service restart `gsd-cloud update` (and restart, rollback) called `Stop()` then `Start()` which races with launchd's KeepAlive=true and systemd's auto-restart-on- failure. Adds `Restart()` to the service.Platform interface that uses platform-native atomic restart: - launchd: `kickstart -k` (kill if running, then start) - systemd: `systemctl restart` (works whether active or not) Updates `cmd/{update,restart,rollback}.go` to use it. login.go intentionally left alone — its stop/start happens around an interactive auth flow where the explicit pause makes sense. ## Clearer pi executor errors When pi exits with the "extension deps missing" signature, the daemon used to bubble up raw npm-shaped stderr to the relay/UI. Now it returns a human-readable error explaining how to repair (restart daemon for self-heal, or run `gsd-cloud doctor` for diagnostics). Other pi exit modes pass through unchanged. ## Tests - `EnsureExtensionHealthy` no-op on missing extension, no-op when healthy, rejects aggregate-only @anthropic-ai/claude-agent-sdk dir as proof of health (must have a platform-specific subdir with binary) - `piExitError` detects both error signatures, falls through to generic format for other failures - `fakePlatform` in cmd/login_test.go gets Restart() to satisfy interface All 18 daemon test packages green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(update): surface non-IsNotExist stat errors in EnsureExtensionHealthy Per CodeRabbit feedback on #51: previous code returned nil for any os.Stat failure on package.json, masking permission errors and similar transient issues as "extension not installed, nothing to repair." Now only IsNotExist silently no-ops; other failures surface so the caller can log them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Apr 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the empty-bubble / $0-cost bug for every macOS user. Adds
gsd-cloud doctorfor diagnostics.Root cause: the release workflow ran
npm ci --omit=devon the Linux x64 GitHub runner and shipped the resultingnode_modulesto every user. The@anthropic-ai/claude-agent-sdkpackage selects its native binary viaoptionalDependencies, which only resolves correctly when npm runs on the target machine — so macOS users got the Linux x64 native binary and pi silently failed with "Native CLI binary for darwin-arm64 not found", producing empty assistant messages and $0 cost.Fix:
release workflow stops bundling
node_modulesin the pi-extension tarball. Ships source +package-lock.jsononly.install.sh runs
npm ci --omit=dev --include=optionalon the user's machine after extracting the extension. Each user gets the right platform-native SDK binary. Adds aneed_cmd npmcheck so missing npm fails fast with a clear error.cmd/doctor.gonewgsd-cloud doctorcommand. Checks:Each check prints ✓/✗ with a fix hint. Exits non-zero if any required check fails.
Updated post-install message prompts the user to install Claude Code ("if you don't have it yet") and points to
gsd-cloud doctor.Test plan
go build ./...cleango test ./...all packages passgsd-cloud doctor, send a chat message, confirm non-zero token usageFollow-ups (not in this PR)
npm install -g @mariozechner/pi-coding-agent(228MB consideration — needs design)🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
gsd-cloud doctorcommand to perform system diagnostics, verifying prerequisites, authentication status, platform-specific binaries, and daemon connectivityImprovements