Skip to content

fix: install pi extension deps on user machine + add doctor command#46

Merged
glittercowboy merged 1 commit into
mainfrom
feat/install-bundle-and-doctor
Apr 27, 2026
Merged

fix: install pi extension deps on user machine + add doctor command#46
glittercowboy merged 1 commit into
mainfrom
feat/install-bundle-and-doctor

Conversation

@glittercowboy
Copy link
Copy Markdown
Contributor

@glittercowboy glittercowboy commented Apr 27, 2026

Summary

Fixes the empty-bubble / $0-cost bug for every macOS user. Adds gsd-cloud doctor for diagnostics.

Root cause: the release workflow ran npm ci --omit=dev on the Linux x64 GitHub runner and shipped the resulting node_modules to every user. The @anthropic-ai/claude-agent-sdk package selects its native binary via optionalDependencies, which only resolves correctly when npm runs on the target machine — so macOS users got the Linux x64 native binary and pi silently failed with "Native CLI binary for darwin-arm64 not found", producing empty assistant messages and $0 cost.

Fix:

  1. release workflow stops bundling node_modules in the pi-extension tarball. Ships source + package-lock.json only.

  2. install.sh runs npm ci --omit=dev --include=optional on the user's machine after extracting the extension. Each user gets the right platform-native SDK binary. Adds a need_cmd npm check so missing npm fails fast with a clear error.

  3. cmd/doctor.go new gsd-cloud doctor command. Checks:

    • claude binary on PATH (with --version)
    • claude is logged in (spawns a trivial prompt, detects "Not logged in")
    • pi binary on PATH
    • pi extension installed with a platform-native claude SDK binary in node_modules
    • machine is paired (config.json present and complete)
    • daemon is running (socket reachable, relay connected)

    Each check prints ✓/✗ with a fix hint. Exits non-zero if any required check fails.

Updated post-install message prompts the user to install Claude Code ("if you don't have it yet") and points to gsd-cloud doctor.

Test plan

  • go build ./... clean
  • go test ./... all packages pass
  • install.sh diff reviewed manually (shellcheck-clean structure, atomic install preserved)
  • After merge: cut a release, install on a clean macOS arm64 machine, run gsd-cloud doctor, send a chat message, confirm non-zero token usage

Follow-ups (not in this PR)

  • Bundle pi CLI itself into the tarball so users don't need a separate npm install -g @mariozechner/pi-coding-agent (228MB consideration — needs design)
  • Add post-install CI smoke test that spawns the daemon and asserts a real assistant response (catches this bug class in CI before users)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added gsd-cloud doctor command to perform system diagnostics, verifying prerequisites, authentication status, platform-specific binaries, and daemon connectivity
  • Improvements

    • Installer now automatically installs required platform-specific dependencies for the extension; updated setup documentation with additional configuration requirements

The release workflow ran `npm ci --omit=dev` on the Linux x64 GitHub
runner and shipped the resulting node_modules to every user. The
@anthropic-ai/claude-agent-sdk package selects its native binary via
optionalDependencies, which only resolves correctly when npm runs on
the target machine — so macOS users got the Linux x64 native binary
and pi silently failed with "Native CLI binary for darwin-arm64 not
found", producing empty assistant messages and $0 cost.

Three changes:

1. release workflow: stop bundling node_modules in the pi-extension
   tarball. Ship source + package-lock.json only.
2. install.sh: run `npm ci --omit=dev --include=optional` on the user's
   machine after extracting the extension. Each user gets the right
   platform-native SDK binary. Adds a `need_cmd npm` check so missing
   npm fails fast with a clear error.
3. cmd/doctor.go: new `gsd-cloud doctor` command. Checks claude is
   installed and logged in, pi is on PATH, the extension is installed
   with a platform-native SDK binary, machine is paired, and daemon is
   running. Surfaces the missing-native-binary failure mode that
   motivated this fix.

Updated post-install message to direct users to install Claude Code
(if needed) and run `gsd-cloud doctor` for diagnostics.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@glittercowboy glittercowboy enabled auto-merge (squash) April 27, 2026 03:32
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 96c718e5-3a09-4215-8f0e-07dfaf7e4cfa

📥 Commits

Reviewing files that changed from the base of the PR and between e79a147 and 13368ca.

📒 Files selected for processing (3)
  • .github/workflows/release-daemon.yml
  • cmd/doctor.go
  • scripts/install.sh

📝 Walkthrough

Walkthrough

The PR defers npm dependency installation from CI packaging to user install time to ensure platform-appropriate native binaries. A new doctor CLI command validates runtime component availability and configuration. The installer script now executes npm ci for platform-specific dependencies and updates post-installation instructions.

Changes

Cohort / File(s) Summary
Installation Pipeline
.github/workflows/release-daemon.yml, scripts/install.sh
Workflow now excludes node_modules from tarball and defers dependency installation. Installer runs npm ci --omit=dev --include=optional in pi-extension directory and adds npm as required command; updates next steps output with Claude Code instructions and gsd-cloud doctor validation step.
Diagnostic Command
cmd/doctor.go
New doctor subcommand performs sequential verification checks: claude/pi binary availability, authentication status, pi-extension directory presence, machine pairing configuration validity, and daemon connectivity via socket. Returns per-check reports with fix hints and exits with code 1 on failure.

Sequence Diagram

sequenceDiagram
    actor User
    participant CLI as gsd-cloud doctor
    participant FS as File System
    participant Config as Config System
    participant Daemon as Daemon Socket

    User->>CLI: Invoke doctor command
    CLI->>FS: Check claude binary available
    FS-->>CLI: Found/Not found
    CLI->>FS: Check pi binary runnable
    FS-->>CLI: Runnable/Error
    CLI->>FS: Verify pi-extension directory exists
    FS-->>CLI: Exists/Missing
    CLI->>Config: Load machine pairing config
    Config-->>CLI: Config loaded or validation failed
    CLI->>Daemon: Resolve socket path & query sockapi status
    Daemon-->>CLI: Daemon healthy/Unhealthy
    CLI->>User: Print per-check report or summary
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

Possibly related PRs

  • Fix pi provider extension resolution #41: Resolves pi-extension path validation and early error handling; directly complements this PR's packaging/installer changes that alter pi-extension availability and presence verification.

Poem

🐰 npm ci's deferred dance, doctor checks the stance,
Extensions packaged light, binaries checked just right,
Platform-perfect installs bloom, no dependency gloom!

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/install-bundle-and-doctor

Comment @coderabbitai help to get the list of available commands and usage tips.

@glittercowboy glittercowboy merged commit fce5dd0 into main Apr 27, 2026
1 of 2 checks passed
glittercowboy added a commit that referenced this pull request Apr 27, 2026
#46 changed the release workflow to ship source + package-lock.json
instead of bundled node_modules, and added npm ci to install.sh. The
auto-updater (internal/update/) is a separate code path that wasn't
updated, so daemons that auto-update from a pre-#46 version land in a
state with no node_modules and pi exits with:

  Failed to load extension: Cannot find module '@anthropic-ai/claude-agent-sdk'
  Unknown provider "claude-cli"

Mirror install.sh's behavior in installPiExtensionArchive: after
extracting, run npm ci --omit=dev --include=optional in the staging
directory before atomic-rename into place.

Skip cleanly when no package-lock.json exists so older tarball formats
and minimal test fixtures don't blow up.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
glittercowboy added a commit that referenced this pull request Apr 27, 2026
…ors (#51)

* fix: pi extension self-heal + atomic service restart + clearer pi errors

Three connected fixes for daemon update reliability that surfaced after
v0.2.31's auto-update path produced broken pi extensions in the wild.

## Pi extension self-heal on daemon start

PR #46 changed release tarballs to ship source-only and added `npm ci` to
install.sh. PR #50 then added the same to the auto-updater. But daemons
running v0.2.31 ran v0.2.31's broken updater code at update time, landing
on v0.2.32+ with the new binary but no `node_modules/`. Pi tasks fail with:

  Failed to load extension: Cannot find module '@anthropic-ai/claude-agent-sdk'
  Unknown provider "claude-cli"

`internal/update.EnsureExtensionHealthy(extDir)` checks for the platform-
native @anthropic-ai/claude-agent-sdk-* binary and runs `npm ci` if absent.
Idempotent: no-op when healthy, no-op when extension isn't installed at all.
Called from `internal/loop/daemon.go` on every startup, so any stuck v0.2.31
user gets repaired automatically the next time the daemon starts (which
launchd's KeepAlive=true triggers within seconds of the broken state).

## Atomic service restart

`gsd-cloud update` (and restart, rollback) called `Stop()` then `Start()`
which races with launchd's KeepAlive=true and systemd's auto-restart-on-
failure. Adds `Restart()` to the service.Platform interface that uses
platform-native atomic restart:
- launchd: `kickstart -k` (kill if running, then start)
- systemd: `systemctl restart` (works whether active or not)

Updates `cmd/{update,restart,rollback}.go` to use it. login.go intentionally
left alone — its stop/start happens around an interactive auth flow where
the explicit pause makes sense.

## Clearer pi executor errors

When pi exits with the "extension deps missing" signature, the daemon used
to bubble up raw npm-shaped stderr to the relay/UI. Now it returns a
human-readable error explaining how to repair (restart daemon for self-heal,
or run `gsd-cloud doctor` for diagnostics). Other pi exit modes pass
through unchanged.

## Tests

- `EnsureExtensionHealthy` no-op on missing extension, no-op when healthy,
  rejects aggregate-only @anthropic-ai/claude-agent-sdk dir as proof of
  health (must have a platform-specific subdir with binary)
- `piExitError` detects both error signatures, falls through to generic
  format for other failures
- `fakePlatform` in cmd/login_test.go gets Restart() to satisfy interface

All 18 daemon test packages green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(update): surface non-IsNotExist stat errors in EnsureExtensionHealthy

Per CodeRabbit feedback on #51: previous code returned nil for any os.Stat
failure on package.json, masking permission errors and similar transient
issues as "extension not installed, nothing to repair." Now only IsNotExist
silently no-ops; other failures surface so the caller can log them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@glittercowboy glittercowboy deleted the feat/install-bundle-and-doctor branch April 27, 2026 04:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant