Skip to content

fix(update): guide EACCES manual recovery#83757

Merged
steipete merged 3 commits into
openclaw:mainfrom
brokemac79:fix/issue-83747-eacces-recovery
May 18, 2026
Merged

fix(update): guide EACCES manual recovery#83757
steipete merged 3 commits into
openclaw:mainfrom
brokemac79:fix/issue-83747-eacces-recovery

Conversation

@brokemac79
Copy link
Copy Markdown
Contributor

@brokemac79 brokemac79 commented May 18, 2026

Summary

  • add EACCES recovery hints that tell managed-Gateway operators to stop the Gateway before sudo/manual package recovery
  • document the root-owned Linux system-global recovery path: stop Gateway, run the system npm install, refresh the Gateway service, restart, then verify
  • add focused coverage for staged/global install EACCES hint text

Why

Current main can restart the old managed Gateway after a staged package update fails with EACCES. The existing recovery hint points operators toward sudo/manual package recovery, but it does not say to stop the Gateway first. That leaves a window where the running Gateway can try to load core/plugin files while npm is replacing the package tree.

This follows ClawSweeper's narrow guidance on #83747: improve recovery hints/docs and focused tests without changing the updater lifecycle policy.

Closes #83747

Real behavior proof

  • Behavior or issue addressed: EACCES recovery guidance for root-owned/system-global npm installs now fails closed at the operator level by instructing users to stop the managed Gateway before manual package replacement.
  • Real environment tested: Ubuntu VPS disposable npm-global proof environment, Node v22.22.0, npm 10.9.4, throwaway OPENCLAW_PROFILE=proof83757, throwaway NPM_CONFIG_PREFIX under /tmp. Also validated in a Windows desktop source checkout from upstream main 424c6d0a5, Node v24.13.0, pnpm 11.1.0.
  • Exact steps or command run after this patch: Installed openclaw@latest into a temp npm prefix on the VPS, applied this PR's EACCES hint change to the packaged update CLI bundle in that disposable install, made only the temp prefix's lib/node_modules unwritable to trigger the real global install stage EACCES path, then ran OPENCLAW_PROFILE=proof83757 NPM_CONFIG_PREFIX=<temp-prefix> openclaw update --no-restart --yes --tag latest --timeout 20.
  • Evidence after fix: Redacted terminal output from the disposable VPS OpenClaw setup:
ENV: VPS disposable proof, node=v22.22.0 npm=10.9.4 profile=proof83757
PATCH: disposable install now contains PR #83757 EACCES hint text
OpenClaw 2026.5.18 (50a2481)
COMMAND: OPENCLAW_PROFILE=proof83757 NPM_CONFIG_PREFIX=<temp-prefix> openclaw update --no-restart --yes --tag latest --timeout 20
EXIT_CODE: 1

Update Result: ERROR
  Root: /tmp/openclaw-83757-proof-REDACTED/prefix/lib/node_modules/openclaw
  Reason: global install stage
  Before: 2026.5.18
  After: 2026.5.18

Steps:
  x global install stage (0ms)
      EACCES: permission denied, mkdtemp '/tmp/openclaw-83757-proof-REDACTED/prefix/lib/node_modules/.openclaw-update-stage-S2S8Ob'

Recovery hints:
  - Detected permission failure (EACCES). Re-run with a writable global prefix or sudo (for system-managed Node installs).
  - If you recover with sudo/manual package install on a managed Gateway, stop the Gateway first so it does not load files while the package tree is being replaced.
  - Example: npm config set prefix ~/.local && npm i -g openclaw@latest
  - System install outline: openclaw gateway stop -> sudo <system-npm> i -g openclaw@latest -> openclaw gateway install --force -> openclaw gateway restart.

Total time: 473ms
ASSERT: proof output contains EACCES recovery hints from patched CLI/update path
CLEANUP: removed disposable proof directory
  • Observed result after fix: The real CLI/update global install stage EACCES path prints the new stop-before-manual-recovery guidance and the system install outline with gateway install --force and gateway restart.
  • What was not tested: I did not perform a destructive package replacement against the live production OpenClaw install; this PR intentionally changes guidance/tests only and does not alter package-manager lifecycle behavior. The VPS proof used a disposable temp npm prefix/cache/home/profile only. The temp proof directory was removed after capture. I did not hotfix, cherry-pick into, stop, restart, or otherwise mutate the live VPS OpenClaw install/gateway.

Validation

node scripts/run-vitest.mjs src/cli/update-cli/progress.test.ts -- --reporter=verbose
Test Files  1 passed (1)
Tests       6 passed (6)
git diff --check
# no output
codex-review
codex-review clean: no accepted/actionable findings reported

@openclaw-barnacle openclaw-barnacle Bot added docs Improvements or additions to documentation cli CLI command changes size: XS triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels May 18, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 18, 2026

Codex review: needs maintainer review before merge.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
Adds managed-Gateway stop-first recovery hints for npm EACCES update failures, documents the root-owned Linux recovery sequence, adds focused hint assertions, and updates the changelog.

Reproducibility: yes. at the source-path level: npm global update or global install stage EACCES failures flow through inferUpdateFailureHints, and the linked report plus PR body show real EACCES output for that path. I did not run a destructive root-owned package replacement during this read-only review.

PR rating
Overall: 🦞 diamond lobster
Proof: 🦞 diamond lobster
Patch quality: 🦞 diamond lobster
Summary: Strong focused PR with supplied real CLI proof, targeted tests, and no blocking review findings.

What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

PR egg
✨ Hatched: 🌱 uncommon Sunspot Crabkin

        /\     /\            
      _/  \___/  \_          
     /  ( o   o )  \         
    |      \_/      |        
    |   /\  ===  /\ |        
     \_/  \_____/  \_/       
        _/|_| |_|\_          
       /__| | | |__\         
          ' ' ' '            
         /_/     \_\         
       .-----------.         
      '-------------'        

Rarity: 🌱 uncommon.
Trait: hums during re-review.
Share on X: post this hatch
Copy: My PR egg hatched a 🌱 uncommon Sunspot Crabkin in ClawSweeper.

What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • How to hatch it: reach status: 👀 ready for maintainer look or status: 🚀 automerge armed; that usually means sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

Real behavior proof
Sufficient (terminal): The PR body includes redacted after-patch terminal output from a disposable Ubuntu npm-global setup showing the updated EACCES recovery hint in the real CLI path.

Next step before merge
No repair lane is needed because the patch has no blocking findings; maintainers should use normal review and CI gating.

Security
Cleared: The diff changes docs, literal CLI recovery strings, a changelog entry, and focused tests; no concrete security or supply-chain regression was found.

Review details

Best possible solution:

Land the narrow recovery guidance and tests after normal CI and maintainer review, leaving any automatic fail-closed updater lifecycle change to a separate owner decision.

Do we have a high-confidence way to reproduce the issue?

Yes, at the source-path level: npm global update or global install stage EACCES failures flow through inferUpdateFailureHints, and the linked report plus PR body show real EACCES output for that path. I did not run a destructive root-owned package replacement during this read-only review.

Is this the best way to solve the issue?

Yes: updating the CLI recovery hint, docs, and focused tests is the narrow maintainable fix for the guidance gap. Changing whether the updater keeps the Gateway stopped after EACCES would be a separate availability and lifecycle policy decision.

Label justifications:

  • P2: This is a focused CLI/update recovery improvement for a limited Linux root-owned npm install path.

What I checked:

  • Current main CLI hint gap: Current main only prints the generic EACCES writable-prefix/sudo hint and the user-writable npm prefix example; it does not tell managed-Gateway operators to stop the Gateway before manual package replacement. (src/cli/update-cli/progress.ts:84, 583eb711ecb1)
  • Current main lifecycle context: The updater stops a running managed Gateway before package updates, but restarts it after a failed package update, matching the linked recovery-guidance gap rather than requiring this PR to change lifecycle policy. (src/cli/update-cli/update-command.ts:846, 583eb711ecb1)
  • Shipped release gap: The latest release tag v2026.5.18 has the same generic EACCES hints and manual-update docs as current main, so the PR is not obsolete on main or in the shipped release. (src/cli/update-cli/progress.ts:84, 50a2481652b6)
  • PR CLI change: The PR head adds a managed-Gateway stop-first warning and a system install outline to the existing npm global EACCES hint path. (src/cli/update-cli/progress.ts:88, a2883acb74c5)
  • PR test coverage: The PR head extends focused EACCES hint tests for both global update and staged package permission failures to assert the new Gateway and system-npm guidance. (src/cli/update-cli/progress.test.ts:67, a2883acb74c5)
  • PR docs change: The PR head documents stopping the managed Gateway before root-owned Linux npm recovery, reinstalling, refreshing the service, restarting, and verifying health. Public docs: docs/install/updating.md. (docs/install/updating.md:100, a2883acb74c5)

Likely related people:

  • Dallin Romney: Local blame for the current EACCES hint function, managed Gateway update lifecycle, and updating docs points to the same current-checkout baseline commit. (role: recent area contributor; confidence: medium; commits: cf194419c315; files: src/cli/update-cli/progress.ts, src/cli/update-cli/update-command.ts, docs/install/updating.md)
  • Josh Lehman: Recently changed npm managed install/update behavior around package freshness filters, adjacent to this PR's npm global update recovery path. (role: recent adjacent contributor; confidence: medium; commits: 85a3d5312f7d; files: src/infra/update-global.ts, src/infra/update-runner.test.ts, src/infra/npm-install-env.ts)
  • steipete: Prepared the v2026.5.18 release that the linked issue upgraded to and is assigned on this PR, making them a good route for update/release recovery review. (role: adjacent release owner; confidence: medium; commits: 50a2481652b6, 583eb711ecb1; files: CHANGELOG.md, package.json, scripts/notarize-mac-artifact.sh)

Codex review notes: model gpt-5.5, reasoning high; reviewed against 583eb711ecb1.

@clawsweeper clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal backlog priority with limited blast radius. labels May 18, 2026
@openclaw-barnacle openclaw-barnacle Bot added triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 18, 2026
@brokemac79
Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 18, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 18, 2026

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels May 18, 2026
@steipete steipete self-assigned this May 18, 2026
@steipete steipete force-pushed the fix/issue-83747-eacces-recovery branch from ad5155a to a2883ac Compare May 18, 2026 23:05
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 18, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 18, 2026
@steipete
Copy link
Copy Markdown
Contributor

Maintainer verification before landing.

Behavior addressed: EACCES during manual npm recovery now tells users to stop the managed Gateway before replacing the package, then restart and verify the service.
Real environment tested: local source checkout on macOS; GitHub Actions on PR head a2883ac.
Exact steps or command run after this patch:

@steipete steipete merged commit 0903fa6 into openclaw:main May 18, 2026
109 of 110 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cli CLI command changes docs Improvements or additions to documentation P2 Normal backlog priority with limited blast radius. proof: sufficient ClawSweeper judged the real behavior proof convincing. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. size: XS status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: system npm upgrade can leave running gateway in transient half-swapped package/plugin state

2 participants