Skip to content

Restructure setup phases to improve failure handling and fix bugs#1883

Closed
DavidMiserak wants to merge 2 commits into
garrytan:mainfrom
DavidMiserak:fix/reduce-risk
Closed

Restructure setup phases to improve failure handling and fix bugs#1883
DavidMiserak wants to merge 2 commits into
garrytan:mainfrom
DavidMiserak:fix/reduce-risk

Conversation

@DavidMiserak
Copy link
Copy Markdown

Summary

  • Restructures setup into four phases ordered by failure probability, so a bad network day or missing sudo never leaves users with zero skills installed
  • Makes Playwright/Chromium failures non-fatal — was exit 1, now a named warning with $_PW_FAIL_REASON; skills, hooks, and migrations are already in place before Phase D runs
  • Fixes two bugs found in code review

Phase structure

Phase What runs Fail risk
A Filesystem ops: ~/.gstack/, welcome message, GBrain detection Always succeeds
B Local compute: binary build, skill linking, migrations, GBrain regen Low
C settings.json mutation: team/plan-tune hooks Medium, fully reversible
D Network/sudo/package managers: coreutils, emoji font, Playwright Highest — warn not exit

Previously, a Playwright install failure (network issue, bad mirror, missing sudo) would exit 1 mid-run, leaving skills unregistered. Now Phase D is best-effort; re-running ./setup retries only that phase.

Bug fixes

  • Windows non-ASCII bun path (lines 587, 792, 824): link_codex_skill_dirs, link_factory_skill_dirs, and link_opencode_skill_dirs were calling bun run directly instead of bun_cmd, bypassing the Windows non-ASCII path workaround. Silent failures masked by subsequent directory-existence checks.
  • GBrain regen exit code masked (line 1200): bun_cmd run gen:skill-docs:user ... | tail -3 caused tail's exit 0 to swallow bun's failure, so the || log "warning..." clause never fired. Removed the pipe.

Test plan

  • Run ./setup on a clean install — confirm skills link, hooks install, and Playwright failure (if any) prints a warning instead of aborting
  • Run ./setup --host codex on Windows with non-ASCII username — confirm Codex skill docs generate
  • Run ./setup with GBrain installed — confirm warning fires if gen:skill-docs:user fails

DavidMiserak and others added 2 commits June 5, 2026 11:36
Restructures setup into four phases ordered by failure probability so a
bad network day or missing sudo never leaves users with zero skills:

  Phase A — trivial filesystem ops (always succeed)
  Phase B — local compute: binary build, skill linking, migrations
  Phase C — settings.json mutation (reversible)
  Phase D — network/sudo/package managers (best-effort, warn not exit)

Playwright/Chromium failures are now non-fatal warnings with a named
$_PW_FAIL_REASON. Skills, hooks, and migrations are already installed
before Phase D runs; re-running ./setup retries only the network step.

Also fixes two bugs found in code review:
- Use bun_cmd wrapper (not bare bun) in Codex/Factory/OpenCode skill
  generation helpers — fixes silent failures on Windows with non-ASCII
  bun paths
- Remove | tail -3 pipe in GBrain regen so bun exit code is preserved
  and the error warning actually fires on failure
@trunk-io
Copy link
Copy Markdown

trunk-io Bot commented Jun 5, 2026

Merging to main in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here

@jbetala7
Copy link
Copy Markdown
Contributor

jbetala7 commented Jun 7, 2026

Thanks for digging into the failure-handling here — the two bug fixes are real and worth landing.

Bug fixes (correct, please split out):

  • The bun runbun_cmd run change in link_codex/factory/opencode_skill_dirs (origin/main setup:714, 919, 951) is a genuine bug: bun_cmd is the only path that honors the non-ASCII Windows BUN_CMD copy from prepare_bun_for_windows_compile, and the literal bun run bypasses it, so the regen silently fails on a non-ASCII Windows username (masked by the following dir-existence check).
  • Removing | tail -3 from the gen:skill-docs:user call (origin/main setup:1297) is also correct — the pipe makes the subshell exit status tail's, so the || log "warning…" guard right after it can never fire and bun failures get swallowed.

These two are tiny and independently testable — a focused PR landing just them (with a test that actually exercises the exit-code path, not only a source-string grep) would merge fast.

The phase restructure is hard to land as-is. It moves ~200 lines and renumbers nearly every step, which (a) can't be reviewed as a behavior-preserving move because it also folds in real behavior changes — Playwright exit 1 → best-effort warn, and the GBrain detect/regen split — and (b) structurally conflicts with every other open setup PR. In particular the Playwright "warn don't exit" change overlaps directly with #1838 (pins the Chromium install to the bundled Playwright version, closes #1829), which is already narrowly scoped and issue-linked for that same block.

Suggested split: (1) the two bug fixes, (2) the Playwright warn-don't-exit change as its own issue-linked PR coordinated with #1838, (3) the phase reorder last, against fresh main, with an issue describing the failure modes it fixes. Happy to re-review the bug-fix PR quickly.

@DavidMiserak
Copy link
Copy Markdown
Author

Thanks @jbetala7 — agreed on the split. Carved into three PRs as you
suggested:

  1. fix(setup): route gen:skill-docs through bun_cmd; drop tail -3 pipe that masks bun failures #1898 — the two bug fixes (bun_cmd routing in
    link_{codex,factory,opencode}_skill_dirs + dropping | tail -3
    from gen:skill-docs:user). Four-line setup diff plus 5 tests that
    exercise the exit-code path (stubs bun_cmd to fail, runs the actual
    shell pattern, asserts the || log warning fires only on the fixed
    shape and does NOT fire on the buggy ... | tail -3 shape). No
    source-string grep on its own.

  2. fix(setup): Playwright Chromium failure becomes best-effort warn, not exit 1 #1900 — Playwright exit 1 → best-effort warn with a
    $_PW_FAIL_REASON accumulator. Includes the coordination note with
    fix(setup): pin Playwright browser install to bundled version so headed browse connect works (#1829) #1838 you mentioned; happy to rebase whichever lands second.

  3. refactor(setup): group steps into four phases ordered by failure probability #1903 — the phase reorder, against fresh main, closing setup: Playwright Chromium download failure aborts setup before skills/hooks/migrations land #1902
    which describes the failure modes (skills/hooks/migrations not
    landing when a Playwright download fails before them). The diff
    intentionally does NOT include the bug fixes from fix(setup): route gen:skill-docs through bun_cmd; drop tail -3 pipe that masks bun failures #1898 or the
    Playwright behavior change from fix(setup): Playwright Chromium failure becomes best-effort warn, not exit 1 #1900 — verified by grep so a
    reviewer can read it as "a move + the gbrain detect/regen split that
    the move requires + one safety guard for the split."

Closing this PR — superseded by the three above.

Recommended merge order: #1898 first (tiny, independent), then
#1838 + #1900 (either order), then #1903 last so it rebases on the
others rather than the others rebasing on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants