[FORGE-89] feat(skills,orchestrate): rename /codex → /second-opinion; dispatch via IHarness adapter (P2-T21) by firatcand · Pull Request #205 · firatcand/forge

firatcand · 2026-05-20T22:03:53Z

Summary

Renames the /codex skill to /second-opinion and routes dispatch through a new forge orchestrate second-opinion CLI verb that wraps IHarness.runReview, so settings.agents.review_host_cli (codex vs gemini) is honored at runtime. The user-facing surface becomes host-agnostic; /codex lives one release as a description-only deprecation alias removed in v0.5.0.

New CLI verb forge orchestrate second-opinion — sole boundary that knows about review_host_cli. Reads diff + prompt from files, calls createHarness(host).runReview(diff, prompt, opts), emits a ReviewVerdict envelope.
New canonical skill skills/second-opinion/SKILL.md — six-piece prompt template, calls the CLI verb, documents the envelope shape and error codes.
Old skill skills/codex/SKILL.md rewritten as a ~25-line DEPRECATED alias (description-only, no chalk-warn, no per-session marker — see "AC deviation" below).
/review gains a step 5 — when CRITICAL.md paths are touched AND review_host_cli !== null, the in-process review suggests /second-opinion review-impl for out-of-process, different-model-lineage verification.
Docs/spec sweep — /codex → /second-opinion across CLAUDE.md, README.md, CRITICAL.md, spec/{SPEC,PRD,CONTEXT}.md; new SPEC §1041 §Multi-host review rewrite naming /second-opinion as canonical surface; forge codex-suggest CLI verb / auto_codex_enabled settings field / FORGE_AUTO_CODEX env var intentionally kept (rename deferred to v0.5 — see "Out of scope" below).

Why

With FORGE-88 (IHarness adapter) landed in #187, the /codex skill no longer needs to hardcode codex exec. The host-agnostic rename + adapter dispatch is the closing piece of the P2-T20–T21 sequence — it makes review_host_cli=gemini actually do something at the interactive surface, not just in orchestrator REVIEW phase. Mirrors the established P2-T10 (/push-to-linear → /push-to-tracker) rename pattern.

Decisions resolved in /plan-task (4 forks)

#	Question	Choice	Notes
1	Where does dispatch live?	CLI verb wrapping `createHarness(host).runReview`	Honors CLAUDE.md skill↔verb contract. Verb adds ~244 LOC + tests.
2	Rename `forge codex-suggest` / `auto_codex_enabled` / `FORGE_AUTO_CODEX`?	Defer — only update the printed suggestion string	Avoids settings-schema migration + adopter `.forge/settings.yaml` rewrite. Filed as v0.5 follow-up.
3	Add `/review` → `/second-opinion` trigger on CRITICAL.md hits?	Yes, gated on `review_host_cli !== null`	Literal AC reading + ETHOS principle 6.
4	Deprecation alias mechanism?	Description-only, no chalk-warn, no per-session state	Skill bodies aren't a stable runtime for stateful warning logic. AC #10 (per-session suppression) intentionally dropped — see below.

AC coverage

AC	Status	Notes
1. `skills/second-opinion/SKILL.md` exists; names Codex + Gemini	✓	Frontmatter description mentions both reviewers.
2. Skill body invokes `IHarness.runReview` via settings	✓	Via `forge orchestrate second-opinion` (skill→verb contract).
3. `skills/codex/SKILL.md` kept as deprecation alias	✓	Description-only per decision #4.
4. `/ship` CRITICAL trigger → `/second-opinion`	✓	§Gates.7 + auto-codex suggestion text updated.
5. `/review` CRITICAL trigger → `/second-opinion`	✓	New step 5; gated on `review_host_cli !== null` so adopters who opted out aren't forced.
6. README + CLAUDE.md canonical	✓	TASK loop diagram + critical-paths comment updated.
7. `spec/ORCHESTRATOR.md` generalized	⚠ — no changes	All "Codex" refs there are to the actual vendor CLI (Codex CLI, Codex review #N, Codex subagent, Codex sandbox), not the abstract reviewer role. Generalizing would either be a no-op or wrong. SPEC §Multi-host review WAS rewritten — that's the place where the abstract role lived.
8. E2E codex variant + gemini variant equivalent verdict	✓	`test/integration/cli/orchestrate-second-opinion.test.ts` — full pipeline (settings → createHarness → real adapter → parseHarnessVerdict → envelope) with mocked `spawnSubprocess`.
9. Load-time validation for invalid/unset `review_host_cli`	✓	Envelope codes: `SETTINGS_NOT_FOUND`, `REVIEW_DISABLED`, `INVALID_SETTINGS`.
10. Per-session deprecation warning on first `/codex` invocation	✗ — intentionally dropped	Skill bodies are markdown, not a runtime — per-session state requires a marker file (race-y across multi-window sessions) + ~20 lines of bash for one removed-in-one-release polish. Decision #4 in `/plan-task`: description-only alias, warn-every-time. Track as a follow-up only if a user complains.

/review findings addressed pre-PR

Code-reviewer + security-auditor caught 7 improvements; all fixed in commit 3834c72 before this PR:

MAX_DIFF_BYTES = 1MB + MAX_PROMPT_BYTES = 200KB caps via statSync before readFileSync (avoids pulling a 1GB file into memory to be rejected). Two new unit tests.
Skill body switched to mktemp -t (mode 0600, unguessable names) + explicit rm -f cleanup — eliminates /tmp/forge-${TASK}-*.txt world-readable race.
fakeHarness test fixture now matches ReviewVerdictSchema (drops bogus summary, adds required version: 1).
SPAWN_FAILED + NON_ZERO_EXIT + TIMEOUT retriable-bit assertions locked down.
Integration-test dead ternary ('claude' : 'claude') replaced + misleading "Reset stdout" comment rewritten.
/review SKILL.md line clarifies /second-opinion is a slash command, not a Bash binary.

Security audit also confirmed: no argv injection (positional argv, no shell interpolation), no settings-tampering escalation (gemini-experimental gate enforced at harness ctor), no env-var-driven settings discovery redirect (gitleaks: 0 leaks across 5 commits, 56KB scanned).

Out of scope — v0.5 follow-up to file

Rename forge codex-suggest CLI verb → forge second-opinion-suggest (or similar). Rename codex.auto_codex_enabled settings field → auto_second_opinion_enabled. Rename FORGE_AUTO_CODEX env var → FORGE_AUTO_SECOND_OPINION. Settings-schema migration + adopter .forge/settings.yaml rewrite. Decision #2 in /plan-task; should be a new FORGE ticket before v0.5 cuts.

How to test

npm test — 1306 pass / 0 fail / 19 skipped (was 1302 on main).
npm run typecheck — clean.
npm run build — node dist/bin/forge.cjs orchestrate lists second-opinion verb in mutating band.
node --test --import tsx 'test/integration/cli/orchestrate-second-opinion.test.ts' — AC [P1-T04] Build core/settings.ts loader + hot-reload #8 gate (codex/gemini equivalence).
gitleaks detect --log-opts=main..HEAD — 0 leaks.
Manual smoke: in a worktree with agents.review_host_cli: codex (no FORGE_GEMINI_EXPERIMENTAL), forge orchestrate second-opinion --task FORGE-89 --diff <path> --prompt <path> --json returns a ReviewVerdict envelope. With review_host_cli: null returns REVIEW_DISABLED.

Closes #121 (FORGE-89).

🤖 Generated with Claude Code

… IHarness.runReview (P2-T21) New `forge orchestrate second-opinion` verb is the sole boundary that knows about settings.agents.review_host_cli. Reads diff + prompt from files, calls createHarness(host).runReview, emits a ReviewVerdict JSON envelope. The skill body (renamed in a later step) becomes host-agnostic. Decision: dispatch shape → CLI verb wrapping createHarness per user answer (skill↔verb contract — skills never own state-machine logic). Decision: CLI/env rename → defer per user answer (only printed strings change). - src/schemas/cli-args.ts: SecondOpinionArgsSchema (taskId, diffPath, promptPath, cwd?, timeoutMs?, forgeDir, json). - src/cli/orchestrate/second-opinion.ts: runOrchestrateSecondOpinion + secondOpinionHandler. Tests inject SecondOpinionHarnessFactory to avoid spawning real codex/gemini. - src/cli/orchestrate/index.ts: register handler in VERBS map + HELP_ORDER. - test/unit/cli/orchestrate/second-opinion.test.ts: 11 tests covering arg validation, missing settings/files, REVIEW_DISABLED, codex + gemini happy paths, HarnessError pass-through (BINARY_NOT_FOUND, TIMEOUT retriable, REVIEW_FAILED for non-HarnessError). All 11 tests pass; typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…nion verb (AC #8) Integration test exercising the full pipeline: settings load → createHarness → runReview (real CodexHarness/GeminiHarness) → parseHarnessVerdict (real fenced-JSON path) → envelope emission. Only spawnSubprocess is mocked, returning a fenced ReviewVerdict in stdout. Asserts that the two host variants produce envelopes that differ ONLY in host stamping; verdict outcome, version, and findings shape are equivalent. This is the AC #8 gate (P2-T21 plan §6). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…lers (P2-T21) New skill skills/second-opinion/SKILL.md is the canonical, host-agnostic interactive surface; dispatch goes through `forge orchestrate second-opinion` (landed in earlier commit). Old skills/codex/SKILL.md becomes a description- only deprecation alias, removed in v0.5.0. Decision: /review trigger added → invokes /second-opinion review-impl when CRITICAL.md paths are touched AND review_host_cli is set (gated to respect adopters who opted out via review_host_cli: null), per user answer. Decision: deprecation alias mechanism → description-only, no chalk-warn, no per-session state, per user answer (AC #10 explicitly dropped; will be flagged in PR body). - skills/second-opinion/SKILL.md: canonical body (6-piece prompt template, CLI verb invocation, envelope shape, /ship + /review integration notes). - skills/codex/SKILL.md: ~15-line DEPRECATED redirect to /second-opinion. - skills/ship/SKILL.md: §Gates line 7 + auto-codex hint text → /second-opinion. - skills/plan-task/SKILL.md: auto-codex hint text → /second-opinion. - skills/review/SKILL.md: new step 5 — CRITICAL.md hit + review_host_cli!==null → invoke /second-opinion review-impl. Findings folded into severity bucketing. - src/cli/codex-suggest.ts: suggestion template `/codex` → `/second-opinion`. CLI verb name `forge codex-suggest`, settings field `auto_codex_enabled`, env var FORGE_AUTO_CODEX intentionally kept (rename deferred — user decision; tracked as v0.5 follow-up in plan §7). - test/unit/cli/codex-suggest.test.ts: 7 assertion strings updated. - test/unit/skills/second-opinion.contract.test.ts: 12 contract tests guarding file presence, frontmatter, deprecation alias shape, and cross- skill references. All 27 tests pass; typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…P2-T21) Updates every doc and spec reference that names the user-facing skill, but leaves the underlying `codex.auto_codex_enabled` settings field, `forge codex-suggest` CLI verb, and `FORGE_AUTO_CODEX` env var unchanged per the Step 5 decision (those rename in v0.5 to avoid a settings-schema migration). - CLAUDE.md: critical-paths comment. - CRITICAL.md: header comments (auto-review trigger doc string). - README.md: TASK loop diagram /codex → /second-opinion. - spec/SPEC.md: - Auto-codex hooks table: three rows /codex review-* → /second-opinion review-*; example suggestion-line string updated; added note that the CLI verb / settings field / env var keep their CODEX-suffixed names in v0.4 per FORGE-89 plan §7. - §824: ADR-review optional consult /codex → /second-opinion. - §1041-1047: §Multi-host review fully rewritten — generalized prose, names /second-opinion as canonical interactive surface, calls out Claude exclusion + cursor drop, points at FORGE-89 / P2-T21. - §1062: FORGE_AUTO_CODEX env doc updated to mention the v0.5 rename. - spec/PRD.md: 4 references updated (auto-suggest comment, ADR draft consult, ADR review-decision step, ephemeral-ADR overview). - spec/CONTEXT.md: drift-workflow paragraph. spec/BRIEF.md mention is intentionally retained (historical baseline doc describing the v0.3.x skill surface, where /codex was the actual name). spec/ORCHESTRATOR.md mentions are intentionally retained — they refer to Codex-the-vendor-CLI (Codex CLI, Codex review #N, Codex subagent, Codex sandbox), not the abstract reviewer role. Build smoke: dist/bin/forge.cjs orchestrate help lists `second-opinion` verb. All 1302 tests pass; typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…2-T21) Code-reviewer and security-auditor passes surfaced 7 improvements; all fixed in this commit before /ship. CODE REVIEW: - second-opinion.test.ts fakeHarness: drop bogus `summary` field, add required `version: 1` so the verdict matches ReviewVerdictSchema without needing the `as ReviewVerdict` escape cast. - second-opinion.test.ts: add SPAWN_FAILED + NON_ZERO_EXIT retriable assertions to lock down the verb's HarnessError → envelope mapping. - Strengthen the existing TIMEOUT test to also assert retriable: true. - orchestrate-second-opinion.test.ts (integration): kill the dead ternary (`'claude' : 'claude'`) — both branches were identical; just pin to claude. - orchestrate-second-opinion.test.ts (integration): replace the misleading "Reset stdout for the second variant" comment with an honest explanation of how the second captureStdout(t) call rewires the monkey-patch and why both t.after restores are idempotent. - skills/review/SKILL.md: clarify that /second-opinion is a Claude Code slash command, not a Bash subprocess (prevents future skill authors from writing Bash(/second-opinion ...) which would fail silently). SECURITY AUDIT: - skills/second-opinion/SKILL.md: switch the documented temp-file pattern from predictable /tmp/forge-${TASK}-*.txt (mode 0644, world-readable) to `mktemp -t ...` (mode 0600, unguessable name). Add explicit cleanup step (rm -f) so committed-secret or local-path content doesn't sit in /tmp. - second-opinion.ts: add MAX_DIFF_BYTES (1MB) + MAX_PROMPT_BYTES (200KB) caps via statSync before readFileSync, returning DIFF_TOO_LARGE / PROMPT_TOO_LARGE envelopes. Stops a 1GB diff from being pulled into memory just to be rejected downstream. Precedent: MAX_STDOUT_BYTES in src/harnesses/subprocess.ts. Two new unit tests cover the caps. All 1306 tests pass; typecheck clean; build smoke shows the verb registered. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Brings in FORGE-22 (gc reconciler + retry policy, PR #203) and FORGE-89 (/codex → /second-opinion rename, PR #205). Resolved conflicts in spec/CONTEXT.md + spec/PRD.md by combining our structural admonitions (deferred-to-v0.5 boxes, Decision 10/11 strikethrough+redirect, settings.yaml decisions-block annotation) with their /codex → /second-opinion renames in the preserved v0.5 prose. No semantic loss either side; both hygiene threads compose cleanly.

firatcand and others added 6 commits May 20, 2026 21:49

Merge branch 'main' into feat/FORGE-89

92534a0

firatcand merged commit 6a8ab7d into main May 20, 2026
10 checks passed

firatcand deleted the feat/FORGE-89 branch May 20, 2026 22:09

firatcand mentioned this pull request May 20, 2026

[FOLLOWUP] Rename codex-suggest CLI verb / auto_codex_enabled / FORGE_AUTO_CODEX to match /second-opinion skill (v0.5) #211

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FORGE-89] feat(skills,orchestrate): rename /codex → /second-opinion; dispatch via IHarness adapter (P2-T21)#205

[FORGE-89] feat(skills,orchestrate): rename /codex → /second-opinion; dispatch via IHarness adapter (P2-T21)#205
firatcand merged 6 commits into
mainfrom
feat/FORGE-89

firatcand commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

firatcand commented May 20, 2026

Summary

Why

Decisions resolved in /plan-task (4 forks)

AC coverage

/review findings addressed pre-PR

Out of scope — v0.5 follow-up to file

How to test

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant