Skip to content

[FORGE-89] feat(skills,orchestrate): rename /codex → /second-opinion; dispatch via IHarness adapter (P2-T21)#205

Merged
firatcand merged 6 commits into
mainfrom
feat/FORGE-89
May 20, 2026
Merged

[FORGE-89] feat(skills,orchestrate): rename /codex → /second-opinion; dispatch via IHarness adapter (P2-T21)#205
firatcand merged 6 commits into
mainfrom
feat/FORGE-89

Conversation

@firatcand
Copy link
Copy Markdown
Owner

Summary

Renames the /codex skill to /second-opinion and routes dispatch through a new forge orchestrate second-opinion CLI verb that wraps IHarness.runReview, so settings.agents.review_host_cli (codex vs gemini) is honored at runtime. The user-facing surface becomes host-agnostic; /codex lives one release as a description-only deprecation alias removed in v0.5.0.

  • New CLI verb forge orchestrate second-opinion — sole boundary that knows about review_host_cli. Reads diff + prompt from files, calls createHarness(host).runReview(diff, prompt, opts), emits a ReviewVerdict envelope.
  • New canonical skill skills/second-opinion/SKILL.md — six-piece prompt template, calls the CLI verb, documents the envelope shape and error codes.
  • Old skill skills/codex/SKILL.md rewritten as a ~25-line DEPRECATED alias (description-only, no chalk-warn, no per-session marker — see "AC deviation" below).
  • /review gains a step 5 — when CRITICAL.md paths are touched AND review_host_cli !== null, the in-process review suggests /second-opinion review-impl for out-of-process, different-model-lineage verification.
  • Docs/spec sweep/codex/second-opinion across CLAUDE.md, README.md, CRITICAL.md, spec/{SPEC,PRD,CONTEXT}.md; new SPEC §1041 §Multi-host review rewrite naming /second-opinion as canonical surface; forge codex-suggest CLI verb / auto_codex_enabled settings field / FORGE_AUTO_CODEX env var intentionally kept (rename deferred to v0.5 — see "Out of scope" below).

Why

With FORGE-88 (IHarness adapter) landed in #187, the /codex skill no longer needs to hardcode codex exec. The host-agnostic rename + adapter dispatch is the closing piece of the P2-T20–T21 sequence — it makes review_host_cli=gemini actually do something at the interactive surface, not just in orchestrator REVIEW phase. Mirrors the established P2-T10 (/push-to-linear/push-to-tracker) rename pattern.

Decisions resolved in /plan-task (4 forks)

# Question Choice Notes
1 Where does dispatch live? CLI verb wrapping createHarness(host).runReview Honors CLAUDE.md skill↔verb contract. Verb adds ~244 LOC + tests.
2 Rename forge codex-suggest / auto_codex_enabled / FORGE_AUTO_CODEX? Defer — only update the printed suggestion string Avoids settings-schema migration + adopter .forge/settings.yaml rewrite. Filed as v0.5 follow-up.
3 Add /review/second-opinion trigger on CRITICAL.md hits? Yes, gated on review_host_cli !== null Literal AC reading + ETHOS principle 6.
4 Deprecation alias mechanism? Description-only, no chalk-warn, no per-session state Skill bodies aren't a stable runtime for stateful warning logic. AC #10 (per-session suppression) intentionally dropped — see below.

AC coverage

AC Status Notes
1. skills/second-opinion/SKILL.md exists; names Codex + Gemini Frontmatter description mentions both reviewers.
2. Skill body invokes IHarness.runReview via settings Via forge orchestrate second-opinion (skill→verb contract).
3. skills/codex/SKILL.md kept as deprecation alias Description-only per decision #4.
4. /ship CRITICAL trigger → /second-opinion §Gates.7 + auto-codex suggestion text updated.
5. /review CRITICAL trigger → /second-opinion New step 5; gated on review_host_cli !== null so adopters who opted out aren't forced.
6. README + CLAUDE.md canonical TASK loop diagram + critical-paths comment updated.
7. spec/ORCHESTRATOR.md generalized ⚠ — no changes All "Codex" refs there are to the actual vendor CLI (Codex CLI, Codex review #N, Codex subagent, Codex sandbox), not the abstract reviewer role. Generalizing would either be a no-op or wrong. SPEC §Multi-host review WAS rewritten — that's the place where the abstract role lived.
8. E2E codex variant + gemini variant equivalent verdict test/integration/cli/orchestrate-second-opinion.test.ts — full pipeline (settings → createHarness → real adapter → parseHarnessVerdict → envelope) with mocked spawnSubprocess.
9. Load-time validation for invalid/unset review_host_cli Envelope codes: SETTINGS_NOT_FOUND, REVIEW_DISABLED, INVALID_SETTINGS.
10. Per-session deprecation warning on first /codex invocation ✗ — intentionally dropped Skill bodies are markdown, not a runtime — per-session state requires a marker file (race-y across multi-window sessions) + ~20 lines of bash for one removed-in-one-release polish. Decision #4 in /plan-task: description-only alias, warn-every-time. Track as a follow-up only if a user complains.

/review findings addressed pre-PR

Code-reviewer + security-auditor caught 7 improvements; all fixed in commit 3834c72 before this PR:

  • MAX_DIFF_BYTES = 1MB + MAX_PROMPT_BYTES = 200KB caps via statSync before readFileSync (avoids pulling a 1GB file into memory to be rejected). Two new unit tests.
  • Skill body switched to mktemp -t (mode 0600, unguessable names) + explicit rm -f cleanup — eliminates /tmp/forge-${TASK}-*.txt world-readable race.
  • fakeHarness test fixture now matches ReviewVerdictSchema (drops bogus summary, adds required version: 1).
  • SPAWN_FAILED + NON_ZERO_EXIT + TIMEOUT retriable-bit assertions locked down.
  • Integration-test dead ternary ('claude' : 'claude') replaced + misleading "Reset stdout" comment rewritten.
  • /review SKILL.md line clarifies /second-opinion is a slash command, not a Bash binary.

Security audit also confirmed: no argv injection (positional argv, no shell interpolation), no settings-tampering escalation (gemini-experimental gate enforced at harness ctor), no env-var-driven settings discovery redirect (gitleaks: 0 leaks across 5 commits, 56KB scanned).

Out of scope — v0.5 follow-up to file

Rename forge codex-suggest CLI verb → forge second-opinion-suggest (or similar). Rename codex.auto_codex_enabled settings field → auto_second_opinion_enabled. Rename FORGE_AUTO_CODEX env var → FORGE_AUTO_SECOND_OPINION. Settings-schema migration + adopter .forge/settings.yaml rewrite. Decision #2 in /plan-task; should be a new FORGE ticket before v0.5 cuts.

How to test

  • npm test — 1306 pass / 0 fail / 19 skipped (was 1302 on main).
  • npm run typecheck — clean.
  • npm run buildnode dist/bin/forge.cjs orchestrate lists second-opinion verb in mutating band.
  • node --test --import tsx 'test/integration/cli/orchestrate-second-opinion.test.ts' — AC [P1-T04] Build core/settings.ts loader + hot-reload #8 gate (codex/gemini equivalence).
  • gitleaks detect --log-opts=main..HEAD — 0 leaks.
  • Manual smoke: in a worktree with agents.review_host_cli: codex (no FORGE_GEMINI_EXPERIMENTAL), forge orchestrate second-opinion --task FORGE-89 --diff <path> --prompt <path> --json returns a ReviewVerdict envelope. With review_host_cli: null returns REVIEW_DISABLED.

Closes #121 (FORGE-89).

🤖 Generated with Claude Code

firatcand and others added 6 commits May 20, 2026 21:49
… IHarness.runReview (P2-T21)

New `forge orchestrate second-opinion` verb is the sole boundary that knows
about settings.agents.review_host_cli. Reads diff + prompt from files, calls
createHarness(host).runReview, emits a ReviewVerdict JSON envelope. The skill
body (renamed in a later step) becomes host-agnostic.

Decision: dispatch shape → CLI verb wrapping createHarness per user answer
(skill↔verb contract — skills never own state-machine logic).
Decision: CLI/env rename → defer per user answer (only printed strings change).

- src/schemas/cli-args.ts: SecondOpinionArgsSchema (taskId, diffPath,
  promptPath, cwd?, timeoutMs?, forgeDir, json).
- src/cli/orchestrate/second-opinion.ts: runOrchestrateSecondOpinion +
  secondOpinionHandler. Tests inject SecondOpinionHarnessFactory to avoid
  spawning real codex/gemini.
- src/cli/orchestrate/index.ts: register handler in VERBS map + HELP_ORDER.
- test/unit/cli/orchestrate/second-opinion.test.ts: 11 tests covering arg
  validation, missing settings/files, REVIEW_DISABLED, codex + gemini happy
  paths, HarnessError pass-through (BINARY_NOT_FOUND, TIMEOUT retriable,
  REVIEW_FAILED for non-HarnessError).

All 11 tests pass; typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nion verb (AC #8)

Integration test exercising the full pipeline: settings load → createHarness
→ runReview (real CodexHarness/GeminiHarness) → parseHarnessVerdict (real
fenced-JSON path) → envelope emission. Only spawnSubprocess is mocked,
returning a fenced ReviewVerdict in stdout.

Asserts that the two host variants produce envelopes that differ ONLY in
host stamping; verdict outcome, version, and findings shape are equivalent.
This is the AC #8 gate (P2-T21 plan §6).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lers (P2-T21)

New skill skills/second-opinion/SKILL.md is the canonical, host-agnostic
interactive surface; dispatch goes through `forge orchestrate second-opinion`
(landed in earlier commit). Old skills/codex/SKILL.md becomes a description-
only deprecation alias, removed in v0.5.0.

Decision: /review trigger added → invokes /second-opinion review-impl when
CRITICAL.md paths are touched AND review_host_cli is set (gated to respect
adopters who opted out via review_host_cli: null), per user answer.
Decision: deprecation alias mechanism → description-only, no chalk-warn, no
per-session state, per user answer (AC #10 explicitly dropped; will be
flagged in PR body).

- skills/second-opinion/SKILL.md: canonical body (6-piece prompt template,
  CLI verb invocation, envelope shape, /ship + /review integration notes).
- skills/codex/SKILL.md: ~15-line DEPRECATED redirect to /second-opinion.
- skills/ship/SKILL.md: §Gates line 7 + auto-codex hint text → /second-opinion.
- skills/plan-task/SKILL.md: auto-codex hint text → /second-opinion.
- skills/review/SKILL.md: new step 5 — CRITICAL.md hit + review_host_cli!==null
  → invoke /second-opinion review-impl. Findings folded into severity bucketing.
- src/cli/codex-suggest.ts: suggestion template `/codex` → `/second-opinion`.
  CLI verb name `forge codex-suggest`, settings field `auto_codex_enabled`,
  env var FORGE_AUTO_CODEX intentionally kept (rename deferred — user
  decision; tracked as v0.5 follow-up in plan §7).
- test/unit/cli/codex-suggest.test.ts: 7 assertion strings updated.
- test/unit/skills/second-opinion.contract.test.ts: 12 contract tests
  guarding file presence, frontmatter, deprecation alias shape, and cross-
  skill references.

All 27 tests pass; typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…P2-T21)

Updates every doc and spec reference that names the user-facing skill, but
leaves the underlying `codex.auto_codex_enabled` settings field, `forge
codex-suggest` CLI verb, and `FORGE_AUTO_CODEX` env var unchanged per the
Step 5 decision (those rename in v0.5 to avoid a settings-schema migration).

- CLAUDE.md: critical-paths comment.
- CRITICAL.md: header comments (auto-review trigger doc string).
- README.md: TASK loop diagram /codex → /second-opinion.
- spec/SPEC.md:
  - Auto-codex hooks table: three rows /codex review-* → /second-opinion
    review-*; example suggestion-line string updated; added note that the
    CLI verb / settings field / env var keep their CODEX-suffixed names in
    v0.4 per FORGE-89 plan §7.
  - §824: ADR-review optional consult /codex → /second-opinion.
  - §1041-1047: §Multi-host review fully rewritten — generalized prose,
    names /second-opinion as canonical interactive surface, calls out
    Claude exclusion + cursor drop, points at FORGE-89 / P2-T21.
  - §1062: FORGE_AUTO_CODEX env doc updated to mention the v0.5 rename.
- spec/PRD.md: 4 references updated (auto-suggest comment, ADR draft
  consult, ADR review-decision step, ephemeral-ADR overview).
- spec/CONTEXT.md: drift-workflow paragraph.

spec/BRIEF.md mention is intentionally retained (historical baseline doc
describing the v0.3.x skill surface, where /codex was the actual name).

spec/ORCHESTRATOR.md mentions are intentionally retained — they refer to
Codex-the-vendor-CLI (Codex CLI, Codex review #N, Codex subagent, Codex
sandbox), not the abstract reviewer role.

Build smoke: dist/bin/forge.cjs orchestrate help lists `second-opinion`
verb. All 1302 tests pass; typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…2-T21)

Code-reviewer and security-auditor passes surfaced 7 improvements; all fixed
in this commit before /ship.

CODE REVIEW:
- second-opinion.test.ts fakeHarness: drop bogus `summary` field, add
  required `version: 1` so the verdict matches ReviewVerdictSchema without
  needing the `as ReviewVerdict` escape cast.
- second-opinion.test.ts: add SPAWN_FAILED + NON_ZERO_EXIT retriable
  assertions to lock down the verb's HarnessError → envelope mapping.
- Strengthen the existing TIMEOUT test to also assert retriable: true.
- orchestrate-second-opinion.test.ts (integration): kill the dead ternary
  (`'claude' : 'claude'`) — both branches were identical; just pin to claude.
- orchestrate-second-opinion.test.ts (integration): replace the misleading
  "Reset stdout for the second variant" comment with an honest explanation
  of how the second captureStdout(t) call rewires the monkey-patch and
  why both t.after restores are idempotent.
- skills/review/SKILL.md: clarify that /second-opinion is a Claude Code
  slash command, not a Bash subprocess (prevents future skill authors from
  writing Bash(/second-opinion ...) which would fail silently).

SECURITY AUDIT:
- skills/second-opinion/SKILL.md: switch the documented temp-file pattern
  from predictable /tmp/forge-${TASK}-*.txt (mode 0644, world-readable) to
  `mktemp -t ...` (mode 0600, unguessable name). Add explicit cleanup step
  (rm -f) so committed-secret or local-path content doesn't sit in /tmp.
- second-opinion.ts: add MAX_DIFF_BYTES (1MB) + MAX_PROMPT_BYTES (200KB)
  caps via statSync before readFileSync, returning DIFF_TOO_LARGE /
  PROMPT_TOO_LARGE envelopes. Stops a 1GB diff from being pulled into
  memory just to be rejected downstream. Precedent: MAX_STDOUT_BYTES in
  src/harnesses/subprocess.ts. Two new unit tests cover the caps.

All 1306 tests pass; typecheck clean; build smoke shows the verb registered.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@firatcand firatcand merged commit 6a8ab7d into main May 20, 2026
10 checks passed
@firatcand firatcand deleted the feat/FORGE-89 branch May 20, 2026 22:09
firatcand added a commit that referenced this pull request May 20, 2026
Brings in FORGE-22 (gc reconciler + retry policy, PR #203) and FORGE-89
(/codex → /second-opinion rename, PR #205).

Resolved conflicts in spec/CONTEXT.md + spec/PRD.md by combining our
structural admonitions (deferred-to-v0.5 boxes, Decision 10/11
strikethrough+redirect, settings.yaml decisions-block annotation) with
their /codex → /second-opinion renames in the preserved v0.5 prose.
No semantic loss either side; both hygiene threads compose cleanly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[P2-T21] Rename /codex skill → /second-opinion; dispatch via review adapter

1 participant