Skip to content

feat(state+rules): rule 22 (consumer-first + RED-first TDD) + parentTaskId fan-out cost rollup + 09-byterover-cli comparison#27

Merged
omerakben merged 8 commits into
mainfrom
feat/byterover-09-borrows
May 11, 2026
Merged

feat(state+rules): rule 22 (consumer-first + RED-first TDD) + parentTaskId fan-out cost rollup + 09-byterover-cli comparison#27
omerakben merged 8 commits into
mainfrom
feat/byterover-09-borrows

Conversation

@omerakben
Copy link
Copy Markdown
Owner

Summary

Closes the byterover-cli template comparison (session 09) and lands the two consolidated borrows on feat/byterover-09-borrows:

  • Rule 22 (non-negotiable) — consumer-first design (Outside-In) + RED-first TDD ordering for behavior changes. Consolidates byterover-cli B1 + B4 into one structural rule to avoid rule-list bloat (Codex fix-first thread 019e12ec). The detailed RED-first 5-step sequence lives in src/agents/defaults/builder.md for execution; rule 22 is the CLAUDE.md non-negotiable.
  • parentTaskId fan-out cost rollup (B3 hotfix) — threads parentTaskId through state/events.jsonl cost-recorded events so reviewer-panel + debate fan-outs roll up correctly against budgets.global. Repaired the parent-task pairing logic per Codex round 3 closure.
  • 09-byterover-cli comparison artifactsCOMPARISON.md (21-row feature matrix), CODEX_BRIEFING.md, CODEX_RESPONSE.md (8 findings, fix-first verdict), CODEX_PREDESIGN_B3.md (pre-implementation design memo, thread 019e1318), CODEX_FINAL_REVIEW.md (round 3 closure), SYNTHESIS.md (closed; 7 decision points resolved).

Commits

  • e672e9b feat(state,cost,phases): thread parentTaskId for fan-out cost rollup (09-byterover-cli B3)
  • 57a1456 docs(rules): add rule 22 (consumer-first + RED-first TDD); refresh status; influence library
  • aae1e7b docs(builder): RED-first 5-step ordering detail (rule 22(b))
  • a0d377e docs(comparison): add 09-byterover-cli folder + close decision points
  • fcd4bfb fix(cost): repair parentTaskId rollup pairing (Codex round 3 closure)
  • 11c08fd docs(comparison): add Codex final review (Codex round 3 closure)
  • f2a17bf merge(main): integrate 11 merged PRs + resolve CLAUDE.md/README/ROADMAP conflicts

Merge resolution

main advanced by 12 commits (11 merged PRs) while this branch was in flight. Conflicts resolved:

  • CLAUDE.md — combined HEAD's M16/byterover-cli status line with main's 3108 baseline + R0/R1/R2 closure language; kept main's rule 1 intervention-writer authority expansion (PR docs(mimir): land doc-only borrow subset (B4/B5 + C-MIMIR-1..4) #20 mimir), rule 16 persona-generation paragraph (PR docs(mimir): land doc-only borrow subset (B4/B5 + C-MIMIR-1..4) #20 mimir), and the rule 21 / universal-rules expansion (PR docs(borrow): agent-skills round-2 — schema-aware citation, trust classification, TDD-as-doubt #15 agent-skills round-2); appended this branch's rule 22 at the end. Decisions-live list keeps both bullets (this branch's comparison-folder bullet + main's 05-agent-skills/synthesis bullet).
  • docs/comparison/README.md — auto-merged additions from main's session rows (01-ace, 02-agenticSeek, 07-maestro); added a new 09 | byterover-cli row; removed byterover-cli from the Unaudited backlog list.
  • src/agents/defaults/builder.md — auto-merged. Rule 22(b) RED-first content and main's rule-9 enforcement layer both preserved.
  • tests/build-prompt-composer.test.ts — raised builder.md body upper-bound cap from 6000 to 7000 chars (stale assertion: rule 22(b) RED-first detail + rule-9 enforcement layer additions are intentional content, not regressions).

Test plan

  • bun test3198 pass, 2 skip (live xAI gated), 0 fail (3200 across 205 files, 17.84s)
  • No conflict markers remain in any tracked file
  • CLAUDE.md still imports all 22 rules (rule 22 last, rule numbering intact)
  • docs/comparison/README.md lists session 09 with closed verdict
  • src/agents/defaults/builder.md retains rule 22(b) reference at line 69
  • tests/state-events-parent-task-id.test.ts present and passing

omerakben added 7 commits May 10, 2026 14:28
…(09-byterover-cli B3)

Optional parentTaskId on agent_invoked/agent_completed correlates per-call
cost rows back to the parent orchestrator step (REVIEW panel run, REVIEW
debate fire path). Closes the M14/M15 telemetry gap described in
docs/comparison/09-byterover-cli/SYNTHESIS.md and CODEX_RESPONSE F3.

Schema-compatible: existing readers parse new events identically (M10/M12/M13
forward-compat precedent); validator enforces canonical T-NNN pattern when
present; FIFO-by-phase budget pairing untouched. Telemetry-only — no rule-21
parallel-provider surface, no new authority boundary.

- src/providers/types.ts: ProviderRequest.parentTaskId (optional)
- src/state/schemas.ts: agent_invoked + agent_completed variants extended
- src/state/events.ts: validator (T-NNN regex when present)
- src/providers/invoke.ts: wrapper writes through on both events
- src/providers/cost.ts: summarizeByParentTask() — separate report,
  not folded into summarizeBudgetUse to keep budget pairing simple
- src/tools/debate-request.ts: DebateRequestInput.parentTaskId, threaded
  onto opposing + synthesis turns
- src/phases/review.ts: two requestDebate sites set parentTaskId from opts.taskId
- src/phases/review-panel.ts: PanelistInvoker takes optional ctx arg;
  runReviewPanel passes { parentTaskId: opts.upstreamRefs.taskId }
- src/cli/production-seams.ts: productionPanelistInvoker reads ctx.parentTaskId
  and stamps it on the constituent ProviderRequest

Codex pre-design memo: docs/comparison/09-byterover-cli/CODEX_PREDESIGN_B3.md
(thread 019e1318, verdict revise-and-implement; corrections applied).

Tests: +20 passing (3108 → 3128).
…atus; influence library

Adopts 09-byterover-cli borrows B1 (Outside-In feature design) and B4
(strict TDD ordering for behavior changes) as a single consolidated
non-negotiable rule per Codex fix-first F5 (avoid rule-list bloat from
21 → 23). The detailed RED-first sequence lives in
src/agents/defaults/builder.md; rule 22 is the structural non-negotiable.

Bundles the agent-skills round 2 carry-over: refresh stale Status line
from v0.13.0-alpha.0 / 1983 tests / PE-1 to v0.17.0-alpha.0 / 3128 tests
(M16 closed; M13/M14/M15/PE-1 closed).

Adds byterover-cli row to the influence library (consumer-first design +
RED-first TDD; parentTaskId fan-out cost rollup).

Adds docs/comparison/ pointer to "Where decisions live."

See docs/comparison/09-byterover-cli/SYNTHESIS.md commit-2 plan.
Codex thread 019e12ec (fix-first → all 8 findings closed in synthesis).
Adds executable detail for the strict-TDD ordering non-negotiable just
landed in CLAUDE.md rule 22(b). Five steps: write the failing test
first, run it to confirm it fails for the right reason, write the
minimal implementation, run it to confirm green, refactor only if green
stays green. Bug-fix tasks must name the reproduction test in `## Notes`.

Bundles M8 mutation-gated discipline cross-reference so behavior changes
in mutation-tested code automatically inherit RED-first; rule 22(b)
covers the prompt-level intent for the rest.

Closes 09-byterover-cli SYNTHESIS commit-3 plan; closes the agent-skills
round-2 carry-over for builder validation language.
Self-contained comparison artifacts for the byterover-cli template
(memory-layer CLI vs SDLC-runtime category). Records:

- COMPARISON.md — 21-row feature matrix; 6 borrows priced in rule-20
  sub-surfaces; 10 explicit rejects with reasons.
- CODEX_BRIEFING.md — debate brief; locked answers (do not relitigate);
  recommended landing plan; 8 specific debate prompts.
- CODEX_RESPONSE.md — Codex fix-first verdict, thread 019e12ec; 8 findings
  (1 block-push: B2 invented `code-oz consult` surface; 2 block-next-
  milestone: B2 under-priced + B3 hotfix-not-followup; 2 fix-soon; 3 fyi).
- CODEX_PREDESIGN_B3.md — Codex pre-implementation design memo, thread
  019e1318; revise-and-implement; corrected file map for the patch.
- SYNTHESIS.md — closed; 7 decision points resolved by Claude under
  Ozzy's autonomy grant; 4-commit landing plan shipped on this branch.

Verdict: code-oz exceeds byterover-cli, scoped to SDLC discipline
mechanics. byterover ships more product-mature memory-layer engineering
(daemon, REPL, web UI, MCP, 21 providers, public benchmarks); code-oz
operates in a different category and structurally exceeds on 12
discipline authorities. Three borrows earn their place at v0.17:

  - B3 pre-M17 telemetry hotfix (Commit 1)
  - Rule 22 consolidating B1+B4 (Commit 2)
  - Builder RED-first detail referencing rule 22(b) (Commit 3)

Three deferred (B2 reframed against tool_use.repo_context as M17/M18
contender; B5 AsyncLocalStorage pattern; B6 ESLint boundary). One reject
reclassified (R10 defer-with-high-bar after SHIP).

This PR is self-contained and does not modify docs/comparison/README.md
to avoid colliding with parallel template-comparison sessions; the
README index entry will land via a separate sync commit on main.
…AP conflicts

CLAUDE.md: combined HEAD's M16/byterover-cli status line with main's 3108 baseline + M16 R0/R1/R2 closure language; kept rule 1 intervention-writer authority expansion, rule 16 persona-generation paragraph (from PR #20 mimir), rule 22 (this branch). Decisions-live list keeps both bullets.

docs/comparison/README.md: kept all existing 01/02/07 session rows; added 09 | byterover-cli row; removed byterover-cli from the Unaudited backlog.

tests/build-prompt-composer.test.ts: raised builder.md body upper-bound cap from 6000 to 7000 (rule 22(b) RED-first detail + rule-9 enforcement layer additions are intentional, not regressions).

3198 tests pass, 2 skip (live xAI gated), 0 fail.
Copilot AI review requested due to automatic review settings May 11, 2026 02:55
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 11, 2026

Warning

Rate limit exceeded

@omerakben has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 35 minutes and 10 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3c4db9c3-0213-4b5c-b096-c6a6dd172607

📥 Commits

Reviewing files that changed from the base of the PR and between bf5dc5a and aa168ca.

📒 Files selected for processing (22)
  • CLAUDE.md
  • docs/comparison/09-byterover-cli/CODEX_BRIEFING.md
  • docs/comparison/09-byterover-cli/CODEX_FINAL_REVIEW.md
  • docs/comparison/09-byterover-cli/CODEX_PREDESIGN_B3.md
  • docs/comparison/09-byterover-cli/CODEX_RESPONSE.md
  • docs/comparison/09-byterover-cli/COMPARISON.md
  • docs/comparison/09-byterover-cli/SYNTHESIS.md
  • docs/comparison/README.md
  • src/agents/defaults/builder.md
  • src/cli/production-seams.ts
  • src/phases/review-panel.ts
  • src/phases/review.ts
  • src/providers/cost.ts
  • src/providers/invoke.ts
  • src/providers/types.ts
  • src/state/events.ts
  • src/state/schemas.ts
  • src/tools/debate-request.ts
  • tests/build-prompt-composer.test.ts
  • tests/cost-by-parent-task.test.ts
  • tests/provider-invoke-parent-task-id.test.ts
  • tests/state-events-parent-task-id.test.ts
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/byterover-09-borrows

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR introduces parentTaskId threading for cost attribution across reviewer panels and debates, alongside new project rules for consumer-first design and RED-first TDD. Feedback recommends renaming a parameter in production-seams.ts to prevent confusion with the InvokeContext type.

opts: ProductionPanelistInvokerOptions,
): import('../phases/review-panel.ts').PanelistInvoker {
return async (cfg, round) => {
return async (cfg, round, invokeCtx) => {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The third parameter is named invokeCtx, which shadows the conceptual meaning of InvokeContext used in opts.invokeCtx. In this codebase, InvokeContext is a specific type containing the registry, config, and other runtime dependencies. Naming this parameter ctx or panelCtx would avoid confusion with the orchestrator's invocation context.

Suggested change
return async (cfg, round, invokeCtx) => {
return async (cfg, round, ctx) => {

Comment on lines +570 to +572
...(invokeCtx?.parentTaskId !== undefined
? { parentTaskId: invokeCtx.parentTaskId }
: {}),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Update the reference to the renamed parameter to maintain consistency and avoid shadowing confusion.

Suggested change
...(invokeCtx?.parentTaskId !== undefined
? { parentTaskId: invokeCtx.parentTaskId }
: {}),
...(ctx?.parentTaskId !== undefined
? { parentTaskId: ctx.parentTaskId }
: {}),

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a parentTaskId correlation field to provider-call events so fan-out operations (review panels + debates) can be cost-attributed back to a single orchestrator task, and lands the byterover-cli session-09 comparison closure + a new consolidated rule (consumer-first + RED-first TDD).

Changes:

  • Thread parentTaskId through ProviderRequestagent_invoked / agent_completed, validate T-\\d{3,} when present, and add summarizeByParentTask() rollup reporting.
  • Propagate parentTaskId through REVIEW debate fire-paths and REVIEW panel production seam wiring.
  • Add rule 22 + builder “RED-first” execution details; add session-09 comparison artifacts and index row; adjust builder prompt-length test cap.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/state-events-parent-task-id.test.ts Adds validator coverage for optional parentTaskId on agent events.
tests/provider-invoke-parent-task-id.test.ts Ensures invokeAgent records/omits parentTaskId correctly in events.jsonl.
tests/cost-by-parent-task.test.ts Adds regression/behavior tests for summarizeByParentTask FIFO pairing + rollup.
tests/build-prompt-composer.test.ts Raises builder persona body upper bound to accommodate new builder content.
src/tools/debate-request.ts Threads optional parentTaskId into opposing + synthesis ProviderRequests.
src/state/schemas.ts Extends agent_invoked / agent_completed event variants with optional parentTaskId.
src/state/events.ts Validates parentTaskId against TASK_ID_PATTERN when present.
src/providers/types.ts Adds ProviderRequest.parentTaskId as an optional correlation id.
src/providers/invoke.ts Writes parentTaskId into agent_invoked and mirrors it into agent_completed.
src/providers/cost.ts Introduces summarizeByParentTask() rollup report keyed by parentTaskId.
src/phases/review.ts Sets parentTaskId on REVIEW debate invocations (single + panel-debate branch).
src/phases/review-panel.ts Extends PanelistInvoker seam to accept optional ctx with parentTaskId; passes it through.
src/cli/production-seams.ts Stamps parentTaskId onto panelist ProviderRequests in production invoker when provided.
src/agents/defaults/builder.md Adds executable “RED-first” test ordering steps per rule 22(b).
docs/comparison/README.md Adds session 09 row and removes byterover-cli from unaudited backlog list.
docs/comparison/09-byterover-cli/SYNTHESIS.md Adds/records the session closure + decisions and shipped scope.
docs/comparison/09-byterover-cli/COMPARISON.md Adds byterover-cli comparison matrix and borrow/reject analysis.
docs/comparison/09-byterover-cli/CODEX_RESPONSE.md Adds Codex review output for the comparison.
docs/comparison/09-byterover-cli/CODEX_PREDESIGN_B3.md Adds Codex pre-design memo for the B3 hotfix.
docs/comparison/09-byterover-cli/CODEX_FINAL_REVIEW.md Adds Codex final review / closure notes for the landing batch.
docs/comparison/09-byterover-cli/CODEX_BRIEFING.md Adds the briefing used for Codex review.
CLAUDE.md Updates status line, adds rule 22, and adds byterover-cli to influence library.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/providers/invoke.ts
Comment on lines +285 to +288
// so reducer pairing keeps the parent correlation across the
// invoke/complete pair (FIFO-by-phase pairs by order, but
// explicit echo lets summarizeByParentTask join cleanly without
// assuming pairing semantics).
Comment thread docs/comparison/README.md Outdated
| 01 | ace | 2026-05-10 | YES, with selective borrows (M17-M20 Reviewer Memory sequence; see SYNTHESIS) | [01-ace/](01-ace/) |
| 02 | agenticSeek | 2026-05-10 | YES, structurally stronger on SDLC authority mechanics that overlap (not "ahead on every"); 4 borrow candidates ranked B3 (conditional on MCP-gap evidence) -> B1 (VERIFY-fail bad-plan telemetry, no plan-mutation authority) -> B4 (local-first OpenAI-compatible provider, demand-gated to PE-2) -> B2 (advisory DEFINE risk/effort hint, no `suggested_path`); substring denylist + memory-compression-as-canonical-state killed; local-first privacy upgraded from off-mission to demand-gated borrow; 3 rounds (Codex `accept-with-modifications` thread `019e12ac` -> 12 round-2 deltas, 10 distinct after merge -> round 3 both Opus and Codex independently report `converged` with 0 deltas, threads `019e131b` / `019e1323`); GPL-3.0 license noted | [02-agenticSeek/](02-agenticSeek/) |
| 07 | maestro | 2026-05-10 | YES, with selective borrows (B1 narrowed wave-verify; B2 heartbeat deferred as projection; B3 PLAN_DIFF blocked on SHIP contract; B4 separated from B5; B5 `outcome=abandoned` use-case-gated; B7 maestro bash loop rejected, `code-oz watch` deferred with contract draft); Codex `fix-first` thread `019e12ee` -- all 6 findings closed in synthesis (rule-21 misapplication corrected -> rule 20, RUN_OUTCOMES schema risk surfaced, SHIP-contract gap identified, Bun-native CI added to deferred set); maestro is the parent template -- three load-bearing rules already absorbed (rules 1/3/4) | [07-maestro/](07-maestro/) |
| 09 | byterover-cli | 2026-05-10 | YES, with selective borrows (B1+B4 consolidated into rule 22 — consumer-first + RED-first TDD; B3 `parentTaskId` fan-out cost rollup shipped on `feat/byterover-09-borrows`; B2 `code-oz consult` deferred to M17/M18 after Codex F1 caught the invented surface; B5/B6 pattern-only; R10 reclassified to defer-with-high-bar); 3 Codex rounds — `fix-first` thread `019e12ec` (8 findings) + pre-design thread `019e1318` + final review (round 3 closure); 3128 offline tests pass | [09-byterover-cli/](09-byterover-cli/) |
Comment thread CLAUDE.md Outdated
`code-oz` is a standalone Bun + TypeScript CLI that boots an adaptive multi-agent software-company simulation over a hybrid phase-graph + agentic sub-orchestration spine. Hard SDLC gates between phases (file-based, schema-validated). Cross-family adversarial review. Non-technical-user intent elicitation at the front. Multi-provider via `IAgentProvider` (Claude / Codex / Gemini SDKs reading CLI OAuth tokens).

Status: **v0.17.0-alpha.0 — M16 closed.** Production CLI completion (per-task cursor, dispatch infra, milestone-level e2e through the binary): `code-oz run`, `approve`, and `doctor` are wired end-to-end across DEFINE → REVIEW; full `resume` command remains M17. 3108 offline tests pass (+402 across M16); live xAI gated behind `CODE_OZ_LIVE_PROVIDER_TESTS=xai` + `CODE_OZ_LIVE_XAI_MODEL=<grok-variant>`. M16 R0/R1/R2 closed (8 production bugs caught by C12 e2e + 4 by Codex R1; per-commit cross-model peer review pattern validated for shared infra). Latest tag pushed: `v0.17.0-alpha.0` (2026-05-10).
Status: **v0.17.0-alpha.0 — M16 closed (production CLI completion).** Per-task-cursor CLI (init/run/dispatchBuild/dispatchVerify/dispatchReview/approve/resume/SHIP) shipped 2026-05-10 with exit-codes contract, prod-seam injection, phase-locks, and full audit-completeness recovery. 3128 offline tests pass (3108 baseline + 20 in 09-byterover-cli B3); live xAI integration test gated behind `CODE_OZ_LIVE_PROVIDER_TESTS=xai` + `CODE_OZ_LIVE_XAI_MODEL=<grok-variant>`. M16 R0/R1/R2 closed (8 production bugs caught by C12 e2e + 4 by Codex R1; per-commit cross-model peer review pattern validated for shared infra). PE-1 (xAI HTTP adapter, v0.13.0-alpha.0), M13 (role-cost policy under `budgets.global`, v0.14.0-alpha.0), M14 (reviewer panel v1, v0.15.0-alpha.0 — first simultaneous-provider surface), M15 (debate-policy scheduler v1, v0.16.0-alpha.0) all closed. PE-2 demand-gated; multi-cloud deferred to v0.2. Latest tag pushed: `v0.17.0-alpha.0` (2026-05-10).
@omerakben omerakben merged commit 1565b67 into main May 11, 2026
@omerakben omerakben deleted the feat/byterover-09-borrows branch May 30, 2026 03:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants