docs(comparisons/gptme): gptme template comparison + M17-M20 candidate slots (5 Codex rounds)#22
Conversation
… synthesis Lands the four comparison-record files for the gptme template review under docs/comparisons/gptme/: - COMPARISON.md: structural review of gptme v0.31.x (chat-loop + autocompact + checkpoint + AGENT_FILES + hooks + subagent + plugins) vs code-oz (phase-FSM + cross-family REVIEW + debate runtime + budgets.global + scientist tails + Rule 20/21 authority discipline); final decision matrix after Codex R1 fix-first. - CODEX_BRIEFING.md: planning-convergence briefing for Codex (gpt-5.5 xhigh, sandbox: read-only) — recommended verdict + locked answers + five challenge prompts. - CODEX_RESPONSE.md: Codex R1 verdict (thread 019e12ed-4038-7fe2-8800-5520e5f2048a). Verdict: fix-first. Narrowed B1 (compaction) + B3 (AGENT_FILES); demoted B2 (checkpoint) to defer; added new D3 (release-quality eval harness) the briefer missed. Net: 2 narrowed-borrow / 4 defer / 5 reject. - SYNTHESIS.md (NEW): single source-of-truth post-debate synthesis. Records the round-2 thread 019e1319-2169-7ab0-8ca7-036d6252fe60 ratifying Option A (RATIFY-ONLY — land comparison record + roadmap slot reservations only, no implementation). Final aligned decision matrix. Why borrows are not implemented in this PR (Rule 20 — one authority per milestone; src/phases/audit.ts does not exist; B1 needs a contract before extending the existing tokensEstimate at src/providers/manifest.ts:111 neighborhood). Forward-looking M17 (B3) / M18 (B1) / M19+ (B2) / M20+ (D3) candidate slots with measurement plans. Codex alignment statement quoted verbatim. No source or test code touched. No telemetry schema changes. Only the comparison record lands here; ROADMAP.md slot reservations land in a separate commit per CLAUDE.md cross-model peer review rule (each milestone gets its own authority boundary).
…ved borrows Adds a "Template-comparison-derived deferred milestones" subsection under the existing milestones list, between the M16+ deferred line and the Provider-expansion track. Reserves four slots, each as a single-authority milestone per CLAUDE.md rule 20 with a Rule-21 measurement plan: - M17 candidate — feat(intake): cross-tool AGENT_FILES discovery + AUDIT/DEFINE opt-in (gptme borrow B3-narrowed). Discovery only; per-file user opt-in; no parent/home walk. Telemetry events: agent_files_discovered, agent_files_accepted, agent_files_rejected, agent_instruction_conflicts. Hard precondition: not before src/phases/audit.ts exists. - M18 candidate — feat(provider): deterministic context-projection + compaction-opportunity probe (gptme borrow B1-narrowed). Telemetry- only; no LLM resume summarization; no view-branch swap; no automatic provider-context mutation. Extends ProviderContextMetrics (src/providers/manifest.ts:111 neighborhood) with context_projection_tokens, compaction_opportunity_savings_ratio, compaction_skipped_savings_ratio. Discipline rule "no phase artifact may exceed N tokens at gate write" lands first as a separate gate-preflight check. Rule-21 floor: > 0.10 ratio observable across runs. - M19+ candidate — feat(diagnostic): worktree topology refusal modes (gptme borrow B2-deferred). Diagnostic-only kind classification (clean_run_worktree | dirty_run_worktree | no_worktree | multi_root_repo). NO destructive restore (gptme's git reset --hard is incompatible with code-oz's user-change preservation). On audit-completeness failure: classify and refuse rather than reset. - M20+ candidate — feat(eval): release/run-quality eval harness (gptme borrow D3, Codex-flagged miss). Separate run-quality evaluation surface, not the unit/integration test suite. Inspired by gptme/docs/evals.rst (model leaderboards, CSV/JSON, Docker, SWE-bench). Trigger: a release-cadence quality regression slips through unit tests after v0.2 stabilizes. Trail: docs/comparisons/gptme/SYNTHESIS.md.
Three findings closed (1 block-push + 1 fix-soon + 1 nit, all from Codex review thread 019e1323-413a-7612-a767-12fca418fad7): 1. block-push: branch rebased onto current main (e18d127) so the diff no longer appears to delete docs/comparison/06-codex/* (those files landed on main from a parallel session while this branch was being prepared). 2. fix-soon: COMPARISON.md TL;DR + Borrow set heading normalized to the post-debate count "2 narrow-borrow candidates / 4 defer / 5 reject" and the per-item Borrow now write-ups marked SUPERSEDED with a pointer to SYNTHESIS.md as the canonical post-debate record. 3. nit: AI_SOFTWARE_COMPANY_THESIS.md citation in CODEX_RESPONSE.md B2 line widened from :196 to :196-197 to cover the builders-mutate-isolated-worktrees principle accurately.
Two follow-ups from convergence review (thread 019e132a): 1. fix-soon: R3-2 was not fully closed. Several lines still carried "3 deferred" or "Borrow now (narrowed)" or the inline self-correction note. Normalized to "2 narrow-borrow candidates / 4 defer / 5 reject" throughout COMPARISON.md final borrow set + outcome counts and CODEX_RESPONSE.md final-borrow line, with an explicit pointer back to SYNTHESIS.md as the canonical post-debate record. 2. nit: SYNTHESIS.md said "four files" but listed five. Corrected to "five files" and updated the closure language to reflect the actual review trail (R1 fix-first -> R2 scope lock -> R3 fix-first -> R4 convergence) instead of the stale "no round 3 required" wording. After this commit, R3-1, R3-2, R3-3 are all closed and no R4 finding remains open.
Two remaining mismatches from R5 strict re-scan (thread 019e132f): 1. fix-first: CODEX_RESPONSE.md:69 and :71 still labeled B1/B3 as "Borrow now (narrowed)" in the post-fix "Fixed" column of the classification-changes table. R2 scope-lock demoted both to "narrow-borrow candidate, deferred to own milestone" with M17/M18 slots in ROADMAP. Updated the table cells to match. 2. fix-first: SYNTHESIS.md:81 closure language said R4 "closed two follow-up nits and declared push-clean" but R4 was fix-first. Rewrote the trail to accurately list R1 fix-first -> R2 scope lock -> R3 fix-first -> R4 fix-first -> R5 fix-first -> post-R5 convergence re-check.
… reference in ROADMAP
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 314f3a3149
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| - **M17 — Reviewer Memory v1.** Kickoff source for the memory-hygiene rubric: `docs/contracts/REVIEWER_MEMORY.md`. | ||
| - **M16+ (deferred until measurable need):** Researcher phase-tail (when Lead-persona source verification overflows), parallel builder candidates (security-wedge trigger), multi-opponent debate (when single-opponent proves insufficient on real disagreement cases), Skills layer architecture (when M9/M10 produce duplication pain). | ||
| - **Template-comparison-derived deferred milestones (slots reserved 2026-05-10):** | ||
| - **M17 candidate — `feat(intake): cross-tool AGENT_FILES discovery + AUDIT/DEFINE opt-in (gptme borrow B3-narrowed).`** Authority boundary: cross-tool agent-instruction-file intake at AUDIT and DEFINE phase entry. Discovery only (file list per `gptme/prompts/__init__.py`: `AGENTS.md`, `CLAUDE.md`, `COPILOT.md`, `GEMINI.md`, `.cursorrules`, `.windsurfrules`, `.github/copilot-instructions.md`). NO parent/home walk. NO automatic prompt injection. Files become available to the user as a confirm UI in AUDIT/DEFINE intake; user accepts or rejects per file. Telemetry events: `agent_files_discovered`, `agent_files_accepted`, `agent_files_rejected`, `agent_instruction_conflicts`. Rule-21 measurement: `agent_files_accepted / agent_files_discovered` rate; intake-question-count delta vs baseline (no AGENT_FILES) on a brownfield corpus. Trail: `docs/comparisons/gptme/SYNTHESIS.md`. Trigger: lands when brownfield AUDIT runtime ships (W4) OR when greenfield DEFINE intake earns the authority. NOT before `src/phases/audit.ts` exists. |
There was a problem hiding this comment.
Renumber the new candidate milestone to avoid M17 collision
This adds a second M17 entry (M17 candidate) while M17 — Reviewer Memory v1 already exists just above, which makes milestone references ambiguous for planning, reviews, and any tooling or docs that key off milestone IDs. Use a non-conflicting label (for example M18 candidate or a TC-* namespace) so each roadmap slot remains uniquely addressable.
Useful? React with 👍 / 👎.
| - **M17 — Reviewer Memory v1.** Kickoff source for the memory-hygiene rubric: `docs/contracts/REVIEWER_MEMORY.md`. | ||
| - **M16+ (deferred until measurable need):** Researcher phase-tail (when Lead-persona source verification overflows), parallel builder candidates (security-wedge trigger), multi-opponent debate (when single-opponent proves insufficient on real disagreement cases), Skills layer architecture (when M9/M10 produce duplication pain). | ||
| - **Template-comparison-derived deferred milestones (slots reserved 2026-05-10):** | ||
| - **M17 candidate — `feat(intake): cross-tool AGENT_FILES discovery + AUDIT/DEFINE opt-in (gptme borrow B3-narrowed).`** Authority boundary: cross-tool agent-instruction-file intake at AUDIT and DEFINE phase entry. Discovery only (file list per `gptme/prompts/__init__.py`: `AGENTS.md`, `CLAUDE.md`, `COPILOT.md`, `GEMINI.md`, `.cursorrules`, `.windsurfrules`, `.github/copilot-instructions.md`). NO parent/home walk. NO automatic prompt injection. Files become available to the user as a confirm UI in AUDIT/DEFINE intake; user accepts or rejects per file. Telemetry events: `agent_files_discovered`, `agent_files_accepted`, `agent_files_rejected`, `agent_instruction_conflicts`. Rule-21 measurement: `agent_files_accepted / agent_files_discovered` rate; intake-question-count delta vs baseline (no AGENT_FILES) on a brownfield corpus. Trail: `docs/comparisons/gptme/SYNTHESIS.md`. Trigger: lands when brownfield AUDIT runtime ships (W4) OR when greenfield DEFINE intake earns the authority. NOT before `src/phases/audit.ts` exists. |
There was a problem hiding this comment.
Resolve contradictory trigger and precondition for AGENT_FILES slot
The trigger says this can land when either brownfield AUDIT ships or greenfield DEFINE earns authority, but the same line then requires NOT before src/phases/audit.ts exists, which blocks the DEFINE-only path you just listed. This contradiction makes the activation criteria non-actionable; split the conditions so the DEFINE path can proceed independently or remove the OR branch.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Code Review
This pull request documents a structural comparison between gptme and code-oz, including a detailed synthesis of the findings and a revised roadmap for future milestones. The reviewer identified several numbering conflicts where new candidate milestones (M17-M18) overlapped with existing entries in the roadmap. The feedback provides actionable corrections to renumber these slots to M18-M21 across multiple documentation files to ensure a unique and consistent sequence.
| ``` | ||
| User → chat() loop | ||
| ├── prompts/ (system prompt + AGENTS.md/CLAUDE.md/GEMINI.md ingestion) | ||
| ├── tools/ (shell, ipython, patch, browser, vision, computer, subagent, rag, gh, tmux, todo, …) |
There was a problem hiding this comment.
This summary is inconsistent with the final verdict reached after the Codex debate. The final count is 2 narrow-borrow candidates, 4 deferred, and 5 rejected, as correctly stated on line 16 and in the final verdict section.
| ├── tools/ (shell, ipython, patch, browser, vision, computer, subagent, rag, gh, tmux, todo, …) | |
| Borrow set: **2 narrow-borrow candidates, 4 deferred, 5 rejected.** |
| | Narrow borrow candidate, deferred to own milestone | **B1** | deterministic context-size + compaction-opportunity probe only; no LLM summarization, no view-branch swap | | ||
| | Narrow borrow candidate, deferred to own milestone | **B3** | AGENT_FILES discovery + explicit AUDIT/DEFINE opt-in; no home/parent walk | | ||
| | Defer | **B2** (renamed) | worktree topology/refusal diagnostics; revisit when audit-completeness recovery measurably fails | | ||
| | Defer | **D1** | generalized hook lifecycle | | ||
| | Defer | **D2** | subagent executor/planner/batch | | ||
| | Defer | **D3 (new)** | release-quality eval harness inspired by gptme evals | |
There was a problem hiding this comment.
Renumbering these slots to resolve the conflict with the existing M17 milestone in the roadmap.
| | Narrow borrow candidate, deferred to own milestone | **B1** | deterministic context-size + compaction-opportunity probe only; no LLM summarization, no view-branch swap | | |
| | Narrow borrow candidate, deferred to own milestone | **B3** | AGENT_FILES discovery + explicit AUDIT/DEFINE opt-in; no home/parent walk | | |
| | Defer | **B2** (renamed) | worktree topology/refusal diagnostics; revisit when audit-completeness recovery measurably fails | | |
| | Defer | **D1** | generalized hook lifecycle | | |
| | Defer | **D2** | subagent executor/planner/batch | | |
| | Defer | **D3 (new)** | release-quality eval harness inspired by gptme evals | | |
| | Narrow borrow candidate, deferred to own milestone | **B1** | deterministic context-size + compaction-opportunity probe only; no LLM summarization, no view-branch swap | M19 candidate | | |
| | Narrow borrow candidate, deferred to own milestone | **B3** | AGENT_FILES discovery + explicit AUDIT/DEFINE opt-in; no home/parent walk | M18 candidate | | |
| | Defer | **B2** (renamed) | worktree topology/refusal diagnostics; revisit when audit-completeness recovery measurably fails | M20+ candidate | | |
| | Defer | **D1** | generalized hook lifecycle | post-v0.2 | | |
| | Defer | **D2** | subagent executor/planner/batch | post-v0.2 | | |
| | Defer | **D3 (new)** | release-quality eval harness inspired by gptme evals | M21+ candidate | |
| | **B1 — Compaction-opportunity probe** | borrow-deferred-to-own-milestone | Deterministic context projection + compaction-opportunity probe is useful telemetry, but it requires a separate milestone authority because gptme's full engine performs LLM resume summarization and view-branch swaps that violate code-oz's "files in `ProviderRequest.files` are explicit, never silently mutated" discipline. | M18 candidate | Extend the existing `tokensEstimate` field on `ProviderContextMetrics` (`src/providers/manifest.ts:111`) with `context_projection_tokens`, `compaction_opportunity_savings_ratio`, `compaction_skipped_savings_ratio`. Rule-21 ship gate: observed `compaction_opportunity_savings_ratio` distribution > 0.10 across runs before any compaction-action authority is added. | | ||
| | **B3 — AGENT_FILES discovery + AUDIT/DEFINE opt-in** | borrow-deferred-to-own-milestone | Cross-tool agent-instruction-file discovery is worth doing, but it requires its own milestone authority because the AUDIT runtime does not yet exist (`src/phases/audit.ts` is absent) and the trust-boundary discipline (no parent/home walk, explicit per-file opt-in) is itself a contract surface. | M17 candidate | Telemetry events `agent_files_discovered`, `agent_files_accepted`, `agent_files_rejected`, `agent_instruction_conflicts`. Rule-21 ship gate: `agent_files_accepted / agent_files_discovered` rate observable; intake-question-count delta vs. baseline (no AGENT_FILES) on a brownfield corpus. | | ||
| | **B2 — Worktree topology refusal diagnostics** | defer | gptme's restore primitive (`git reset --hard` + optional `git clean -fd`) is incompatible with code-oz's per-run isolated worktrees and user-change preservation discipline; the worktree IS the checkpoint. Only the topology-classification idea has lift-value. | M19+ candidate | Rule-21 ship gate: count of resumes where audit-completeness recovery would have benefited from `kind`-classification refusal vs. count where current recovery is sufficient. | | ||
| | **D1 — Generalized hook lifecycle (16+ types)** | defer | Rule 20 — extension authority. gptme's hook surface is wider than the briefing claimed (transforms, confirmations, elicitation, cwd, cache-invalidation per `gptme/hooks/types.py:61,68,100,103`), and code-oz has exactly one production hook today (`review-scheduler-hook.ts` from M15). Revisit when ≥3 features want to subscribe to the same lifecycle event. | post-v0.2 | (n/a — defer) | | ||
| | **D2 — Subagent batch + planner pattern** | defer | Rule 21 — parallel-agent execution surface. gptme's subagent API includes executor/planner modes, parallel/sequential subtasks, subprocess mode, ACP mode, profiles, model routing, and optional isolated worktrees (`gptme/tools/subagent/api.py:32,80,95`); pinned to measurable need before adoption. | post-v0.2 | (n/a — defer) | | ||
| | **D3 (new) — Release/run-quality eval harness** | defer | Codex-flagged gap the briefer missed: gptme's `docs/evals.rst` has model leaderboards, CSV/JSON export, Docker guidance, and SWE-bench compatibility; code-oz's offline tests validate orchestration but not live run quality across model/release combos. | M20+ candidate | Rule-21 ship gate: a release-cadence quality regression slips through unit tests, motivating a separate run-quality evaluation surface. | |
There was a problem hiding this comment.
Renumbering these slots to resolve the conflict with the existing M17 milestone in the roadmap.
| | **B1 — Compaction-opportunity probe** | borrow-deferred-to-own-milestone | Deterministic context projection + compaction-opportunity probe is useful telemetry, but it requires a separate milestone authority because gptme's full engine performs LLM resume summarization and view-branch swaps that violate code-oz's "files in `ProviderRequest.files` are explicit, never silently mutated" discipline. | M18 candidate | Extend the existing `tokensEstimate` field on `ProviderContextMetrics` (`src/providers/manifest.ts:111`) with `context_projection_tokens`, `compaction_opportunity_savings_ratio`, `compaction_skipped_savings_ratio`. Rule-21 ship gate: observed `compaction_opportunity_savings_ratio` distribution > 0.10 across runs before any compaction-action authority is added. | | |
| | **B3 — AGENT_FILES discovery + AUDIT/DEFINE opt-in** | borrow-deferred-to-own-milestone | Cross-tool agent-instruction-file discovery is worth doing, but it requires its own milestone authority because the AUDIT runtime does not yet exist (`src/phases/audit.ts` is absent) and the trust-boundary discipline (no parent/home walk, explicit per-file opt-in) is itself a contract surface. | M17 candidate | Telemetry events `agent_files_discovered`, `agent_files_accepted`, `agent_files_rejected`, `agent_instruction_conflicts`. Rule-21 ship gate: `agent_files_accepted / agent_files_discovered` rate observable; intake-question-count delta vs. baseline (no AGENT_FILES) on a brownfield corpus. | | |
| | **B2 — Worktree topology refusal diagnostics** | defer | gptme's restore primitive (`git reset --hard` + optional `git clean -fd`) is incompatible with code-oz's per-run isolated worktrees and user-change preservation discipline; the worktree IS the checkpoint. Only the topology-classification idea has lift-value. | M19+ candidate | Rule-21 ship gate: count of resumes where audit-completeness recovery would have benefited from `kind`-classification refusal vs. count where current recovery is sufficient. | | |
| | **D1 — Generalized hook lifecycle (16+ types)** | defer | Rule 20 — extension authority. gptme's hook surface is wider than the briefing claimed (transforms, confirmations, elicitation, cwd, cache-invalidation per `gptme/hooks/types.py:61,68,100,103`), and code-oz has exactly one production hook today (`review-scheduler-hook.ts` from M15). Revisit when ≥3 features want to subscribe to the same lifecycle event. | post-v0.2 | (n/a — defer) | | |
| | **D2 — Subagent batch + planner pattern** | defer | Rule 21 — parallel-agent execution surface. gptme's subagent API includes executor/planner modes, parallel/sequential subtasks, subprocess mode, ACP mode, profiles, model routing, and optional isolated worktrees (`gptme/tools/subagent/api.py:32,80,95`); pinned to measurable need before adoption. | post-v0.2 | (n/a — defer) | | |
| | **D3 (new) — Release/run-quality eval harness** | defer | Codex-flagged gap the briefer missed: gptme's `docs/evals.rst` has model leaderboards, CSV/JSON export, Docker guidance, and SWE-bench compatibility; code-oz's offline tests validate orchestration but not live run quality across model/release combos. | M20+ candidate | Rule-21 ship gate: a release-cadence quality regression slips through unit tests, motivating a separate run-quality evaluation surface. | | |
| | **B1 — Compaction-opportunity probe** | borrow-deferred-to-own-milestone | Deterministic context projection + compaction-opportunity probe is useful telemetry, but it requires a separate milestone authority because gptme's full engine performs LLM resume summarization and view-branch swaps that violate code-oz's "files in `ProviderRequest.files` are explicit, never silently mutated" discipline. | M19 candidate | Extend the existing `tokensEstimate` field on `ProviderContextMetrics` (`src/providers/manifest.ts:111`) with `context_projection_tokens`, `compaction_opportunity_savings_ratio`, `compaction_skipped_savings_ratio`. Rule-21 ship gate: observed `compaction_opportunity_savings_ratio` distribution > 0.10 across runs before any compaction-action authority is added. | | |
| | **B3 — AGENT_FILES discovery + AUDIT/DEFINE opt-in** | borrow-deferred-to-own-milestone | Cross-tool agent-instruction-file discovery is worth doing, but it requires its own milestone authority because the AUDIT runtime does not yet exist (`src/phases/audit.ts` is absent) and the trust-boundary discipline (no parent/home walk, explicit per-file opt-in) is itself a contract surface. | M18 candidate | Telemetry events `agent_files_discovered`, `agent_files_accepted`, `agent_files_rejected`, `agent_instruction_conflicts`. Rule-21 ship gate: `agent_files_accepted / agent_files_discovered` rate observable; intake-question-count delta vs. baseline (no AGENT_FILES) on a brownfield corpus. | | |
| | **B2 — Worktree topology refusal diagnostics** | defer | gptme's restore primitive (`git reset --hard` + optional `git clean -fd`) is incompatible with code-oz's per-run isolated worktrees and user-change preservation discipline; the worktree IS the checkpoint. Only the topology-classification idea has lift-value. | M20+ candidate | Rule-21 ship gate: count of resumes where audit-completeness recovery would have benefited from `kind`-classification refusal vs. count where current recovery is sufficient. | | |
| | **D1 — Generalized hook lifecycle (16+ types)** | defer | Rule 20 — extension authority. gptme's hook surface is wider than the briefing claimed (transforms, confirmations, elicitation, cwd, cache-invalidation per `gptme/hooks/types.py:61,68,100,103`), and code-oz has exactly one production hook today (`review-scheduler-hook.ts` from M15). Revisit when ≥3 features want to subscribe to the same lifecycle event. | post-v0.2 | (n/a — defer) | | |
| | **D2 — Subagent batch + planner pattern** | defer | Rule 21 — parallel-agent execution surface. gptme's subagent API includes executor/planner modes, parallel/sequential subtasks, subprocess mode, ACP mode, profiles, model routing, and optional isolated worktrees (`gptme/tools/subagent/api.py:32,80,95`); pinned to measurable need before adoption. | post-v0.2 | (n/a — defer) | | |
| | **D3 (new) — Release/run-quality eval harness** | defer | Codex-flagged gap the briefer missed: gptme's `docs/evals.rst` has model leaderboards, CSV/JSON export, Docker guidance, and SWE-bench compatibility; code-oz's offline tests validate orchestration but not live run quality across model/release combos. | M21+ candidate | Rule-21 ship gate: a release-cadence quality regression slips through unit tests, motivating a separate run-quality evaluation surface. | |
| **M17 candidate — AGENT_FILES intake authority (B3-narrowed).** Lands the discovery list (`AGENTS.md`, `CLAUDE.md`, `COPILOT.md`, `GEMINI.md`, `.cursorrules`, `.windsurfrules`, `.github/copilot-instructions.md` per `gptme/prompts/__init__.py:23`) at AUDIT and DEFINE phase entry. Discovery only — no parent/home walk (gptme walks home → workspace per `gptme/prompts/workspace.py:121,215,233`; code-oz refuses), no automatic prompt injection. Files become a confirm UI that accepts or rejects per file. New telemetry events (`agent_files_discovered`, `agent_files_accepted`, `agent_files_rejected`, `agent_instruction_conflicts`) extend `PhaseEvent`. Trigger: lands when brownfield AUDIT runtime ships (W4) or when greenfield DEFINE intake earns the authority. Hard precondition: not before `src/phases/audit.ts` exists. | ||
|
|
||
| **M18 candidate — Compaction-opportunity probe authority (B1-narrowed).** Telemetry-only context projection that reports compaction opportunity without mutating provider invocations. No LLM resume summarization (gptme's `gptme/tools/autocompact/hook.py:164`), no view-branch swap (`gptme/tools/autocompact/hook.py:128`), no automatic provider-context mutation. The discipline rule "no phase artifact may exceed N tokens at gate write" lands first as a separate gate-preflight check (`src/phases/gate-preflight.ts` extension); the probe extends the existing `ProviderContextMetrics` (`src/providers/manifest.ts:111` neighborhood) with `context_projection_tokens`, `compaction_opportunity_savings_ratio`, `compaction_skipped_savings_ratio`. Trigger: M14 Reviewer-panel + M15 debate-scheduler accumulate large enough contexts to make the > 0.10 floor measurable. | ||
|
|
||
| **Parallel deferred slots — M19+ (B2) and M20+ (D3).** B2's worktree topology refusal diagnostics waits on actual operator-intervention evidence in the resume corpus. D3's release/run-quality eval harness waits on a release-cadence quality regression that slips through unit tests. Both are reserved with measurement triggers in `ROADMAP.md`; neither is committed. |
There was a problem hiding this comment.
Renumbering these slots to resolve the conflict with the existing M17 milestone in the roadmap.
| **M17 candidate — AGENT_FILES intake authority (B3-narrowed).** Lands the discovery list (`AGENTS.md`, `CLAUDE.md`, `COPILOT.md`, `GEMINI.md`, `.cursorrules`, `.windsurfrules`, `.github/copilot-instructions.md` per `gptme/prompts/__init__.py:23`) at AUDIT and DEFINE phase entry. Discovery only — no parent/home walk (gptme walks home → workspace per `gptme/prompts/workspace.py:121,215,233`; code-oz refuses), no automatic prompt injection. Files become a confirm UI that accepts or rejects per file. New telemetry events (`agent_files_discovered`, `agent_files_accepted`, `agent_files_rejected`, `agent_instruction_conflicts`) extend `PhaseEvent`. Trigger: lands when brownfield AUDIT runtime ships (W4) or when greenfield DEFINE intake earns the authority. Hard precondition: not before `src/phases/audit.ts` exists. | |
| **M18 candidate — Compaction-opportunity probe authority (B1-narrowed).** Telemetry-only context projection that reports compaction opportunity without mutating provider invocations. No LLM resume summarization (gptme's `gptme/tools/autocompact/hook.py:164`), no view-branch swap (`gptme/tools/autocompact/hook.py:128`), no automatic provider-context mutation. The discipline rule "no phase artifact may exceed N tokens at gate write" lands first as a separate gate-preflight check (`src/phases/gate-preflight.ts` extension); the probe extends the existing `ProviderContextMetrics` (`src/providers/manifest.ts:111` neighborhood) with `context_projection_tokens`, `compaction_opportunity_savings_ratio`, `compaction_skipped_savings_ratio`. Trigger: M14 Reviewer-panel + M15 debate-scheduler accumulate large enough contexts to make the > 0.10 floor measurable. | |
| **Parallel deferred slots — M19+ (B2) and M20+ (D3).** B2's worktree topology refusal diagnostics waits on actual operator-intervention evidence in the resume corpus. D3's release/run-quality eval harness waits on a release-cadence quality regression that slips through unit tests. Both are reserved with measurement triggers in `ROADMAP.md`; neither is committed. | |
| **M18 candidate — AGENT_FILES intake authority (B3-narrowed).** Lands the discovery list (`AGENTS.md`, `CLAUDE.md`, `COPILOT.md`, `GEMINI.md`, `.cursorrules`, `.windsurfrules`, `.github/copilot-instructions.md` per `gptme/prompts/__init__.py:23`) at AUDIT and DEFINE phase entry. Discovery only — no parent/home walk (gptme walks home → workspace per `gptme/prompts/workspace.py:121,215,233`; code-oz refuses), no automatic prompt injection. Files become a confirm UI that accepts or rejects per file. New telemetry events (`agent_files_discovered`, `agent_files_accepted`, `agent_files_rejected`, `agent_instruction_conflicts`) extend `PhaseEvent`. Trigger: lands when brownfield AUDIT runtime ships (W4) or when greenfield DEFINE intake earns the authority. Hard precondition: not before `src/phases/audit.ts` exists. | |
| **M19 candidate — Compaction-opportunity probe authority (B1-narrowed).** Telemetry-only context projection that reports compaction opportunity without mutating provider invocations. No LLM resume summarization (gptme's `gptme/tools/autocompact/hook.py:164`), no view-branch swap (`gptme/tools/autocompact/hook.py:128`), no automatic provider-context mutation. The discipline rule "no phase artifact may exceed N tokens at gate write" lands first as a separate gate-preflight check (`src/phases/gate-preflight.ts` extension); the probe extends the existing `ProviderContextMetrics` (`src/providers/manifest.ts:111` neighborhood) with `context_projection_tokens`, `compaction_opportunity_savings_ratio`, `compaction_skipped_savings_ratio`. Trigger: M14 Reviewer-panel + M15 debate-scheduler accumulate large enough contexts to make the > 0.10 floor measurable. | |
| **Parallel deferred slots — M20+ (B2) and M21+ (D3).** B2's worktree topology refusal diagnostics waits on actual operator-intervention evidence in the resume corpus. D3's release/run-quality eval harness waits on a release-cadence quality regression that slips through unit tests. Both are reserved with measurement triggers in `ROADMAP.md`; neither is committed. |
| - **M17 candidate — `feat(intake): cross-tool AGENT_FILES discovery + AUDIT/DEFINE opt-in (gptme borrow B3-narrowed).`** Authority boundary: cross-tool agent-instruction-file intake at AUDIT and DEFINE phase entry. Discovery only (file list per `gptme/prompts/__init__.py`: `AGENTS.md`, `CLAUDE.md`, `COPILOT.md`, `GEMINI.md`, `.cursorrules`, `.windsurfrules`, `.github/copilot-instructions.md`). NO parent/home walk. NO automatic prompt injection. Files become available to the user as a confirm UI in AUDIT/DEFINE intake; user accepts or rejects per file. Telemetry events: `agent_files_discovered`, `agent_files_accepted`, `agent_files_rejected`, `agent_instruction_conflicts`. Rule-21 measurement: `agent_files_accepted / agent_files_discovered` rate; intake-question-count delta vs baseline (no AGENT_FILES) on a brownfield corpus. Trail: `docs/comparisons/gptme/SYNTHESIS.md`. Trigger: lands when brownfield AUDIT runtime ships (W4) OR when greenfield DEFINE intake earns the authority. NOT before `src/phases/audit.ts` exists. | ||
| - **M18 candidate — `feat(provider): deterministic context-projection + compaction-opportunity probe (gptme borrow B1-narrowed).`** Authority boundary: telemetry-only context projection that reports compaction opportunity without mutating provider invocations. NO LLM resume summarization. NO view-branch swap. NO automatic provider-context mutation. The discipline rule "no phase artifact may exceed N tokens at gate write" lands FIRST as a separate gate-preflight check; the probe extends existing `tokensEstimate` (`src/providers/manifest.ts:109` neighborhood, computed at line 111) with `context_projection_tokens`, `compaction_opportunity_savings_ratio`, `compaction_skipped_savings_ratio`. Rule-21 measurement: observed `compaction_opportunity_savings_ratio` distribution across runs > 0.10 floor before any compaction action authority is added. Trail: `docs/comparisons/gptme/SYNTHESIS.md`. Trigger: M14 Reviewer-panel + M15 debate-scheduler runs accumulate large enough contexts to make the floor measurable. | ||
| - **M19+ candidate — `feat(diagnostic): worktree topology refusal modes (gptme borrow B2-deferred).`** Authority boundary: diagnostic-only kind-classification of worktree state (`clean_run_worktree | dirty_run_worktree | no_worktree | multi_root_repo`). NO destructive restore primitive (gptme's `git reset --hard` is incompatible with code-oz's user-change preservation). On audit-completeness failure, classify and refuse rather than reset. Trigger: lands when actual resumes show audit-completeness recovery cannot recover from a dirty run worktree without operator intervention. Rule-21 measurement: count of resumes where current recovery is destructive vs. count where classification would have made the next action obvious. | ||
| - **M20+ candidate — `feat(eval): release/run-quality eval harness (gptme borrow D3, Codex-flagged).`** Authority boundary: a separate run-quality evaluation suite, not the unit/integration test surface. Inspired by `gptme/docs/evals.rst` (model leaderboards, CSV/JSON export, Docker guidance, SWE-bench compatibility). Validates orchestration AND live run quality across model/release combos. Trigger: when v0.2 stabilizes and a release-cadence quality regression slips through unit tests. |
There was a problem hiding this comment.
There is a numbering conflict here. M17 is already assigned to "Reviewer Memory v1" on line 382. These new candidate slots should be renumbered (e.g., M18-M21) to maintain a unique sequence in the roadmap.
| - **M17 candidate — `feat(intake): cross-tool AGENT_FILES discovery + AUDIT/DEFINE opt-in (gptme borrow B3-narrowed).`** Authority boundary: cross-tool agent-instruction-file intake at AUDIT and DEFINE phase entry. Discovery only (file list per `gptme/prompts/__init__.py`: `AGENTS.md`, `CLAUDE.md`, `COPILOT.md`, `GEMINI.md`, `.cursorrules`, `.windsurfrules`, `.github/copilot-instructions.md`). NO parent/home walk. NO automatic prompt injection. Files become available to the user as a confirm UI in AUDIT/DEFINE intake; user accepts or rejects per file. Telemetry events: `agent_files_discovered`, `agent_files_accepted`, `agent_files_rejected`, `agent_instruction_conflicts`. Rule-21 measurement: `agent_files_accepted / agent_files_discovered` rate; intake-question-count delta vs baseline (no AGENT_FILES) on a brownfield corpus. Trail: `docs/comparisons/gptme/SYNTHESIS.md`. Trigger: lands when brownfield AUDIT runtime ships (W4) OR when greenfield DEFINE intake earns the authority. NOT before `src/phases/audit.ts` exists. | |
| - **M18 candidate — `feat(provider): deterministic context-projection + compaction-opportunity probe (gptme borrow B1-narrowed).`** Authority boundary: telemetry-only context projection that reports compaction opportunity without mutating provider invocations. NO LLM resume summarization. NO view-branch swap. NO automatic provider-context mutation. The discipline rule "no phase artifact may exceed N tokens at gate write" lands FIRST as a separate gate-preflight check; the probe extends existing `tokensEstimate` (`src/providers/manifest.ts:109` neighborhood, computed at line 111) with `context_projection_tokens`, `compaction_opportunity_savings_ratio`, `compaction_skipped_savings_ratio`. Rule-21 measurement: observed `compaction_opportunity_savings_ratio` distribution across runs > 0.10 floor before any compaction action authority is added. Trail: `docs/comparisons/gptme/SYNTHESIS.md`. Trigger: M14 Reviewer-panel + M15 debate-scheduler runs accumulate large enough contexts to make the floor measurable. | |
| - **M19+ candidate — `feat(diagnostic): worktree topology refusal modes (gptme borrow B2-deferred).`** Authority boundary: diagnostic-only kind-classification of worktree state (`clean_run_worktree | dirty_run_worktree | no_worktree | multi_root_repo`). NO destructive restore primitive (gptme's `git reset --hard` is incompatible with code-oz's user-change preservation). On audit-completeness failure, classify and refuse rather than reset. Trigger: lands when actual resumes show audit-completeness recovery cannot recover from a dirty run worktree without operator intervention. Rule-21 measurement: count of resumes where current recovery is destructive vs. count where classification would have made the next action obvious. | |
| - **M20+ candidate — `feat(eval): release/run-quality eval harness (gptme borrow D3, Codex-flagged).`** Authority boundary: a separate run-quality evaluation suite, not the unit/integration test surface. Inspired by `gptme/docs/evals.rst` (model leaderboards, CSV/JSON export, Docker guidance, SWE-bench compatibility). Validates orchestration AND live run quality across model/release combos. Trigger: when v0.2 stabilizes and a release-cadence quality regression slips through unit tests. | |
| - **M18 candidate — `feat(intake): cross-tool AGENT_FILES discovery + AUDIT/DEFINE opt-in (gptme borrow B3-narrowed).`** Authority boundary: cross-tool agent-instruction-file intake at AUDIT and DEFINE phase entry. Discovery only (file list per `gptme/prompts/__init__.py`: `AGENTS.md`, `CLAUDE.md`, `COPILOT.md`, `GEMINI.md`, `.cursorrules`, `.windsurfrules`, `.github/copilot-instructions.md`). NO parent/home walk. NO automatic prompt injection. Files become available to the user as a confirm UI in AUDIT/DEFINE intake; user accepts or rejects per file. Telemetry events: `agent_files_discovered`, `agent_files_accepted`, `agent_files_rejected`, `agent_instruction_conflicts`. Rule-21 measurement: `agent_files_accepted / agent_files_discovered` rate; intake-question-count delta vs baseline (no AGENT_FILES) on a brownfield corpus. Trail: `docs/comparisons/gptme/SYNTHESIS.md`. Trigger: lands when brownfield AUDIT runtime ships (W4) OR when greenfield DEFINE intake earns the authority. NOT before `src/phases/audit.ts` exists. | |
| - **M19 candidate — `feat(provider): deterministic context-projection + compaction-opportunity probe (gptme borrow B1-narrowed).`** Authority boundary: telemetry-only context projection that reports compaction opportunity without mutating provider invocations. NO LLM resume summarization. NO view-branch swap. NO automatic provider-context mutation. The discipline rule "no phase artifact may exceed N tokens at gate write" lands FIRST as a separate gate-preflight check; the probe extends existing `tokensEstimate` (`src/providers/manifest.ts:109` neighborhood, computed at line 111) with `context_projection_tokens`, `compaction_opportunity_savings_ratio`, `compaction_skipped_savings_ratio`. Rule-21 measurement: observed `compaction_opportunity_savings_ratio` distribution across runs > 0.10 floor before any compaction action authority is added. Trail: `docs/comparisons/gptme/SYNTHESIS.md`. Trigger: M14 Reviewer-panel + M15 debate-scheduler runs accumulate large enough contexts to make the floor measurable. | |
| - **M20+ candidate — `feat(diagnostic): worktree topology refusal modes (gptme borrow B2-deferred).`** Authority boundary: diagnostic-only kind-classification of worktree state (`clean_run_worktree | dirty_run_worktree | no_worktree | multi_root_repo`). NO destructive restore primitive (gptme's `git reset --hard` is incompatible with code-oz's user-change preservation). On audit-completeness failure, classify and refuse rather than reset. Trigger: lands when actual resumes show audit-completeness recovery cannot recover from a dirty run worktree without operator intervention. Rule-21 measurement: count of resumes where current recovery is destructive vs. count where classification would have made the next action obvious. | |
| - **M21+ candidate — `feat(eval): release/run-quality eval harness (gptme borrow D3, Codex-flagged).`** Authority boundary: a separate run-quality evaluation suite, not the unit/integration test surface. Inspired by `gptme/docs/evals.rst` (model leaderboards, CSV/JSON export, Docker guidance, SWE-bench compatibility). Validates orchestration AND live run quality across model/release combos. Trigger: when v0.2 stabilizes and a release-cadence quality regression slips through unit tests. |
There was a problem hiding this comment.
Pull request overview
Doc-only PR that records a gptme vs code-oz template comparison (briefing, Codex response, and synthesis) and reserves roadmap slots for potential future borrows (AGENT_FILES intake, context-projection probe, worktree-topology refusal diagnostics, eval harness).
Changes:
- Add gptme comparison docs: COMPARISON, Codex briefing/response, and post-debate synthesis.
- Update
docs/design/ROADMAP.mdto reserve M17–M20(+)-style candidate slots derived from the gptme comparison.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/design/ROADMAP.md | Adds reserved candidate milestone slots derived from the gptme comparison. |
| docs/comparisons/gptme/COMPARISON.md | New side-by-side comparison + borrow/defer/reject rationale. |
| docs/comparisons/gptme/CODEX_BRIEFING.md | New briefing prompt used for cross-model peer review. |
| docs/comparisons/gptme/CODEX_RESPONSE.md | New Codex round-1 response capturing fix-first findings and revised classification. |
| docs/comparisons/gptme/SYNTHESIS.md | New single-source-of-truth synthesis + rationale for “ratify-only” scope and roadmap slot reservation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| | **Knowledge injection** | Lessons (keyword/tool/pattern auto-load) + Anthropic skills | Universal anti-slop rules + per-persona prompts; no auto-load by keywords | | ||
| | **Plugins** | Python entry-points; packages tools+hooks+commands | None — agentpacks (skill bundles) only | | ||
| | **Subagents** | `subagent` tool: executor + planner + batch + completion-hooks | Phase agents; reviewer panel; debate participants — but not an in-chat subagent surface | | ||
| | **CLI agent files** | Loads AGENTS.md/CLAUDE.md/GEMINI.md/COPILOT.md/.cursorrules/.windsurfrules | Loads its own CLAUDE.md only | |
| - **Template-comparison-derived deferred milestones (slots reserved 2026-05-10):** | ||
| - **M17 candidate — `feat(intake): cross-tool AGENT_FILES discovery + AUDIT/DEFINE opt-in (gptme borrow B3-narrowed).`** Authority boundary: cross-tool agent-instruction-file intake at AUDIT and DEFINE phase entry. Discovery only (file list per `gptme/prompts/__init__.py`: `AGENTS.md`, `CLAUDE.md`, `COPILOT.md`, `GEMINI.md`, `.cursorrules`, `.windsurfrules`, `.github/copilot-instructions.md`). NO parent/home walk. NO automatic prompt injection. Files become available to the user as a confirm UI in AUDIT/DEFINE intake; user accepts or rejects per file. Telemetry events: `agent_files_discovered`, `agent_files_accepted`, `agent_files_rejected`, `agent_instruction_conflicts`. Rule-21 measurement: `agent_files_accepted / agent_files_discovered` rate; intake-question-count delta vs baseline (no AGENT_FILES) on a brownfield corpus. Trail: `docs/comparisons/gptme/SYNTHESIS.md`. Trigger: lands when brownfield AUDIT runtime ships (W4) OR when greenfield DEFINE intake earns the authority. NOT before `src/phases/audit.ts` exists. | ||
| - **M18 candidate — `feat(provider): deterministic context-projection + compaction-opportunity probe (gptme borrow B1-narrowed).`** Authority boundary: telemetry-only context projection that reports compaction opportunity without mutating provider invocations. NO LLM resume summarization. NO view-branch swap. NO automatic provider-context mutation. The discipline rule "no phase artifact may exceed N tokens at gate write" lands FIRST as a separate gate-preflight check; the probe extends existing `tokensEstimate` (`src/providers/manifest.ts:109` neighborhood, computed at line 111) with `context_projection_tokens`, `compaction_opportunity_savings_ratio`, `compaction_skipped_savings_ratio`. Rule-21 measurement: observed `compaction_opportunity_savings_ratio` distribution across runs > 0.10 floor before any compaction action authority is added. Trail: `docs/comparisons/gptme/SYNTHESIS.md`. Trigger: M14 Reviewer-panel + M15 debate-scheduler runs accumulate large enough contexts to make the floor measurable. | ||
| - **M19+ candidate — `feat(diagnostic): worktree topology refusal modes (gptme borrow B2-deferred).`** Authority boundary: diagnostic-only kind-classification of worktree state (`clean_run_worktree | dirty_run_worktree | no_worktree | multi_root_repo`). NO destructive restore primitive (gptme's `git reset --hard` is incompatible with code-oz's user-change preservation). On audit-completeness failure, classify and refuse rather than reset. Trigger: lands when actual resumes show audit-completeness recovery cannot recover from a dirty run worktree without operator intervention. Rule-21 measurement: count of resumes where current recovery is destructive vs. count where classification would have made the next action obvious. | ||
| - **M20+ candidate — `feat(eval): release/run-quality eval harness (gptme borrow D3, Codex-flagged).`** Authority boundary: a separate run-quality evaluation suite, not the unit/integration test surface. Inspired by `gptme/docs/evals.rst` (model leaderboards, CSV/JSON export, Docker guidance, SWE-bench compatibility). Validates orchestration AND live run quality across model/release combos. Trigger: when v0.2 stabilizes and a release-cadence quality regression slips through unit tests. |
| | Item | Status | Reason (one sentence) | Target slot | Measurement plan (if borrow) | | ||
| |---|---|---|---|---| | ||
| | **B1 — Compaction-opportunity probe** | borrow-deferred-to-own-milestone | Deterministic context projection + compaction-opportunity probe is useful telemetry, but it requires a separate milestone authority because gptme's full engine performs LLM resume summarization and view-branch swaps that violate code-oz's "files in `ProviderRequest.files` are explicit, never silently mutated" discipline. | M18 candidate | Extend the existing `tokensEstimate` field on `ProviderContextMetrics` (`src/providers/manifest.ts:111`) with `context_projection_tokens`, `compaction_opportunity_savings_ratio`, `compaction_skipped_savings_ratio`. Rule-21 ship gate: observed `compaction_opportunity_savings_ratio` distribution > 0.10 across runs before any compaction-action authority is added. | | ||
| | **B3 — AGENT_FILES discovery + AUDIT/DEFINE opt-in** | borrow-deferred-to-own-milestone | Cross-tool agent-instruction-file discovery is worth doing, but it requires its own milestone authority because the AUDIT runtime does not yet exist (`src/phases/audit.ts` is absent) and the trust-boundary discipline (no parent/home walk, explicit per-file opt-in) is itself a contract surface. | M17 candidate | Telemetry events `agent_files_discovered`, `agent_files_accepted`, `agent_files_rejected`, `agent_instruction_conflicts`. Rule-21 ship gate: `agent_files_accepted / agent_files_discovered` rate observable; intake-question-count delta vs. baseline (no AGENT_FILES) on a brownfield corpus. | |
| **M17 candidate — AGENT_FILES intake authority (B3-narrowed).** Lands the discovery list (`AGENTS.md`, `CLAUDE.md`, `COPILOT.md`, `GEMINI.md`, `.cursorrules`, `.windsurfrules`, `.github/copilot-instructions.md` per `gptme/prompts/__init__.py:23`) at AUDIT and DEFINE phase entry. Discovery only — no parent/home walk (gptme walks home → workspace per `gptme/prompts/workspace.py:121,215,233`; code-oz refuses), no automatic prompt injection. Files become a confirm UI that accepts or rejects per file. New telemetry events (`agent_files_discovered`, `agent_files_accepted`, `agent_files_rejected`, `agent_instruction_conflicts`) extend `PhaseEvent`. Trigger: lands when brownfield AUDIT runtime ships (W4) or when greenfield DEFINE intake earns the authority. Hard precondition: not before `src/phases/audit.ts` exists. | ||
|
|
||
| **M18 candidate — Compaction-opportunity probe authority (B1-narrowed).** Telemetry-only context projection that reports compaction opportunity without mutating provider invocations. No LLM resume summarization (gptme's `gptme/tools/autocompact/hook.py:164`), no view-branch swap (`gptme/tools/autocompact/hook.py:128`), no automatic provider-context mutation. The discipline rule "no phase artifact may exceed N tokens at gate write" lands first as a separate gate-preflight check (`src/phases/gate-preflight.ts` extension); the probe extends the existing `ProviderContextMetrics` (`src/providers/manifest.ts:111` neighborhood) with `context_projection_tokens`, `compaction_opportunity_savings_ratio`, `compaction_skipped_savings_ratio`. Trigger: M14 Reviewer-panel + M15 debate-scheduler accumulate large enough contexts to make the > 0.10 floor measurable. | ||
|
|
||
| **Parallel deferred slots — M19+ (B2) and M20+ (D3).** B2's worktree topology refusal diagnostics waits on actual operator-intervention evidence in the resume corpus. D3's release/run-quality eval harness waits on a release-cadence quality regression that slips through unit tests. Both are reserved with measurement triggers in `ROADMAP.md`; neither is committed. |
| 3. **Rule 21 measurement plans documented for future milestones.** Each borrow's measurement plan is recorded in the decision matrix above and in the `ROADMAP.md` slot reservation. Implementation cannot land without telemetry first. | ||
| 4. **No source or test code touched.** Only `docs/comparisons/gptme/*` and `docs/design/ROADMAP.md` are modified. No `src/**`, no `tests/**`, no schema changes. | ||
|
|
||
| When the convergence Codex round returns `push-clean`, the PR ships and the slots are reserved. The next milestone (M17 candidate or otherwise) opens its own briefing → debate → implementation → review cycle. |
| | B3: Borrow now — Cross-tool AGENT_FILES ingestion | **B3: Narrow borrow candidate, deferred to own milestone — AGENT_FILES discovery plus explicit AUDIT/DEFINE opt-in.** No parent/home walk. Cross-tool files are informational until the user accepts them. Measurement: `agent_files_discovered`, `agent_files_accepted`, `agent_files_rejected`, `agent_instruction_conflicts`, intake-question delta. Reserved as M17 candidate slot in `ROADMAP.md`. | | ||
| | (none) | **D3 (new): Release/run-quality eval harness inspired by gptme evals.** Defer unless it becomes the single milestone authority. | | ||
| | D1, D2, R1, R2, R3, R4, R5 | unchanged | | ||
|
|
||
| Borrow set after fix: **2 narrow-borrow candidates (each deferred to its own milestone), 4 deferred (B2 demoted + D1 + D2 + new D3 eval harness), 5 rejected.** | ||
|
|
||
| > **Note (post-R2 scope lock, thread `019e1319`):** Codex round 2 ratified Option A — RATIFY-ONLY. The two narrow-borrow candidates above (B1, B3) are reserved for their own future milestones (M17/M18 candidate slots in `docs/design/ROADMAP.md`); they are NOT implemented in this PR. The canonical post-debate settlement is `SYNTHESIS.md`. |
|
|
||
| Borrow set after fix: **2 narrow-borrow candidates (each deferred to its own milestone), 4 deferred (B2 demoted + D1 + D2 + new D3 eval harness), 5 rejected.** | ||
|
|
||
| > **Note (post-R2 scope lock, thread `019e1319`):** Codex round 2 ratified Option A — RATIFY-ONLY. The two narrow-borrow candidates above (B1, B3) are reserved for their own future milestones (M17/M18 candidate slots in `docs/design/ROADMAP.md`); they are NOT implemented in this PR. The canonical post-debate settlement is `SYNTHESIS.md`. |
Doc-only PR adding gptme template comparison (5 Codex rounds R1-R5 to convergence) and reserving M17-M20 candidate slots for gptme-derived borrows (AGENT_FILES intake, context-projection probe, worktree topology refusal, eval harness).
Merge-pre-staged: combined gptme's M17-M20 ROADMAP slots with main's RULE21_BENCHMARK reference.
Test plan
Part of the AFK release-readiness loop — bundled with 10 already-merged comparison PRs landing on main.