docs(comparisons/gptme): gptme template comparison + M17-M20 candidate slots (5 Codex rounds) by omerakben · Pull Request #22 · omerakben/code-oz

omerakben · 2026-05-11T02:48:38Z

Doc-only PR adding gptme template comparison (5 Codex rounds R1-R5 to convergence) and reserving M17-M20 candidate slots for gptme-derived borrows (AGENT_FILES intake, context-projection probe, worktree topology refusal, eval harness).

Merge-pre-staged: combined gptme's M17-M20 ROADMAP slots with main's RULE21_BENCHMARK reference.

Test plan

Doc-only changes; no runtime test impact
ROADMAP merge preserves both gptme candidate slots and the RULE21_BENCHMARK link

Part of the AFK release-readiness loop — bundled with 10 already-merged comparison PRs landing on main.

… synthesis Lands the four comparison-record files for the gptme template review under docs/comparisons/gptme/: - COMPARISON.md: structural review of gptme v0.31.x (chat-loop + autocompact + checkpoint + AGENT_FILES + hooks + subagent + plugins) vs code-oz (phase-FSM + cross-family REVIEW + debate runtime + budgets.global + scientist tails + Rule 20/21 authority discipline); final decision matrix after Codex R1 fix-first. - CODEX_BRIEFING.md: planning-convergence briefing for Codex (gpt-5.5 xhigh, sandbox: read-only) — recommended verdict + locked answers + five challenge prompts. - CODEX_RESPONSE.md: Codex R1 verdict (thread 019e12ed-4038-7fe2-8800-5520e5f2048a). Verdict: fix-first. Narrowed B1 (compaction) + B3 (AGENT_FILES); demoted B2 (checkpoint) to defer; added new D3 (release-quality eval harness) the briefer missed. Net: 2 narrowed-borrow / 4 defer / 5 reject. - SYNTHESIS.md (NEW): single source-of-truth post-debate synthesis. Records the round-2 thread 019e1319-2169-7ab0-8ca7-036d6252fe60 ratifying Option A (RATIFY-ONLY — land comparison record + roadmap slot reservations only, no implementation). Final aligned decision matrix. Why borrows are not implemented in this PR (Rule 20 — one authority per milestone; src/phases/audit.ts does not exist; B1 needs a contract before extending the existing tokensEstimate at src/providers/manifest.ts:111 neighborhood). Forward-looking M17 (B3) / M18 (B1) / M19+ (B2) / M20+ (D3) candidate slots with measurement plans. Codex alignment statement quoted verbatim. No source or test code touched. No telemetry schema changes. Only the comparison record lands here; ROADMAP.md slot reservations land in a separate commit per CLAUDE.md cross-model peer review rule (each milestone gets its own authority boundary).

…ved borrows Adds a "Template-comparison-derived deferred milestones" subsection under the existing milestones list, between the M16+ deferred line and the Provider-expansion track. Reserves four slots, each as a single-authority milestone per CLAUDE.md rule 20 with a Rule-21 measurement plan: - M17 candidate — feat(intake): cross-tool AGENT_FILES discovery + AUDIT/DEFINE opt-in (gptme borrow B3-narrowed). Discovery only; per-file user opt-in; no parent/home walk. Telemetry events: agent_files_discovered, agent_files_accepted, agent_files_rejected, agent_instruction_conflicts. Hard precondition: not before src/phases/audit.ts exists. - M18 candidate — feat(provider): deterministic context-projection + compaction-opportunity probe (gptme borrow B1-narrowed). Telemetry- only; no LLM resume summarization; no view-branch swap; no automatic provider-context mutation. Extends ProviderContextMetrics (src/providers/manifest.ts:111 neighborhood) with context_projection_tokens, compaction_opportunity_savings_ratio, compaction_skipped_savings_ratio. Discipline rule "no phase artifact may exceed N tokens at gate write" lands first as a separate gate-preflight check. Rule-21 floor: > 0.10 ratio observable across runs. - M19+ candidate — feat(diagnostic): worktree topology refusal modes (gptme borrow B2-deferred). Diagnostic-only kind classification (clean_run_worktree | dirty_run_worktree | no_worktree | multi_root_repo). NO destructive restore (gptme's git reset --hard is incompatible with code-oz's user-change preservation). On audit-completeness failure: classify and refuse rather than reset. - M20+ candidate — feat(eval): release/run-quality eval harness (gptme borrow D3, Codex-flagged miss). Separate run-quality evaluation surface, not the unit/integration test suite. Inspired by gptme/docs/evals.rst (model leaderboards, CSV/JSON, Docker, SWE-bench). Trigger: a release-cadence quality regression slips through unit tests after v0.2 stabilizes. Trail: docs/comparisons/gptme/SYNTHESIS.md.

Three findings closed (1 block-push + 1 fix-soon + 1 nit, all from Codex review thread 019e1323-413a-7612-a767-12fca418fad7): 1. block-push: branch rebased onto current main (e18d127) so the diff no longer appears to delete docs/comparison/06-codex/* (those files landed on main from a parallel session while this branch was being prepared). 2. fix-soon: COMPARISON.md TL;DR + Borrow set heading normalized to the post-debate count "2 narrow-borrow candidates / 4 defer / 5 reject" and the per-item Borrow now write-ups marked SUPERSEDED with a pointer to SYNTHESIS.md as the canonical post-debate record. 3. nit: AI_SOFTWARE_COMPANY_THESIS.md citation in CODEX_RESPONSE.md B2 line widened from :196 to :196-197 to cover the builders-mutate-isolated-worktrees principle accurately.

Two follow-ups from convergence review (thread 019e132a): 1. fix-soon: R3-2 was not fully closed. Several lines still carried "3 deferred" or "Borrow now (narrowed)" or the inline self-correction note. Normalized to "2 narrow-borrow candidates / 4 defer / 5 reject" throughout COMPARISON.md final borrow set + outcome counts and CODEX_RESPONSE.md final-borrow line, with an explicit pointer back to SYNTHESIS.md as the canonical post-debate record. 2. nit: SYNTHESIS.md said "four files" but listed five. Corrected to "five files" and updated the closure language to reflect the actual review trail (R1 fix-first -> R2 scope lock -> R3 fix-first -> R4 convergence) instead of the stale "no round 3 required" wording. After this commit, R3-1, R3-2, R3-3 are all closed and no R4 finding remains open.

Two remaining mismatches from R5 strict re-scan (thread 019e132f): 1. fix-first: CODEX_RESPONSE.md:69 and :71 still labeled B1/B3 as "Borrow now (narrowed)" in the post-fix "Fixed" column of the classification-changes table. R2 scope-lock demoted both to "narrow-borrow candidate, deferred to own milestone" with M17/M18 slots in ROADMAP. Updated the table cells to match. 2. fix-first: SYNTHESIS.md:81 closure language said R4 "closed two follow-up nits and declared push-clean" but R4 was fix-first. Rewrote the trail to accurately list R1 fix-first -> R2 scope lock -> R3 fix-first -> R4 fix-first -> R5 fix-first -> post-R5 convergence re-check.

… reference in ROADMAP

coderabbitai · 2026-05-11T02:48:44Z

Warning

Rate limit exceeded

@omerakben has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 55 minutes and 6 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 44249f25-3531-4e79-8e92-b9805f2f833a

📥 Commits

Reviewing files that changed from the base of the PR and between 6724165 and 314f3a3.

📒 Files selected for processing (5)

docs/comparisons/gptme/CODEX_BRIEFING.md
docs/comparisons/gptme/CODEX_RESPONSE.md
docs/comparisons/gptme/COMPARISON.md
docs/comparisons/gptme/SYNTHESIS.md
docs/design/ROADMAP.md

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/gptme-borrow-set-ratified

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 314f3a3149

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-11T02:50:41Z

  - **M17 — Reviewer Memory v1.** Kickoff source for the memory-hygiene rubric: `docs/contracts/REVIEWER_MEMORY.md`.
  - **M16+ (deferred until measurable need):** Researcher phase-tail (when Lead-persona source verification overflows), parallel builder candidates (security-wedge trigger), multi-opponent debate (when single-opponent proves insufficient on real disagreement cases), Skills layer architecture (when M9/M10 produce duplication pain).
+  - **Template-comparison-derived deferred milestones (slots reserved 2026-05-10):**
+    - **M17 candidate — `feat(intake): cross-tool AGENT_FILES discovery + AUDIT/DEFINE opt-in (gptme borrow B3-narrowed).`** Authority boundary: cross-tool agent-instruction-file intake at AUDIT and DEFINE phase entry. Discovery only (file list per `gptme/prompts/__init__.py`: `AGENTS.md`, `CLAUDE.md`, `COPILOT.md`, `GEMINI.md`, `.cursorrules`, `.windsurfrules`, `.github/copilot-instructions.md`). NO parent/home walk. NO automatic prompt injection. Files become available to the user as a confirm UI in AUDIT/DEFINE intake; user accepts or rejects per file. Telemetry events: `agent_files_discovered`, `agent_files_accepted`, `agent_files_rejected`, `agent_instruction_conflicts`. Rule-21 measurement: `agent_files_accepted / agent_files_discovered` rate; intake-question-count delta vs baseline (no AGENT_FILES) on a brownfield corpus. Trail: `docs/comparisons/gptme/SYNTHESIS.md`. Trigger: lands when brownfield AUDIT runtime ships (W4) OR when greenfield DEFINE intake earns the authority. NOT before `src/phases/audit.ts` exists.


Renumber the new candidate milestone to avoid M17 collision

This adds a second M17 entry (M17 candidate) while M17 — Reviewer Memory v1 already exists just above, which makes milestone references ambiguous for planning, reviews, and any tooling or docs that key off milestone IDs. Use a non-conflicting label (for example M18 candidate or a TC-* namespace) so each roadmap slot remains uniquely addressable.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-11T02:50:41Z

  - **M17 — Reviewer Memory v1.** Kickoff source for the memory-hygiene rubric: `docs/contracts/REVIEWER_MEMORY.md`.
  - **M16+ (deferred until measurable need):** Researcher phase-tail (when Lead-persona source verification overflows), parallel builder candidates (security-wedge trigger), multi-opponent debate (when single-opponent proves insufficient on real disagreement cases), Skills layer architecture (when M9/M10 produce duplication pain).
+  - **Template-comparison-derived deferred milestones (slots reserved 2026-05-10):**
+    - **M17 candidate — `feat(intake): cross-tool AGENT_FILES discovery + AUDIT/DEFINE opt-in (gptme borrow B3-narrowed).`** Authority boundary: cross-tool agent-instruction-file intake at AUDIT and DEFINE phase entry. Discovery only (file list per `gptme/prompts/__init__.py`: `AGENTS.md`, `CLAUDE.md`, `COPILOT.md`, `GEMINI.md`, `.cursorrules`, `.windsurfrules`, `.github/copilot-instructions.md`). NO parent/home walk. NO automatic prompt injection. Files become available to the user as a confirm UI in AUDIT/DEFINE intake; user accepts or rejects per file. Telemetry events: `agent_files_discovered`, `agent_files_accepted`, `agent_files_rejected`, `agent_instruction_conflicts`. Rule-21 measurement: `agent_files_accepted / agent_files_discovered` rate; intake-question-count delta vs baseline (no AGENT_FILES) on a brownfield corpus. Trail: `docs/comparisons/gptme/SYNTHESIS.md`. Trigger: lands when brownfield AUDIT runtime ships (W4) OR when greenfield DEFINE intake earns the authority. NOT before `src/phases/audit.ts` exists.


Resolve contradictory trigger and precondition for AGENT_FILES slot

The trigger says this can land when either brownfield AUDIT ships or greenfield DEFINE earns authority, but the same line then requires NOT before src/phases/audit.ts exists, which blocks the DEFINE-only path you just listed. This contradiction makes the activation criteria non-actionable; split the conditions so the DEFINE path can proceed independently or remove the OR branch.

Useful? React with 👍 / 👎.

gemini-code-assist

Code Review

This pull request documents a structural comparison between gptme and code-oz, including a detailed synthesis of the findings and a revised roadmap for future milestones. The reviewer identified several numbering conflicts where new candidate milestones (M17-M18) overlapped with existing entries in the roadmap. The feedback provides actionable corrections to renumber these slots to M18-M21 across multiple documentation files to ensure a unique and consistent sequence.

gemini-code-assist · 2026-05-11T02:51:00Z

+```
+User → chat() loop
+       ├── prompts/ (system prompt + AGENTS.md/CLAUDE.md/GEMINI.md ingestion)
+       ├── tools/ (shell, ipython, patch, browser, vision, computer, subagent, rag, gh, tmux, todo, …)


This summary is inconsistent with the final verdict reached after the Codex debate. The final count is 2 narrow-borrow candidates, 4 deferred, and 5 rejected, as correctly stated on line 16 and in the final verdict section.

Suggested change

├── tools/ (shell, ipython, patch, browser, vision, computer, subagent, rag, gh, tmux, todo, …)

Borrow set: **2 narrow-borrow candidates, 4 deferred, 5 rejected.**

gemini-code-assist · 2026-05-11T02:51:01Z

+| Narrow borrow candidate, deferred to own milestone | **B1** | deterministic context-size + compaction-opportunity probe only; no LLM summarization, no view-branch swap |
+| Narrow borrow candidate, deferred to own milestone | **B3** | AGENT_FILES discovery + explicit AUDIT/DEFINE opt-in; no home/parent walk |
+| Defer | **B2** (renamed) | worktree topology/refusal diagnostics; revisit when audit-completeness recovery measurably fails |
+| Defer | **D1** | generalized hook lifecycle |
+| Defer | **D2** | subagent executor/planner/batch |
+| Defer | **D3 (new)** | release-quality eval harness inspired by gptme evals |


gemini-code-assist · 2026-05-11T02:51:01Z

+| **B1 — Compaction-opportunity probe** | borrow-deferred-to-own-milestone | Deterministic context projection + compaction-opportunity probe is useful telemetry, but it requires a separate milestone authority because gptme's full engine performs LLM resume summarization and view-branch swaps that violate code-oz's "files in `ProviderRequest.files` are explicit, never silently mutated" discipline. | M18 candidate | Extend the existing `tokensEstimate` field on `ProviderContextMetrics` (`src/providers/manifest.ts:111`) with `context_projection_tokens`, `compaction_opportunity_savings_ratio`, `compaction_skipped_savings_ratio`. Rule-21 ship gate: observed `compaction_opportunity_savings_ratio` distribution > 0.10 across runs before any compaction-action authority is added. |
+| **B3 — AGENT_FILES discovery + AUDIT/DEFINE opt-in** | borrow-deferred-to-own-milestone | Cross-tool agent-instruction-file discovery is worth doing, but it requires its own milestone authority because the AUDIT runtime does not yet exist (`src/phases/audit.ts` is absent) and the trust-boundary discipline (no parent/home walk, explicit per-file opt-in) is itself a contract surface. | M17 candidate | Telemetry events `agent_files_discovered`, `agent_files_accepted`, `agent_files_rejected`, `agent_instruction_conflicts`. Rule-21 ship gate: `agent_files_accepted / agent_files_discovered` rate observable; intake-question-count delta vs. baseline (no AGENT_FILES) on a brownfield corpus. |
+| **B2 — Worktree topology refusal diagnostics** | defer | gptme's restore primitive (`git reset --hard` + optional `git clean -fd`) is incompatible with code-oz's per-run isolated worktrees and user-change preservation discipline; the worktree IS the checkpoint. Only the topology-classification idea has lift-value. | M19+ candidate | Rule-21 ship gate: count of resumes where audit-completeness recovery would have benefited from `kind`-classification refusal vs. count where current recovery is sufficient. |
+| **D1 — Generalized hook lifecycle (16+ types)** | defer | Rule 20 — extension authority. gptme's hook surface is wider than the briefing claimed (transforms, confirmations, elicitation, cwd, cache-invalidation per `gptme/hooks/types.py:61,68,100,103`), and code-oz has exactly one production hook today (`review-scheduler-hook.ts` from M15). Revisit when ≥3 features want to subscribe to the same lifecycle event. | post-v0.2 | (n/a — defer) |
+| **D2 — Subagent batch + planner pattern** | defer | Rule 21 — parallel-agent execution surface. gptme's subagent API includes executor/planner modes, parallel/sequential subtasks, subprocess mode, ACP mode, profiles, model routing, and optional isolated worktrees (`gptme/tools/subagent/api.py:32,80,95`); pinned to measurable need before adoption. | post-v0.2 | (n/a — defer) |
+| **D3 (new) — Release/run-quality eval harness** | defer | Codex-flagged gap the briefer missed: gptme's `docs/evals.rst` has model leaderboards, CSV/JSON export, Docker guidance, and SWE-bench compatibility; code-oz's offline tests validate orchestration but not live run quality across model/release combos. | M20+ candidate | Rule-21 ship gate: a release-cadence quality regression slips through unit tests, motivating a separate run-quality evaluation surface. |


Renumbering these slots to resolve the conflict with the existing M17 milestone in the roadmap.

Suggested change

| **B1 — Compaction-opportunity probe** | borrow-deferred-to-own-milestone | Deterministic context projection + compaction-opportunity probe is useful telemetry, but it requires a separate milestone authority because gptme's full engine performs LLM resume summarization and view-branch swaps that violate code-oz's "files in `ProviderRequest.files` are explicit, never silently mutated" discipline. | M18 candidate | Extend the existing `tokensEstimate` field on `ProviderContextMetrics` (`src/providers/manifest.ts:111`) with `context_projection_tokens`, `compaction_opportunity_savings_ratio`, `compaction_skipped_savings_ratio`. Rule-21 ship gate: observed `compaction_opportunity_savings_ratio` distribution > 0.10 across runs before any compaction-action authority is added. |

| **B3 — AGENT_FILES discovery + AUDIT/DEFINE opt-in** | borrow-deferred-to-own-milestone | Cross-tool agent-instruction-file discovery is worth doing, but it requires its own milestone authority because the AUDIT runtime does not yet exist (`src/phases/audit.ts` is absent) and the trust-boundary discipline (no parent/home walk, explicit per-file opt-in) is itself a contract surface. | M17 candidate | Telemetry events `agent_files_discovered`, `agent_files_accepted`, `agent_files_rejected`, `agent_instruction_conflicts`. Rule-21 ship gate: `agent_files_accepted / agent_files_discovered` rate observable; intake-question-count delta vs. baseline (no AGENT_FILES) on a brownfield corpus. |

| **B2 — Worktree topology refusal diagnostics** | defer | gptme's restore primitive (`git reset --hard` + optional `git clean -fd`) is incompatible with code-oz's per-run isolated worktrees and user-change preservation discipline; the worktree IS the checkpoint. Only the topology-classification idea has lift-value. | M19+ candidate | Rule-21 ship gate: count of resumes where audit-completeness recovery would have benefited from `kind`-classification refusal vs. count where current recovery is sufficient. |

| **D1 — Generalized hook lifecycle (16+ types)** | defer | Rule 20 — extension authority. gptme's hook surface is wider than the briefing claimed (transforms, confirmations, elicitation, cwd, cache-invalidation per `gptme/hooks/types.py:61,68,100,103`), and code-oz has exactly one production hook today (`review-scheduler-hook.ts` from M15). Revisit when ≥3 features want to subscribe to the same lifecycle event. | post-v0.2 | (n/a — defer) |

| **D2 — Subagent batch + planner pattern** | defer | Rule 21 — parallel-agent execution surface. gptme's subagent API includes executor/planner modes, parallel/sequential subtasks, subprocess mode, ACP mode, profiles, model routing, and optional isolated worktrees (`gptme/tools/subagent/api.py:32,80,95`); pinned to measurable need before adoption. | post-v0.2 | (n/a — defer) |

| **D3 (new) — Release/run-quality eval harness** | defer | Codex-flagged gap the briefer missed: gptme's `docs/evals.rst` has model leaderboards, CSV/JSON export, Docker guidance, and SWE-bench compatibility; code-oz's offline tests validate orchestration but not live run quality across model/release combos. | M20+ candidate | Rule-21 ship gate: a release-cadence quality regression slips through unit tests, motivating a separate run-quality evaluation surface. |

| **B1 — Compaction-opportunity probe** | borrow-deferred-to-own-milestone | Deterministic context projection + compaction-opportunity probe is useful telemetry, but it requires a separate milestone authority because gptme's full engine performs LLM resume summarization and view-branch swaps that violate code-oz's "files in `ProviderRequest.files` are explicit, never silently mutated" discipline. | M19 candidate | Extend the existing `tokensEstimate` field on `ProviderContextMetrics` (`src/providers/manifest.ts:111`) with `context_projection_tokens`, `compaction_opportunity_savings_ratio`, `compaction_skipped_savings_ratio`. Rule-21 ship gate: observed `compaction_opportunity_savings_ratio` distribution > 0.10 across runs before any compaction-action authority is added. |

| **B3 — AGENT_FILES discovery + AUDIT/DEFINE opt-in** | borrow-deferred-to-own-milestone | Cross-tool agent-instruction-file discovery is worth doing, but it requires its own milestone authority because the AUDIT runtime does not yet exist (`src/phases/audit.ts` is absent) and the trust-boundary discipline (no parent/home walk, explicit per-file opt-in) is itself a contract surface. | M18 candidate | Telemetry events `agent_files_discovered`, `agent_files_accepted`, `agent_files_rejected`, `agent_instruction_conflicts`. Rule-21 ship gate: `agent_files_accepted / agent_files_discovered` rate observable; intake-question-count delta vs. baseline (no AGENT_FILES) on a brownfield corpus. |

| **B2 — Worktree topology refusal diagnostics** | defer | gptme's restore primitive (`git reset --hard` + optional `git clean -fd`) is incompatible with code-oz's per-run isolated worktrees and user-change preservation discipline; the worktree IS the checkpoint. Only the topology-classification idea has lift-value. | M20+ candidate | Rule-21 ship gate: count of resumes where audit-completeness recovery would have benefited from `kind`-classification refusal vs. count where current recovery is sufficient. |

| **D1 — Generalized hook lifecycle (16+ types)** | defer | Rule 20 — extension authority. gptme's hook surface is wider than the briefing claimed (transforms, confirmations, elicitation, cwd, cache-invalidation per `gptme/hooks/types.py:61,68,100,103`), and code-oz has exactly one production hook today (`review-scheduler-hook.ts` from M15). Revisit when ≥3 features want to subscribe to the same lifecycle event. | post-v0.2 | (n/a — defer) |

| **D2 — Subagent batch + planner pattern** | defer | Rule 21 — parallel-agent execution surface. gptme's subagent API includes executor/planner modes, parallel/sequential subtasks, subprocess mode, ACP mode, profiles, model routing, and optional isolated worktrees (`gptme/tools/subagent/api.py:32,80,95`); pinned to measurable need before adoption. | post-v0.2 | (n/a — defer) |

| **D3 (new) — Release/run-quality eval harness** | defer | Codex-flagged gap the briefer missed: gptme's `docs/evals.rst` has model leaderboards, CSV/JSON export, Docker guidance, and SWE-bench compatibility; code-oz's offline tests validate orchestration but not live run quality across model/release combos. | M21+ candidate | Rule-21 ship gate: a release-cadence quality regression slips through unit tests, motivating a separate run-quality evaluation surface. |

gemini-code-assist · 2026-05-11T02:51:01Z

+**M17 candidate — AGENT_FILES intake authority (B3-narrowed).** Lands the discovery list (`AGENTS.md`, `CLAUDE.md`, `COPILOT.md`, `GEMINI.md`, `.cursorrules`, `.windsurfrules`, `.github/copilot-instructions.md` per `gptme/prompts/__init__.py:23`) at AUDIT and DEFINE phase entry. Discovery only — no parent/home walk (gptme walks home → workspace per `gptme/prompts/workspace.py:121,215,233`; code-oz refuses), no automatic prompt injection. Files become a confirm UI that accepts or rejects per file. New telemetry events (`agent_files_discovered`, `agent_files_accepted`, `agent_files_rejected`, `agent_instruction_conflicts`) extend `PhaseEvent`. Trigger: lands when brownfield AUDIT runtime ships (W4) or when greenfield DEFINE intake earns the authority. Hard precondition: not before `src/phases/audit.ts` exists.
+
+**M18 candidate — Compaction-opportunity probe authority (B1-narrowed).** Telemetry-only context projection that reports compaction opportunity without mutating provider invocations. No LLM resume summarization (gptme's `gptme/tools/autocompact/hook.py:164`), no view-branch swap (`gptme/tools/autocompact/hook.py:128`), no automatic provider-context mutation. The discipline rule "no phase artifact may exceed N tokens at gate write" lands first as a separate gate-preflight check (`src/phases/gate-preflight.ts` extension); the probe extends the existing `ProviderContextMetrics` (`src/providers/manifest.ts:111` neighborhood) with `context_projection_tokens`, `compaction_opportunity_savings_ratio`, `compaction_skipped_savings_ratio`. Trigger: M14 Reviewer-panel + M15 debate-scheduler accumulate large enough contexts to make the > 0.10 floor measurable.
+
+**Parallel deferred slots — M19+ (B2) and M20+ (D3).** B2's worktree topology refusal diagnostics waits on actual operator-intervention evidence in the resume corpus. D3's release/run-quality eval harness waits on a release-cadence quality regression that slips through unit tests. Both are reserved with measurement triggers in `ROADMAP.md`; neither is committed.


Renumbering these slots to resolve the conflict with the existing M17 milestone in the roadmap.

Suggested change

**M17 candidate — AGENT_FILES intake authority (B3-narrowed).** Lands the discovery list (`AGENTS.md`, `CLAUDE.md`, `COPILOT.md`, `GEMINI.md`, `.cursorrules`, `.windsurfrules`, `.github/copilot-instructions.md` per `gptme/prompts/__init__.py:23`) at AUDIT and DEFINE phase entry. Discovery only — no parent/home walk (gptme walks home → workspace per `gptme/prompts/workspace.py:121,215,233`; code-oz refuses), no automatic prompt injection. Files become a confirm UI that accepts or rejects per file. New telemetry events (`agent_files_discovered`, `agent_files_accepted`, `agent_files_rejected`, `agent_instruction_conflicts`) extend `PhaseEvent`. Trigger: lands when brownfield AUDIT runtime ships (W4) or when greenfield DEFINE intake earns the authority. Hard precondition: not before `src/phases/audit.ts` exists.

**M18 candidate — Compaction-opportunity probe authority (B1-narrowed).** Telemetry-only context projection that reports compaction opportunity without mutating provider invocations. No LLM resume summarization (gptme's `gptme/tools/autocompact/hook.py:164`), no view-branch swap (`gptme/tools/autocompact/hook.py:128`), no automatic provider-context mutation. The discipline rule "no phase artifact may exceed N tokens at gate write" lands first as a separate gate-preflight check (`src/phases/gate-preflight.ts` extension); the probe extends the existing `ProviderContextMetrics` (`src/providers/manifest.ts:111` neighborhood) with `context_projection_tokens`, `compaction_opportunity_savings_ratio`, `compaction_skipped_savings_ratio`. Trigger: M14 Reviewer-panel + M15 debate-scheduler accumulate large enough contexts to make the > 0.10 floor measurable.

**Parallel deferred slots — M19+ (B2) and M20+ (D3).** B2's worktree topology refusal diagnostics waits on actual operator-intervention evidence in the resume corpus. D3's release/run-quality eval harness waits on a release-cadence quality regression that slips through unit tests. Both are reserved with measurement triggers in `ROADMAP.md`; neither is committed.

**M18 candidate — AGENT_FILES intake authority (B3-narrowed).** Lands the discovery list (`AGENTS.md`, `CLAUDE.md`, `COPILOT.md`, `GEMINI.md`, `.cursorrules`, `.windsurfrules`, `.github/copilot-instructions.md` per `gptme/prompts/__init__.py:23`) at AUDIT and DEFINE phase entry. Discovery only — no parent/home walk (gptme walks home → workspace per `gptme/prompts/workspace.py:121,215,233`; code-oz refuses), no automatic prompt injection. Files become a confirm UI that accepts or rejects per file. New telemetry events (`agent_files_discovered`, `agent_files_accepted`, `agent_files_rejected`, `agent_instruction_conflicts`) extend `PhaseEvent`. Trigger: lands when brownfield AUDIT runtime ships (W4) or when greenfield DEFINE intake earns the authority. Hard precondition: not before `src/phases/audit.ts` exists.

**M19 candidate — Compaction-opportunity probe authority (B1-narrowed).** Telemetry-only context projection that reports compaction opportunity without mutating provider invocations. No LLM resume summarization (gptme's `gptme/tools/autocompact/hook.py:164`), no view-branch swap (`gptme/tools/autocompact/hook.py:128`), no automatic provider-context mutation. The discipline rule "no phase artifact may exceed N tokens at gate write" lands first as a separate gate-preflight check (`src/phases/gate-preflight.ts` extension); the probe extends the existing `ProviderContextMetrics` (`src/providers/manifest.ts:111` neighborhood) with `context_projection_tokens`, `compaction_opportunity_savings_ratio`, `compaction_skipped_savings_ratio`. Trigger: M14 Reviewer-panel + M15 debate-scheduler accumulate large enough contexts to make the > 0.10 floor measurable.

**Parallel deferred slots — M20+ (B2) and M21+ (D3).** B2's worktree topology refusal diagnostics waits on actual operator-intervention evidence in the resume corpus. D3's release/run-quality eval harness waits on a release-cadence quality regression that slips through unit tests. Both are reserved with measurement triggers in `ROADMAP.md`; neither is committed.

gemini-code-assist · 2026-05-11T02:51:01Z

+    - **M17 candidate — `feat(intake): cross-tool AGENT_FILES discovery + AUDIT/DEFINE opt-in (gptme borrow B3-narrowed).`** Authority boundary: cross-tool agent-instruction-file intake at AUDIT and DEFINE phase entry. Discovery only (file list per `gptme/prompts/__init__.py`: `AGENTS.md`, `CLAUDE.md`, `COPILOT.md`, `GEMINI.md`, `.cursorrules`, `.windsurfrules`, `.github/copilot-instructions.md`). NO parent/home walk. NO automatic prompt injection. Files become available to the user as a confirm UI in AUDIT/DEFINE intake; user accepts or rejects per file. Telemetry events: `agent_files_discovered`, `agent_files_accepted`, `agent_files_rejected`, `agent_instruction_conflicts`. Rule-21 measurement: `agent_files_accepted / agent_files_discovered` rate; intake-question-count delta vs baseline (no AGENT_FILES) on a brownfield corpus. Trail: `docs/comparisons/gptme/SYNTHESIS.md`. Trigger: lands when brownfield AUDIT runtime ships (W4) OR when greenfield DEFINE intake earns the authority. NOT before `src/phases/audit.ts` exists.
+    - **M18 candidate — `feat(provider): deterministic context-projection + compaction-opportunity probe (gptme borrow B1-narrowed).`** Authority boundary: telemetry-only context projection that reports compaction opportunity without mutating provider invocations. NO LLM resume summarization. NO view-branch swap. NO automatic provider-context mutation. The discipline rule "no phase artifact may exceed N tokens at gate write" lands FIRST as a separate gate-preflight check; the probe extends existing `tokensEstimate` (`src/providers/manifest.ts:109` neighborhood, computed at line 111) with `context_projection_tokens`, `compaction_opportunity_savings_ratio`, `compaction_skipped_savings_ratio`. Rule-21 measurement: observed `compaction_opportunity_savings_ratio` distribution across runs > 0.10 floor before any compaction action authority is added. Trail: `docs/comparisons/gptme/SYNTHESIS.md`. Trigger: M14 Reviewer-panel + M15 debate-scheduler runs accumulate large enough contexts to make the floor measurable.
+    - **M19+ candidate — `feat(diagnostic): worktree topology refusal modes (gptme borrow B2-deferred).`** Authority boundary: diagnostic-only kind-classification of worktree state (`clean_run_worktree | dirty_run_worktree | no_worktree | multi_root_repo`). NO destructive restore primitive (gptme's `git reset --hard` is incompatible with code-oz's user-change preservation). On audit-completeness failure, classify and refuse rather than reset. Trigger: lands when actual resumes show audit-completeness recovery cannot recover from a dirty run worktree without operator intervention. Rule-21 measurement: count of resumes where current recovery is destructive vs. count where classification would have made the next action obvious.
+    - **M20+ candidate — `feat(eval): release/run-quality eval harness (gptme borrow D3, Codex-flagged).`** Authority boundary: a separate run-quality evaluation suite, not the unit/integration test surface. Inspired by `gptme/docs/evals.rst` (model leaderboards, CSV/JSON export, Docker guidance, SWE-bench compatibility). Validates orchestration AND live run quality across model/release combos. Trigger: when v0.2 stabilizes and a release-cadence quality regression slips through unit tests.


There is a numbering conflict here. M17 is already assigned to "Reviewer Memory v1" on line 382. These new candidate slots should be renumbered (e.g., M18-M21) to maintain a unique sequence in the roadmap.

Suggested change

- **M17 candidate — `feat(intake): cross-tool AGENT_FILES discovery + AUDIT/DEFINE opt-in (gptme borrow B3-narrowed).`** Authority boundary: cross-tool agent-instruction-file intake at AUDIT and DEFINE phase entry. Discovery only (file list per `gptme/prompts/__init__.py`: `AGENTS.md`, `CLAUDE.md`, `COPILOT.md`, `GEMINI.md`, `.cursorrules`, `.windsurfrules`, `.github/copilot-instructions.md`). NO parent/home walk. NO automatic prompt injection. Files become available to the user as a confirm UI in AUDIT/DEFINE intake; user accepts or rejects per file. Telemetry events: `agent_files_discovered`, `agent_files_accepted`, `agent_files_rejected`, `agent_instruction_conflicts`. Rule-21 measurement: `agent_files_accepted / agent_files_discovered` rate; intake-question-count delta vs baseline (no AGENT_FILES) on a brownfield corpus. Trail: `docs/comparisons/gptme/SYNTHESIS.md`. Trigger: lands when brownfield AUDIT runtime ships (W4) OR when greenfield DEFINE intake earns the authority. NOT before `src/phases/audit.ts` exists.

- **M18 candidate — `feat(provider): deterministic context-projection + compaction-opportunity probe (gptme borrow B1-narrowed).`** Authority boundary: telemetry-only context projection that reports compaction opportunity without mutating provider invocations. NO LLM resume summarization. NO view-branch swap. NO automatic provider-context mutation. The discipline rule "no phase artifact may exceed N tokens at gate write" lands FIRST as a separate gate-preflight check; the probe extends existing `tokensEstimate` (`src/providers/manifest.ts:109` neighborhood, computed at line 111) with `context_projection_tokens`, `compaction_opportunity_savings_ratio`, `compaction_skipped_savings_ratio`. Rule-21 measurement: observed `compaction_opportunity_savings_ratio` distribution across runs > 0.10 floor before any compaction action authority is added. Trail: `docs/comparisons/gptme/SYNTHESIS.md`. Trigger: M14 Reviewer-panel + M15 debate-scheduler runs accumulate large enough contexts to make the floor measurable.

- **M19+ candidate — `feat(diagnostic): worktree topology refusal modes (gptme borrow B2-deferred).`** Authority boundary: diagnostic-only kind-classification of worktree state (`clean_run_worktree | dirty_run_worktree | no_worktree | multi_root_repo`). NO destructive restore primitive (gptme's `git reset --hard` is incompatible with code-oz's user-change preservation). On audit-completeness failure, classify and refuse rather than reset. Trigger: lands when actual resumes show audit-completeness recovery cannot recover from a dirty run worktree without operator intervention. Rule-21 measurement: count of resumes where current recovery is destructive vs. count where classification would have made the next action obvious.

- **M20+ candidate — `feat(eval): release/run-quality eval harness (gptme borrow D3, Codex-flagged).`** Authority boundary: a separate run-quality evaluation suite, not the unit/integration test surface. Inspired by `gptme/docs/evals.rst` (model leaderboards, CSV/JSON export, Docker guidance, SWE-bench compatibility). Validates orchestration AND live run quality across model/release combos. Trigger: when v0.2 stabilizes and a release-cadence quality regression slips through unit tests.

- **M18 candidate — `feat(intake): cross-tool AGENT_FILES discovery + AUDIT/DEFINE opt-in (gptme borrow B3-narrowed).`** Authority boundary: cross-tool agent-instruction-file intake at AUDIT and DEFINE phase entry. Discovery only (file list per `gptme/prompts/__init__.py`: `AGENTS.md`, `CLAUDE.md`, `COPILOT.md`, `GEMINI.md`, `.cursorrules`, `.windsurfrules`, `.github/copilot-instructions.md`). NO parent/home walk. NO automatic prompt injection. Files become available to the user as a confirm UI in AUDIT/DEFINE intake; user accepts or rejects per file. Telemetry events: `agent_files_discovered`, `agent_files_accepted`, `agent_files_rejected`, `agent_instruction_conflicts`. Rule-21 measurement: `agent_files_accepted / agent_files_discovered` rate; intake-question-count delta vs baseline (no AGENT_FILES) on a brownfield corpus. Trail: `docs/comparisons/gptme/SYNTHESIS.md`. Trigger: lands when brownfield AUDIT runtime ships (W4) OR when greenfield DEFINE intake earns the authority. NOT before `src/phases/audit.ts` exists.

- **M19 candidate — `feat(provider): deterministic context-projection + compaction-opportunity probe (gptme borrow B1-narrowed).`** Authority boundary: telemetry-only context projection that reports compaction opportunity without mutating provider invocations. NO LLM resume summarization. NO view-branch swap. NO automatic provider-context mutation. The discipline rule "no phase artifact may exceed N tokens at gate write" lands FIRST as a separate gate-preflight check; the probe extends existing `tokensEstimate` (`src/providers/manifest.ts:109` neighborhood, computed at line 111) with `context_projection_tokens`, `compaction_opportunity_savings_ratio`, `compaction_skipped_savings_ratio`. Rule-21 measurement: observed `compaction_opportunity_savings_ratio` distribution across runs > 0.10 floor before any compaction action authority is added. Trail: `docs/comparisons/gptme/SYNTHESIS.md`. Trigger: M14 Reviewer-panel + M15 debate-scheduler runs accumulate large enough contexts to make the floor measurable.

- **M20+ candidate — `feat(diagnostic): worktree topology refusal modes (gptme borrow B2-deferred).`** Authority boundary: diagnostic-only kind-classification of worktree state (`clean_run_worktree | dirty_run_worktree | no_worktree | multi_root_repo`). NO destructive restore primitive (gptme's `git reset --hard` is incompatible with code-oz's user-change preservation). On audit-completeness failure, classify and refuse rather than reset. Trigger: lands when actual resumes show audit-completeness recovery cannot recover from a dirty run worktree without operator intervention. Rule-21 measurement: count of resumes where current recovery is destructive vs. count where classification would have made the next action obvious.

- **M21+ candidate — `feat(eval): release/run-quality eval harness (gptme borrow D3, Codex-flagged).`** Authority boundary: a separate run-quality evaluation suite, not the unit/integration test surface. Inspired by `gptme/docs/evals.rst` (model leaderboards, CSV/JSON export, Docker guidance, SWE-bench compatibility). Validates orchestration AND live run quality across model/release combos. Trigger: when v0.2 stabilizes and a release-cadence quality regression slips through unit tests.

Copilot

Pull request overview

Doc-only PR that records a gptme vs code-oz template comparison (briefing, Codex response, and synthesis) and reserves roadmap slots for potential future borrows (AGENT_FILES intake, context-projection probe, worktree-topology refusal diagnostics, eval harness).

Changes:

Add gptme comparison docs: COMPARISON, Codex briefing/response, and post-debate synthesis.
Update docs/design/ROADMAP.md to reserve M17–M20(+)-style candidate slots derived from the gptme comparison.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
docs/design/ROADMAP.md	Adds reserved candidate milestone slots derived from the gptme comparison.
docs/comparisons/gptme/COMPARISON.md	New side-by-side comparison + borrow/defer/reject rationale.
docs/comparisons/gptme/CODEX_BRIEFING.md	New briefing prompt used for cross-model peer review.
docs/comparisons/gptme/CODEX_RESPONSE.md	New Codex round-1 response capturing fix-first findings and revised classification.
docs/comparisons/gptme/SYNTHESIS.md	New single-source-of-truth synthesis + rationale for “ratify-only” scope and roadmap slot reservation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+| **Knowledge injection** | Lessons (keyword/tool/pattern auto-load) + Anthropic skills | Universal anti-slop rules + per-persona prompts; no auto-load by keywords |
+| **Plugins** | Python entry-points; packages tools+hooks+commands | None — agentpacks (skill bundles) only |
+| **Subagents** | `subagent` tool: executor + planner + batch + completion-hooks | Phase agents; reviewer panel; debate participants — but not an in-chat subagent surface |
+| **CLI agent files** | Loads AGENTS.md/CLAUDE.md/GEMINI.md/COPILOT.md/.cursorrules/.windsurfrules | Loads its own CLAUDE.md only |


+  - **Template-comparison-derived deferred milestones (slots reserved 2026-05-10):**
+    - **M17 candidate — `feat(intake): cross-tool AGENT_FILES discovery + AUDIT/DEFINE opt-in (gptme borrow B3-narrowed).`** Authority boundary: cross-tool agent-instruction-file intake at AUDIT and DEFINE phase entry. Discovery only (file list per `gptme/prompts/__init__.py`: `AGENTS.md`, `CLAUDE.md`, `COPILOT.md`, `GEMINI.md`, `.cursorrules`, `.windsurfrules`, `.github/copilot-instructions.md`). NO parent/home walk. NO automatic prompt injection. Files become available to the user as a confirm UI in AUDIT/DEFINE intake; user accepts or rejects per file. Telemetry events: `agent_files_discovered`, `agent_files_accepted`, `agent_files_rejected`, `agent_instruction_conflicts`. Rule-21 measurement: `agent_files_accepted / agent_files_discovered` rate; intake-question-count delta vs baseline (no AGENT_FILES) on a brownfield corpus. Trail: `docs/comparisons/gptme/SYNTHESIS.md`. Trigger: lands when brownfield AUDIT runtime ships (W4) OR when greenfield DEFINE intake earns the authority. NOT before `src/phases/audit.ts` exists.
+    - **M18 candidate — `feat(provider): deterministic context-projection + compaction-opportunity probe (gptme borrow B1-narrowed).`** Authority boundary: telemetry-only context projection that reports compaction opportunity without mutating provider invocations. NO LLM resume summarization. NO view-branch swap. NO automatic provider-context mutation. The discipline rule "no phase artifact may exceed N tokens at gate write" lands FIRST as a separate gate-preflight check; the probe extends existing `tokensEstimate` (`src/providers/manifest.ts:109` neighborhood, computed at line 111) with `context_projection_tokens`, `compaction_opportunity_savings_ratio`, `compaction_skipped_savings_ratio`. Rule-21 measurement: observed `compaction_opportunity_savings_ratio` distribution across runs > 0.10 floor before any compaction action authority is added. Trail: `docs/comparisons/gptme/SYNTHESIS.md`. Trigger: M14 Reviewer-panel + M15 debate-scheduler runs accumulate large enough contexts to make the floor measurable.
+    - **M19+ candidate — `feat(diagnostic): worktree topology refusal modes (gptme borrow B2-deferred).`** Authority boundary: diagnostic-only kind-classification of worktree state (`clean_run_worktree | dirty_run_worktree | no_worktree | multi_root_repo`). NO destructive restore primitive (gptme's `git reset --hard` is incompatible with code-oz's user-change preservation). On audit-completeness failure, classify and refuse rather than reset. Trigger: lands when actual resumes show audit-completeness recovery cannot recover from a dirty run worktree without operator intervention. Rule-21 measurement: count of resumes where current recovery is destructive vs. count where classification would have made the next action obvious.
+    - **M20+ candidate — `feat(eval): release/run-quality eval harness (gptme borrow D3, Codex-flagged).`** Authority boundary: a separate run-quality evaluation suite, not the unit/integration test surface. Inspired by `gptme/docs/evals.rst` (model leaderboards, CSV/JSON export, Docker guidance, SWE-bench compatibility). Validates orchestration AND live run quality across model/release combos. Trigger: when v0.2 stabilizes and a release-cadence quality regression slips through unit tests.


+| Item | Status | Reason (one sentence) | Target slot | Measurement plan (if borrow) |
+|---|---|---|---|---|
+| **B1 — Compaction-opportunity probe** | borrow-deferred-to-own-milestone | Deterministic context projection + compaction-opportunity probe is useful telemetry, but it requires a separate milestone authority because gptme's full engine performs LLM resume summarization and view-branch swaps that violate code-oz's "files in `ProviderRequest.files` are explicit, never silently mutated" discipline. | M18 candidate | Extend the existing `tokensEstimate` field on `ProviderContextMetrics` (`src/providers/manifest.ts:111`) with `context_projection_tokens`, `compaction_opportunity_savings_ratio`, `compaction_skipped_savings_ratio`. Rule-21 ship gate: observed `compaction_opportunity_savings_ratio` distribution > 0.10 across runs before any compaction-action authority is added. |
+| **B3 — AGENT_FILES discovery + AUDIT/DEFINE opt-in** | borrow-deferred-to-own-milestone | Cross-tool agent-instruction-file discovery is worth doing, but it requires its own milestone authority because the AUDIT runtime does not yet exist (`src/phases/audit.ts` is absent) and the trust-boundary discipline (no parent/home walk, explicit per-file opt-in) is itself a contract surface. | M17 candidate | Telemetry events `agent_files_discovered`, `agent_files_accepted`, `agent_files_rejected`, `agent_instruction_conflicts`. Rule-21 ship gate: `agent_files_accepted / agent_files_discovered` rate observable; intake-question-count delta vs. baseline (no AGENT_FILES) on a brownfield corpus. |


+**M17 candidate — AGENT_FILES intake authority (B3-narrowed).** Lands the discovery list (`AGENTS.md`, `CLAUDE.md`, `COPILOT.md`, `GEMINI.md`, `.cursorrules`, `.windsurfrules`, `.github/copilot-instructions.md` per `gptme/prompts/__init__.py:23`) at AUDIT and DEFINE phase entry. Discovery only — no parent/home walk (gptme walks home → workspace per `gptme/prompts/workspace.py:121,215,233`; code-oz refuses), no automatic prompt injection. Files become a confirm UI that accepts or rejects per file. New telemetry events (`agent_files_discovered`, `agent_files_accepted`, `agent_files_rejected`, `agent_instruction_conflicts`) extend `PhaseEvent`. Trigger: lands when brownfield AUDIT runtime ships (W4) or when greenfield DEFINE intake earns the authority. Hard precondition: not before `src/phases/audit.ts` exists.
+
+**M18 candidate — Compaction-opportunity probe authority (B1-narrowed).** Telemetry-only context projection that reports compaction opportunity without mutating provider invocations. No LLM resume summarization (gptme's `gptme/tools/autocompact/hook.py:164`), no view-branch swap (`gptme/tools/autocompact/hook.py:128`), no automatic provider-context mutation. The discipline rule "no phase artifact may exceed N tokens at gate write" lands first as a separate gate-preflight check (`src/phases/gate-preflight.ts` extension); the probe extends the existing `ProviderContextMetrics` (`src/providers/manifest.ts:111` neighborhood) with `context_projection_tokens`, `compaction_opportunity_savings_ratio`, `compaction_skipped_savings_ratio`. Trigger: M14 Reviewer-panel + M15 debate-scheduler accumulate large enough contexts to make the > 0.10 floor measurable.
+
+**Parallel deferred slots — M19+ (B2) and M20+ (D3).** B2's worktree topology refusal diagnostics waits on actual operator-intervention evidence in the resume corpus. D3's release/run-quality eval harness waits on a release-cadence quality regression that slips through unit tests. Both are reserved with measurement triggers in `ROADMAP.md`; neither is committed.


+3. **Rule 21 measurement plans documented for future milestones.** Each borrow's measurement plan is recorded in the decision matrix above and in the `ROADMAP.md` slot reservation. Implementation cannot land without telemetry first.
+4. **No source or test code touched.** Only `docs/comparisons/gptme/*` and `docs/design/ROADMAP.md` are modified. No `src/**`, no `tests/**`, no schema changes.
+
+When the convergence Codex round returns `push-clean`, the PR ships and the slots are reserved. The next milestone (M17 candidate or otherwise) opens its own briefing → debate → implementation → review cycle.


+| B3: Borrow now — Cross-tool AGENT_FILES ingestion | **B3: Narrow borrow candidate, deferred to own milestone — AGENT_FILES discovery plus explicit AUDIT/DEFINE opt-in.** No parent/home walk. Cross-tool files are informational until the user accepts them. Measurement: `agent_files_discovered`, `agent_files_accepted`, `agent_files_rejected`, `agent_instruction_conflicts`, intake-question delta. Reserved as M17 candidate slot in `ROADMAP.md`. |
+| (none) | **D3 (new): Release/run-quality eval harness inspired by gptme evals.** Defer unless it becomes the single milestone authority. |
+| D1, D2, R1, R2, R3, R4, R5 | unchanged |
+
+Borrow set after fix: **2 narrow-borrow candidates (each deferred to its own milestone), 4 deferred (B2 demoted + D1 + D2 + new D3 eval harness), 5 rejected.**
+
+> **Note (post-R2 scope lock, thread `019e1319`):** Codex round 2 ratified Option A — RATIFY-ONLY. The two narrow-borrow candidates above (B1, B3) are reserved for their own future milestones (M17/M18 candidate slots in `docs/design/ROADMAP.md`); they are NOT implemented in this PR. The canonical post-debate settlement is `SYNTHESIS.md`.


+
+Borrow set after fix: **2 narrow-borrow candidates (each deferred to its own milestone), 4 deferred (B2 demoted + D1 + D2 + new D3 eval harness), 5 rejected.**
+
+> **Note (post-R2 scope lock, thread `019e1319`):** Codex round 2 ratified Option A — RATIFY-ONLY. The two narrow-borrow candidates above (B1, B3) are reserved for their own future milestones (M17/M18 candidate slots in `docs/design/ROADMAP.md`); they are NOT implemented in this PR. The canonical post-debate settlement is `SYNTHESIS.md`.


omerakben added 6 commits May 10, 2026 14:31

merge(main): combine gptme M17-M20 candidate slots + RULE21_BENCHMARK…

314f3a3

… reference in ROADMAP

Copilot AI review requested due to automatic review settings May 11, 2026 02:48

omerakben merged commit 4ac6f4c into main May 11, 2026
2 checks passed

Copilot started reviewing on behalf of omerakben May 11, 2026 02:49 View session

chatgpt-codex-connector Bot reviewed May 11, 2026

View reviewed changes

gemini-code-assist Bot reviewed May 11, 2026

View reviewed changes

Copilot AI reviewed May 11, 2026

View reviewed changes

This was referenced May 11, 2026

feat(util): allowlisted env reader (B5) + test(integration): cross-family handoff matrix (B4) + 08-pi-mono comparison #26

Merged

feat(prompts+docs): ADR gate + architecture vocabulary affordances (M18 partial) + 10-mattpocock-skills comparison #28

Merged

omerakben deleted the feat/gptme-borrow-set-ratified branch May 30, 2026 03:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(comparisons/gptme): gptme template comparison + M17-M20 candidate slots (5 Codex rounds)#22

docs(comparisons/gptme): gptme template comparison + M17-M20 candidate slots (5 Codex rounds)#22
omerakben merged 6 commits into
mainfrom
feat/gptme-borrow-set-ratified

omerakben commented May 11, 2026

Uh oh!

coderabbitai Bot commented May 11, 2026

Rate limit exceeded

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 11, 2026

Uh oh!

chatgpt-codex-connector Bot May 11, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 11, 2026

Uh oh!

gemini-code-assist Bot May 11, 2026

Uh oh!

gemini-code-assist Bot May 11, 2026

Uh oh!

gemini-code-assist Bot May 11, 2026

Uh oh!

gemini-code-assist Bot May 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	├── tools/ (shell, ipython, patch, browser, vision, computer, subagent, rag, gh, tmux, todo, …)
	Borrow set: 2 narrow-borrow candidates, 4 deferred, 5 rejected.


		Borrow set after fix: 2 narrow-borrow candidates (each deferred to its own milestone), 4 deferred (B2 demoted + D1 + D2 + new D3 eval harness), 5 rejected.

		> Note (post-R2 scope lock, thread `019e1319`): Codex round 2 ratified Option A — RATIFY-ONLY. The two narrow-borrow candidates above (B1, B3) are reserved for their own future milestones (M17/M18 candidate slots in `docs/design/ROADMAP.md`); they are NOT implemented in this PR. The canonical post-debate settlement is `SYNTHESIS.md`.

Conversation

omerakben commented May 11, 2026

Test plan

Uh oh!

coderabbitai Bot commented May 11, 2026

Rate limit exceeded

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants