docs(mt#1057): add ADR-007 for cognition provider abstraction#677
Conversation
Proposes CognitionProvider as a first-class domain abstraction for dual-mode AI operation: direct (own API), delegated (borrow from harness), degraded (no cognition available). AICompletionService becomes an implementation detail of DirectCognitionProvider. First consumer: mt#321.2 (agent-readiness assessment). Retrofit of existing features tracked in mt#1058. Also updates docs/architecture.md ADR Index to include ADR-004 and ADR-005.
There was a problem hiding this comment.
Review: ADR-005 cognition provider abstraction
CI status: pass (2/2 checks green)
Coverage: 2/2 files reviewed.
Findings
[BLOCKING] docs/architecture/adr-005-cognition-provider-abstraction.md:5, :23, :66, :112, :128 — The ADR references "mt#321.2" five times as a task identifier, but that notation is not a valid Minsky task ID. The actual subtasks created under mt#321 have IDs mt#1062 (the "mt#321.1" role — core rubric engine), mt#1063 (the "mt#321.2" role — AI evaluation/synthesis), and mt#1064 (the "mt#321.3" role — assess command). The ".1/.2/.3" shorthand was convenient in planning conversation but doesn't exist in the task system. Someone opening this ADR and trying tasks_get mt#321.2 will get a "not found" error. Fix: replace inline references with real IDs (e.g., "First consumer: mt#1063 (AI criterion evaluation and synthesis, subtask of mt#321)") or state once at the top that ".1/.2/.3" is descriptive shorthand and include a mapping table. The References section already lists real IDs for the non-subtask work (mt#1057, mt#1058, mt#1059) — consistency would help.
[NON-BLOCKING] docs/architecture/adr-005-cognition-provider-abstraction.md:102 — Mode resolution sketch references hasConfiguredAIProvider() which does not currently exist in the codebase. AIConfigurationService (verified at src/domain/ai/config-service.ts) provides the underlying capability. Not blocking because ADRs describe target state, not current state — the helper is expected to be created during implementation. Flag so the implementation task (mt#1057 Phase 1) knows to add it alongside the provider code.
[NON-BLOCKING] The same mt#321 parent task spec I rewrote earlier today uses the same ".1/.2/.3" shorthand in its Subtasks section. That's outside this PR's diff but will cause the same lookup failure. Follow-up fix via tasks_spec_search_replace after this PR; not blocking for the ADR merge itself.
Checked and clear
docs/architecture.mdindex additions — ADR-004 file verified atdocs/architecture/adr-004-two-phase-command-execution.mdwith statusACCEPTED(confirmed by reading the file's header, dated 2026-04-20), so "Accepted" in the new row is correct. ADR-005 correctly markedProposed.- Table formatting preserved — column alignment, header row intact.
- Cross-references to ADR-002 and ADR-003 — both paths exist at
docs/architecture/adr-002-persistence-provider-architecture.mdanddocs/architecture/adr-003-project-level-repository-backend.mdper earlier verification. - Claim:
src/domain/ai/types.tscontains theAICompletionServiceinterface — verified in prior read. - Claim:
src/domain/runtime/harness-detection.tsprovidesdetectAgentHarness()— verified in prior read. - Claim: mt#800 skills architecture compiles to harness targets (Claude Code skills, Cursor rules, AGENTS.md) — matches
project_skills_architecture.mdframing. - No behavioral changes in this PR (docs-only); no test impact.
- No conflict with ADR-002/003 design patterns — explicitly positions itself as inheriting their conventions.
Spec verification
Task: mt#1057
| Criterion | Status | Evidence |
|---|---|---|
ADR-005 committed to docs/architecture/adr-005-cognition-provider-abstraction.md and referenced from docs/architecture.md ADR Index |
Met | New file at docs/architecture/adr-005-cognition-provider-abstraction.md; index row added at docs/architecture.md:601 |
src/domain/cognition/ module with types |
Not met (deferred) | This PR only ships the ADR; implementation is Phase 1 of the task |
DirectCognitionProvider wrapping AICompletionService |
Not met (deferred) | Phase 1 |
DelegatedCognitionProvider |
Not met (deferred) | Phase 1 |
DegradedCognitionProvider |
Not met (deferred) | Phase 1 |
resolveCognitionMode() |
Not met (deferred) | Phase 1 |
| Composition-root wiring | Not met (deferred) | Phase 1 |
| Stub provider for tests | Not met (deferred) | Phase 1 |
| Unit tests >90% coverage | Not met (deferred) | Phase 1 |
| Integration test across modes | Not met (deferred) | Phase 1 |
| Coordination with mt#915 payload shape | Not met (deferred) | Phase 3 per ADR implementation plan |
Action required: This PR delivers only criterion #1 (ADR). The remaining criteria are deferred to subsequent PRs on the same task. On merge, mt#1057 stays IN-PROGRESS — it does NOT transition to DONE until Phase 1 implementation lands. The ADR's own §Implementation Plan documents this phasing, so the spec is internally consistent; reviewers should not treat this PR as task-completing.
Documentation impact
Updated docs/architecture.md (ADR Index) and added docs/architecture/adr-005-cognition-provider-abstraction.md in this PR.
docs/theory-of-operation.md — not updated in this PR, intentionally. The ADR describes a proposed infrastructure organ for VSM System 4; the theory-of-operation doc currently states System 4 is "Mostly missing." Updating the VSM status table to "Proposed" would be premature — the organ doesn't exist yet. When Phase 1 implementation lands, docs/theory-of-operation.md should be updated alongside to reflect the built organ. Not blocking for this PR.
(Had Claude look into this — AI-assisted review)
Replaces mt#321.2 references (shorthand that doesn't resolve as a task ID) with the actual subtask ID mt#1063. Addresses blocking finding from self-review on PR #677.
There was a problem hiding this comment.
Review: ADR-005 (Cognition Provider Abstraction) — Chinese-wall review
Fresh-context review by a reviewer subagent (no access to the authoring conversation). Posted as COMMENT per self-approval restriction; findings classified as BLOCKING take precedence over the event type.
CI status: pass (2/2 checks green on current HEAD 9b5319c3e)
Coverage: 2/2 files reviewed.
Findings
[BLOCKING] docs/architecture/adr-005-cognition-provider-abstraction.md — ADR number collision. The repo already has docs/architecture/adr-005-forgebackend-subinterfaces.md on main (merged for mt#1020) and docs/architecture/adr-006-agent-identity.md on main. Both are Accepted. This PR claims the same ADR-005 number as ForgeBackend. The next available ADR number is ADR-007. File must be renamed to adr-007-cognition-provider-abstraction.md and the title heading updated from # ADR-005: to # ADR-007:.
Evidence: on main, ls docs/architecture/adr-* shows adr-002-*.md, adr-003-*.md, adr-004-*.md, adr-005-forgebackend-subinterfaces.md, adr-006-agent-identity.md. The initial session git-status snapshot also explicitly listed a30be8831 docs(mt#1020): add ADR-005 for ForgeBackend sub-interface architecture in recent commits — the information was available; the authoring context missed it by reading only the (stale) ADR Index.
[BLOCKING] docs/architecture.md ADR Index — Index is incomplete. This PR adds ADR-004 and the (to-be-renumbered) cognition ADR but omits ADR-005 (ForgeBackend, Accepted) and ADR-006 (Agent Identity, Accepted). Both exist on disk; neither appears in the index. All four ADRs (004, 005 ForgeBackend, 006 Agent Identity, 007 Cognition) should be in the index after this PR lands.
Checked and clear
src/domain/ai/types.ts:AICompletionServiceinterface exists at lines 93–99 exactly as the ADR describes (complete, stream, generateObject, getAvailableModels, validateConfiguration).src/domain/runtime/harness-detection.ts:detectAgentHarness()exists at line 36, returns the expectedAgentHarnessunion. TheresolveCognitionModesketch uses it correctly.docs/theory-of-operation.md: exists; §System 4 reference is accurate.- Cross-referenced ADRs (002, 003, 004): all exist on disk; paths in the ADR body are correct.
- ADR-004 status: file header reads
**ACCEPTED** - 2026-04-20; the index row added by this PR correctly lists "Accepted". - Prior blocking fix (
mt#321.2→mt#1063): commit9b5319c3ereplaces all four occurrences cleanly; nomt#321.2references remain. detectAgentHarness()usage in ADR: sketch'sdetectAgentHarness() !== "standalone"works against the real function signature.- ADR content coherence: the three-mode (direct/delegated/degraded) design is internally consistent; does not contradict ADR-002 (explicitly inherits capability-provider pattern), ADR-003 (ForgeBackend, different domain), ADR-004 (unrelated).
- Delegated-mode relationship to
session_generate_prompt(mt#915): appropriately flagged as future convergence, not claimed as already done. - All in-ADR task ID references: mt#1057, mt#1058, mt#1059, mt#1063, mt#800, mt#915, mt#762, mt#321 verified as real tasks.
Spec verification
Task: mt#1057
| Criterion | Status | Evidence |
|---|---|---|
ADR committed to docs/architecture/adr-005-cognition-provider-abstraction.md and referenced from ADR Index |
Not met — blocking | File exists but must be renumbered ADR-007 (ADR-005 already taken by ForgeBackend). Index additions are incomplete. |
src/domain/cognition/ module with types |
Not met (deferred) | Phase 1 |
DirectCognitionProvider / DelegatedCognitionProvider / DegradedCognitionProvider |
Not met (deferred) | Phase 1 |
resolveCognitionMode() |
Not met (deferred) | Phase 1 |
| Composition-root wiring | Not met (deferred) | Phase 1 |
| Stub provider for tests | Not met (deferred) | Phase 1 |
| Unit tests >90% coverage | Not met (deferred) | Phase 1 |
| Integration test across modes | Not met (deferred) | Phase 1 |
| Coordination with mt#915 payload shape | Not met (deferred) | Phase 3 |
Action required: Rename the ADR file to ADR-007, update its heading, and extend the ADR Index to include ADR-005 (ForgeBackend), ADR-006 (Agent Identity), and ADR-007 (Cognition). After fix, re-review on the updated HEAD.
Documentation impact
Updated docs/architecture.md (ADR Index) and added new ADR file in this PR. Both blocking findings are IN the documentation change itself — no upstream/downstream doc impact.
(Chinese-wall review by a reviewer subagent — fresh context, no access to the authoring conversation)
There was a problem hiding this comment.
Independent adversarial review (Chinese-wall)
Reviewer: minsky-reviewer[bot] via openai:gpt-5
Tier: unknown
Findings
-
[BLOCKING] docs/architecture/adr-005-cognition-provider-abstraction.md:1 — Title says “Dual-Mode AI Operation” while the ADR explicitly defines three modes (Direct, Delegated, Degraded) at lines ~36–52 (“Three modes are supported, selected at runtime...”). This inconsistency will confuse future readers about whether two or three modes are authoritative. Please align the title and all references (e.g., index entry) to reflect three modes or justify why “dual-mode” excludes “degraded.”
-
[BLOCKING] docs/architecture/adr-005-cognition-provider-abstraction.md:~101–111 — The Mode Resolution pseudocode defaults MCP invocations to delegated mode:
if (ctx.invocation === "mcp" || detectAgentHarness() !== "standalone") {
return "delegated";
}
However, the Implementation Plan (Phase 5, lines ~160–167) states “Once the abstraction is stable, MCP tool handlers default to delegated mode… sequenced last.” These conflict on when the MCP default changes. Either fix the pseudocode to reflect current behavior (no default-to-delegated yet) or adjust the plan and call out the behavioral change in release notes. -
[BLOCKING] docs/architecture/adr-005-cognition-provider-abstraction.md:~170–187 — The “References” section lists ADR-002/003/004 and internal files as plain text, not links. The PR test plan requires “Cross-references to ADR-002, ADR-003, ADR-004 resolve,” but there are no actual links to resolve. Convert these to working relative links (e.g., ADR-002) and verify they render/click through.
-
[NON-BLOCKING] docs/architecture.md:~704–705 — ADR-004 is marked “Accepted” in the index, but this PR does not show ADR-004’s content or status. If ADR-004’s file status is still Proposed or Draft, this creates an inconsistency. Please confirm ADR-004’s Status field matches the index; otherwise, update the index to mirror the ADR’s own Status.
-
[NON-BLOCKING] docs/architecture/adr-005-cognition-provider-abstraction.md:~23–31 vs PR description — The PR description calls the first consumer “mt#321.2,” whereas the ADR Status line names “mt#1063 (subtask of mt#321)” as first consumer. The mismatch in identifiers can cause traceability issues. Please standardize on one identifier and, if they’re the same, state that explicitly.
-
[NON-BLOCKING] docs/architecture/adr-005-cognition-provider-abstraction.md:~85–96 — The interface sketch hard-codes Zod (ZodType). If the codebase does not already depend on Zod for runtime schemas, this ADR bakes in a technology choice prematurely. Either:
- confirm Zod is the established schema lib and add that to the ADR’s Context/Decision rationale, or
- abstract the schema type in the ADR (e.g., Schema) and note Zod as a likely implementation, to avoid locking in a specific dependency at the ADR level.
-
[NON-BLOCKING] docs/architecture/adr-005-cognition-provider-abstraction.md:~113–120 — The batch API enforces homogeneous task output types (performBatch(tasks: CognitionTask[])). Many real workloads bundle heterogeneous cognitive subtasks (e.g., judgments plus summaries). Consider noting this limitation and a potential extension (e.g., discriminated union tasks, per-task typing, or returning per-task results with IDs) so implementers don’t discover the constraint too late.
-
[NON-BLOCKING] docs/architecture/adr-005-cognition-provider-abstraction.md:~121–140 — Delegated execution lacks an explicit correlation/ID model for later result ingestion. CognitionBundle carries a list of tasks and an order flag, but there’s no task identifier or contract for how results are matched back if/when the harness executes them. Recommend adding a taskId field (and perhaps a bundleId) to CognitionTask/CognitionBundle and calling out the ingestion handshake at least at a high level.
-
[NON-BLOCKING] docs/architecture/adr-005-cognition-provider-abstraction.md:~146–154 — The example return objects use kinds like "delegation" and "deterministic-only" which are not CognitionResult kinds (they are the feature’s own return). That’s fine, but the ADR could benefit from a one-sentence note clarifying that the example’s outer return shape is feature-specific, to avoid readers assuming it’s part of CognitionResult.
-
[NON-BLOCKING] docs/architecture/adr-005-cognition-provider-abstraction.md:~127–131 and ~180–183 — The ADR asserts the existence of detectAgentHarness()/hasConfiguredAIProvider(), src/domain/runtime/harness-detection.ts, and src/domain/ai/types.ts (AICompletionService). If any of these paths or helpers don’t exist or use different names/locations, the ADR will contain dead references. Please verify and convert them into repository-relative links where possible.
-
[NON-BLOCKING] docs/architecture.md:~704–705 — The ADR index entry text uses the “Dual-Mode” title that conflicts with the three-mode content in ADR-005 (see blocking item 1). Update the index label after resolving the ADR title.
Spec verification table
- ADR renders correctly in GitHub markdown viewer: Met (formatting and code fences appear conventional; no unusual syntax)
- ADR Index table renders with all four ADRs: Met (002–005 present; two new rows added)
- Cross-references to ADR-002, ADR-003, ADR-004 resolve: Not Met (references are plain text, not links; no resolution possible)
Documentation impact
- This PR is documentation. However, the ADR introduces a user-visible behavior change plan (MCP default to delegated). Before implementing that change, update:
- docs/cli and MCP tool docs to describe mode-dependent outputs and the --cognition-mode override
- architecture.md ADR index after resolving the title/status inconsistencies
- Add working links in ADR-005 to referenced ADRs and internal code paths to aid navigation.
Event
REQUEST_CHANGES
Chinese-wall review caught ADR-005 collision with forgebackend-subinterfaces.md (mt#1020, merged 2026-04-22) and agent-identity.md (mt#953, also merged). Next available ADR number is 007. - Renamed adr-005-cognition-provider-abstraction.md to 007 - Updated title heading from ADR-005 to ADR-007 - Extended docs/architecture.md ADR Index with all four entries: ADR-004, ADR-005 (ForgeBackend), ADR-006 (Agent Identity), ADR-007 (Cognition) Follow-up will update remaining ADR-005 cognition references in task specs.
There was a problem hiding this comment.
Independent adversarial review (Chinese-wall)
Reviewer: minsky-reviewer[bot] via openai:gpt-5
Tier: unknown
Findings
-
[BLOCKING] Heterogeneous batch results are not supported by the interface sketch
- File: docs/architecture/adr-007-cognition-provider-abstraction.md: lines 44–69 (Interface Sketch)
- The proposed signature performBatch(tasks: CognitionTask[]): Promise<CognitionResult<T[]>>; forces all tasks in a batch to share the same output schema T. This precludes batching heterogeneous cognitive work (e.g., interleaving rubric evaluations with narrative synthesis, or mixing different criterion schemas) which is a realistic and likely use case, particularly in “delegated” mode where bundling tasks for a harness is desirable. The ADR does not justify this constraint or provide an alternative (e.g., tuple types, tagged unions, or an opaque result envelope keyed by task IDs). As written, adopting this interface will constrain implementations or force unnatural splitting into multiple batches.
- Failure mode: Features that need to package multiple distinct CognitionTask types in one bundle cannot do so without type erasure or multiple round-trips, undermining the stated goal of clean delegated execution.
-
[BLOCKING] Cross-reference criterion in the test plan is not met (no actual links in ADR-007)
- File: docs/architecture/adr-007-cognition-provider-abstraction.md: throughout; see “References” section lines ~160–187 and inline mentions in “Context”, “Decision”, and “Relationship to Existing Work”
- The PR’s test plan states “Cross-references to ADR-002, ADR-003, ADR-004 resolve.” In ADR-007, these are plain text mentions and a non-linked References list. There are no markdown links to the corresponding files (e.g., ADR-002). As a result, the cross-references do not “resolve” as clickable links.
- Failure mode: Readers cannot navigate directly; test plan not satisfied.
-
[NON-BLOCKING] Title promises “Dual-Mode” while the design defines three modes
- File: docs/architecture/adr-007-cognition-provider-abstraction.md: line 1
- The heading reads “Cognition Provider Abstraction for Dual-Mode AI Operation” yet the ADR formalizes three modes (Direct, Delegated, Degraded). This is likely to confuse readers and downstream consumers (and searchability).
- Suggestion: Rename to “…for Multi-Mode” or explicitly carve “Degraded” out as not a “mode” but a failure condition, and reflect that consistently.
-
[NON-BLOCKING] Potential mismatch between ADR Index statuses and the actual ADR docs
- File: docs/architecture.md: lines 703–706 (added rows for ADR-004..ADR-007)
- The index marks ADR-004/005/006 as “Accepted.” If any of those ADR documents are still in Proposed/Draft, the index will be incorrect. This PR does not touch those ADR documents; please verify their headers and align statuses.
- Failure mode: Architecture index misleads readers about decision maturity.
-
[NON-BLOCKING] Risk of broken links in ADR Index for ADR-004..ADR-006
- File: docs/architecture.md: lines 703–706
- New index entries assume files exist at:
- architecture/adr-004-two-phase-command-execution.md
- architecture/adr-005-forgebackend-subinterfaces.md
- architecture/adr-006-agent-identity.md
- The PR description asserts 005/006 are on main; 004 is not explicitly confirmed. If any are missing or named differently, index links will 404. Please verify.
-
[NON-BLOCKING] Domain-layer coupling to Zod in the interface sketch is undocumented and may be inconsistent with layering conventions
- File: docs/architecture/adr-007-cognition-provider-abstraction.md: lines 18–43 (CognitionTask.schema: ZodType)
- Baking ZodType into a domain-facing provider couples the domain abstraction to a specific validation library. ADR-002’s capability pattern and prior domain boundaries may avoid third-party types at the domain surface. If Zod is the intentional contract, call out rationale and alternatives (JSON Schema, brand types, repo-native schema abstraction) and the handling in delegated mode (how schemas are conveyed/validated by a harness).
-
[NON-BLOCKING] Delegated bundle shape likely underspecified for real harness execution
- File: docs/architecture/adr-007-cognition-provider-abstraction.md: lines 28–43 (CognitionBundle interface)
- Missing practical fields:
- Stable task identifiers for correlating returned results
- Optional dependency graph (not just parallel|sequential)
- Idempotency keys/retry semantics
- Explicit serialization of the output schema (if validation on return is expected)
- Cost/priority hints
- Without these, harnesses cannot robustly execute/return results for multi-task bundles.
-
[NON-BLOCKING] Mode resolution heuristic risks over-delegation in non-agent shells
- File: docs/architecture/adr-007-cognition-provider-abstraction.md: lines ~90–110 (Mode Resolution snippet)
- The condition if (ctx.invocation === "mcp" || detectAgentHarness() !== "standalone") return "delegated"; may classify generic wrapped environments (e.g., devcontainers, terminals embedded in IDEs) as non-standalone and force delegation unexpectedly. Consider limiting to explicit known harnesses or MCP invocation, or documenting the detection contract and safeguards.
-
[NON-BLOCKING] References to source file paths may be inaccurate
- File: docs/architecture/adr-007-cognition-provider-abstraction.md: lines ~165–170 (References)
- Paths “src/domain/ai/types.ts” and “src/domain/runtime/harness-detection.ts” are cited as existing. If filenames or locations differ (e.g., completion service in a different module), these references will mislead implementers. Please verify exact paths and update accordingly.
-
[NON-BLOCKING] Streaming claim may be too strong
- File: docs/architecture/adr-007-cognition-provider-abstraction.md: lines ~120–130 (Consequences → Negative)
- “Delegated mode cannot [stream].” Some harnesses can support progressive updates (e.g., streaming MCP results or partial tool outputs). Consider softening to “may not” and framing as provider-dependent to avoid unnecessarily constraining future implementations.
Spec verification table
- ADR renumbered to ADR-007, file renamed, heading updated, no self-references remain: Met
- ADR Index table includes all four post-ADR-003 entries: Met
- ADR renders correctly in GitHub markdown viewer: N/A (cannot verify in code review; please confirm via UI)
- Cross-references to ADR-002, ADR-003, ADR-004 resolve: Not Met (ADR-007 contains no clickable links to these ADRs; add markdown links)
Documentation impact
- This PR is documentation-only but introduces an architectural contract that other docs will need to reference consistently:
- Update ADR-007 title/wording to resolve “Dual-Mode” vs three modes.
- Ensure docs/architecture.md statuses match the actual ADR document statuses.
- Add working links in ADR-007 to ADR-002/003/004 and to referenced source files if you intend readers to navigate there.
- Consider a short glossary or patterns note explaining the rationale for Zod at the domain boundary (or refactor to a domain-native schema abstraction).
Event: REQUEST_CHANGES
There was a problem hiding this comment.
Review: ADR-007 renumber fix (post-Chinese-wall)
Fresh review of commit dcdee6ccd confirming the Chinese-wall reviewer's blocking findings are addressed. Author self-review — the independent scrutiny happened earlier; this is narrow mechanical verification.
CI status: pass (2/2 checks green on current HEAD dcdee6ccd)
Coverage: 3/3 files reviewed (the diff in this commit is: rename + architecture.md index + heading).
Findings
No blocking issues in the fix.
Checked and clear
- Rename applied:
docs/architecture/adr-005-cognition-provider-abstraction.md→docs/architecture/adr-007-cognition-provider-abstraction.mdviaR099(rename with 99% similarity) in the commit diff. - Heading updated inside the renamed file:
# ADR-005: ...→# ADR-007: ...(verified via grep: no remainingADR-005self-references inside the file). - ADR Index completeness:
docs/architecture.md§10 now lists ADR-002 through ADR-007 (six rows), status correct for each:- ADR-002: Accepted (persistence provider)
- ADR-003: Accepted (project-level repository backend)
- ADR-004: Accepted (two-phase command execution)
- ADR-005: Accepted (ForgeBackend sub-interfaces) — added
- ADR-006: Accepted (Agent Identity) — added
- ADR-007: Proposed (cognition provider) — renumbered from previous incorrect ADR-005
- No collision on main: verified
adr-005-forgebackend-subinterfaces.mdandadr-006-agent-identity.mdexist on origin/main and are Accepted. ADR-007 is the next available number. - PR body updated (
session_pr_edit) to reflect the renumbering, with renumbering-note callout for future readers. - Follow-up for remaining references filed: mt#1128 tracks the spec-hygiene sweep across mt#1057, mt#1058, mt#1059, mt#1060, mt#1063, mt#1064, mt#321 where "ADR-005" was used to mean the cognition ADR.
- Memory files updated in-commit (for index only — the spec-text update is out of scope here):
project_cognition_provider.mdADR-005 → ADR-007 viareplace_all;MEMORY.mdindex line updated.
Spec verification
Task: mt#1057
| Criterion | Status | Evidence |
|---|---|---|
ADR committed to docs/architecture/adr-NNN-cognition-provider-abstraction.md and referenced from ADR Index |
Met | File at docs/architecture/adr-007-cognition-provider-abstraction.md; index row added at docs/architecture.md:603 |
src/domain/cognition/ module with types |
Not met (deferred) | This PR only ships the ADR; implementation is Phase 1 of the task |
DirectCognitionProvider / DelegatedCognitionProvider / DegradedCognitionProvider |
Not met (deferred) | Phase 1 |
resolveCognitionMode() |
Not met (deferred) | Phase 1 |
| Composition-root wiring | Not met (deferred) | Phase 1 |
| Stub provider for tests | Not met (deferred) | Phase 1 |
| Unit tests >90% coverage | Not met (deferred) | Phase 1 |
| Integration test across modes | Not met (deferred) | Phase 1 |
| Coordination with mt#915 payload shape | Not met (deferred) | Phase 3 |
Action required: None for this PR. mt#1057 stays IN-PROGRESS after merge — the remaining criteria are delivered by Phase 1 implementation in subsequent PRs on the same task. The ADR's Implementation Plan documents this phasing.
Documentation impact
Updated docs/architecture.md (ADR Index, now complete through ADR-007) and added docs/architecture/adr-007-cognition-provider-abstraction.md in this PR. No other doc updates needed; docs/theory-of-operation.md update will follow Phase 1 implementation, not the ADR.
(Had Claude look into this — AI-assisted review)
Reviewer App flagged 2 BLOCKING + 8 NON-BLOCKING findings. Addressing: BLOCKING: - performBatch now supports heterogeneous tasks via tuple inference (was forcing homogeneous output types, which defeats the batched-delegation use case) - ADR References converted from plain text to clickable markdown links (test plan asserted resolvable cross-refs; prior version had none) NON-BLOCKING (trivial text): - Title: Dual-Mode to Multi-Mode (three modes, not two) - Streaming claim softened to harness-dependent - CognitionTask gains id field for delegated-mode correlation - CognitionBundle gains id field - Zod dependency rationale added inline - Mode Resolution pseudocode clarified as target state; current MCP default stays direct until Phase 5 retrofit lands ADR Index entry title synced to Multi-Mode.
There was a problem hiding this comment.
Review: addressed reviewer-App findings (post-68566f9cf)
Fresh review covering commit 68566f9cf. All BLOCKING findings from minsky-reviewer[bot] review 4162415333 are addressed; NON-BLOCKING items are either addressed or explicitly deferred.
CI status: pass (2/2 checks green on HEAD 68566f9cf)
Coverage: 2/2 files reviewed (diff is scoped to ADR + architecture.md index row).
How each finding was handled
BLOCKING:
- Heterogeneous batch types —
performBatchsignature changed toperformBatch<Ts extends readonly CognitionTask<unknown>[]>(tasks: Ts): Promise<CognitionResult<{ [K in keyof Ts]: Ts[K] extends CognitionTask<infer R> ? R : never }>>. Tuple inference preserves per-task output types across heterogeneous batches. Direct use case (rubric eval + narrative synthesis in one delegated bundle) now fits. - Cross-references as links — References section converted from plain text to relative markdown links for ADR-002 through ADR-006, plus source file paths (
src/domain/ai/types.ts,src/domain/runtime/harness-detection.ts) anddocs/theory-of-operation.md. Test plan criterion "Cross-references resolve" now met.
NON-BLOCKING (addressed in this commit):
- Title "Dual-Mode" → "Multi-Mode" (both ADR heading and architecture.md index row).
- Streaming claim softened from "delegated mode cannot stream" to "typically does not stream (harness-dependent)" with note that some MCP hosts support progressive outputs.
CognitionTaskgainedid: stringfield for delegated-mode result correlation.CognitionBundlegainedid: stringfield for bundle-level correlation.- Zod at domain boundary rationale added inline: "Zod is the established schema lib for this codebase; alternative libs can be swapped behind a narrower Schema adapter at implementation time."
- Mode Resolution pseudocode clarified as target state; added a paragraph noting the current pre-Phase-5 behavior (MCP defaults to direct, flipping to delegated in Phase 5 after retrofit mt#1058 completes).
NON-BLOCKING (deferred with justification):
- Dependency graph beyond
parallel|sequential, idempotency keys, cost/priority hints in CognitionBundle — deferred to Phase 1 implementation per ADR's Open Questions. The ADR is a design doc; exhaustive bundle fields are implementation territory. - Mode resolution "over-delegation in non-agent shells" — the heuristic uses
detectAgentHarness()which already filters to known harnesses (claude-code,cursor,standalone). Generic devcontainers returnstandaloneper the existing implementation. No change needed. - Source file paths in References — verified to exist (confirmed against current main) and now linked clickably.
- "Example return kinds like 'delegation' and 'deterministic-only' are feature-specific, not CognitionResult kinds" — the illustrative code comment already shows the outer shape as a feature's own return; adding a note is over-explanatory for a sketch. Leaving as-is.
Checked and clear
- Commit
68566f9cfapplies cleanly on top ofdcdee6ccd. - ADR title matches index row label (both "Multi-Mode").
- All relative links in References section point at files that exist (verified on origin/main: adr-002 through adr-006, src paths, theory-of-operation.md).
performBatchtype expression is valid TS (readonly tuple with mapped-type inference — standard pattern).- No
ADR-005references remain in the ADR body referring to cognition (only to ForgeBackend in References, correctly). - CI 2/2 green.
Spec verification
Task: mt#1057
| Criterion | Status | Evidence |
|---|---|---|
| ADR committed with correct number + index reference | Met | adr-007-cognition-provider-abstraction.md present; index row exists with Multi-Mode title |
src/domain/cognition/ module with types |
Not met (deferred) | Phase 1 |
DirectCognitionProvider / DelegatedCognitionProvider / DegradedCognitionProvider |
Not met (deferred) | Phase 1 |
resolveCognitionMode() |
Not met (deferred) | Phase 1 |
| Composition-root wiring | Not met (deferred) | Phase 1 |
| Stub provider for tests | Not met (deferred) | Phase 1 |
| Unit tests >90% coverage | Not met (deferred) | Phase 1 |
| Integration test across modes | Not met (deferred) | Phase 1 |
| Coordination with mt#915 payload shape | Not met (deferred) | Phase 3 |
Action required: None for this PR. Awaiting minsky-reviewer[bot] re-review on 68566f9cf to confirm blocking findings addressed.
Documentation impact
Updated docs/architecture.md (ADR Index title synced to Multi-Mode) and modified docs/architecture/adr-007-cognition-provider-abstraction.md with the fixes above. No other docs affected.
(Had Claude look into this — AI-assisted review)
edobry
left a comment
There was a problem hiding this comment.
Approving as non-author to unblock merge. minsky-reviewer[bot] webhook failed to fire on HEAD 68566f9 (logged in mt#1110 calibration data as new silent-failure class — webhook-missed-on-subsequent-push, distinct from mt#1125 empty-body). All prior Chinese-wall findings addressed: heterogeneous performBatch typing, clickable ADR cross-refs, Multi-Mode title, streaming softened, task IDs, Zod rationale, pseudocode/plan reconciliation. CI 2/2 green. Had Claude approve on my behalf.
Stale — addressed in subsequent commits. Both BLOCKING findings (ADR collision, incomplete index) fixed in dcdee6c.
Adds single-retry-on-empty semantics to the reviewer service. When GPT-5 (or any o-series reasoning model) exhausts max_completion_tokens on hidden reasoning tokens, a second attempt with reasoning_effort=low shifts the budget toward visible output. OpenAI-only, one retry, no backoff. See PR #748. User-authorized merge past reviewer-bot: the webhook did not fire for this PR across two separate pushes (original 23d799b at 01:10 UTC, merge-commit 699f870 at 06:23 UTC — neither received any review over 5+ and 15+ min respectively). This is the second confirmed instance of the webhook-missed-on-subsequent-push class (first was #677 iter 3). Substantive work is complete and tested: 113 reviewer-service tests pass, including the 6 new callReviewerWithRetry cases and full coexistence with mt#1126 tool-use + mt#1085 tier routing.
…+ post-mortem docs (mt#1372)
## Summary
Implements the rescoped mt#1372 forward instrumentation: every webhook delivery received by the reviewer service is now persisted to a `reviewer_webhook_events` Postgres table, enabling forensic analysis of missed-review incidents that would otherwise leave no trace after Railway log and GitHub App delivery-history retention windows expire.
**What ships:**
- **Drizzle schema** (`src/db/schemas/webhook-events-schema.ts`): `reviewer_webhook_events` table with `delivery_id` UNIQUE, `event_type`, `headers` (jsonb subset), `body` (jsonb), `outcome` (pgEnum, 8 values), `error_details` (jsonb), `received_at`, `processed_at`; three indexes
- **SQL migration** (`migrations/pg/0001_webhook_events.sql`): creates enum type and table; journal updated
- **Persistence module** (`src/webhook-events.ts`): `recordWebhookReceipt` (ON CONFLICT DO NOTHING — first-delivery wins, terminal outcomes preserved on re-delivery), `updateOutcome` (sets `processedAt` on terminal outcomes), `pruneOldRows` (returns count or -1), `extractPersistedHeaders` (truncates HMAC signature to 12-char prefix)
- **server.ts wiring**: synthesizes `synthetic-${crypto.randomUUID()}` for missing x-github-delivery (R2 fix); `recordWebhookReceipt` on every POST regardless of header presence (R1 fix #3); `updateOutcome` at each pipeline stage (reviewer_called, review_submitted, failed_at_reviewer, skipped, failed_at_signature, failed_at_tier_resolve), 24h in-process retention pruner, `log.error("webhook_processing_failed", ...)` OperatorNotify on dispatch/reviewer failures
- **Unit tests** (`src/webhook-events.test.ts` + `src/server.test.ts`): 19 + 9 tests, including 3 pinned regression tests for R1/R2 BLOCKING findings
- **Smoke script** (`scripts/smoke-webhook-events.ts`): POSTs a signed webhook, queries the DB 500ms later via raw SQL (decoupled from src/ TS imports — R1 fix #2), asserts row shape; skips gracefully when env vars are absent (exit 0 SKIP)
- **Post-mortem guide** (`docs/incidents/reviewer-webhook-investigation.md`): 6 SQL queries, investigation workflow, Railway log correlation, retention policy reference
**Scope boundary** (per task spec): no changes to review-worker.ts, providers.ts, sweeper.ts, prior-review-summary.ts, mcp-client.ts, tier-routing.ts, config.ts, logger.ts.
## R1 + R2 review findings addressed
- [R1 BLOCKING] UPSERT overwrites terminal outcomes on re-delivery → onConflictDoUpdate → onConflictDoNothing (commit `9d6268792`)
- [R1 BLOCKING] Smoke script imports TS schema from src/ → switched to raw SQL via postgres-js tagged template (commit `9d6268792`)
- [R1 BLOCKING] POSTs missing x-github-event are skipped → persist every POST with "unknown" sentinel (commit `9d6268792`)
- [R2 BLOCKING] Missing x-github-delivery collapses all malformed POSTs into one row → synthesize `synthetic-${crypto.randomUUID()}` per request (commit `5ca07ab11`)
## Test plan
- [x] `bun test src/webhook-events.test.ts src/server.test.ts` — 28/28 pass (see Execution evidence below)
- [x] `bun run scripts/smoke-webhook-events.ts` — exits 0 (SKIP) when env vars absent
- [ ] Post-deploy: run smoke script with `MINSKY_REVIEWER_WEBHOOK_SECRET`, `MINSKY_SESSIONDB_POSTGRES_URL`, and `SMOKE_REVIEWER_BASE_URL` set against the Railway environment — asserts row in `reviewer_webhook_events` with correct fields
### Execution evidence:
```
$ cd services/reviewer && bun test --preload ../../tests/setup.ts --timeout=15000 src/webhook-events.test.ts src/server.test.ts
(pass) extractPersistedHeaders > extracts all four canonical headers when present
(pass) extractPersistedHeaders > omits absent headers rather than storing null/undefined
(pass) extractPersistedHeaders > partial headers — only present ones are included
(pass) extractPersistedHeaders > signature prefix is always 12 chars even for short signatures
(pass) recordWebhookReceipt > calls db.insert with correct arguments
(pass) recordWebhookReceipt > does not throw when DB insert fails
(pass) recordWebhookReceipt > handles non-JSON body by wrapping in { raw } object
(pass) recordWebhookReceipt > uses ON CONFLICT DO NOTHING (R1 BLOCKING #1 regression — re-delivery must not overwrite terminal outcomes)
(pass) recordWebhookReceipt > accepts 'unknown' sentinel eventType (R1 BLOCKING #3 regression — every POST is persisted)
(pass) updateOutcome > calls db.update and db.where with the delivery ID
(pass) updateOutcome > does NOT set processedAt for non-terminal outcomes
(pass) updateOutcome > includes errorDetails when provided
(pass) updateOutcome > does not throw when DB update fails
(pass) updateOutcome > skipped is a terminal outcome
(pass) updateOutcome > received is a non-terminal outcome
(pass) pruneOldRows > calls db.delete and returns the row count
(pass) pruneOldRows > returns -1 when DB delete fails
(pass) pruneOldRows > returns 0 when no rows are deleted
(pass) pruneOldRows > uses default retention of 90 days when not specified
(pass) webhook handler > responds 200 immediately for pull_request.opened, before runReview finishes
(pass) webhook handler > responds 400 when signature header is absent
(pass) webhook handler > responds 401 when signature is invalid
(pass) webhook handler > skips draft PRs — returns 200 without scheduling a review
(pass) webhook handler > /health returns inflightCount
(pass) webhook handler > missing x-github-delivery synthesizes a unique synthetic-<uuid> per request (R2 BLOCKING regression)
(pass) gracefulShutdown > logs shutdown_drain_start with correct inflightCount, then shutdown_drain_complete
(pass) gracefulShutdown > shutdown_drain_start carries inflightCount=0 when no reviews in flight
(pass) gracefulShutdown > review errors do NOT prevent graceful shutdown from completing
28 pass
0 fail
55 expect() calls
Ran 28 tests across 2 files. [320.00ms]
```
## Concurrency analysis
(N/A — no check-then-act pattern introduced. ON CONFLICT DO NOTHING is structurally atomic in Postgres; per-request UUID synthesis is intra-process and non-shared.)
## Forensic context
Three confirmed instances of missed reviews (PR #677, #748, #761, 2026-04-23/24) could not be diagnosed because GitHub App delivery history (14-day retention) and Railway logs were both expired. This table captures all webhook events from deployment of this PR onward. See `docs/incidents/reviewer-webhook-investigation.md` for investigation queries.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: minsky-ai[bot] <minsky-ai[bot]@users.noreply.github.com>
Summary
Proposes ADR-007: Cognition Provider Abstraction for Multi-Mode AI Operation — a first-class domain abstraction that lets Minsky AI-using features operate in three modes:
AICompletionService(existing behavior, standalone CLI with API key)CognitionBundlefor the surrounding agent harness (Claude Code, Cursor, MCP host) to executeAfter this lands,
AICompletionServicebecomes an implementation detail ofDirectCognitionProvider. Feature code consumesCognitionProviderat the domain layer, neverAICompletionServicedirectly.Why this matters
/assessin Claude Code with no Minsky config and no API key still gets a useful result.Why now
mt#321 (agent-readiness assessment, first consumer via mt#1063) is the first feature that will be user-facing in all three contexts. It cannot ship coherently without this abstraction, and every subsequent AI-using feature benefits from having the pattern formalized rather than reinvented per-feature.
What is in this PR
docs/architecture/adr-007-cognition-provider-abstraction.md— the ADR (Proposed status)docs/architecture.md— ADR Index updated to include ADR-004, ADR-005 (ForgeBackend), ADR-006 (Agent Identity), and ADR-007 (this ADR)Review iteration history
mt#321.2shorthand as BLOCKING, addressed in9b5319c3e.dcdee6ccd(rename to ADR-007, index extended).What is next (separate PRs)
CognitionProvider+ three providers + mode resolution (this task, mt#1057 Phase 1)Test plan
performBatchtype signature supports heterogeneous tasks via tuple inference🤖 Generated with Claude Code