Skip to content

docs(mt#1057): add ADR-007 for cognition provider abstraction#677

Merged
edobry merged 5 commits into
mainfrom
task/mt-1057
Apr 24, 2026
Merged

docs(mt#1057): add ADR-007 for cognition provider abstraction#677
edobry merged 5 commits into
mainfrom
task/mt-1057

Conversation

@minsky-ai
Copy link
Copy Markdown
Contributor

@minsky-ai minsky-ai Bot commented Apr 22, 2026

Summary

Proposes ADR-007: Cognition Provider Abstraction for Multi-Mode AI Operation — a first-class domain abstraction that lets Minsky AI-using features operate in three modes:

  • Direct — Minsky owns its AI access via AICompletionService (existing behavior, standalone CLI with API key)
  • Delegated — Minsky packages cognitive work as a CognitionBundle for the surrounding agent harness (Claude Code, Cursor, MCP host) to execute
  • Degraded — no cognition available; callers supply deterministic fallbacks or fail cleanly

After this lands, AICompletionService becomes an implementation detail of DirectCognitionProvider. Feature code consumes CognitionProvider at the domain layer, never AICompletionService directly.

Renumbering note: this ADR was originally drafted as ADR-005 based on the (stale) ADR Index. A Chinese-wall review caught the collision — adr-005-forgebackend-subinterfaces.md and adr-006-agent-identity.md were already on main. File renamed to adr-007-cognition-provider-abstraction.md; index updated to include all four ADRs (004, 005, 006, 007).

Why this matters

  • T0 zero-setup progressive adoption works by construction. A user running /assess in Claude Code with no Minsky config and no API key still gets a useful result.
  • Skill distribution is clean. Compiled Minsky skills carry no runtime API key dependency — the harness provides cognition.
  • MCP tool semantics match caller expectations. AI-agent callers receive evidence + prompts, not opaque AI-generated text.
  • Cost alignment. Embedded mode pays for cognition once, through the harness.

Why now

mt#321 (agent-readiness assessment, first consumer via mt#1063) is the first feature that will be user-facing in all three contexts. It cannot ship coherently without this abstraction, and every subsequent AI-using feature benefits from having the pattern formalized rather than reinvented per-feature.

What is in this PR

  • docs/architecture/adr-007-cognition-provider-abstraction.md — the ADR (Proposed status)
  • docs/architecture.md — ADR Index updated to include ADR-004, ADR-005 (ForgeBackend), ADR-006 (Agent Identity), and ADR-007 (this ADR)

Review iteration history

  1. Initial ADR numbered ADR-005, self-reviewed (COMMENTED) — flagged mt#321.2 shorthand as BLOCKING, addressed in 9b5319c3e.
  2. Chinese-wall reviewer subagent (self-posted on behalf, same App identity) flagged ADR number collision — addressed in dcdee6ccd (rename to ADR-007, index extended).
  3. minsky-reviewer[bot] (separate App identity via Sprint A / mt#1073) posted CHANGES_REQUESTED reviews surfacing: heterogeneous-batch type gap, references not resolvable as links, "Dual-Mode" title vs three-mode design, streaming claim too absolute, Zod at domain boundary undocumented, bundle correlation underspecified, mode-resolution/plan conflict.
  4. This commit addresses both BLOCKING findings plus the trivial-text NON-BLOCKING items.

What is next (separate PRs)

  1. Implementation of CognitionProvider + three providers + mode resolution (this task, mt#1057 Phase 1)
  2. mt#1063 consumes the abstraction (Phase 2)
  3. Unification with mt#915 dual-path prompt generation (Phase 3)
  4. Retrofit umbrella (mt#1058) — migrate existing AI-using features
  5. MCP default-to-delegated behavioral change (Phase 5, sequenced last)
  6. Follow-up (mt#1128) to update remaining ADR-005 (cognition) references in memory files and other task specs to ADR-007

Test plan

  • ADR renumbered to ADR-007, file renamed, heading updated, no self-references remain
  • ADR Index table includes all four post-ADR-003 entries with Multi-Mode title
  • Cross-references to ADR-002 through ADR-006 in References section use clickable relative links
  • performBatch type signature supports heterogeneous tasks via tuple inference
  • ADR renders correctly in GitHub markdown viewer
  • Re-review by minsky-reviewer[bot] returns APPROVE or comments-only

🤖 Generated with Claude Code

Proposes CognitionProvider as a first-class domain abstraction for dual-mode
AI operation: direct (own API), delegated (borrow from harness), degraded
(no cognition available). AICompletionService becomes an implementation
detail of DirectCognitionProvider. First consumer: mt#321.2 (agent-readiness
assessment). Retrofit of existing features tracked in mt#1058.

Also updates docs/architecture.md ADR Index to include ADR-004 and ADR-005.
@minsky-ai minsky-ai Bot added the authorship/co-authored Co-authored by human and AI agent label Apr 22, 2026
Copy link
Copy Markdown
Contributor Author

@minsky-ai minsky-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: ADR-005 cognition provider abstraction

CI status: pass (2/2 checks green)

Coverage: 2/2 files reviewed.

Findings

[BLOCKING] docs/architecture/adr-005-cognition-provider-abstraction.md:5, :23, :66, :112, :128 — The ADR references "mt#321.2" five times as a task identifier, but that notation is not a valid Minsky task ID. The actual subtasks created under mt#321 have IDs mt#1062 (the "mt#321.1" role — core rubric engine), mt#1063 (the "mt#321.2" role — AI evaluation/synthesis), and mt#1064 (the "mt#321.3" role — assess command). The ".1/.2/.3" shorthand was convenient in planning conversation but doesn't exist in the task system. Someone opening this ADR and trying tasks_get mt#321.2 will get a "not found" error. Fix: replace inline references with real IDs (e.g., "First consumer: mt#1063 (AI criterion evaluation and synthesis, subtask of mt#321)") or state once at the top that ".1/.2/.3" is descriptive shorthand and include a mapping table. The References section already lists real IDs for the non-subtask work (mt#1057, mt#1058, mt#1059) — consistency would help.

[NON-BLOCKING] docs/architecture/adr-005-cognition-provider-abstraction.md:102 — Mode resolution sketch references hasConfiguredAIProvider() which does not currently exist in the codebase. AIConfigurationService (verified at src/domain/ai/config-service.ts) provides the underlying capability. Not blocking because ADRs describe target state, not current state — the helper is expected to be created during implementation. Flag so the implementation task (mt#1057 Phase 1) knows to add it alongside the provider code.

[NON-BLOCKING] The same mt#321 parent task spec I rewrote earlier today uses the same ".1/.2/.3" shorthand in its Subtasks section. That's outside this PR's diff but will cause the same lookup failure. Follow-up fix via tasks_spec_search_replace after this PR; not blocking for the ADR merge itself.

Checked and clear

  • docs/architecture.md index additions — ADR-004 file verified at docs/architecture/adr-004-two-phase-command-execution.md with status ACCEPTED (confirmed by reading the file's header, dated 2026-04-20), so "Accepted" in the new row is correct. ADR-005 correctly marked Proposed.
  • Table formatting preserved — column alignment, header row intact.
  • Cross-references to ADR-002 and ADR-003 — both paths exist at docs/architecture/adr-002-persistence-provider-architecture.md and docs/architecture/adr-003-project-level-repository-backend.md per earlier verification.
  • Claim: src/domain/ai/types.ts contains the AICompletionService interface — verified in prior read.
  • Claim: src/domain/runtime/harness-detection.ts provides detectAgentHarness() — verified in prior read.
  • Claim: mt#800 skills architecture compiles to harness targets (Claude Code skills, Cursor rules, AGENTS.md) — matches project_skills_architecture.md framing.
  • No behavioral changes in this PR (docs-only); no test impact.
  • No conflict with ADR-002/003 design patterns — explicitly positions itself as inheriting their conventions.

Spec verification

Task: mt#1057

Criterion Status Evidence
ADR-005 committed to docs/architecture/adr-005-cognition-provider-abstraction.md and referenced from docs/architecture.md ADR Index Met New file at docs/architecture/adr-005-cognition-provider-abstraction.md; index row added at docs/architecture.md:601
src/domain/cognition/ module with types Not met (deferred) This PR only ships the ADR; implementation is Phase 1 of the task
DirectCognitionProvider wrapping AICompletionService Not met (deferred) Phase 1
DelegatedCognitionProvider Not met (deferred) Phase 1
DegradedCognitionProvider Not met (deferred) Phase 1
resolveCognitionMode() Not met (deferred) Phase 1
Composition-root wiring Not met (deferred) Phase 1
Stub provider for tests Not met (deferred) Phase 1
Unit tests >90% coverage Not met (deferred) Phase 1
Integration test across modes Not met (deferred) Phase 1
Coordination with mt#915 payload shape Not met (deferred) Phase 3 per ADR implementation plan

Action required: This PR delivers only criterion #1 (ADR). The remaining criteria are deferred to subsequent PRs on the same task. On merge, mt#1057 stays IN-PROGRESS — it does NOT transition to DONE until Phase 1 implementation lands. The ADR's own §Implementation Plan documents this phasing, so the spec is internally consistent; reviewers should not treat this PR as task-completing.

Documentation impact

Updated docs/architecture.md (ADR Index) and added docs/architecture/adr-005-cognition-provider-abstraction.md in this PR.

docs/theory-of-operation.md — not updated in this PR, intentionally. The ADR describes a proposed infrastructure organ for VSM System 4; the theory-of-operation doc currently states System 4 is "Mostly missing." Updating the VSM status table to "Proposed" would be premature — the organ doesn't exist yet. When Phase 1 implementation lands, docs/theory-of-operation.md should be updated alongside to reflect the built organ. Not blocking for this PR.

(Had Claude look into this — AI-assisted review)

Replaces mt#321.2 references (shorthand that doesn't resolve as a task ID)
with the actual subtask ID mt#1063. Addresses blocking finding from self-review
on PR #677.
Copy link
Copy Markdown
Contributor Author

@minsky-ai minsky-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: ADR-005 (Cognition Provider Abstraction) — Chinese-wall review

Fresh-context review by a reviewer subagent (no access to the authoring conversation). Posted as COMMENT per self-approval restriction; findings classified as BLOCKING take precedence over the event type.

CI status: pass (2/2 checks green on current HEAD 9b5319c3e)

Coverage: 2/2 files reviewed.

Findings

[BLOCKING] docs/architecture/adr-005-cognition-provider-abstraction.mdADR number collision. The repo already has docs/architecture/adr-005-forgebackend-subinterfaces.md on main (merged for mt#1020) and docs/architecture/adr-006-agent-identity.md on main. Both are Accepted. This PR claims the same ADR-005 number as ForgeBackend. The next available ADR number is ADR-007. File must be renamed to adr-007-cognition-provider-abstraction.md and the title heading updated from # ADR-005: to # ADR-007:.

Evidence: on main, ls docs/architecture/adr-* shows adr-002-*.md, adr-003-*.md, adr-004-*.md, adr-005-forgebackend-subinterfaces.md, adr-006-agent-identity.md. The initial session git-status snapshot also explicitly listed a30be8831 docs(mt#1020): add ADR-005 for ForgeBackend sub-interface architecture in recent commits — the information was available; the authoring context missed it by reading only the (stale) ADR Index.

[BLOCKING] docs/architecture.md ADR Index — Index is incomplete. This PR adds ADR-004 and the (to-be-renumbered) cognition ADR but omits ADR-005 (ForgeBackend, Accepted) and ADR-006 (Agent Identity, Accepted). Both exist on disk; neither appears in the index. All four ADRs (004, 005 ForgeBackend, 006 Agent Identity, 007 Cognition) should be in the index after this PR lands.

Checked and clear

  • src/domain/ai/types.ts: AICompletionService interface exists at lines 93–99 exactly as the ADR describes (complete, stream, generateObject, getAvailableModels, validateConfiguration).
  • src/domain/runtime/harness-detection.ts: detectAgentHarness() exists at line 36, returns the expected AgentHarness union. The resolveCognitionMode sketch uses it correctly.
  • docs/theory-of-operation.md: exists; §System 4 reference is accurate.
  • Cross-referenced ADRs (002, 003, 004): all exist on disk; paths in the ADR body are correct.
  • ADR-004 status: file header reads **ACCEPTED** - 2026-04-20; the index row added by this PR correctly lists "Accepted".
  • Prior blocking fix (mt#321.2mt#1063): commit 9b5319c3e replaces all four occurrences cleanly; no mt#321.2 references remain.
  • detectAgentHarness() usage in ADR: sketch's detectAgentHarness() !== "standalone" works against the real function signature.
  • ADR content coherence: the three-mode (direct/delegated/degraded) design is internally consistent; does not contradict ADR-002 (explicitly inherits capability-provider pattern), ADR-003 (ForgeBackend, different domain), ADR-004 (unrelated).
  • Delegated-mode relationship to session_generate_prompt (mt#915): appropriately flagged as future convergence, not claimed as already done.
  • All in-ADR task ID references: mt#1057, mt#1058, mt#1059, mt#1063, mt#800, mt#915, mt#762, mt#321 verified as real tasks.

Spec verification

Task: mt#1057

Criterion Status Evidence
ADR committed to docs/architecture/adr-005-cognition-provider-abstraction.md and referenced from ADR Index Not met — blocking File exists but must be renumbered ADR-007 (ADR-005 already taken by ForgeBackend). Index additions are incomplete.
src/domain/cognition/ module with types Not met (deferred) Phase 1
DirectCognitionProvider / DelegatedCognitionProvider / DegradedCognitionProvider Not met (deferred) Phase 1
resolveCognitionMode() Not met (deferred) Phase 1
Composition-root wiring Not met (deferred) Phase 1
Stub provider for tests Not met (deferred) Phase 1
Unit tests >90% coverage Not met (deferred) Phase 1
Integration test across modes Not met (deferred) Phase 1
Coordination with mt#915 payload shape Not met (deferred) Phase 3

Action required: Rename the ADR file to ADR-007, update its heading, and extend the ADR Index to include ADR-005 (ForgeBackend), ADR-006 (Agent Identity), and ADR-007 (Cognition). After fix, re-review on the updated HEAD.

Documentation impact

Updated docs/architecture.md (ADR Index) and added new ADR file in this PR. Both blocking findings are IN the documentation change itself — no upstream/downstream doc impact.

(Chinese-wall review by a reviewer subagent — fresh context, no access to the authoring conversation)

Copy link
Copy Markdown

@minsky-reviewer minsky-reviewer Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Independent adversarial review (Chinese-wall)
Reviewer: minsky-reviewer[bot] via openai:gpt-5
Tier: unknown


Findings

  • [BLOCKING] docs/architecture/adr-005-cognition-provider-abstraction.md:1 — Title says “Dual-Mode AI Operation” while the ADR explicitly defines three modes (Direct, Delegated, Degraded) at lines ~36–52 (“Three modes are supported, selected at runtime...”). This inconsistency will confuse future readers about whether two or three modes are authoritative. Please align the title and all references (e.g., index entry) to reflect three modes or justify why “dual-mode” excludes “degraded.”

  • [BLOCKING] docs/architecture/adr-005-cognition-provider-abstraction.md:~101–111 — The Mode Resolution pseudocode defaults MCP invocations to delegated mode:
    if (ctx.invocation === "mcp" || detectAgentHarness() !== "standalone") {
    return "delegated";
    }
    However, the Implementation Plan (Phase 5, lines ~160–167) states “Once the abstraction is stable, MCP tool handlers default to delegated mode… sequenced last.” These conflict on when the MCP default changes. Either fix the pseudocode to reflect current behavior (no default-to-delegated yet) or adjust the plan and call out the behavioral change in release notes.

  • [BLOCKING] docs/architecture/adr-005-cognition-provider-abstraction.md:~170–187 — The “References” section lists ADR-002/003/004 and internal files as plain text, not links. The PR test plan requires “Cross-references to ADR-002, ADR-003, ADR-004 resolve,” but there are no actual links to resolve. Convert these to working relative links (e.g., ADR-002) and verify they render/click through.

  • [NON-BLOCKING] docs/architecture.md:~704–705 — ADR-004 is marked “Accepted” in the index, but this PR does not show ADR-004’s content or status. If ADR-004’s file status is still Proposed or Draft, this creates an inconsistency. Please confirm ADR-004’s Status field matches the index; otherwise, update the index to mirror the ADR’s own Status.

  • [NON-BLOCKING] docs/architecture/adr-005-cognition-provider-abstraction.md:~23–31 vs PR description — The PR description calls the first consumer “mt#321.2,” whereas the ADR Status line names “mt#1063 (subtask of mt#321)” as first consumer. The mismatch in identifiers can cause traceability issues. Please standardize on one identifier and, if they’re the same, state that explicitly.

  • [NON-BLOCKING] docs/architecture/adr-005-cognition-provider-abstraction.md:~85–96 — The interface sketch hard-codes Zod (ZodType). If the codebase does not already depend on Zod for runtime schemas, this ADR bakes in a technology choice prematurely. Either:

    • confirm Zod is the established schema lib and add that to the ADR’s Context/Decision rationale, or
    • abstract the schema type in the ADR (e.g., Schema) and note Zod as a likely implementation, to avoid locking in a specific dependency at the ADR level.
  • [NON-BLOCKING] docs/architecture/adr-005-cognition-provider-abstraction.md:~113–120 — The batch API enforces homogeneous task output types (performBatch(tasks: CognitionTask[])). Many real workloads bundle heterogeneous cognitive subtasks (e.g., judgments plus summaries). Consider noting this limitation and a potential extension (e.g., discriminated union tasks, per-task typing, or returning per-task results with IDs) so implementers don’t discover the constraint too late.

  • [NON-BLOCKING] docs/architecture/adr-005-cognition-provider-abstraction.md:~121–140 — Delegated execution lacks an explicit correlation/ID model for later result ingestion. CognitionBundle carries a list of tasks and an order flag, but there’s no task identifier or contract for how results are matched back if/when the harness executes them. Recommend adding a taskId field (and perhaps a bundleId) to CognitionTask/CognitionBundle and calling out the ingestion handshake at least at a high level.

  • [NON-BLOCKING] docs/architecture/adr-005-cognition-provider-abstraction.md:~146–154 — The example return objects use kinds like "delegation" and "deterministic-only" which are not CognitionResult kinds (they are the feature’s own return). That’s fine, but the ADR could benefit from a one-sentence note clarifying that the example’s outer return shape is feature-specific, to avoid readers assuming it’s part of CognitionResult.

  • [NON-BLOCKING] docs/architecture/adr-005-cognition-provider-abstraction.md:~127–131 and ~180–183 — The ADR asserts the existence of detectAgentHarness()/hasConfiguredAIProvider(), src/domain/runtime/harness-detection.ts, and src/domain/ai/types.ts (AICompletionService). If any of these paths or helpers don’t exist or use different names/locations, the ADR will contain dead references. Please verify and convert them into repository-relative links where possible.

  • [NON-BLOCKING] docs/architecture.md:~704–705 — The ADR index entry text uses the “Dual-Mode” title that conflicts with the three-mode content in ADR-005 (see blocking item 1). Update the index label after resolving the ADR title.

Spec verification table

  • ADR renders correctly in GitHub markdown viewer: Met (formatting and code fences appear conventional; no unusual syntax)
  • ADR Index table renders with all four ADRs: Met (002–005 present; two new rows added)
  • Cross-references to ADR-002, ADR-003, ADR-004 resolve: Not Met (references are plain text, not links; no resolution possible)

Documentation impact

  • This PR is documentation. However, the ADR introduces a user-visible behavior change plan (MCP default to delegated). Before implementing that change, update:
    • docs/cli and MCP tool docs to describe mode-dependent outputs and the --cognition-mode override
    • architecture.md ADR index after resolving the title/status inconsistencies
  • Add working links in ADR-005 to referenced ADRs and internal code paths to aid navigation.

Event

REQUEST_CHANGES

@minsky-ai minsky-ai Bot changed the title docs(mt#1057): add ADR-005 for cognition provider abstraction docs(mt#1057): add ADR-007 for cognition provider abstraction Apr 23, 2026
Chinese-wall review caught ADR-005 collision with forgebackend-subinterfaces.md
(mt#1020, merged 2026-04-22) and agent-identity.md (mt#953, also merged). Next
available ADR number is 007.

- Renamed adr-005-cognition-provider-abstraction.md to 007
- Updated title heading from ADR-005 to ADR-007
- Extended docs/architecture.md ADR Index with all four entries:
  ADR-004, ADR-005 (ForgeBackend), ADR-006 (Agent Identity), ADR-007 (Cognition)

Follow-up will update remaining ADR-005 cognition references in task specs.
Copy link
Copy Markdown

@minsky-reviewer minsky-reviewer Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Independent adversarial review (Chinese-wall)
Reviewer: minsky-reviewer[bot] via openai:gpt-5
Tier: unknown


Findings

  • [BLOCKING] Heterogeneous batch results are not supported by the interface sketch

    • File: docs/architecture/adr-007-cognition-provider-abstraction.md: lines 44–69 (Interface Sketch)
    • The proposed signature performBatch(tasks: CognitionTask[]): Promise<CognitionResult<T[]>>; forces all tasks in a batch to share the same output schema T. This precludes batching heterogeneous cognitive work (e.g., interleaving rubric evaluations with narrative synthesis, or mixing different criterion schemas) which is a realistic and likely use case, particularly in “delegated” mode where bundling tasks for a harness is desirable. The ADR does not justify this constraint or provide an alternative (e.g., tuple types, tagged unions, or an opaque result envelope keyed by task IDs). As written, adopting this interface will constrain implementations or force unnatural splitting into multiple batches.
    • Failure mode: Features that need to package multiple distinct CognitionTask types in one bundle cannot do so without type erasure or multiple round-trips, undermining the stated goal of clean delegated execution.
  • [BLOCKING] Cross-reference criterion in the test plan is not met (no actual links in ADR-007)

    • File: docs/architecture/adr-007-cognition-provider-abstraction.md: throughout; see “References” section lines ~160–187 and inline mentions in “Context”, “Decision”, and “Relationship to Existing Work”
    • The PR’s test plan states “Cross-references to ADR-002, ADR-003, ADR-004 resolve.” In ADR-007, these are plain text mentions and a non-linked References list. There are no markdown links to the corresponding files (e.g., ADR-002). As a result, the cross-references do not “resolve” as clickable links.
    • Failure mode: Readers cannot navigate directly; test plan not satisfied.
  • [NON-BLOCKING] Title promises “Dual-Mode” while the design defines three modes

    • File: docs/architecture/adr-007-cognition-provider-abstraction.md: line 1
    • The heading reads “Cognition Provider Abstraction for Dual-Mode AI Operation” yet the ADR formalizes three modes (Direct, Delegated, Degraded). This is likely to confuse readers and downstream consumers (and searchability).
    • Suggestion: Rename to “…for Multi-Mode” or explicitly carve “Degraded” out as not a “mode” but a failure condition, and reflect that consistently.
  • [NON-BLOCKING] Potential mismatch between ADR Index statuses and the actual ADR docs

    • File: docs/architecture.md: lines 703–706 (added rows for ADR-004..ADR-007)
    • The index marks ADR-004/005/006 as “Accepted.” If any of those ADR documents are still in Proposed/Draft, the index will be incorrect. This PR does not touch those ADR documents; please verify their headers and align statuses.
    • Failure mode: Architecture index misleads readers about decision maturity.
  • [NON-BLOCKING] Risk of broken links in ADR Index for ADR-004..ADR-006

    • File: docs/architecture.md: lines 703–706
    • New index entries assume files exist at:
      • architecture/adr-004-two-phase-command-execution.md
      • architecture/adr-005-forgebackend-subinterfaces.md
      • architecture/adr-006-agent-identity.md
    • The PR description asserts 005/006 are on main; 004 is not explicitly confirmed. If any are missing or named differently, index links will 404. Please verify.
  • [NON-BLOCKING] Domain-layer coupling to Zod in the interface sketch is undocumented and may be inconsistent with layering conventions

    • File: docs/architecture/adr-007-cognition-provider-abstraction.md: lines 18–43 (CognitionTask.schema: ZodType)
    • Baking ZodType into a domain-facing provider couples the domain abstraction to a specific validation library. ADR-002’s capability pattern and prior domain boundaries may avoid third-party types at the domain surface. If Zod is the intentional contract, call out rationale and alternatives (JSON Schema, brand types, repo-native schema abstraction) and the handling in delegated mode (how schemas are conveyed/validated by a harness).
  • [NON-BLOCKING] Delegated bundle shape likely underspecified for real harness execution

    • File: docs/architecture/adr-007-cognition-provider-abstraction.md: lines 28–43 (CognitionBundle interface)
    • Missing practical fields:
      • Stable task identifiers for correlating returned results
      • Optional dependency graph (not just parallel|sequential)
      • Idempotency keys/retry semantics
      • Explicit serialization of the output schema (if validation on return is expected)
      • Cost/priority hints
    • Without these, harnesses cannot robustly execute/return results for multi-task bundles.
  • [NON-BLOCKING] Mode resolution heuristic risks over-delegation in non-agent shells

    • File: docs/architecture/adr-007-cognition-provider-abstraction.md: lines ~90–110 (Mode Resolution snippet)
    • The condition if (ctx.invocation === "mcp" || detectAgentHarness() !== "standalone") return "delegated"; may classify generic wrapped environments (e.g., devcontainers, terminals embedded in IDEs) as non-standalone and force delegation unexpectedly. Consider limiting to explicit known harnesses or MCP invocation, or documenting the detection contract and safeguards.
  • [NON-BLOCKING] References to source file paths may be inaccurate

    • File: docs/architecture/adr-007-cognition-provider-abstraction.md: lines ~165–170 (References)
    • Paths “src/domain/ai/types.ts” and “src/domain/runtime/harness-detection.ts” are cited as existing. If filenames or locations differ (e.g., completion service in a different module), these references will mislead implementers. Please verify exact paths and update accordingly.
  • [NON-BLOCKING] Streaming claim may be too strong

    • File: docs/architecture/adr-007-cognition-provider-abstraction.md: lines ~120–130 (Consequences → Negative)
    • “Delegated mode cannot [stream].” Some harnesses can support progressive updates (e.g., streaming MCP results or partial tool outputs). Consider softening to “may not” and framing as provider-dependent to avoid unnecessarily constraining future implementations.

Spec verification table

  • ADR renumbered to ADR-007, file renamed, heading updated, no self-references remain: Met
  • ADR Index table includes all four post-ADR-003 entries: Met
  • ADR renders correctly in GitHub markdown viewer: N/A (cannot verify in code review; please confirm via UI)
  • Cross-references to ADR-002, ADR-003, ADR-004 resolve: Not Met (ADR-007 contains no clickable links to these ADRs; add markdown links)

Documentation impact

  • This PR is documentation-only but introduces an architectural contract that other docs will need to reference consistently:
    • Update ADR-007 title/wording to resolve “Dual-Mode” vs three modes.
    • Ensure docs/architecture.md statuses match the actual ADR document statuses.
    • Add working links in ADR-007 to ADR-002/003/004 and to referenced source files if you intend readers to navigate there.
    • Consider a short glossary or patterns note explaining the rationale for Zod at the domain boundary (or refactor to a domain-native schema abstraction).

Event: REQUEST_CHANGES

Copy link
Copy Markdown
Contributor Author

@minsky-ai minsky-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: ADR-007 renumber fix (post-Chinese-wall)

Fresh review of commit dcdee6ccd confirming the Chinese-wall reviewer's blocking findings are addressed. Author self-review — the independent scrutiny happened earlier; this is narrow mechanical verification.

CI status: pass (2/2 checks green on current HEAD dcdee6ccd)

Coverage: 3/3 files reviewed (the diff in this commit is: rename + architecture.md index + heading).

Findings

No blocking issues in the fix.

Checked and clear

  • Rename applied: docs/architecture/adr-005-cognition-provider-abstraction.mddocs/architecture/adr-007-cognition-provider-abstraction.md via R099 (rename with 99% similarity) in the commit diff.
  • Heading updated inside the renamed file: # ADR-005: ...# ADR-007: ... (verified via grep: no remaining ADR-005 self-references inside the file).
  • ADR Index completeness: docs/architecture.md §10 now lists ADR-002 through ADR-007 (six rows), status correct for each:
    • ADR-002: Accepted (persistence provider)
    • ADR-003: Accepted (project-level repository backend)
    • ADR-004: Accepted (two-phase command execution)
    • ADR-005: Accepted (ForgeBackend sub-interfaces) — added
    • ADR-006: Accepted (Agent Identity) — added
    • ADR-007: Proposed (cognition provider) — renumbered from previous incorrect ADR-005
  • No collision on main: verified adr-005-forgebackend-subinterfaces.md and adr-006-agent-identity.md exist on origin/main and are Accepted. ADR-007 is the next available number.
  • PR body updated (session_pr_edit) to reflect the renumbering, with renumbering-note callout for future readers.
  • Follow-up for remaining references filed: mt#1128 tracks the spec-hygiene sweep across mt#1057, mt#1058, mt#1059, mt#1060, mt#1063, mt#1064, mt#321 where "ADR-005" was used to mean the cognition ADR.
  • Memory files updated in-commit (for index only — the spec-text update is out of scope here): project_cognition_provider.md ADR-005 → ADR-007 via replace_all; MEMORY.md index line updated.

Spec verification

Task: mt#1057

Criterion Status Evidence
ADR committed to docs/architecture/adr-NNN-cognition-provider-abstraction.md and referenced from ADR Index Met File at docs/architecture/adr-007-cognition-provider-abstraction.md; index row added at docs/architecture.md:603
src/domain/cognition/ module with types Not met (deferred) This PR only ships the ADR; implementation is Phase 1 of the task
DirectCognitionProvider / DelegatedCognitionProvider / DegradedCognitionProvider Not met (deferred) Phase 1
resolveCognitionMode() Not met (deferred) Phase 1
Composition-root wiring Not met (deferred) Phase 1
Stub provider for tests Not met (deferred) Phase 1
Unit tests >90% coverage Not met (deferred) Phase 1
Integration test across modes Not met (deferred) Phase 1
Coordination with mt#915 payload shape Not met (deferred) Phase 3

Action required: None for this PR. mt#1057 stays IN-PROGRESS after merge — the remaining criteria are delivered by Phase 1 implementation in subsequent PRs on the same task. The ADR's Implementation Plan documents this phasing.

Documentation impact

Updated docs/architecture.md (ADR Index, now complete through ADR-007) and added docs/architecture/adr-007-cognition-provider-abstraction.md in this PR. No other doc updates needed; docs/theory-of-operation.md update will follow Phase 1 implementation, not the ADR.

(Had Claude look into this — AI-assisted review)

Reviewer App flagged 2 BLOCKING + 8 NON-BLOCKING findings. Addressing:

BLOCKING:
- performBatch now supports heterogeneous tasks via tuple inference (was
  forcing homogeneous output types, which defeats the batched-delegation
  use case)
- ADR References converted from plain text to clickable markdown links
  (test plan asserted resolvable cross-refs; prior version had none)

NON-BLOCKING (trivial text):
- Title: Dual-Mode to Multi-Mode (three modes, not two)
- Streaming claim softened to harness-dependent
- CognitionTask gains id field for delegated-mode correlation
- CognitionBundle gains id field
- Zod dependency rationale added inline
- Mode Resolution pseudocode clarified as target state; current MCP
  default stays direct until Phase 5 retrofit lands

ADR Index entry title synced to Multi-Mode.
Copy link
Copy Markdown
Contributor Author

@minsky-ai minsky-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: addressed reviewer-App findings (post-68566f9cf)

Fresh review covering commit 68566f9cf. All BLOCKING findings from minsky-reviewer[bot] review 4162415333 are addressed; NON-BLOCKING items are either addressed or explicitly deferred.

CI status: pass (2/2 checks green on HEAD 68566f9cf)

Coverage: 2/2 files reviewed (diff is scoped to ADR + architecture.md index row).

How each finding was handled

BLOCKING:

  • Heterogeneous batch typesperformBatch signature changed to performBatch<Ts extends readonly CognitionTask<unknown>[]>(tasks: Ts): Promise<CognitionResult<{ [K in keyof Ts]: Ts[K] extends CognitionTask<infer R> ? R : never }>>. Tuple inference preserves per-task output types across heterogeneous batches. Direct use case (rubric eval + narrative synthesis in one delegated bundle) now fits.
  • Cross-references as links — References section converted from plain text to relative markdown links for ADR-002 through ADR-006, plus source file paths (src/domain/ai/types.ts, src/domain/runtime/harness-detection.ts) and docs/theory-of-operation.md. Test plan criterion "Cross-references resolve" now met.

NON-BLOCKING (addressed in this commit):

  • Title "Dual-Mode" → "Multi-Mode" (both ADR heading and architecture.md index row).
  • Streaming claim softened from "delegated mode cannot stream" to "typically does not stream (harness-dependent)" with note that some MCP hosts support progressive outputs.
  • CognitionTask gained id: string field for delegated-mode result correlation.
  • CognitionBundle gained id: string field for bundle-level correlation.
  • Zod at domain boundary rationale added inline: "Zod is the established schema lib for this codebase; alternative libs can be swapped behind a narrower Schema adapter at implementation time."
  • Mode Resolution pseudocode clarified as target state; added a paragraph noting the current pre-Phase-5 behavior (MCP defaults to direct, flipping to delegated in Phase 5 after retrofit mt#1058 completes).

NON-BLOCKING (deferred with justification):

  • Dependency graph beyond parallel|sequential, idempotency keys, cost/priority hints in CognitionBundle — deferred to Phase 1 implementation per ADR's Open Questions. The ADR is a design doc; exhaustive bundle fields are implementation territory.
  • Mode resolution "over-delegation in non-agent shells" — the heuristic uses detectAgentHarness() which already filters to known harnesses (claude-code, cursor, standalone). Generic devcontainers return standalone per the existing implementation. No change needed.
  • Source file paths in References — verified to exist (confirmed against current main) and now linked clickably.
  • "Example return kinds like 'delegation' and 'deterministic-only' are feature-specific, not CognitionResult kinds" — the illustrative code comment already shows the outer shape as a feature's own return; adding a note is over-explanatory for a sketch. Leaving as-is.

Checked and clear

  • Commit 68566f9cf applies cleanly on top of dcdee6ccd.
  • ADR title matches index row label (both "Multi-Mode").
  • All relative links in References section point at files that exist (verified on origin/main: adr-002 through adr-006, src paths, theory-of-operation.md).
  • performBatch type expression is valid TS (readonly tuple with mapped-type inference — standard pattern).
  • No ADR-005 references remain in the ADR body referring to cognition (only to ForgeBackend in References, correctly).
  • CI 2/2 green.

Spec verification

Task: mt#1057

Criterion Status Evidence
ADR committed with correct number + index reference Met adr-007-cognition-provider-abstraction.md present; index row exists with Multi-Mode title
src/domain/cognition/ module with types Not met (deferred) Phase 1
DirectCognitionProvider / DelegatedCognitionProvider / DegradedCognitionProvider Not met (deferred) Phase 1
resolveCognitionMode() Not met (deferred) Phase 1
Composition-root wiring Not met (deferred) Phase 1
Stub provider for tests Not met (deferred) Phase 1
Unit tests >90% coverage Not met (deferred) Phase 1
Integration test across modes Not met (deferred) Phase 1
Coordination with mt#915 payload shape Not met (deferred) Phase 3

Action required: None for this PR. Awaiting minsky-reviewer[bot] re-review on 68566f9cf to confirm blocking findings addressed.

Documentation impact

Updated docs/architecture.md (ADR Index title synced to Multi-Mode) and modified docs/architecture/adr-007-cognition-provider-abstraction.md with the fixes above. No other docs affected.

(Had Claude look into this — AI-assisted review)

Copy link
Copy Markdown
Owner

@edobry edobry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving as non-author to unblock merge. minsky-reviewer[bot] webhook failed to fire on HEAD 68566f9 (logged in mt#1110 calibration data as new silent-failure class — webhook-missed-on-subsequent-push, distinct from mt#1125 empty-body). All prior Chinese-wall findings addressed: heterogeneous performBatch typing, clickable ADR cross-refs, Multi-Mode title, streaming softened, task IDs, Zod rationale, pseudocode/plan reconciliation. CI 2/2 green. Had Claude approve on my behalf.

@edobry edobry dismissed stale reviews from minsky-reviewer[bot] and minsky-reviewer[bot] April 24, 2026 00:29

Stale — addressed in subsequent commits. Both BLOCKING findings (ADR collision, incomplete index) fixed in dcdee6c.

@edobry edobry merged commit 5d7df01 into main Apr 24, 2026
2 checks passed
@edobry edobry deleted the task/mt-1057 branch April 24, 2026 00:29
edobry pushed a commit that referenced this pull request Apr 24, 2026
Adds single-retry-on-empty semantics to the reviewer service. When GPT-5 (or any o-series reasoning model) exhausts max_completion_tokens on hidden reasoning tokens, a second attempt with reasoning_effort=low shifts the budget toward visible output. OpenAI-only, one retry, no backoff. See PR #748.

User-authorized merge past reviewer-bot: the webhook did not fire for this PR across two separate pushes (original 23d799b at 01:10 UTC, merge-commit 699f870 at 06:23 UTC — neither received any review over 5+ and 15+ min respectively). This is the second confirmed instance of the webhook-missed-on-subsequent-push class (first was #677 iter 3). Substantive work is complete and tested: 113 reviewer-service tests pass, including the 6 new callReviewerWithRetry cases and full coexistence with mt#1126 tool-use + mt#1085 tier routing.
edobry added a commit that referenced this pull request May 11, 2026
…+ post-mortem docs (mt#1372)

## Summary

Implements the rescoped mt#1372 forward instrumentation: every webhook delivery received by the reviewer service is now persisted to a `reviewer_webhook_events` Postgres table, enabling forensic analysis of missed-review incidents that would otherwise leave no trace after Railway log and GitHub App delivery-history retention windows expire.

**What ships:**

- **Drizzle schema** (`src/db/schemas/webhook-events-schema.ts`): `reviewer_webhook_events` table with `delivery_id` UNIQUE, `event_type`, `headers` (jsonb subset), `body` (jsonb), `outcome` (pgEnum, 8 values), `error_details` (jsonb), `received_at`, `processed_at`; three indexes
- **SQL migration** (`migrations/pg/0001_webhook_events.sql`): creates enum type and table; journal updated
- **Persistence module** (`src/webhook-events.ts`): `recordWebhookReceipt` (ON CONFLICT DO NOTHING — first-delivery wins, terminal outcomes preserved on re-delivery), `updateOutcome` (sets `processedAt` on terminal outcomes), `pruneOldRows` (returns count or -1), `extractPersistedHeaders` (truncates HMAC signature to 12-char prefix)
- **server.ts wiring**: synthesizes `synthetic-${crypto.randomUUID()}` for missing x-github-delivery (R2 fix); `recordWebhookReceipt` on every POST regardless of header presence (R1 fix #3); `updateOutcome` at each pipeline stage (reviewer_called, review_submitted, failed_at_reviewer, skipped, failed_at_signature, failed_at_tier_resolve), 24h in-process retention pruner, `log.error("webhook_processing_failed", ...)` OperatorNotify on dispatch/reviewer failures
- **Unit tests** (`src/webhook-events.test.ts` + `src/server.test.ts`): 19 + 9 tests, including 3 pinned regression tests for R1/R2 BLOCKING findings
- **Smoke script** (`scripts/smoke-webhook-events.ts`): POSTs a signed webhook, queries the DB 500ms later via raw SQL (decoupled from src/ TS imports — R1 fix #2), asserts row shape; skips gracefully when env vars are absent (exit 0 SKIP)
- **Post-mortem guide** (`docs/incidents/reviewer-webhook-investigation.md`): 6 SQL queries, investigation workflow, Railway log correlation, retention policy reference

**Scope boundary** (per task spec): no changes to review-worker.ts, providers.ts, sweeper.ts, prior-review-summary.ts, mcp-client.ts, tier-routing.ts, config.ts, logger.ts.

## R1 + R2 review findings addressed

- [R1 BLOCKING] UPSERT overwrites terminal outcomes on re-delivery → onConflictDoUpdate → onConflictDoNothing (commit `9d6268792`)
- [R1 BLOCKING] Smoke script imports TS schema from src/ → switched to raw SQL via postgres-js tagged template (commit `9d6268792`)
- [R1 BLOCKING] POSTs missing x-github-event are skipped → persist every POST with "unknown" sentinel (commit `9d6268792`)
- [R2 BLOCKING] Missing x-github-delivery collapses all malformed POSTs into one row → synthesize `synthetic-${crypto.randomUUID()}` per request (commit `5ca07ab11`)

## Test plan

- [x] `bun test src/webhook-events.test.ts src/server.test.ts` — 28/28 pass (see Execution evidence below)
- [x] `bun run scripts/smoke-webhook-events.ts` — exits 0 (SKIP) when env vars absent
- [ ] Post-deploy: run smoke script with `MINSKY_REVIEWER_WEBHOOK_SECRET`, `MINSKY_SESSIONDB_POSTGRES_URL`, and `SMOKE_REVIEWER_BASE_URL` set against the Railway environment — asserts row in `reviewer_webhook_events` with correct fields

### Execution evidence:

```
$ cd services/reviewer && bun test --preload ../../tests/setup.ts --timeout=15000 src/webhook-events.test.ts src/server.test.ts

(pass) extractPersistedHeaders > extracts all four canonical headers when present
(pass) extractPersistedHeaders > omits absent headers rather than storing null/undefined
(pass) extractPersistedHeaders > partial headers — only present ones are included
(pass) extractPersistedHeaders > signature prefix is always 12 chars even for short signatures
(pass) recordWebhookReceipt > calls db.insert with correct arguments
(pass) recordWebhookReceipt > does not throw when DB insert fails
(pass) recordWebhookReceipt > handles non-JSON body by wrapping in { raw } object
(pass) recordWebhookReceipt > uses ON CONFLICT DO NOTHING (R1 BLOCKING #1 regression — re-delivery must not overwrite terminal outcomes)
(pass) recordWebhookReceipt > accepts 'unknown' sentinel eventType (R1 BLOCKING #3 regression — every POST is persisted)
(pass) updateOutcome > calls db.update and db.where with the delivery ID
(pass) updateOutcome > does NOT set processedAt for non-terminal outcomes
(pass) updateOutcome > includes errorDetails when provided
(pass) updateOutcome > does not throw when DB update fails
(pass) updateOutcome > skipped is a terminal outcome
(pass) updateOutcome > received is a non-terminal outcome
(pass) pruneOldRows > calls db.delete and returns the row count
(pass) pruneOldRows > returns -1 when DB delete fails
(pass) pruneOldRows > returns 0 when no rows are deleted
(pass) pruneOldRows > uses default retention of 90 days when not specified
(pass) webhook handler > responds 200 immediately for pull_request.opened, before runReview finishes
(pass) webhook handler > responds 400 when signature header is absent
(pass) webhook handler > responds 401 when signature is invalid
(pass) webhook handler > skips draft PRs — returns 200 without scheduling a review
(pass) webhook handler > /health returns inflightCount
(pass) webhook handler > missing x-github-delivery synthesizes a unique synthetic-<uuid> per request (R2 BLOCKING regression)
(pass) gracefulShutdown > logs shutdown_drain_start with correct inflightCount, then shutdown_drain_complete
(pass) gracefulShutdown > shutdown_drain_start carries inflightCount=0 when no reviews in flight
(pass) gracefulShutdown > review errors do NOT prevent graceful shutdown from completing

 28 pass
 0 fail
 55 expect() calls
Ran 28 tests across 2 files. [320.00ms]
```

## Concurrency analysis

(N/A — no check-then-act pattern introduced. ON CONFLICT DO NOTHING is structurally atomic in Postgres; per-request UUID synthesis is intra-process and non-shared.)

## Forensic context

Three confirmed instances of missed reviews (PR #677, #748, #761, 2026-04-23/24) could not be diagnosed because GitHub App delivery history (14-day retention) and Railway logs were both expired. This table captures all webhook events from deployment of this PR onward. See `docs/incidents/reviewer-webhook-investigation.md` for investigation queries.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: minsky-ai[bot] <minsky-ai[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

authorship/co-authored Co-authored by human and AI agent

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant