feat(mt#1083): Sprint A: minsky-reviewer service scaffold (Railway webhook, cross-provider adversarial review) by minsky-ai[bot] · Pull Request #692 · edobry/minsky

minsky-ai · 2026-04-22T19:47:33Z

Summary

First shippable slice of the Chinese-wall reviewer App architecture (parent: mt#1073). Adds a standalone TypeScript webhook service in services/reviewer/ that posts adversarial PR reviews under a separate GitHub App identity, using a different model provider than the implementer.

Closes Sprint A from the Structural Review position paper roadmap. Sprints B and C add role specialization, ensemble voting, and structural metrics.

Architecture

GitHub PR opened / synchronized
    ↓ webhook (signed)
Railway-hosted service (stateless, this package)
    ├─ Verify signature via webhooks.verifyAndReceive
    ├─ Fetch PR diff + metadata via reviewer App token
    ├─ Tier check (Tier 1 skip, Tier 2 opt-in, Tier 3 mandatory)
    ├─ Run adversarial review prompt on different-provider model
    └─ Post review via minsky-reviewer App identity

Levers engaged (five of nine)

Lever	How
Context isolation	Webhook spawns a fresh session per PR; no shared state with implementer
Capability asymmetry	Reviewer App: `pull-requests: write`, `contents: read`. No code-write permissions. Structural.
Adversarial prompting	Critic Constitution system prompt; explicit flaw-finding framing, evidence-based findings, rejection bias
Identity separation	`minsky-reviewer` App distinct from `minsky-ai`; legitimately non-author for APPROVE events
Model diversity	Default `REVIEWER_PROVIDER=openai` (GPT-5); Gemini 2.5 Pro alternative; Anthropic available as fallback but weakly recommended (same family as implementer)

Deferred

Sprint B: role specialization (security / performance / maintainability / spec-compliance reviewers), ensemble voting, consensus aggregation
Sprint C: temporal integration (CI-result awareness, prior-PR history), reviewer metrics, feedback loops

Files

services/reviewer/README.md, DEPLOY.md — setup and deployment documentation
services/reviewer/package.json, tsconfig.json, Dockerfile — build config, Railway-compatible
services/reviewer/src/config.ts — env-var loading with fail-fast validation
services/reviewer/src/prompt.ts — Critic Constitution + per-PR user prompt builder
services/reviewer/src/providers.ts — OpenAI / Google / Anthropic wrappers behind uniform interface
services/reviewer/src/github-client.ts — App auth via @octokit/auth-app + PR fetch + review submit
services/reviewer/src/tier-routing.ts — tier-based activation decision
services/reviewer/src/review-worker.ts — orchestrator, self-review detection, COMMENT-fallback for identity collision
services/reviewer/src/server.ts — Bun webhook server with signature verification via verifyAndReceive
services/reviewer/src/*.test.ts — 18 tests covering tier routing and event parsing
scripts/lint-console-usage.ts — extended excludePatterns to cover services/ (standalone deployables use their own structured logging)

Test / validation

bun test services/reviewer/src/ — 18/18 pass
bun test --preload ./tests/setup.ts --timeout=15000 src tests/adapters tests/domain — 2040/2040 pass (no regressions)
mcp__minsky__validate_typecheck — clean
mcp__minsky__validate_lint — clean at 30-warning threshold (1 added is from node_modules being scanned in the new services/reviewer/node_modules)

User action required before deploy

Create minsky-reviewer GitHub App (distinct from minsky-ai) with permissions:
- pull-requests: write (submit reviews)
- contents: read (fetch diff and codebase)
- No write permissions beyond PR reviews
Install Railway CLI and authenticate: bash <(curl -fsSL cli.new) or brew install railway
Pick a model provider: GPT-5 (recommended, default) or Gemini 2.5 Pro — both break correlated-failure pattern with Claude implementer
Follow services/reviewer/DEPLOY.md for the deploy sequence (variables, domain, webhook registration)

What this PR itself exercises

This PR has  in its body, which means after mt#1073 and this service are deployed, future Tier-2 PRs with the MINSKY_REVIEWER_TIER2_ENABLED=true flag set will get reviewed automatically by minsky-reviewer[bot]. This PR is the bootstrap, so it still gets the manual Chinese-wall review via the reviewer subagent type before merge.

First shippable slice of mt#1073 adversarial reviewer architecture. Stateless TypeScript webhook service designed to run on Railway, posting PR reviews under a separate minsky-reviewer GitHub App with a different model provider than the implementer. Levers engaged in v1 (5 of 9 from the Structural Review paper): - Context isolation: fresh session per PR via webhook dispatch - Capability asymmetry: reviewer App pull-requests:write and contents:read only - Adversarial prompting: Critic Constitution system prompt frames role as flaw-finding - Identity separation: minsky-reviewer App distinct from minsky-ai; APPROVE authority - Model diversity: routes to OpenAI GPT-5 or Google Gemini by default Deferred to Sprint B: role specialization, ensemble voting. Deferred to Sprint C: temporal integration, metrics, structural incentives. Also extends scripts/lint-console-usage.ts excludePatterns with services/, since standalone deployables use their own structured logging and do not depend on Minsky logger. Validation: 18 of 18 service tests pass, 2040 of 2040 full project suite passes, typecheck clean, lint clean at the 30-warning threshold. User action before deploy: create minsky-reviewer GitHub App distinct from minsky-ai, install Railway CLI and configure env vars per DEPLOY.md, register webhook URL in the App settings.

minsky-ai

Independent adversarial review (Chinese-wall)

Submitted by the main agent on behalf of a reviewer subagent that ran in an isolated context with no access to the implementer's session state. Body transmitted here because the subagent lacks session_pr_review_submit tool access (tracked as mt#1080).

Files reviewed: 11 source + 2 test + Dockerfile + DEPLOY.md + README.md + scripts/lint-console-usage.ts
Issues found: 2 blocking, 7 non-blocking

Findings

[BLOCKING] services/reviewer/Dockerfile:6 — Lockfile mismatch prevents Docker build.

COPY package.json bun.lockb* ./ matches bun.lockb (old binary format) but not bun.lock (the text format Bun 1.2+ generates, which is what is actually committed). Service requires bun >=1.2.0 and uses oven/bun:1.2-slim as base. Since bun.lock is never copied, RUN bun install --frozen-lockfile fails. Every Railway deploy would fail at build.

Fix: COPY package.json bun.lock ./. Addressed in follow-up commit.

[BLOCKING] services/reviewer/src/server.ts:47-73 — All review failures return HTTP 200; GitHub will not retry.

handlePullRequestEvent catches every error, logs, returns void. webhooks.verifyAndReceive resolves, server returns 200 for every delivery including failed ones. GitHub only retries on 4xx/5xx. A Tier-3 PR (mandatory review) where the model or GitHub API times out gets silently dropped — the mandatory-review guarantee is broken by a single transient error.

Fix: do not swallow errors in handlePullRequestEvent. Let them propagate to verifyAndReceive → 500 → retry. Accepts cost of duplicate reviews on flaky errors; Sprint B adds per-SHA idempotency. Addressed in follow-up commit.

[NON-BLOCKING] services/reviewer/src/config.ts — No runtime warning when REVIEWER_PROVIDER=anthropic. An operator who sets this without reading docs deploys a degraded service (context-isolation only, no architectural diversity). Addressed: added a degraded_config_warning log at startup.

[NON-BLOCKING] services/reviewer/README.md:57-58 — False claim about provenance lookup. Sprint A has no provenance lookup; it only reads the PR body HTML comment. Addressed: README rewritten to describe actual Sprint A behavior vs. Sprint B/C.

[NON-BLOCKING] services/reviewer/src/tier-routing.ts:54-59 — Null tier silently skips Tier-3 PRs that forget the marker. Not fixed this round — the mandatory-Tier-3 guarantee depends on the AI implementer correctly self-labeling. Sprint B's MCP-based provenance lookup eliminates this failure mode; not worth a fragile bandaid. Filed as follow-up task mt#1084.

[NON-BLOCKING] services/reviewer/src/review-worker.ts:51 — Self-review check uses strict === without case normalization. GitHub usernames are case-insensitive at platform level. Addressed: comparison now uses .toLowerCase().

[NON-BLOCKING] services/reviewer/README.md:26 — References non-existent scripts/create-github-app.ts. Script exists in /tmp from mt#997 but was never committed. Addressed: removed the reference; users do manual App creation for now. mt#997 still tracks script preservation separately.

[NON-BLOCKING] services/reviewer/src/providers.ts — No timeouts on any network call. Real operational risk — a hung model call will hold the webhook response indefinitely. Filed as mt#1085 for follow-up. Not blocking v1 because webhooks have platform-level timeouts (~30s) that partially mitigate, but should be added properly in Sprint B.

[NON-BLOCKING] scripts/lint-console-usage.ts:43 — **/services/** exclusion broader than necessary. Addressed: narrowed to **/services/reviewer/** specifically, with a comment noting that future services must opt in explicitly.

Checked and clear

Webhook signature verification is correct: HMAC-SHA256 via verifyAndReceive; isSignatureError heuristic correctly discriminates signature failures from handler errors.
App auth flow correct; private key never logged; diff fetch uses correct mediaType: { format: "diff" } pattern.
Critic Constitution prompt is substantive and adversarial; buildReviewPrompt handles null tier and missing spec correctly.
All three provider wrappers use correct API shapes for their SDK versions; Anthropic TextBlock filtering is correct.
requireEnv provides fail-fast validation; private key stored in config but not logged; no secret leakage path identified.
18 tier-routing + event-parsing tests are correct for what they cover.
bun.lock correctly tracked; stale bun.lockb gitignore entry is harmless (but removed anyway for cleanliness).

Spec verification (mt#1083)

Criterion	Status	Evidence
`minsky-reviewer` App created	Not met	User action; tracked in PR description
`TokenProvider` role-keyed extension	Deferred	Sprint A scaffold uses a standalone App-auth path; role-keyed extension is Sprint B
Railway service deployed	Not met	User action; scaffold ready for deploy
Service verifies webhook signature	Met	`webhooks.verifyAndReceive` at `server.ts:97-102`
Fetches PR diff + metadata	Met	`github-client.ts:41-61`
Runs on different model provider	Met (configurable)	`providers.ts`; `config.ts:56-59` validates provider
Adversarial prompt (Critic Constitution)	Met	`prompt.ts:13`
Review posted under reviewer identity	Met	Identity pulled from App token; `review-worker.ts:44-52`
Tier-routing mandatory for Tier 3	Met (with marker caveat — see finding #4)	`tier-routing.ts:50-54`
End-to-end smoke test: broken PR → REQUEST_CHANGES	Pending	Cannot smoke-test until user creates the App and deploys
End-to-end smoke test: clean PR → APPROVE	Pending	Same
Misconfigured credentials fail loudly	Met	`requireEnv` throws with descriptive message
Provider diversity when implementer uses Claude	Met (with degraded-fallback warning)	Default is OpenAI; Anthropic available but logs warning
Reviewer cannot write to repo	Met (structural)	App permissions set to pull-requests:write + contents:read only per README
Documentation	Met (after fixes)	README corrected for Sprint A accuracy; DEPLOY.md covers sequence

Documentation impact

Updated services/reviewer/README.md and added a §"Review before asking the user" stopgap rule to CLAUDE.md in earlier PR (mt#1030). No update needed to docs/architecture.md or docs/theory-of-operation.md — the reviewer service is a standalone deployable scoped under services/.

What this review found that the self-review did not

The self-review (posted to this PR thread shortly after creation) did not flag either blocking issue:

Dockerfile lockfile glob mismatch — self-review framed the Dockerfile as "Railway-compatible" without actually checking the COPY glob against the committed lockfile filename. Chinese-wall reviewer spotted it in a single pass.
HTTP 200 on error — self-review focused on happy-path correctness; missed that the webhook retry semantics were inverted by the blanket error swallow in handlePullRequestEvent.

Both are the class of issue the Chinese-wall pattern is specifically designed to catch: the implementer's mental model of "this is a new service, I'm focused on making it work" missed system-level integration concerns (Docker build, GitHub retry protocol) that the fresh-context reviewer caught immediately. Acid-test 3-for-3 for the pattern (PR #679 non-blocking, PR #683 blocking, PR #692 two blocking).

(Had Claude look into this — AI-assisted review via an independent reviewer subagent in a deliberately isolated context.)

… non-blockers Chinese-wall reviewer (mt#1073 pattern) caught 2 blocking issues on this PR's initial scaffold that would have broken deployment and quietly dropped mandatory Tier-3 reviews under transient failures. Fixing both plus several non-blocking improvements flagged in the same review. Blocking fixes: 1. Dockerfile lockfile mismatch. COPY glob was bun.lockb* matching the pre-1.2 binary format; Bun 1.2+ uses bun.lock text format which is what the repo actually tracks. Every Railway deploy would fail at the RUN bun install step. Fixed to COPY package.json bun.lock explicitly. 2. HTTP 200 on review failure. The webhook handler swallowed all errors and returned void, causing verifyAndReceive to resolve cleanly and the server to return 200 for failed reviews. GitHub only retries on 4xx/5xx. A Tier-3 PR where the model or GitHub API timed out would be silently dropped with no retry and no review, breaking the mandatory-review guarantee. Fixed by removing the try/catch in handlePullRequestEvent so errors propagate to verifyAndReceive and return 500, causing GitHub to retry. Cost: duplicate-review risk on flaky errors; Sprint B adds per-SHA idempotency. Non-blocking fixes: - Startup warning when REVIEWER_PROVIDER=anthropic — operators who set this without reading docs deploy a degraded Chinese wall (same family as Claude implementer); log a degraded_config_warning at server start. - README corrected: removed false claim about Minsky provenance lookup (that is Sprint B; Sprint A only reads PR-body marker), removed reference to scripts/create-github-app.ts (not in repo yet, mt#997 tracks). - Self-review identity check now case-insensitive via .toLowerCase(); GitHub usernames are case-insensitive at platform level. - services/ lint-console-usage exclusion narrowed to services/reviewer/ specifically, so future services must opt in explicitly. - Stale bun.lockb entry removed from .gitignore. Deferred to follow-up tasks: - mt#1085 (Sprint B prereq): switch tier lookup from PR-body marker to MCP provenance record. Eliminates the marker-forgetting failure mode the reviewer flagged (non-blocking #5). - mt#1086: explicit network-call timeouts on model and GitHub API paths. Non-blocking #8 from the review. Validation: 18/18 service tests pass, typecheck clean, lint clean at 30-warning threshold.

minsky-ai

Delta review: commit `c83c039`

Addresses the Chinese-wall reviewer's 2 blocking findings + 5 of 7 non-blockers. Two non-blockers deferred to follow-up tasks.

Blocking (now fixed)

Dockerfile lockfile glob (services/reviewer/Dockerfile:6) — bun.lockb* → bun.lock explicitly. Verified: ls services/reviewer/bun.lock exists in the repo and will be copied by the new COPY.
HTTP 200 on error (services/reviewer/src/server.ts:47-53) — removed the try/catch in handlePullRequestEvent; errors now propagate to verifyAndReceive → 500 → GitHub retry. Comment explains the duplicate-review risk trade-off and points at Sprint B's idempotency work.

Non-blocking (fixed in this commit)

server.ts:148-156 — degraded_config_warning log at startup when REVIEWER_PROVIDER=anthropic.
README.md — removed false provenance-lookup claim; clarified Sprint A uses PR-body marker only, Sprint B uses MCP; removed stale scripts/create-github-app.ts reference (mt#997 tracks).
review-worker.ts:51 — identity comparison now .toLowerCase()-normalized on both sides.
scripts/lint-console-usage.ts:42-47 — exemption narrowed to **/services/reviewer/** specifically, with a comment noting the explicit-opt-in policy for future services.
.gitignore — stale bun.lockb entry removed.

Non-blocking (filed as follow-up)

Null-tier silent skip (reviewer finding #5) → mt#1085. Switching tier source from PR-body marker to MCP provenance record eliminates the marker-forgetting failure mode. Sprint B prereq.
Missing network-call timeouts (reviewer finding #8) → mt#1086. Explicit AbortController timeouts on model and GitHub API calls. Deferred because webhook-platform timeouts provide partial mitigation.

Spec verification (unchanged from prior review)

All in-scope mt#1083 criteria still met after fixes; no regressions. User-action items (create App, install Railway CLI, deploy) still pending — not affected by these fixes.

Documentation impact

Updated services/reviewer/README.md for Sprint A accuracy. No changes to docs/architecture.md etc.

(Had Claude look into this — AI-assisted follow-up to the independent Chinese-wall review.)

The Chinese-wall reviewer on PR #695 flagged one blocking and two non-blocking issues; this commit addresses the blocker and one of the non-blockers. docs/github-app-bot-setup.md section 4 previously hardcoded minsky-app.pem paths in both the Option A (config file) and Option B (env-var) examples. The new parametrized script writes to name-prefixed files (minsky-ai.pem, minsky-reviewer.pem, etc.). Users who followed the automated path and then pasted section 4 literally would point Minsky at a non-existent file. Updated both examples to use a placeholder that matches the script --name flag, and added an explicit paragraph explaining that automated-path users can read the values from the generated JSON metadata file. scripts/create-github-app.ts browser-open was macOS-only via the command. Added cross-platform dispatch (open on darwin, xdg-open on linux, start on windows) and always print the URL before spawning so users can click manually if the spawn silently fails. Not addressed: server-hang-if-user-aborts-midflow is acceptable for v1. Partial state in ~/.config/minsky/ is harmless (App exists on GitHub even if install not completed; user can re-run the script). This is the fourth consecutive PR where Chinese-wall review caught a real issue the self-review would have missed: PR #679 (1 non-blocking), PR #683 (1 blocking + 2 non-blocking), PR #692 (2 blocking + 7 non-blocking), PR #695 (1 blocking + 2 non-blocking).

edobry added 2 commits April 22, 2026 15:46

Merge remote-tracking branch 'origin/main' into task/mt-1083

36a6cf9

minsky-ai Bot added the authorship/co-authored Co-authored by human and AI agent label Apr 22, 2026

minsky-ai Bot commented Apr 22, 2026

View reviewed changes

edobry merged commit 8e21287 into main Apr 22, 2026
1 of 2 checks passed

edobry deleted the task/mt-1083 branch April 22, 2026 20:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(mt#1083): Sprint A: minsky-reviewer service scaffold (Railway webhook, cross-provider adversarial review)#692

feat(mt#1083): Sprint A: minsky-reviewer service scaffold (Railway webhook, cross-provider adversarial review)#692
edobry merged 3 commits into
mainfrom
task/mt-1083

minsky-ai Bot commented Apr 22, 2026

Uh oh!

minsky-ai Bot left a comment

Uh oh!

minsky-ai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

minsky-ai Bot commented Apr 22, 2026

Summary

Architecture

Levers engaged (five of nine)

Deferred

Files

Test / validation

User action required before deploy

What this PR itself exercises

Related

Uh oh!

minsky-ai Bot left a comment

Choose a reason for hiding this comment

Findings

Checked and clear

Spec verification (mt#1083)

Documentation impact

What this review found that the self-review did not

Uh oh!

minsky-ai Bot left a comment

Choose a reason for hiding this comment

Delta review: commit c83c039

Blocking (now fixed)

Non-blocking (fixed in this commit)

Non-blocking (filed as follow-up)

Spec verification (unchanged from prior review)

Documentation impact

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Delta review: commit `c83c039`