feat(mt#1083): Sprint A: minsky-reviewer service scaffold (Railway webhook, cross-provider adversarial review)#692
Conversation
First shippable slice of mt#1073 adversarial reviewer architecture. Stateless TypeScript webhook service designed to run on Railway, posting PR reviews under a separate minsky-reviewer GitHub App with a different model provider than the implementer. Levers engaged in v1 (5 of 9 from the Structural Review paper): - Context isolation: fresh session per PR via webhook dispatch - Capability asymmetry: reviewer App pull-requests:write and contents:read only - Adversarial prompting: Critic Constitution system prompt frames role as flaw-finding - Identity separation: minsky-reviewer App distinct from minsky-ai; APPROVE authority - Model diversity: routes to OpenAI GPT-5 or Google Gemini by default Deferred to Sprint B: role specialization, ensemble voting. Deferred to Sprint C: temporal integration, metrics, structural incentives. Also extends scripts/lint-console-usage.ts excludePatterns with services/, since standalone deployables use their own structured logging and do not depend on Minsky logger. Validation: 18 of 18 service tests pass, 2040 of 2040 full project suite passes, typecheck clean, lint clean at the 30-warning threshold. User action before deploy: create minsky-reviewer GitHub App distinct from minsky-ai, install Railway CLI and configure env vars per DEPLOY.md, register webhook URL in the App settings.
There was a problem hiding this comment.
Independent adversarial review (Chinese-wall)
Submitted by the main agent on behalf of a reviewer subagent that ran in an isolated context with no access to the implementer's session state. Body transmitted here because the subagent lacks session_pr_review_submit tool access (tracked as mt#1080).
Files reviewed: 11 source + 2 test + Dockerfile + DEPLOY.md + README.md + scripts/lint-console-usage.ts
Issues found: 2 blocking, 7 non-blocking
Findings
[BLOCKING] services/reviewer/Dockerfile:6 — Lockfile mismatch prevents Docker build.
COPY package.json bun.lockb* ./ matches bun.lockb (old binary format) but not bun.lock (the text format Bun 1.2+ generates, which is what is actually committed). Service requires bun >=1.2.0 and uses oven/bun:1.2-slim as base. Since bun.lock is never copied, RUN bun install --frozen-lockfile fails. Every Railway deploy would fail at build.
Fix: COPY package.json bun.lock ./. Addressed in follow-up commit.
[BLOCKING] services/reviewer/src/server.ts:47-73 — All review failures return HTTP 200; GitHub will not retry.
handlePullRequestEvent catches every error, logs, returns void. webhooks.verifyAndReceive resolves, server returns 200 for every delivery including failed ones. GitHub only retries on 4xx/5xx. A Tier-3 PR (mandatory review) where the model or GitHub API times out gets silently dropped — the mandatory-review guarantee is broken by a single transient error.
Fix: do not swallow errors in handlePullRequestEvent. Let them propagate to verifyAndReceive → 500 → retry. Accepts cost of duplicate reviews on flaky errors; Sprint B adds per-SHA idempotency. Addressed in follow-up commit.
[NON-BLOCKING] services/reviewer/src/config.ts — No runtime warning when REVIEWER_PROVIDER=anthropic. An operator who sets this without reading docs deploys a degraded service (context-isolation only, no architectural diversity). Addressed: added a degraded_config_warning log at startup.
[NON-BLOCKING] services/reviewer/README.md:57-58 — False claim about provenance lookup. Sprint A has no provenance lookup; it only reads the PR body HTML comment. Addressed: README rewritten to describe actual Sprint A behavior vs. Sprint B/C.
[NON-BLOCKING] services/reviewer/src/tier-routing.ts:54-59 — Null tier silently skips Tier-3 PRs that forget the marker. Not fixed this round — the mandatory-Tier-3 guarantee depends on the AI implementer correctly self-labeling. Sprint B's MCP-based provenance lookup eliminates this failure mode; not worth a fragile bandaid. Filed as follow-up task mt#1084.
[NON-BLOCKING] services/reviewer/src/review-worker.ts:51 — Self-review check uses strict === without case normalization. GitHub usernames are case-insensitive at platform level. Addressed: comparison now uses .toLowerCase().
[NON-BLOCKING] services/reviewer/README.md:26 — References non-existent scripts/create-github-app.ts. Script exists in /tmp from mt#997 but was never committed. Addressed: removed the reference; users do manual App creation for now. mt#997 still tracks script preservation separately.
[NON-BLOCKING] services/reviewer/src/providers.ts — No timeouts on any network call. Real operational risk — a hung model call will hold the webhook response indefinitely. Filed as mt#1085 for follow-up. Not blocking v1 because webhooks have platform-level timeouts (~30s) that partially mitigate, but should be added properly in Sprint B.
[NON-BLOCKING] scripts/lint-console-usage.ts:43 — **/services/** exclusion broader than necessary. Addressed: narrowed to **/services/reviewer/** specifically, with a comment noting that future services must opt in explicitly.
Checked and clear
- Webhook signature verification is correct: HMAC-SHA256 via
verifyAndReceive;isSignatureErrorheuristic correctly discriminates signature failures from handler errors. - App auth flow correct; private key never logged; diff fetch uses correct
mediaType: { format: "diff" }pattern. - Critic Constitution prompt is substantive and adversarial;
buildReviewPrompthandles null tier and missing spec correctly. - All three provider wrappers use correct API shapes for their SDK versions; Anthropic TextBlock filtering is correct.
requireEnvprovides fail-fast validation; private key stored in config but not logged; no secret leakage path identified.- 18 tier-routing + event-parsing tests are correct for what they cover.
bun.lockcorrectly tracked; stalebun.lockbgitignore entry is harmless (but removed anyway for cleanliness).
Spec verification (mt#1083)
| Criterion | Status | Evidence |
|---|---|---|
minsky-reviewer App created |
Not met | User action; tracked in PR description |
TokenProvider role-keyed extension |
Deferred | Sprint A scaffold uses a standalone App-auth path; role-keyed extension is Sprint B |
| Railway service deployed | Not met | User action; scaffold ready for deploy |
| Service verifies webhook signature | Met | webhooks.verifyAndReceive at server.ts:97-102 |
| Fetches PR diff + metadata | Met | github-client.ts:41-61 |
| Runs on different model provider | Met (configurable) | providers.ts; config.ts:56-59 validates provider |
| Adversarial prompt (Critic Constitution) | Met | prompt.ts:13 |
| Review posted under reviewer identity | Met | Identity pulled from App token; review-worker.ts:44-52 |
| Tier-routing mandatory for Tier 3 | Met (with marker caveat — see finding #4) | tier-routing.ts:50-54 |
| End-to-end smoke test: broken PR → REQUEST_CHANGES | Pending | Cannot smoke-test until user creates the App and deploys |
| End-to-end smoke test: clean PR → APPROVE | Pending | Same |
| Misconfigured credentials fail loudly | Met | requireEnv throws with descriptive message |
| Provider diversity when implementer uses Claude | Met (with degraded-fallback warning) | Default is OpenAI; Anthropic available but logs warning |
| Reviewer cannot write to repo | Met (structural) | App permissions set to pull-requests:write + contents:read only per README |
| Documentation | Met (after fixes) | README corrected for Sprint A accuracy; DEPLOY.md covers sequence |
Documentation impact
Updated services/reviewer/README.md and added a §"Review before asking the user" stopgap rule to CLAUDE.md in earlier PR (mt#1030). No update needed to docs/architecture.md or docs/theory-of-operation.md — the reviewer service is a standalone deployable scoped under services/.
What this review found that the self-review did not
The self-review (posted to this PR thread shortly after creation) did not flag either blocking issue:
- Dockerfile lockfile glob mismatch — self-review framed the Dockerfile as "Railway-compatible" without actually checking the COPY glob against the committed lockfile filename. Chinese-wall reviewer spotted it in a single pass.
- HTTP 200 on error — self-review focused on happy-path correctness; missed that the webhook retry semantics were inverted by the blanket error swallow in
handlePullRequestEvent.
Both are the class of issue the Chinese-wall pattern is specifically designed to catch: the implementer's mental model of "this is a new service, I'm focused on making it work" missed system-level integration concerns (Docker build, GitHub retry protocol) that the fresh-context reviewer caught immediately. Acid-test 3-for-3 for the pattern (PR #679 non-blocking, PR #683 blocking, PR #692 two blocking).
(Had Claude look into this — AI-assisted review via an independent reviewer subagent in a deliberately isolated context.)
… non-blockers Chinese-wall reviewer (mt#1073 pattern) caught 2 blocking issues on this PR's initial scaffold that would have broken deployment and quietly dropped mandatory Tier-3 reviews under transient failures. Fixing both plus several non-blocking improvements flagged in the same review. Blocking fixes: 1. Dockerfile lockfile mismatch. COPY glob was bun.lockb* matching the pre-1.2 binary format; Bun 1.2+ uses bun.lock text format which is what the repo actually tracks. Every Railway deploy would fail at the RUN bun install step. Fixed to COPY package.json bun.lock explicitly. 2. HTTP 200 on review failure. The webhook handler swallowed all errors and returned void, causing verifyAndReceive to resolve cleanly and the server to return 200 for failed reviews. GitHub only retries on 4xx/5xx. A Tier-3 PR where the model or GitHub API timed out would be silently dropped with no retry and no review, breaking the mandatory-review guarantee. Fixed by removing the try/catch in handlePullRequestEvent so errors propagate to verifyAndReceive and return 500, causing GitHub to retry. Cost: duplicate-review risk on flaky errors; Sprint B adds per-SHA idempotency. Non-blocking fixes: - Startup warning when REVIEWER_PROVIDER=anthropic — operators who set this without reading docs deploy a degraded Chinese wall (same family as Claude implementer); log a degraded_config_warning at server start. - README corrected: removed false claim about Minsky provenance lookup (that is Sprint B; Sprint A only reads PR-body marker), removed reference to scripts/create-github-app.ts (not in repo yet, mt#997 tracks). - Self-review identity check now case-insensitive via .toLowerCase(); GitHub usernames are case-insensitive at platform level. - services/ lint-console-usage exclusion narrowed to services/reviewer/ specifically, so future services must opt in explicitly. - Stale bun.lockb entry removed from .gitignore. Deferred to follow-up tasks: - mt#1085 (Sprint B prereq): switch tier lookup from PR-body marker to MCP provenance record. Eliminates the marker-forgetting failure mode the reviewer flagged (non-blocking #5). - mt#1086: explicit network-call timeouts on model and GitHub API paths. Non-blocking #8 from the review. Validation: 18/18 service tests pass, typecheck clean, lint clean at 30-warning threshold.
There was a problem hiding this comment.
Delta review: commit c83c039
Addresses the Chinese-wall reviewer's 2 blocking findings + 5 of 7 non-blockers. Two non-blockers deferred to follow-up tasks.
Blocking (now fixed)
- Dockerfile lockfile glob (
services/reviewer/Dockerfile:6) —bun.lockb*→bun.lockexplicitly. Verified:ls services/reviewer/bun.lockexists in the repo and will be copied by the new COPY. - HTTP 200 on error (
services/reviewer/src/server.ts:47-53) — removed the try/catch inhandlePullRequestEvent; errors now propagate toverifyAndReceive→ 500 → GitHub retry. Comment explains the duplicate-review risk trade-off and points at Sprint B's idempotency work.
Non-blocking (fixed in this commit)
server.ts:148-156—degraded_config_warninglog at startup whenREVIEWER_PROVIDER=anthropic.README.md— removed false provenance-lookup claim; clarified Sprint A uses PR-body marker only, Sprint B uses MCP; removed stalescripts/create-github-app.tsreference (mt#997 tracks).review-worker.ts:51— identity comparison now.toLowerCase()-normalized on both sides.scripts/lint-console-usage.ts:42-47— exemption narrowed to**/services/reviewer/**specifically, with a comment noting the explicit-opt-in policy for future services..gitignore— stalebun.lockbentry removed.
Non-blocking (filed as follow-up)
- Null-tier silent skip (reviewer finding #5) → mt#1085. Switching tier source from PR-body marker to MCP provenance record eliminates the marker-forgetting failure mode. Sprint B prereq.
- Missing network-call timeouts (reviewer finding #8) → mt#1086. Explicit
AbortControllertimeouts on model and GitHub API calls. Deferred because webhook-platform timeouts provide partial mitigation.
Spec verification (unchanged from prior review)
All in-scope mt#1083 criteria still met after fixes; no regressions. User-action items (create App, install Railway CLI, deploy) still pending — not affected by these fixes.
Documentation impact
Updated services/reviewer/README.md for Sprint A accuracy. No changes to docs/architecture.md etc.
(Had Claude look into this — AI-assisted follow-up to the independent Chinese-wall review.)
The Chinese-wall reviewer on PR #695 flagged one blocking and two non-blocking issues; this commit addresses the blocker and one of the non-blockers. docs/github-app-bot-setup.md section 4 previously hardcoded minsky-app.pem paths in both the Option A (config file) and Option B (env-var) examples. The new parametrized script writes to name-prefixed files (minsky-ai.pem, minsky-reviewer.pem, etc.). Users who followed the automated path and then pasted section 4 literally would point Minsky at a non-existent file. Updated both examples to use a placeholder that matches the script --name flag, and added an explicit paragraph explaining that automated-path users can read the values from the generated JSON metadata file. scripts/create-github-app.ts browser-open was macOS-only via the command. Added cross-platform dispatch (open on darwin, xdg-open on linux, start on windows) and always print the URL before spawning so users can click manually if the spawn silently fails. Not addressed: server-hang-if-user-aborts-midflow is acceptable for v1. Partial state in ~/.config/minsky/ is harmless (App exists on GitHub even if install not completed; user can re-run the script). This is the fourth consecutive PR where Chinese-wall review caught a real issue the self-review would have missed: PR #679 (1 non-blocking), PR #683 (1 blocking + 2 non-blocking), PR #692 (2 blocking + 7 non-blocking), PR #695 (1 blocking + 2 non-blocking).
Summary
First shippable slice of the Chinese-wall reviewer App architecture (parent: mt#1073). Adds a standalone TypeScript webhook service in
services/reviewer/that posts adversarial PR reviews under a separate GitHub App identity, using a different model provider than the implementer.Closes Sprint A from the Structural Review position paper roadmap. Sprints B and C add role specialization, ensemble voting, and structural metrics.
Architecture
Levers engaged (five of nine)
pull-requests: write,contents: read. No code-write permissions. Structural.minsky-reviewerApp distinct fromminsky-ai; legitimately non-author for APPROVE eventsREVIEWER_PROVIDER=openai(GPT-5); Gemini 2.5 Pro alternative; Anthropic available as fallback but weakly recommended (same family as implementer)Deferred
Files
services/reviewer/README.md,DEPLOY.md— setup and deployment documentationservices/reviewer/package.json,tsconfig.json,Dockerfile— build config, Railway-compatibleservices/reviewer/src/config.ts— env-var loading with fail-fast validationservices/reviewer/src/prompt.ts— Critic Constitution + per-PR user prompt builderservices/reviewer/src/providers.ts— OpenAI / Google / Anthropic wrappers behind uniform interfaceservices/reviewer/src/github-client.ts— App auth via@octokit/auth-app+ PR fetch + review submitservices/reviewer/src/tier-routing.ts— tier-based activation decisionservices/reviewer/src/review-worker.ts— orchestrator, self-review detection, COMMENT-fallback for identity collisionservices/reviewer/src/server.ts— Bun webhook server with signature verification viaverifyAndReceiveservices/reviewer/src/*.test.ts— 18 tests covering tier routing and event parsingscripts/lint-console-usage.ts— extendedexcludePatternsto coverservices/(standalone deployables use their own structured logging)Test / validation
bun test services/reviewer/src/— 18/18 passbun test --preload ./tests/setup.ts --timeout=15000 src tests/adapters tests/domain— 2040/2040 pass (no regressions)mcp__minsky__validate_typecheck— cleanmcp__minsky__validate_lint— clean at 30-warning threshold (1 added is fromnode_modulesbeing scanned in the newservices/reviewer/node_modules)User action required before deploy
minsky-reviewerGitHub App (distinct fromminsky-ai) with permissions:pull-requests: write(submit reviews)contents: read(fetch diff and codebase)bash <(curl -fsSL cli.new)orbrew install railwayservices/reviewer/DEPLOY.mdfor the deploy sequence (variables, domain, webhook registration)What this PR itself exercises
This PR has
<!-- minsky:tier=2 -->in its body, which means after mt#1073 and this service are deployed, future Tier-2 PRs with theMINSKY_REVIEWER_TIER2_ENABLED=trueflag set will get reviewed automatically byminsky-reviewer[bot]. This PR is the bootstrap, so it still gets the manual Chinese-wall review via thereviewersubagent type before merge.Related
session_pr_review_submit— parallel work on the local path)