feat(mcp): output JSON Schema validation gate in boruna_run (closes #8)#14
Merged
feat(mcp): output JSON Schema validation gate in boruna_run (closes #8)#14
Conversation
Sprint 0.5-S6, pulled forward from 0.5.0. Fourth FleetQ ask shipped in a row (after #3, #6, #5). Lets integrators drop their host-language JSON Schema validation layer — Boruna validates the agent's return value before yielding it, with structured per-path errors. ## What changed - New optional `output_schema: object` parameter on `boruna_run`. Accepts any JSON Schema 2020-12 object. Post-execution gate. - New `error_kind` values: - `validation_failed` (phase: "output_validation") with per-path JSON Pointer errors, truncated and total_errors fields - `invalid_output_schema` for malformed or oversized schemas - `jsonschema = "0.30"` dep in boruna-mcp (default features off — no resolve-http/resolve-file, so $ref can't trigger SSRF/file reads) ## Hard limits (locked, all addressed in review) - 256 KB max schema size (compact JSON bytes). Larger → invalid_output_schema before any compilation. Mirrors the spirit of the 1 MB source cap. - 100 errors max in response (matches TRACE_LIMIT pattern). Pathological schemas can produce thousands; cap protects MCP transport. truncated: true + total_errors: N fields when exceeded. - Draft 2020-12 enforced. Schemas declaring non-2020-12 $schema are REJECTED at parse time, not silently honoured at older-draft semantics (which would change `id` vs `$id`, `exclusiveMinimum` boolean vs number, etc.). Same "reject at parse, don't silently override" pattern as 0.3-S10 max_memory_mb. ## Known limitation (documented loudly) `format_value` emits Boruna records/enums/Some/Ok as wrapper JSON: - Record → {"type":"record","type_id":N,"fields":[positional values]} - Enum → {"type":"enum","type_id":N,"variant":<n>,"payload":...} - Some/Ok/Err → {"option":"Some","value":...} etc. A schema written for the natural object shape (e.g. {"type":"object", "properties":{"name":...}}) will FAIL validation against a record return. The gate is most useful for primitive return types (Int, String, Bool) and homogeneous List/Map containers. Record/enum projection lands in a future sprint. Documented in: - docs/design-output-schema.md - validate_output_against_schema doc comment - mcp-server.md (lands when PR #11 merges) ## Tests - 9 new MCP tests: schema-not-set passthrough, schema-passing, schema- failing-with-path-errors, schema-failing-with-multiple-errors, invalid-schema-rejected, runtime-error-takes-precedence, empty-schema- accepts-anything, $schema-non-2020-12-rejected, $schema-2020-12-accepted, size-limit-rejection, total_errors-field-present. - All 591+ existing workspace tests pass. - clippy --workspace -- -D warnings clean. - cargo fmt --all -- --check clean. ## Review ce-correctness-reviewer surfaced 1 HIGH + 3 MEDIUM findings. All addressed before commit: 1. (HIGH) `validator_for` honored older drafts via $schema → REJECT non-2020-12 $schema at parse time (jsonschema crate honours $schema over with_draft, so silent override impossible). 2. (MED) `format_value` wrapper-format limitation for Record/Enum → documented loudly in design doc, doc comment, and CHANGELOG. 3. (MED) Unbounded errors array → cap at 100 with truncated/total_errors. 4. (MED) No size limit on output_schema → 256 KB cap before compilation. ## What's NOT in this PR (follow-ups) - The user-facing docs/reference/mcp-server.md update for the output_schema parameter and new error kinds. Lives on PR #11's branch which adds that file in the first place; this PR is branched off master. Folds in as a small follow-up after PR #11 merges. - Logical projection of Record/Enum to by-name JSON before validation. Future sprint when there's a real customer ask. - Schema compilation caching (compile-by-hash). Defer until measured. ## Closes - Closes #8 (FleetQ P2: output schema validation as a first-class gate) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures the 4 review findings (1 HIGH + 3 MEDIUM) and how each was addressed before commit. Notable: the test-driven discovery that `with_draft` is only a fallback (not a lock) when `\$schema` is set — forced the right fix (reject non-2020-12 \$schema at parse time). Establishes "reject at parse, don't silently override" as a project convention, now applied to: max_memory_mb (0.3-S10), policy invalid strings (0.2.0), output_schema size, non-2020-12 \$schema (0.5-S6). Also locks the wrapper-format limitation in 4 docs (design doc, doc comment, CHANGELOG, PR description) so integrators have honest expectations until logical projection lands in a future sprint. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Sprint 0.5-S6 — closes #8 (FleetQ P2, fourth in a row)
Lets integrators drop their host-language JSON Schema validation layer — Boruna validates the agent's return value before yielding it, with structured per-path errors. Pulled forward from 0.5.0 because FleetQ wanted it in their pipeline; same logic that worked for #6 (lock the surface before 0.5 freezes it).
What's in this PR
MCP surface
Mismatch returns:
{ \"success\": false, \"error_kind\": \"validation_failed\", \"phase\": \"output_validation\", \"message\": \"result does not match output_schema\", \"errors\": [{ \"path\": \"/status\", \"message\": \"...\" }, ...], \"truncated\": false, \"total_errors\": 3, \"steps\": <vm.step_count() at successful completion> }Malformed/oversized schema:
{ \"success\": false, \"error_kind\": \"invalid_output_schema\", \"message\": \"...\" }Hard limits (locked)
Known limitation (documented loudly)
`format_value` emits Boruna records/enums/Some/Ok as wrapper JSON:
A schema written for the natural object shape (e.g. `{"type":"object","properties":{"name":...}}`) will FAIL validation against a record return. The gate is most useful for primitive return types (`Int`, `String`, `Bool`) and homogeneous `List`/`Map` containers. Record/enum projection lands in a future sprint. Documented in design doc, doc comment, CHANGELOG.
Tests
Review
`ce-correctness-reviewer` surfaced 1 HIGH + 3 MEDIUM findings. All addressed before commit:
Security
`jsonschema = "0.30"` added with `default-features = false` — no `resolve-http` or `resolve-file` features, so `$ref` to remote URLs or local files cannot trigger SSRF or arbitrary file reads.
What's NOT in this PR (follow-ups)
Closes
FleetQ status after this PR
4 of 9 P1/P2 asks closed (#3, #5, #6, #8). Only 2 P2 asks remain:
After those, only the 0.3.0 critical-path work (persistent state + dependent sprints) remains as the dominant roadmap focus.
🤖 Generated with Claude Code