feat(http): record/replay for net.fetch (closes #7) by escapeboy · Pull Request #15 · escapeboy/boruna

escapeboy · 2026-04-25T16:03:47Z

Sprint 0.5-S7 — closes #7 (FleetQ P2, fifth in a row)

Boruna scripts are deterministic by design; external HTTP is not. This bridges the gap so agent CI loops become genuinely reproducible. Distinctive selling point per FleetQ: "one of the few runtimes where record/replay would be ergonomic." Pulled forward from 0.5.0 (same pattern as #6, #8).

Surface

```bash

Record once against the real upstream:

boruna run app.ax --policy allow-all --live \
--record-net-to fixtures/run-001.tape.json

Replay forever, no network access:

boruna run app.ax --policy allow-all \
--replay-net-from fixtures/run-001.tape.json
```

Tape file:
```json
{
"format_version": 1,
"transactions": [
{ "method": "GET", "url": "https://api.example.com/users/42",
"request_body": null, "response_body": "{\"id\":42}" },
{ "method": "POST", "url": "https://api.example.com/events",
"request_body": "{\"event\":\"click\"}",
"response_body": "{\"ok\":true}" }
]
}
```

Match strategy (locked, in design doc)

Strict ordered, key on `(method, url, request_body)`
Headers EXCLUDED from match key — auth tokens change between sessions
Mismatch → typed error: `position N: method differs / url differs / request_body differs`
Exhaustion → typed error: `tape exhausted (N consumed, asked for more at position N)`
Under-consumption → silently OK (trailing tape entries unused)

Tests

18 new tests in `boruna-vm` (round-trip, in-order match, all three mismatch flavors, exhaustion, under-consumption, default-method-GET, case normalization, mock pass-through, save-on-drop, drop-without-path, parser-agreement, back-to-back-identical-calls for polling scripts, format-version compatibility, save-then-load round trip)
All 124+ existing VM tests pass under `--features http`
All 591+ existing workspace tests pass
`cargo clippy --workspace -- -D warnings` clean (with and without http)
`cargo fmt --all -- --check` clean

Review

`ce-correctness-reviewer` surfaced 1 HIGH + 3 MEDIUM findings. All addressed before commit:

#	Finding	Fix
1 (HIGH)	Save-on-drop swallows tape errors → CI fixture corruption	CLI write-probe at startup; Drop is the safety net only
2 (MED)	Tape lost on panic mid-recording	Documented as v1 limitation; streaming tape is the future fix
3 (MED)	Missing test for back-to-back identical calls (polling)	Added
4 (MED)	`describe_net_fetch_request` could drift from `HttpHandler` parser	Extracted shared `parse_net_fetch_args` in `http_handler.rs`; both call it. Regression test asserts agreement.

Documented limitations (per review)

Request headers NOT in match key (auth tokens change between sessions)
Response status/headers NOT recorded (handler returns body only today)
Failed transactions NOT taped in v1
Non-UTF-8 response bodies inherit `HttpHandler`'s "not valid UTF-8" error
Tape file size unbounded (~2× pretty-print multiplier)
Panic during record may lose the tape (esp. `panic = "abort"`)

What's NOT in this PR (follow-ups)

`boruna workflow run --record-net-to / --replay-net-from` — same machinery; defer to follow-up sprint when there's a workflow-level ask
`boruna_run` MCP parameter form — defer until asked
Recording response status + headers — requires `HttpHandler::handle_net_fetch` return-type change; defer
Recording failed transactions — requires teaching the tape format about errors; defer
Header-aware matching — opt-in `match_headers: [...]` config; defer
`db.query` / `llm.call` record/replay — each is its own sprint; this PR establishes the pattern

Closes

Closes [P2] Record/replay for net.fetch #7 (FleetQ P2: record/replay for net.fetch)

FleetQ status after this PR

5 of 9 P1/P2 asks closed (#3, #5, #6, #7, #8). Only #9 (per-call OpenTelemetry observability) remains as the last small-sprint pick before the big-sprint pivot to `0.3-S2` (persistent state).

🤖 Generated with Claude Code

Sprint 0.5-S7, pulled forward from 0.5.0. Fifth FleetQ ask shipped in a row (after #3, #6, #5, #8). Boruna scripts are deterministic by design; external HTTP is not. This bridges the gap so agent CI loops become genuinely reproducible. ## Surface ```bash # Record once against the real upstream: boruna run app.ax --policy allow-all --live \\ --record-net-to fixtures/run-001.tape.json # Replay forever, no network access: boruna run app.ax --policy allow-all \\ --replay-net-from fixtures/run-001.tape.json ``` ## What's in this PR - New module `crates/llmvm/src/net_record_replay.rs` (feature `http`): - `NetTransaction { method, url, request_body, response_body }` - `NetTape { format_version: 1, transactions: [...] }` with save/load and version compatibility check - `RecordingHttpHandler` wraps `HttpHandler`; records on each call; `with_save_path` constructor enables save-on-drop - `ReplayingHttpHandler` serves from a loaded tape; strict ordered match on (method, url, request_body); typed errors for mismatch / exhaustion; under-consumption is silently OK - New CLI flags `--record-net-to <FILE>` and `--replay-net-from <FILE>` on `boruna run`. Mutually exclusive (clap `conflicts_with`). Record requires `--live`; replay overrides `--live`. - New shared parser `parse_net_fetch_args` in `http_handler.rs` used by BOTH the real handler and the recording layer — eliminates the silent-drift risk the reviewer flagged on the duplicated parser. - CLI write-probe: writes an empty tape to the target path BEFORE the run starts, so disk errors surface in process exit code instead of a stale fixture from a pipeline like `record && verify`. ## Match strategy (locked, in design doc) - Strict ordered, key on (method, url, request_body) - Headers EXCLUDED from match key (auth tokens change between sessions) - Mismatch returns typed error: `position N: method differs / url differs / request_body differs` - Exhaustion: `tape exhausted (N transactions consumed, ... at position N)` - Under-consumption: silently OK (trailing tape entries unused) ## Tape format ```json { "format_version": 1, "transactions": [ ... ] } ``` Format version is bumped on breaking shape changes; additive ones keep the version. Loading a tape with an unsupported version returns a typed error. ## Tests - 18 new tests in `boruna-vm` (tape round-trip, in-order match, all three mismatch flavors, exhaustion, under-consumption, default-method- GET, case normalization, mock pass-through, save-on-drop, drop-without- path, parser-agreement-with-http_handler, back-to-back-identical-calls, bad-format-version, save-then-load round trip, RecordingHttpHandler empty/len helpers). - All 124+ existing VM tests pass under `--features http`. - All 591+ existing workspace tests pass. - `cargo clippy --workspace -- -D warnings` clean (with and without http). - `cargo fmt --all -- --check` clean. ## Review `ce-correctness-reviewer` surfaced 1 HIGH + 3 MEDIUM findings. All addressed before commit: 1. (HIGH) Save-on-drop swallows tape errors → CI fixture corruption. Fixed by CLI write-probe at startup; Drop becomes the safety net. 2. (MED) Tape lost on panic mid-recording. Documented as v1 limitation; streaming append-only tape is the future fix. 3. (MED) Missing test for back-to-back identical calls. Added. 4. (MED) `describe_net_fetch_request` duplication may drift from `HttpHandler::handle_net_fetch`. Extracted shared parser `parse_net_fetch_args` in http_handler.rs; both call sites use it. Added regression test asserting parser agreement. ## Documented limitations (per review) - Request headers NOT in match key (auth tokens change between sessions) - Response status/headers NOT recorded (handler returns body only today) - Failed transactions NOT taped in v1 (re-recording is user's job) - Non-UTF-8 response bodies inherit `HttpHandler`'s "not valid UTF-8" error — recording cannot capture binary payloads - Tape file size unbounded (a 100k-call agent → multi-GB JSON, ~2× pretty-print multiplier) - Panic during record may lose tape if Drop doesn't run (esp. under `panic = "abort"`) ## Closes - Closes #7 (FleetQ P2: record/replay for net.fetch) ## FleetQ status after this PR 5 of 9 P1/P2 asks closed (#3, #5, #6, #7, #8). Only #9 (per-call OpenTelemetry observability) remains as the last small-sprint pick before the big-sprint pivot to 0.3-S2 (persistent state). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Captures the 4 review findings (1 HIGH + 3 MEDIUM) and how each was addressed. Notable: the Drop-based-save-with-eprintln correctness gap (silent stale fixtures in CI && chains) and the paired-parser extraction (eliminating a silent-drift risk that comments couldn't enforce). Establishes two new project conventions: - For Drop-based side effects, pair with a pre-flight probe at the CLI integration point. Drop is ergonomic; pre-flight is exit-code-honest. - For paired parsers (real path + instrumentation mirror path), extract a shared function. Comments saying "keep in lock-step" aren't enforceable; a single function IS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

escapeboy and others added 2 commits April 25, 2026 19:03

escapeboy mentioned this pull request Apr 25, 2026

feat(observability): per-call OpenTelemetry spans (closes #9 — final FleetQ ask) #16

Merged

escapeboy merged commit e08b9cf into master Apr 25, 2026
2 of 3 checks passed

escapeboy deleted the feat/0.5-s7-net-record-replay branch April 25, 2026 17:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(http): record/replay for net.fetch (closes #7)#15

feat(http): record/replay for net.fetch (closes #7)#15
escapeboy merged 2 commits intomasterfrom
feat/0.5-s7-net-record-replay

escapeboy commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

escapeboy commented Apr 25, 2026

Sprint 0.5-S7 — closes #7 (FleetQ P2, fifth in a row)

Surface

Record once against the real upstream:

Replay forever, no network access:

Match strategy (locked, in design doc)

Tests

Review

Documented limitations (per review)

What's NOT in this PR (follow-ups)

Closes

FleetQ status after this PR

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant