feat(http): record/replay for net.fetch (closes #7)#15
Merged
Conversation
Sprint 0.5-S7, pulled forward from 0.5.0. Fifth FleetQ ask shipped in a row (after #3, #6, #5, #8). Boruna scripts are deterministic by design; external HTTP is not. This bridges the gap so agent CI loops become genuinely reproducible. ## Surface ```bash # Record once against the real upstream: boruna run app.ax --policy allow-all --live \\ --record-net-to fixtures/run-001.tape.json # Replay forever, no network access: boruna run app.ax --policy allow-all \\ --replay-net-from fixtures/run-001.tape.json ``` ## What's in this PR - New module `crates/llmvm/src/net_record_replay.rs` (feature `http`): - `NetTransaction { method, url, request_body, response_body }` - `NetTape { format_version: 1, transactions: [...] }` with save/load and version compatibility check - `RecordingHttpHandler` wraps `HttpHandler`; records on each call; `with_save_path` constructor enables save-on-drop - `ReplayingHttpHandler` serves from a loaded tape; strict ordered match on (method, url, request_body); typed errors for mismatch / exhaustion; under-consumption is silently OK - New CLI flags `--record-net-to <FILE>` and `--replay-net-from <FILE>` on `boruna run`. Mutually exclusive (clap `conflicts_with`). Record requires `--live`; replay overrides `--live`. - New shared parser `parse_net_fetch_args` in `http_handler.rs` used by BOTH the real handler and the recording layer — eliminates the silent-drift risk the reviewer flagged on the duplicated parser. - CLI write-probe: writes an empty tape to the target path BEFORE the run starts, so disk errors surface in process exit code instead of a stale fixture from a pipeline like `record && verify`. ## Match strategy (locked, in design doc) - Strict ordered, key on (method, url, request_body) - Headers EXCLUDED from match key (auth tokens change between sessions) - Mismatch returns typed error: `position N: method differs / url differs / request_body differs` - Exhaustion: `tape exhausted (N transactions consumed, ... at position N)` - Under-consumption: silently OK (trailing tape entries unused) ## Tape format ```json { "format_version": 1, "transactions": [ ... ] } ``` Format version is bumped on breaking shape changes; additive ones keep the version. Loading a tape with an unsupported version returns a typed error. ## Tests - 18 new tests in `boruna-vm` (tape round-trip, in-order match, all three mismatch flavors, exhaustion, under-consumption, default-method- GET, case normalization, mock pass-through, save-on-drop, drop-without- path, parser-agreement-with-http_handler, back-to-back-identical-calls, bad-format-version, save-then-load round trip, RecordingHttpHandler empty/len helpers). - All 124+ existing VM tests pass under `--features http`. - All 591+ existing workspace tests pass. - `cargo clippy --workspace -- -D warnings` clean (with and without http). - `cargo fmt --all -- --check` clean. ## Review `ce-correctness-reviewer` surfaced 1 HIGH + 3 MEDIUM findings. All addressed before commit: 1. (HIGH) Save-on-drop swallows tape errors → CI fixture corruption. Fixed by CLI write-probe at startup; Drop becomes the safety net. 2. (MED) Tape lost on panic mid-recording. Documented as v1 limitation; streaming append-only tape is the future fix. 3. (MED) Missing test for back-to-back identical calls. Added. 4. (MED) `describe_net_fetch_request` duplication may drift from `HttpHandler::handle_net_fetch`. Extracted shared parser `parse_net_fetch_args` in http_handler.rs; both call sites use it. Added regression test asserting parser agreement. ## Documented limitations (per review) - Request headers NOT in match key (auth tokens change between sessions) - Response status/headers NOT recorded (handler returns body only today) - Failed transactions NOT taped in v1 (re-recording is user's job) - Non-UTF-8 response bodies inherit `HttpHandler`'s "not valid UTF-8" error — recording cannot capture binary payloads - Tape file size unbounded (a 100k-call agent → multi-GB JSON, ~2× pretty-print multiplier) - Panic during record may lose tape if Drop doesn't run (esp. under `panic = "abort"`) ## Closes - Closes #7 (FleetQ P2: record/replay for net.fetch) ## FleetQ status after this PR 5 of 9 P1/P2 asks closed (#3, #5, #6, #7, #8). Only #9 (per-call OpenTelemetry observability) remains as the last small-sprint pick before the big-sprint pivot to 0.3-S2 (persistent state). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures the 4 review findings (1 HIGH + 3 MEDIUM) and how each was addressed. Notable: the Drop-based-save-with-eprintln correctness gap (silent stale fixtures in CI && chains) and the paired-parser extraction (eliminating a silent-drift risk that comments couldn't enforce). Establishes two new project conventions: - For Drop-based side effects, pair with a pre-flight probe at the CLI integration point. Drop is ergonomic; pre-flight is exit-code-honest. - For paired parsers (real path + instrumentation mirror path), extract a shared function. Comments saying "keep in lock-step" aren't enforceable; a single function IS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Sprint 0.5-S7 — closes #7 (FleetQ P2, fifth in a row)
Boruna scripts are deterministic by design; external HTTP is not. This bridges the gap so agent CI loops become genuinely reproducible. Distinctive selling point per FleetQ: "one of the few runtimes where record/replay would be ergonomic." Pulled forward from 0.5.0 (same pattern as #6, #8).
Surface
```bash
Record once against the real upstream:
boruna run app.ax --policy allow-all --live \
--record-net-to fixtures/run-001.tape.json
Replay forever, no network access:
boruna run app.ax --policy allow-all \
--replay-net-from fixtures/run-001.tape.json
```
Tape file:
```json
{
"format_version": 1,
"transactions": [
{ "method": "GET", "url": "https://api.example.com/users/42",
"request_body": null, "response_body": "{\"id\":42}" },
{ "method": "POST", "url": "https://api.example.com/events",
"request_body": "{\"event\":\"click\"}",
"response_body": "{\"ok\":true}" }
]
}
```
Match strategy (locked, in design doc)
Tests
Review
`ce-correctness-reviewer` surfaced 1 HIGH + 3 MEDIUM findings. All addressed before commit:
Documented limitations (per review)
What's NOT in this PR (follow-ups)
Closes
FleetQ status after this PR
5 of 9 P1/P2 asks closed (#3, #5, #6, #7, #8). Only #9 (per-call OpenTelemetry observability) remains as the last small-sprint pick before the big-sprint pivot to `0.3-S2` (persistent state).
🤖 Generated with Claude Code